Perspectives on Arabic Linguistics XIX
AMSTERDAM STUDIES IN THE THEORY AND HISTORY OF LINGUISTIC SCIENCE General Editor E.F.K. KOERNER (Zentrum für Allgemeine Sprachwissenschaft, Typologie und Universalienforschung, Berlin) Series IV – CURRENT ISSUES IN LINGUISTIC THEORY Advisory Editorial Board Lyle Campbell (Salt Lake City); Sheila Embleton (Toronto) Brian D. Joseph (Columbus, Ohio); John E. Joseph (Edinburgh) Manfred Krifka (Berlin); E. Wyn Roberts (Vancouver, B.C.) Joseph C. Salmons (Madison, Wis.); Hans-Jürgen Sasse (Köln)
Volume 289
Elabbas Benmamoun (ed.) Perspectives on Arabic Linguistics XIX Papers from the nineteenth annual symposium on Arabic Linguistics, Urbana, Illinois, April 2005
Perspectives on Arabic Linguistics XIX Papers from the nineteenth annual symposium on Arabic Linguistics, Urbana, Illinois, April 2005
Edited by
Elabbas Benmamoun
University of Illinois, Urbana-Champaign
JOHN BENJAMINS PUBLISHING COMPANY AMSTERDAM/PHILADELPHIA
4-
The paper used in this publication meets the minimum requirements of American National Standard for Information Sciences — Permanence of Paper for Printed Library Materials, ANSI Z39.48-1984.
Perspectives on Arabic linguistics XIX : Papers from the nineteenth annual symposium on Arabic Linguistics, Urbana, Illinois, April 2005 / Edited by Elabbas Benmamoun. (Amsterdam studies in the theory and history of linguistic science. Series IV, Current issues in linguistic theory, ISSN 0304-0763 ; v. 289) Includes bibliographical references and index. ISBN 978 90 272 4804 6 (Hb; alk. paper) © 2007 – John Benjamins B.V. No part of this book may be reproduced in any form, by print, photoprint, microfilm, or any other means, without written permission from the publisher. John Benjamins Publishing Co. • P.O.Box 36224 • 1020 ME Amsterdam • The Netherlands John Benjamins North America • P.O.Box 27519 • Philadelphia PA 19118-0519 • USA
CONTENTS
Acknowledgments
vii
Foreword Elabbas Benmamoun
ix
Section I: Computational and Corpus Linguistics Systematicity in the Arabic Mental Lexicon Ilana Bromberg Arabic PAPPI: A Principles and Parameters Parser Sandiway Fong
3
19
Corpus-based Linguistic Analyses: Testing Intuitions about Arabic Structure and Use Salem Ghazali
37
Learning Arabic Morphology Using Statistical Constraint-Satisfaction Models Paul Rodrigues and Damir Ćavar
63
Learning to Use the Prague Arabic Dependency Treebank Otakar Smrž, Petr Pajas, Zdeněk Žabokrtský, Jan Hajič, Jiří Mírovský, and Petr Němec
77
CONTENTS
vi
Section II: Phonology, Morphology, and Syntax Intonational and Rhythmic Patterns across the Dialect Continuum Salem Ghazali, Rym Hamdi and Khouloud Knis
97
Roots and Patterns in Arabic Lexical Processing Abdessatar Mahfoudhi
123
Affrication in North Arabic Revisited Eiman Mustafawi
151
The Syntax of Complex Tense in Moroccan Arabic Hamid Ouali and Catherine Fortin
175
On Agree and Postcyclic Merge in Syntactic Derivations: First Conjunct Agreement in Standard Arabic Usama Soltan
191
Section III: Sociolinguistics and Second Language Acquisition Null Subjects Use by English and Spanish Learners of Arabic as an L2 Mohammad T. Alhawary
217
Linguistic Diversity: The Qaaf across Arabic Dialects Maher Bahloul
247
Arabic Sociolinguistics and Cultural Diversity in Morocco Moha Ennaji
267
The Gendered Use of Arabic and Other Languages in Morocco Fatima Sadiqi
277
Index of Subjects
301
ACKNOWLEDGMENTS The Nineteenth Annual Symposium on Arabic Linguistics was held at the University of Illinois at Urbana-Champaign in March 2005. The Symposium was sponsored by the Arabic Linguistic Society and the Department of Linguistics at the University of Illinois at Urbana-Champaign. Additional support was provided by a number of departments and centers at the University of Illinois including the Center for African Studies, the Center for Global Studies, the Program in South Asian and Middle Eastern Studies, the Center for Advanced Studies, and the Beckman Institute. I am indebted to all the reviewers for their help with the selection and editing of the papers that are included in this volume. I would also like to thank to Hala Jawlakh and Bezza Ayalew for their assistance.
FOREWORD
The fourteen papers in this volume engage various issues in Arabic linguistics. The majority of the papers rely on quantitative methods to analyze data from corpora or data elicited from speakers using experimentally grounded methods. While most of the papers focus on Standard Arabic, some deal with spoken colloquial dialects from the Maghreb and the Gulf region. Section I includes five papers that deal with computational and corpus-based studies of Arabic. The topic of the paper by Bromberg is the relation between form and meaning. More precisely, the paper studies the correlation between the phonetic form of a word and its meaning in Arabic. Bromberg bases her study on the analysis of 1,000 words selected for their frequency from the Linguistic Data Consortium’s Agence France Press corpus. The author’s aim is to see whether there is predictable similarity along the semantic and phonetic dimensions. She claims that to a certain extent such correlation exists. Then she explores the psycho-linguistics implications of the study, particularly whether the observed systematic relation between form and meaning can facilitate acquisition. The paper by Fong describes the properties of PAPPI, a multilingual parser, as it is implemented to handle Arabic clause structure analyzed in the Principles and Parameters (P&P) framework. The author relies on assumptions and principles posited in the P&P framework to develop a parser that captures patterns that relate to clause structure, word order, agreement, placement of verbs, etc. There aren’t many parsers, in the public domain at least, that have been developed for Arabic using the P&P framework. Ghazali provides a large corpus study of the distribution of a number of Arabic words and grammatical particles. By looking at the collocation and colligation patterns, the author is able to demonstrate that some nearly synonymous words have different distributions depending on the words and expressions they co-occur with. This finding, based on extensive corpus investigations, shows the limits of current studies of the Arabic lexicon based on limited dictionary definitions. The results will be beneficial to researchers and teachers of Arabic alike.
x
FOREWORD
The paper by Rodrigues & Ćavar discusses a machine learning model of Arabic morphology. Assuming a root-based system for Arabic morphology, they developed an unsupervised constraint-based, statistical learning model that does not rely on the use of a dictionary, as previous models have done. The success rate of the model in learning the root system and deciding whether a consonant is part of the root system is 75%. The authors deal mainly with roots that contain three radicals. Smrž, Pajas, Žabokrtský, Hajič, Mírovský, & Němec outline the data structures of the Prague Arabic Dependecy Treebank (PADT), available from the Linguistics Data Consortium (LDC). The corpus contains annotated newswire Modern Standard Arabic texts based on the Arabic Gigaword (LDC). They illustrate the working of the corpus and its annotated data by focusing on searching for instances of Arabic improper annexation (part of the Construct State pattern), which involves a complex semantic relation between its members. Section II includes papers on Arabic phonology, morphology, and syntax. The paper by Ghazali, Hamdi & Knis takes up the issues of phonological and prosodic differences between various dialects from the western and eastern regions of the Arab world. For example, experimental studies of intonation patterns in Egyptian Arabic, Syrian Arabic, Iraqi Arabic, Moroccan Arabic, and Tunisian Arabic, reveal differences between the dialects. Thus, while the eastern dialects exhibit the declination phenomenon, Moroccan Arabic does not seem to display the same pattern. On the other hand, Iraqi Arabic displays a falling pattern (HL) and seems to be almost unique. These differences are, of course, on top of other differences such as vowel duration. Focusing on one Eastern dialect, Qatari Arabic, Mustafwi discusses affrication of the voiced velar stop. Departing from previous studies, she argues that affrication is confined to contexts where the voice velar stop is adjacent to high front vowels. One critical factor in the analysis concerns the fact that the affrication process is limited to the stem. Another interesting fact is that the process applies only within restricted paradigms. For example, it does not apply to broken plurals, verbs, and participles. The analysis is framed within the Optimality Theoretic framework and the author proposes a number of rankable constraints to derive the observed patterns in Qatari phonology.
FOREWORD
xi
The role of the root in Arabic morphology is explored by Mahfoudi from a psycholinguistic perspective. He revisits the longstanding debate regarding the nature of lexical relations in Arabic, whether they are root-based or stem-based. He also explores the role of the patterns in anchoring those relations. Based on experiments he comes to the conclusion that the root has a priming effect. This effect is not dependent on any shared orthography or meaning. The same claim cannot be made about the patterns (both sound and weak), which did not display the same priming effects. The results of the study are in line with recent studies that have argued for the psychological reality of the root with regard to the organization of the Arabic lexicon. The question that remains to be addressed is how these results can be reconciled with other studies that have shown that stems seem to play a role in establishing lexical relations. Ouali & Fortin add to the debate about clause structure in Arabic. The authors focus specifically on complex tenses in Moroccan Arabic and provide an analysis that is in the spirit of the P&P framework and its minimalist incarnation. They discuss the dependency relations that exist between tense and aspect in Moroccan Arabic, which they derive through a selectional relation between the two heads (and their projections). They also put forward an analysis whereby particles that elsewhere are analyzed as aspectual markers (such as the ka/ta morpheme) are claimed to realize the present tense. One innovative idea they advance is that complex tenses are biclausal, contrary to what has been argued before. Also working within the minimalist version of the P&P framework, Soltan deals with the ongoing debate about subject agreement and coordination in Arabic. The paper focuses mainly on Standard Arabic and tries to account for why agreement is with the first conjunct when the latter is in the postverbal position, i.e., in the VS order. This has been a matter of concern for a number of students of Arabic. The question is why agreement in the SV pattern is with whole conjunct. To derive the phenomenon of first conjunct agreement, the author argues that a simple analysis deploying the operation Agree, as understood in recent minimalist studies, can explain the facts if one assumes that in coordination the second conjunct is in fact an adjunct that is essentially not present at the point where the agreement relation is computed. This ensures first conjunct agreement. With regard to the
xii
FOREWORD
SV order, full agreement follows from the assumption that the real subject is a pronominal that is related to the preverbal conjunct, an assumption that, while not uncontroversial, has been advanced and adopted by a number of students of Arabic. Section III includes papers in Arabic sociolinguistics and acquisition of Arabic as a second language. Bahloul’s paper provides an analysis of the distribution of the phoneme /q/ and its many variants in eighteen Arabic dialects from the Maghreb to the Gulf. The data was collected based on questionnaires given to native speakers of the relevant dialects. In addition to the well-known variants of /q/, namely the voiced velar /g/ and the glottal stop, some eastern dialects of Arabic (but also some pockets of North Africa) display other variants such as /k/. Bahloul shows that within the same country and/or region two variants may have a distribution according to the urban/non-urban split. For example, in Syria the glottal stop variant is found in the major cities, such as Damascus, Hama, and Homs while /g/ is found in the rural areas. In fact, this split seems to hold across the eastern Mediterranean region, including lower Egypt. The author argues that the distribution of /q/ and its many variants leads to a linguistic map that divides the whole region into five main areas. Of course it remains to be seen how the distribution of /q/ and its variants line up with other linguistic features that distinguish Arabic dialects from each other. The issue of multilingualism in contexts where Arabic is spoken as a native language is the topic of Ennaji’s paper. The paper discusses the complexity of the linguistics situation in Morocco, where five languages compete for space: Moroccan Arabic, Classical Arabic, Standard Arabic, Berber, and French (and one can add English as well). Ennaji uses the term quadriglossia to refer to the situation with Arabic, echoing Ferguson who uses the term diglossia (for colloquial and standard Arabic) and Youssi who uses the term triglossia (which adds educated/middle Arabic to the mix). The four varieties according to the author include Classical Arabic, Standard Arabic, Educated Spoken Arabic and Moroccan Arabic. Given this complex linguistic space it is inevitable that competition would exist between the different media. Ennaji traces the history of the linguistic situation in Morocco and the cultural, social, and political dimensions and the factors that have played a role in marginalizing or strengthening a particular language or variety at the expense of the others. For example, Berber, the original
FOREWORD
xiii
language of the country and the region, has not been given the space that is commensurate with its history and demographic weight. However, Berber has recently been introduced in schools in Morocco and there seems to be a strong push to give it a more prominent role. The author provides a succinct but informative overview of the different facets of the debate and its history. Staying with the same linguistic context of Morocco, Sadiqi turns her attention to language and gender in the country. She focuses on the interplay between Arabic, French and Berber. Given the colonial history of the country, its high illiteracy rate, particularly among women, and its ethnic make-up, language naturally reflects how these issues relate to gender and how language use evolves with the changing position of women in the Moroccan society. Thus, according to the author, Standard Arabic used to be predominantly a male language, partly due to the fact that it is closely associated with the Islamic faith whose leadership and public figures used to be exclusively males. However, this situation has started to change with the increasing prominence of women who write in Standard Arabic and use it to engage in religious debates. Such use of Standard Arabic is seen as a form of empowerment for women who have reclaimed Standard Arabic as a vehicle for their own discourse. With respect to Berber, the author argues that women played an important role in maintaining the language and, with it, Berber identity. The author goes as far as to put forward the thesis that the fate of Berber parallels the fate of women in Morocco. More rights for women in Morocco have also been accompanied by more rights for Berber. On the other hand, Moroccan Arabic is less associated with a specific gender but French is used by women to reflect their social prestige while males use it to assert their economic and political positions. Though the issues are complex given the many factors at play in a society where gender issues play a major role in the social system, it is to be expected that language would reflect those dynamics. Acquisition of Arabic as a second language is the main topic of Alhawary’s paper. The paper focuses particularly on the status of the null subject (sentences without overt subjects) in the language of Arabic learners whose first language is Spanish or English. Spanish is a null subject language while English is not. Moreover, like Arabic, Spanish drops the subject because it can be retrieved from agreement
xiv
FOREWORD
on the verb. Thus, there is a tight connection between the presence of null subjects and agreement. The paper seeks to investigate the distribution of the null subject in the production of Arabic data by native Spanish and American English speakers, and the issue of transfer, i.e., whether native language use of the subjects carries over to Arabic, and the connection between the development of the null subject and agreement inflection in their Arabic data. The subjects of the study are all college students with no prior exposure to Arabic in high school or at home. One interesting finding of the experimental study is the early presence of null subjects in the Arabic data of native English speakers relative to native Spanish speakers, which is surprising since the latter have null subjects in their own native language. Moreover, English speakers also acquired the use of agreement inflection early, which shows the link between null subjects and agreement. The Spanish speakers did “improve” eventually but the differences in the production data for beginners is quite striking. Elabbas Benmamoun University of Illinois, Urbana September 2007
Section I
Computational and Corpus Linguistics
SYSTEMATICITY IN THE ARABIC MENTAL LEXICON
Ilana Bromberg The Ohio State University
1. Introduction 1.1 L’Arbitraire du signe The relationship between the form of a word and its meaning is arbitrary. This concept, first espoused by Ferdinand de Saussure (1916), states that there is nothing inherent about a worldly object or event that links it to the name it is given in any particular language. Nor is there any inherent property about the sounds of a word that cause it to be linked to a specific meaning. It is this property of languages that allows humans to create new names for new ideas and objects, and it is the reason that a single object has different names in different languages or many names in a single language. 1.2 Deviations from Saussure However, there are phenomena in natural language that cause one to question the complete arbitrariness of this relationship. For instance, most languages have some words that can be classified as onomatopoetic, such that they are meant to exactly elicit the sound that they represent. In English, some onomatopoetic words are “drip”, “splash”, “bang”, and “beep”. Similarly, the words we attribute to animal noises are, in a way, onomatopoetic, and many of these words sound similar across languages. In these cases, there is an iconic relationship between form and meaning. A non-iconic but systematic relationship between form and meaning exists among sounds identified as phonaesthemes. Studied at length by Margaret Magnus (2000), phonaesthemes are sounds that
4
ILANA BROMBERG
recur in words that fall within the same semantic field. For instance, the words “glimmer”, “glow”, and “gleam” all begin with the [gl] phonetic cluster, and semantically, they all appeal to the notion of light. Statistical analysis has shown that this cluster, and similarly for other such clusters, appears significantly more often in words having to do with light or vision than in words that do not. Furthermore, a study by Benjamin Bergen (2004) has shown that these groupings are a part of the linguistic repertoire that can be accessed productively by a native speaker. A speaker will be likely to use such a salient cluster in defining a meaning for a new word or in creating a new word for a given meaning. Given these deviations from Saussure’s hypothesis, what can we say about the arbitrariness of the relationship between form and meaning over the whole lexicon? Do the effects of onomatopoeia and phonaesthemes create a significant, perceptible correlation if we look at all words in the lexicon or are these simply isolated examples of lexically limited phenomena? This study shows that there is indeed a systematic relationship between form and meaning over a subset of the Arabic lexicon, specifically a set of one thousand highly frequent, non-morphologically related words. Pairs of words are compared for both semantic and phonetic similarity, in a manner similar to that pioneered in Shillcock et al. (forthcoming). The process for determining semantic similarities is described in section 2, and for phonetic similarities in section 3. Section 4 shows the results of the comparison and includes a discussion of the possible psycholinguistic correlates to this research. 2. Determining Semantic Similarity 2.1 Meaning from context “You shall know a word by the company it keeps” (Firth 1957). In the spirit of this well-known claim, the semantic similarity of two words shall be, in this research, defined by the number of words they have in common among their various contexts. For example, if one were to search for the words “rain” and “storm” in a corpus of written English text, one would be likely to find the following words nearby each of them: “water”, “flood”, “weather”, “temperature”, “snow”, and “wind”. Because “rain” and “storm” occur in similar contexts, they are considered semantically similar. This definition of semantic similarity
SYSTEMATICITY IN THE ARABIC LEXICON
5
is often used in the natural language processing literature (Landauer 1997, Rohde et al. forthcoming). In order to measure the similarity between two words, one measures the degree of similarity between the cumulative contexts of each of the target words. The “context” can be defined as, for example, several words surrounding the target word, the paragraph in which it is found, or the entire document in which it is found. In this way, words can be divided into groups comprising different semantic fields (Cutting et al. 1992). This definition of semantic similarity has some limitations. In general, syntax and morphology do not enter into the description. One could initially tag words for part of speech or some other feature, but this will not naturally fall out of the semantic groupings. Along the same lines, the rating of similarity between two words will say nothing about the actual relationship between the words, i.e., whether they are synonyms or antonyms, have a part-whole relationship, etc. Furthermore, the semantic fields of the words will not be determined automatically. However, if the particular algorithm in use has worked correctly, a human should be able to categorize any of the automatically grouped words into a semantic field fairly easily. These limitations do not affect the outcome of this research, as I am currently only interested in the semantic similarities between pairs of words, not their relationship or semantic field. 2.2 Corpus analysis The first step in determining semantic similarity is to obtain a corpus. I have used in this study the Linguistic Data Consortium’s distribution of the Arabic Gigaword Corpus, a resource comprising four Arabic newspapers. I make use of only one of these newspapers, specifically Agence France Presse, which includes over 100 million words. The newspaper is written in Modern Standard Arabic, and the documentation states that there may be some regional dialect disparity due to the international scope of the services. The next step is to process the corpus to extract only the information that is needed, namely, each word and the position in which it occurs. To do this, I use the Buckwalter Morphological Analyzer, henceforth BMA (Buckwalter 2002). The job of the BMA is to produce a complete morphological analysis for each non-vocalized Arabic word shown to it. It does this by first splitting the orthographical form into
6
ILANA BROMBERG
every numerically possible combination of prefix, stem, and suffix, with different vocalizations. By numerical, I mean that a prefix may have from 0 to 4 letters, a stem from 1 to infinite letters, etc. Each combination of morphemes that results in an actual Arabic word, as determined by combinations dictionaries included in the BMA, is output as a possible solution to the given input word. A solution includes vocalization, morphological decomposition, part of speech, gloss, and lemma. The BMA may produce many solutions for a given input. No formal part of speech analysis is done, i.e., there is no analysis of the syntax of the sentence; therefore, there is no ranking of the given solutions. Rather than introduce the additional step of part of speech tagging, I assume a priori that each of the BMA solutions has equal probability of being the correct solution. Thus, using BMA, I collect all possible solutions for each position in the corpus, recording both the numerical corpus position and the probability of each solution being the correct one. At this point, the target and context lists are produced. The target list includes those words that will be analyzed for phonetic and semantic similarity, and the context list contains the words that will define the semantic relatedness of the target words. Only nouns, verbs, adjectives, and adverbs are included in each list. Furthermore, all words in the lists are derived from a different root, as defined by the BMA, which includes root information in the stem dictionary. In Arabic, if two words are derived from the same root, then they are morphologically related and thus automatically have similarities in both form and meaning. The goal of the study, then, must be to see if the formmeaning relationship exists between words derived from different roots.1 The target list comprises the one thousand most frequent words in the corpus, taking into account the probabilities discussed in the previous paragraph. The context list includes the next six hundred most frequent words in the corpus. A series of empty semantic vectors are created, such that each vector represents a target word defined by the same six hundred context words. This may be visualized as a matrix in which there are one thousand rows, one for each target word, and six hundred columns, 1
Note that this study does not directly assess the relationship of form and meaning between roots per se, as the distribution of one word derived from a specific root will not reflect the total distribution of all words derived from that root.
SYSTEMATICITY IN THE ARABIC LEXICON
7
each representing a context word. If context word i is found within the vicinity of target word j, then cell [i,j] is incremented by one. To fill in these semantic vectors, I step through the corpus in its morphologically analyzed form to look for target words and corresponding context words. Each time a target word is found (in the exact form in which it is found in the target list), I look at a window of ten words before and ten words after that target word. Within this window, I look for any instance of a context word and increment the target word’s semantic vector accordingly. A window of ten words on each side of the target word is used for two reasons: 1) This method is simpler, practically and intuitively, than using a window consisting of the entire document, and 2) this method is the one used in the research that this particular project is aiming to reproduce, and thus it is desirable that the results be as comparable as possible. As the process progresses, the one thousand semantic vectors are simultaneously filled in with values reflecting occurrences of the same six hundred context words. When this process has been completed, two calculations take place: the first is a smoothing calculation, the second, spatial comparison of the vectors. Logarithmic smoothing of the context word counts is performed in order to emphasize the first few appearances of a context word over the repeated appearance of the same word. In other words, the first appearance of a context word in the vicinity of a target word is more important than any subsequent appearance. A smoothing calculation of 1+log(count) accomplishes this. 2.3 Vector analysis Vector comparison is done through the calculation of cosine distance, as is standard in computational literature for comparing the similarity of numeric vectors. The formula for cosine distance is: n
r r cos( x , y ) =
!x y i =0
n
i
i
!x !y i =0
(Manning & Schütze 1999)
n 2 i
i =0
2 i
where x and y are vectors, n is the number of elements in the vector, and xi and yi are elements in x and y, respectively. In order to interpret
8
ILANA BROMBERG
the resulting figure as a distance measure, I subtract the result from 1. Thus, two vectors with a distance of 1 are considered maximally far apart (semantically dissimilar), whereas two vectors with a distance close to 0 are considered close together (semantically similar). Each of the one thousand semantic vectors is compared to every other vector, resulting in one million distance measures. Since distance(A,B) = distance(B,A), half of the measures are discarded. Also, we know that distance(A,A) = 0, and since including one thousand measures of 0 in the form-meaning comparison would skew the results, these comparisons are also left out. I am thus left with 499,000 semantic distance measures between the one thousand target words, ranging between just above 0 for pairs of very similar words, to about .6 for pairs of semantically dissimilar words. 3. Determining Phonetic Similarity 3.1 Feature analysis The phonetic comparison of the target words is undertaken through a phoneme-by-phoneme comparison, which, in Arabic, basically amounts to a letter-by-letter comparison. The phonetic distances between isolated Arabic sounds are calculated, and then the words are compared as strings of phonemes using Levenshtein distance. The letters of the Arabic script are essentially phonemic; that is, each letter stands for exactly one sound. The one major exception to this statement is the process of pharyngealization, through which pharyngeal, or emphatic, consonants tend to cause certain other consonants or vowels within the same word to also be pronounced as pharyngeal. I have not as yet taken this process into account; however, planned future work on this research most certainly will. Each phoneme is characterized by a 28-feature phonetic vector. The features in each vector are typical descriptive features such as consonantal, voiced, strident, lateral, etc. The initial values are taken from Bruce Hayes’ FeaturePad, which includes values for each feature for every English phoneme. Extra vectors were added to cover the Arabic phoneme set, as sounds such as the emphatic consonants were not included in the initial set. The values for these sounds were collected by gathering pronunciation information from a study by El-Imam (2001), descriptive information from the IPA chart, and articulatory information from an elementary Arabic textbook (Brustad 2001), such that I
SYSTEMATICITY IN THE ARABIC LEXICON
9
was able to encode the features of all of the Arabic sounds in a manner analogous to the sounds already encoded. As such, each phonetic vector has 28 features, each with a value of -1, 0 (feature does not apply), or 1, and no two vectors are alike. With the information encoded in this manner, the phonetic distance between the vectors can be calculated using Manhattan distance, as suggested by Nerbonne & Heeringa (1997). Two sounds with very different feature values, such as /a/ and /z/, have a large distance between them, whereas similar phonemes such as /m/ and /n/ have short distances separating them. These distances come into play in the word-to-word distances, the calculation of which is described next. A second reference for describing Arabic phonemes in terms of features originates from the work of Sami Boudelaa and his colleagues. In this case, each phoneme is described by only 15 features, each feature having a value of either 1 or 0. The features used are similar to those in the set described above; however, the smaller number of features leads to less redundancy in the data, which is desirable. Manhattan distances for the phonemes described in this manner were also calculated, and I will describe the results based on each of these phoneme description sets separately in section 4.1. 3.2 Word comparison The phonetic makeup of a word is defined by the individual phonemes that comprise that word. Thus, we can refer to a word as a phoneme string. Phoneme strings can be compared for similarities by simply comparing how many phonemes they have in common. A more sophisticated comparison takes into account the positions in which the phonemes are found, as well as differing degrees of similarity between the phonemes themselves. Compare, for instance, the following pairs of words: a. plus - plush b. corn - born c. plug - gulp
Pair (a) is very similar; they differ by only one sound, /s/ vs. /2/, and in fact, these two sounds are themselves very similar, so very little phonetic difference exists between the two words. Pair (b) might be
10
ILANA BROMBERG
considered slightly less similar than the previous pair. While they only differ by one phoneme, the difference between /k/ and /b/ is greater than the difference between /s/ and /2/, at least in terms of the features discussed above, and perhaps intuitively as well. The third pair of words share all of the same phonemes; however, since they are arranged differently, the phonetic similarity between the two phoneme strings is reduced. An algorithm known as Levenshtein Distance is used to calculate the distances between phoneme strings by taking into account the phonemes present in each string, the order in which those phonemes appear, and the relative phonetic distances between the phonemes themselves, as calculated above (Sankoff 1999). The algorithm works by transforming a “source” word into a “object” word through a series of insertions, deletions, and substitutions. Each of these operations changes exactly one phoneme of the source word to make it more like the object word. For instance, we can transform the source word “pant” into the object word “arts” by performing one of each operation: p a p a a a Table 1.
n n n r
t t t t
s s s
Insertion of s Deletion of p Substitution of n by r
Furthermore, a cost is assigned to each of these operations, so that the overall transformation can be compared to other transformations. Most importantly, the cost of a substitution is equal to the distance determined by the feature comparison previously described. Thus, changing an /a/ to a /z/ will cost more than changing an /m/ to an /n/. Insertions and deletions all have the same cost, and the only requirement on that cost is that the sum of a deletion and an insertion must always cost more than a substitution, such that a substitution will be preferred by the algorithm. In terms of phoneme order, the Levenshtein distance algorithm is a comprehensive algorithm, such that it calculates the cost of transformation of every alignment of the two words (without rearranging phonemes within the word). As such, the cost of each of the alignments in Table 2 are calculated:
SYSTEMATICITY IN THE ARABIC LEXICON
p a n t a r t s p a n t a r t p a n t a r p a n t a p a a r p a a r t p a a r t s p a a r t s Table 2.
11
s t
s
r n t n s n
t s t s t t
n t
The least cost alignment is always chosen as representative of the phonetic distance between the pair. Thus, the phonetic distance between the words in pair (a) above will be equal to the cost of substituting /s/ with /2/, as this is the only operation necessary to transform “plus” into “plush”. Similarly, only one substitution is necessary to change “corn” into “born”, but since the distance between /k/ and /b/ is greater than the distance between /s/ and /2/, the cost of the transformation of pair (b) will be greater than that of pair (a). As for pair (c), the following operations will take place to complete the transformation: p l
u
g
l g g g Table 3.
u u u u
g g l l p
Deletion of p Substitution of l by g Substitution of g by l Insertion of p
Even though the two words contain the same phonemes, the words will be considered phonetically more distant than the previous two
12
ILANA BROMBERG
pairs. Levenshtein distance does not allow the phonemes within the word to be transposed, so each of these steps is necessary to transform “plug” into “gulp”. The cost of transformation of pair (c) will be greater than that of pair (a) or (b) due to the additional operations. In this way, every pair of words in the target list is compared phonetically. For the same 499,000 pairs compared semantically in section 2, a phonetic distance score between 0 and 124.5 is calculated. Again, a word is not compared to itself, so the lowest score is just above 0 for two words that are maximally similar and about 124.5 for two words that are phonetically maximally distant.2 4. Results and Discussion 4.1 Arabic is systematic Having calculated the semantic and phonetic similarities between the same group of high frequency Arabic words, the final step is to calculate how these two measures compare. As can be seen in Figure 1, there is a positive correlation between the two measures. That is, as the semantic similarities between words increase, so does the phonetic similarity. Alternatively, as phonetic similarities increase, so do semantic similarities. Note that the correlation between the two measures is small; this is expected. A large correlation might lead us to believe that there existed something very close to a sound-meaning correspondence as described in the introduction. As it stands, we find only a small correlation, but a significant one. For a data set this large, the test for statistical significance falls outside the efficacy of the χ-squared or other typical significance tests. Instead, I use a method developed by Shillcock et al. (forthcoming) and described in Cohen (1995) as appropriate for this type of data. I begin with my original matrix of semantic distance scores, one score for each pair of words. I then take my phonetic distance matrix and randomize the scores representing the phonetic distance between pairs. Next, I calculate the correlation between the original semantic matrix and randomized phonetic matrix. After completing this calculation a number of times with differently randomized phonetic matrices, I 2
This figure is not a predetermined maximum—simply the largest distance between two words in the current data set.
SYSTEMATICITY IN THE ARABIC LEXICON
13
create a curve representing the likely outcomes of these randomized experiments. Placing the actual experimental results on this curve shows that the reported correlation of actual data is statistically significant, in that is a very distant outlier on the curve of randomized results. The correlation coefficient of 0.1418 is more than 100 standard deviations away from the average randomized score.
Figure 1. Semantic Distances related to Phonetic Distances based on a 28-phonetic feature paradigm. 1000 random data points. Slope of correlation = 18.85, coefficient of correlation = 0.1418
A second test shows that the relationship between form and meaning remains when the phonetic similarities are judged using less redundant feature specification data. The results in Figure 2 were obtained by testing the same semantic similarity data against the phonetic similarity data derived from the information in Boudelaa’s phonetic feature matrix. Again, the correlation coefficient is small but statistically significant.
14
ILANA BROMBERG
4.2 Is systematicity specific to Arabic? Thus, the hypothesis that a form-meaning correlation exists has been proven correct. Is this phenomenon specific to Arabic? In fact, similar results were found by researchers conducting a similar study in English. However, some may say that Arabic is predisposed to such systematicity on a morphological level. This study ruled out the most well-known kind of morphological similarities in Arabic, those being the similarities that arise in words derived from the same root. However, a hypothesis espoused by Georges Bohas (1997) states that every Arabic root is itself derived from a more abstract form, what he calls an étymon. An étymon is an unordered biconsonantal pair that comprises some semantic field, such that any three- or four-letter root derived from this étymon has meanings associated with that semantic field. If this is true, and Bohas makes a very convincing case, then the form-meaning correlation in Arabic is indeed built into its morphology.
Figure 2. Semantic Distances related to Phonetic Distances based on a 15-phonetic feature paradigm. 1000 random data points. Slope of correlation = 6.1768, coefficient of correlation = 0.1419
SYSTEMATICITY IN THE ARABIC LEXICON
15
4.3 Psycholinguistic correlates Aside from morphological interest, what does this correlation, or systematicity, mean? One main point of interest is whether this systematicity is actually encoded in the brain. If we could test for this systematicity in the mental lexicon, would we find that it is a productive aspect of natural language? If so, what else might we find? Sound-meaning systematicity, if it exists in the mental lexicon, may aid in retrieval of two kinds. First, upon hearing a new or rarely used word, a listener may be able to utilize the general form of the word to relate it to other, more familiar, words that fall into the same formmeaning grouping, if one exists. This might let the hearer interpret the unknown word’s general connotation, if not its denotation. I hypothesize that this would be a fallback retrieval method for words which the hearer cannot analyze successfully using morphology, etymology, or context. Or, perhaps systematicity works along with, rather than subsequent to, these analyses. Systematicity may also be an aid in the opposite type of retrieval; in the “tip-of-the-tongue” phenomenon, a speaker has a particular meaning in mind but cannot come up with the appropriate word right away, even if it is a word that the speaker knows and has previously used. In this case, the speaker might appeal to systematicity to retrieve words that are akin to the sought out word in meaning and also in form, thereby leading to a simultaneous phonological and semantic priming (e.g., Boudelaa 2004). Along these lines, one might think about systematicity as an aid to first language learning. A child may appeal to form-meaning correspondences when learning names for new objects in her world or new concepts. If these hypotheses are correct, then systematicity must be a built-in characteristic of language, one that exists to aid the learner, the hearer, and the speaker. We would then expect to find a very similar formmeaning correlation existing cross linguistically. Or, perhaps the amount of systematicity within a given language will be in direct correlation with learners’ perceived degree of ease of learning it as a second language; that is, languages with a high degree of form-meaning correlation should be easier to learn than those without this correlation. Also, if systematicity is an aid for retrieving the form or meaning of rarely used words, we might expect to find a higher degree of correlation among these words. Indeed, this was the effect found in the similar
16
ILANA BROMBERG
systematicity study conducted phonological information.
using
an
English
corpus
and
5. Further Work Before testing this theory on other languages, I plan to pursue this line of research more closely in Arabic. My most immediate step will be to run this same study, searching for the least common words in the corpus. The lack of overlap in context words among this new set of target words is an empirical problem to be overcome; hence it has not been included in this initial study. I am also interested in seeing if a significant difference in systematicity exists between derived and nonderived nouns in Arabic, as they have a different distribution in the lexicon. I predict that non-derived nouns, which hypothetically occur less frequently overall (due to the fact that derived nouns also occur in verb form), will show more systematicity. Some planned changes to the research itself include using a different morphological analysis in the semantic comparisons, such that there is less uncertainty in predicting what is the correct full form and analysis of each word presented in the newspaper text. This may be done through part of speech tagging to rank each of the choices delivered by the BMA, or by using a different tool entirely, such as Mona Diab’s Support Vector Machine analyzer (2004). Furthermore, I intend to experiment with other methods of phonetic analysis, including acoustic analyses of recorded data. Hopefully, these changes will refine the results so that they will be more reliable when comparing to data from other languages.
REFERENCES Arabic Gigaword. Linguistic Data Consortium University of Pennsylvania, 2003. LDC Catalog No.: LDC2003T12. Bergen, Benjamin K. 2004. “The Psychological Reality of Phonaesthemes”. Language 80:2.290-311. Bohas, Georges. 1997. Matrices, étymons, racines, éléments d’une théorie lexicologique du vocabulaire arabe. Louvain: Paris. Boudelaa, Sami & William D. Marslen-Wilson. 2004. “Abstract Morphemes and Lexical Representation: The CV-skeleton in Arabic”. Cognition 92.271-303.
SYSTEMATICITY IN THE ARABIC LEXICON
17
Brustad, Kristen, Mahmoud Al-Batal & Abbas Al-Tonsi. 1995. Alif-Baa: Introduction to Arabic letters and sounds. Washington, D.C.: Georgetown University Press. Buckwalter, Tim. 2002. Buckwalter Arabic Morphological Analyzer Version 1.0. Linguistic Data Consortium, University of Pennsylvania, 2002. LDC Catalog No.: LDC2002L49. Cohen, Paul R. 1995. “Empirical Methods for Artificial Intelligence”. Cambridge, Mass.: MIT Press. Cutting, Douglass, David Karger, Jan Pederson, & John W. Tukey. 1992. “Scatter/Gather: A cluster-based approach to browsing large document collections”. Proceedings of the 15th Annual ACM/SIGIR Conference: Copenhagen. Diab, Mona. 2004. “Relieving the Data Acquisition Bottleneck for Word Sense Disambiguation”. Proceedings of ACL 2004. El-Imam, Yousif A. 2001. “Synthesis of Arabic from Short Sound Clusters”. Computer Speech and Language 15. 355-380. Firth, John Rupert.1957. “Modes of Meaning”. Papers in Linguistics 1934-1951. Oxford: Oxford University Press. 190-215. Hayes, Bruce. 2004. FeaturePad. Downloaded from http://www.linguistics. ucla.edu/people/hayes/120a/FeaturePad.htm, Feb 3 2005. Landauer, Thomas K. & Susan T. Dumais. 1997. “A Solution to Plato’s Problem: The latent semantic analysis theory of acquisition, induction, and representation of knowledge”. Psychological Review 104:2. 211-240. Magnus,Margaret. 2000. What’s in a Word? Evidence for phonosemantics. Ph.D. dissertation, University of Trondheim. Manning, Christopher D. & Hinrich Schütze. 1999. “Foundations of Statistical Natural Language Processing”. Cambridge: Massachusetts Institute of Technology. Nerbonne, John & Wilbert Heeringa. 1997. “Measuring Dialect Distance Phonetically”. Proceedings of the Third Meeting of the ACL Special Interest Group in Computational Phonology (SIGPHON-97). Rohde, Douglas L. T., Laura M. Gonnerman, & David C. Plaut. Forthcoming. “An Improved Method for Deriving Word Meaning from Lexical CoOccurrence”. Sankoff, David & Joseph Kruskal. 1999. Time Warps, String Edits, and Macromolecules: The theory and practice of sequence comparison. CSLI Publications. Saussure, Ferdinand de. 1916. Course in General Linguistics. Charles Bally & Albert Sechehaye, eds. Wade Baskin trans. New York: Philosophical Library. Shillcock, Richard, Simon Kirby, Scott MacDonald & Chris Brew. Forthcoming. “Systematicity in the Mental Lexicon”.
ARABIC PAPPI A PRINCIPLES AND PARAMETERS PARSER
Sandiway Fong1 University of Arizona
1. Introduction PAPPI is a freely-available and extensible multilingual parsing engine in the Principles-and-Parameters (P&P) framework (Chomsky 1981).2 At the heart of PAPPI is a core engine written in Prolog, a Horn clause logic-based programming language originally designed for natural language processing (Colmerauer et al. 1973), containing both a module for the recovery of phrase structure and a set of linguisticallymotivated primitives for the expression of structural constraints imposed by linguistic theory. The basic system described in Fong (1991) implements classic Government and Binding (GB) theory for English (Lasnik & Uriagereka 1988). Following the central working hypothesis in the P&P framework, i.e. that there is a core theory or set of basic principles broadly applicable across languages, the system is designed to operate with a single parsing engine implementing GB theory, while at the same time accommodating implementations of different languages under systematic parameterization. More concretely, in addition to English, PAPPI implementations exist for SVO languages like Chinese (Lin 1
The author would like to thank Abdelkader Fassi Fehri of IERA (Mohammed V University, Rabat, Morocco) for supplying the original impetus for the Arabicspecific implementation described here. 2 PAPPI is available for the MacOS X and Linux software platforms at http://dingo. sbs.arizona.edu/~sandiway/pappi/.
20
SANDIWAY FONG
1997) and French, V2 (verb-second) languages like Dutch, and SOV (head-final) languages such as Turkish (Birtürk 1998) and Japanese (Fong 2001). Although each language extends PAPPI in a different direction, all of the implementations share a common core theory. In software engineering terms, this advantage of the P&P approach results in a reduction in the cost and time required for initial parser construction (provided, of course, that the theory being implemented for a particular language is broadly compatible with extant implementations). The reuse of core linguistic principles across languages is not only an efficient use of software resources, but also serves to reinforce or confirm the theoretical status of the principles as being broadly applicable across languages. The goal of this paper is to continue this theme of re-use, and show how PAPPI can be adapted to handle aspects of Arabic syntax unexplored in any other PAPPI language implementation to date. In particular, this paper highlights the implementation of VSO/SVO word order and concomitant verbal agreement phenomena in Arabic. 2. PAPPI Architecture The PAPPI parser is organized around a series of four software layers shown in Figure 1 below. Lexicon
Parameters Periphery
PS Rules
Principles
Programming Language
Compilation Stage
LR(1) Type Chain Tree Inf.
Figure 1. PAPPI Architecture
2.1 The language-particular level At the topmost level, a lexicon, parameter settings and a periphery file must be defined for each language. In this section, we will review
ARABIC PAPPI
21
the lexicon and parameter settings used for Arabic. The periphery file holds language-particular rules not directly derivable from core principles. The AGR criterion, to be discussed later in this paper, is an example of a rule or principle specific to the Arabic implementation and not already present in the core system. 2.1.1 The lexicon The lexicon is a list of the ‘words’ recognized by the parser together with category and syntactic features that together drive phrase structure recovery and principle applicability. Depending on the size and complexity of the particular implementation, words may be morphologically decomposed via simple concatenative rules also stated in the lexicon, e.g. see the implementation of Turkish morphology in Birtürk (1998). 3 For example, (1), adapted from (Fassi Fehri 1993:2.5) has the PAPPI lexical entries given in (2). (1) kataba r-rajul-u r-risaalat-a wrote the-man-NOM the-letter-ACC “the man wrote the letter” (2) a. lex(kataba, v, [morph(write,past(+)),grid([agent],[theme])]). b. lex(r, mrkr, [right(n,[],[prefix(r),def(+)])]). c. lex(rajul, n, [a(–),p(–),agr([3,sg,m]),case(_),theta(_)]). d. lex(u, mrkr, [left(n,[],[suffix(nom),morphC(nom)])]). e. lex(risaalat, n, [a(–),p(–),agr([3,sg,n]),case(_),theta(_)]). f. lex(a, mrkr, [left(n,[],[suffix(acc),morphC(acc)])]).
Each entry representing a word form or morpheme is defined using the Prolog predicate lex of arity 3, i.e. lex takes 3 arguments (written as lex/3), with template given in (3). (3) lex(W,C,L). W = word or morpheme C = category label L = list of features (comma-separated list delimited by square brackets)
3
Currently, there is no template-style morphology implementation, a feature that would be useful in describing Semitic languages in PAPPI. An external morphological analyzer can also be interfaced to PAPPI.
22
SANDIWAY FONG
The verb form kataba in (2a) has the feature morph(write,past(+)), indicating that it is the past tense form of write in Arabic, and also the feature grid([agent],[theme]), indicating to the parser that it has a thematic grid with external role agent and an internal role theme that must be instantiated in syntactic structure. The thematic grid will also drive theta role assignment in the parser. Nouns rajul and risaalat, in (2c) and (2e), respectively, each have Binding theory features a(–) and p(–) indicating that they are nonanaphoric and non-pronominal. The values of these two features informs the parser that (common) nouns like rajul and risaalat are not subject to Binding conditions A and B. (However, [a(–),p(–)] marks them as referential expressions, and hence they are subject to Binding condition C.) The feature agr([P,N,G]) encodes person (P), number (N) and gender (G) agreement features for the two nouns. Finally, the case(_) and theta(_) features are initially unvalued, as indicated by the Prolog anonymous variable underscore “_”. In the course of parsing, these slots will be instantiated by parser operations implementing Case and Theta principles. The lexical entries for the morphemes r-, -u and -a are given in (2b), (2d) and (2f), respectively. These morphemes are encoded as “markers”, indicated by the distinguished category label mrkr. PAPPI markers are lexical entries that do not project syntactic structure; instead, they are resolved as syntactic features attached to adjacent lexical items. For example, the definite determiner prefix r- has feature right(n,[],[prefix(r),def(+)]). The feature right/3 has template (4). (4) right(C,L1,L2) C = category to be matched to the right L1 = list of match features (Note: [] represents the empty list, i.e. nothing to match) L2 = list of add features
This means that the morpheme r- does not project in this implementation; it simply finds the noun to its right, adds features prefix(r) and def(+), and disappears. In the case of (1), rajul will be the recipient of these two features. The Case markers -u and -a, given in (2d) and (2f), operate in similar fashion to r-, i.e. they do not project; instead they mark the
ARABIC PAPPI
23
noun to their left with syntactic features. In particular, the noun to the left will receive a morphC (morphological Case) feature instantiated to values nom (nominative) and acc (accusative), respectively. Case markers are already present in the PAPPI system for of-insertion in English, and the Japanese case particle system. For Arabic, we simply make use of the pre-existing code. Case theory will check the value of morphC against the value assigned to the abstract case(_) feature during parsing. 2.1.2 The parameters The core theory includes modules dealing with phrase structure, head and phrasal movement, Case, Theta and Binding theory. With respect to word order, Arabic is implemented as a SVO language; in Xbar theoretic terms, phrases are specified as uniformly head-initial and specifier-initial, as with English. (We will describe how VSO word order is implemented in this framework in section 2.2.) For head movement in the extended verbal complex, Arabic is specified as having ‘strong’ agreement (unlike English) in the sense of Pollock (1989). Other parameters in the system govern the introduction and licensing of empty categories, e.g. whether empty expletives are allowed and the binary-valued pro-drop parameter; also, by means of a wh-in-syntax parameter, whether wh-movement is overt or covert. 2.2 The core theory level As Figure 1 indicates, below the language-particular layer, i.e. the lexicon, parameter settings and periphery file, lies the core set of principles common to all implementations. This core level is logically divided into a phrase structure component, based on X-bar syntax and the effects of (overt) movement, and a set of principles or constraints implementing various modules of GB theory. In this section, we briefly outline these two components and their parameterization with respect to the Arabic implementation. 2.2.1 X-bar syntax The parser recovers X-bar phrase structure, factoring in for the effects of overt phrasal and head movement plus word order. The X-bar rule template system shown in (5–6) is part of the core theory common to all implementations.
24
SANDIWAY FONG
(5) a. rule XP -> [XB|spec(XB)] ordered specFinal st max(XP)proj(XB,XP). b. rule XB -> [X|compl(X)] ordered headInitial(X) st bar(XB), proj(X,XB), head(X). (6) a. head(n). head(v). head(a). head(p). head(i). head(c). head(neg). b. bar(n1). bar(v1). bar(a1). bar(p1). bar(i1). bar(c1). bar(neg1). bar(v2). c. max(np). max(vp). max(ap). max(pp). max(ip). max(cp). max(negp). d. proj(n,n1). proj(v,v1). proj(a,a1). etc. e. proj(n1,np). proj(v1,vp). proj(a1,ap). etc.
(5a) specifies that XP, as instantiated through the category labels defined by max/1 in (6c), expands to the constituent ordering XB followed by spec(XB) provided specFinal is true, i.e. when the language is specifier-final. In the case of Arabic and English parameterization, specFinal is set so that it does not hold, and consequently, XP expands to the reverse or ‘flipped’ order, namely spec(XB) followed by XB. The single-bar category label XB is defined via proj(XB,XP) as instantiated in (6e). spec(XP) is parameterized according to category. For example, spec(i1) can be defined to be np (noun phrase), indicating that inflection has a specifier, i.e. the surface subject position. Similarly, spec(v1) can be specified as null (subjects are first merged at specifier-IP) or np, i.e. first merged at specifier-VP assuming the VPinternal subject hypothesis. Similar considerations apply to the expansion of single-bar level XB to a head X followed by compl(X) in (5b). The complement relation compl(X), as given in (7), encodes categorical selection in standard PAPPI for the extended verbal projection VP–(NegP)–IP–CP. (This ordering will be revised in section 3.) Given the phrase structure definitions in (5)–(7) instantiated for Arabic, and factoring in head movement in the verbal complex (to be described below), we can recover the corresponding Arabic phrase structure for example (1) as shown in Figure (2) below. 2.2.2 Verbal head movement Note that in Figure 2, the surface word order is VSO. This is achieved from an underlying SVO word order by a combination of the nominative Case-marked subject r-rajul-u being merged in specifierVP, i.e. we assume the VP-internal subject hypothesis, and the inflected verb kataba raising to head-adjoin to inflection from its underlying position (marked by the trace Vt left by head movement).
ARABIC PAPPI
25
(7) compl(i,vp). compl(i,negp). compl(neg,vp). compl(c,ip).
Figure 2. Phrase structure for example (1)
implements the core Pollock-style verbal head movement system as defined by the rules in (8).4 In the case of Arabic, we can take advantage of the parameter setting agr(strong) already implemented for Romance verbs. Hence, verbs will obligatorily raise to inflection, thereby completing the VSO word order. PAPPI
(8) a. b. c. d.
rule v(V) moves_to i provided agr(strong), finite(V). rule v(V) moves_to i provided agr(weak), V has_feature aux.
rule i(I) moves_to v(V) provided agr(weak), \+ V has_feature aux. rule i(I) moves_to v(V) provided agr(strong), \+ finite(V).
The verbal complex may also contain elements such as pronominal clitics. For example, in (9) (= (Fassi Fehri 1993:2.6)) the 1st person pronominal clitic –tu is attached to the matrix verb /arad (wanted). (9) 'arad-tu 'an y-uqaabil-a r-rajul-u l-mudiir-a wanted-I that 3-meet-subj the-man-nom the-director-acc
For Arabic, we co-opt the mechanism already defined for the Romance pronominal clitic system: in (10) –tu is specified as a PF clitic marker possessing pronominal Binding features [a(–),p(+)], and agreement features 1st person singular (masculine/feminine). 4
Note: in (8c–d), \+ is the Prolog negation operator, e.g. \+ finite(V) means nonfinite V.
26
SANDIWAY FONG
(10) a. lex(tu,pf(cl),[a(-),p(+),agr([1,sg,[m,f]])|Fs]) :- subjClFeatures(Fs). b. subjClFeatures([adjoin(v,right),morphC(nom)]).
The lexical entry for –tu also includes the general subject clitic properties given in (10), i.e. it head-adjoins to the right of the verb, adjoin(v,right), and it is marked with the nominative morphological Case feature morphC(nom). The resulting parse for (9) including the verbal complex /arad-tu is shown in Figure 3 below.5
Figure 3. Phrase Structure for Example (9)
2.2.3 Empty expletive EXPL In both Figure 2 and 3, subjects remains in their first merge position, i.e. the specifier-VP position, and do not raise to the surface subject position, i.e. specifier-IP. The Extended Projection Principle (EPP) is responsible for licensing the specifier-IP position. Assuming this position is always present in syntactic structure, i.e. PAPPI will always generate an (initially underspecified) empty NP to fill it, core 5
Orthographical representation is limited to the ASCII character set in some of the trees shown. In Figure 3, capitalized Q is used to stand in for '.
ARABIC PAPPI
27
principles must interact in the course of derivation to license and determine its status as one of several possible empty NPs, e.g. PRO, pro or variable. In the Arabic PAPPI implementation, licensing of empty subject NPs is extended to an empty expletive EXPL for VSO word order only via the AGR criterion. 2.2.4 The principles The basic PAPPI implementation includes a core set of GB principles broadly applicable across languages including Case and Theta theory, Binding conditions A, B and C, the ECP, LF-movement and QR, plus components of Full Interpretation, grouped into filters (operations that may rule out structures) and generators (operations that may generate additional structural possibilities and lead to “fan out”) as shown in Figure 4.
Figure 4. Filters and Generators
All phrase structure recovered by the generator Parse S-structure, the parser operation that implements X-bar syntax plus the effects of overt movement, must be validated across the other filters and generators listed.
28
SANDIWAY FONG
2.3 The programming language level The PAPPI system is designed with the goal in mind that the grammar and language implementor will create definitions for the language-particular and core theory software layers only. In particular, it is envisaged that principles will be written using a fixed set of linguistically-motivated primitive operations provided by the lowerlevel programming language layer shown in Figure 1. To illustrate how principles may be implemented using these primitives, consider the code fragments given in (11) and (12) for Case theory. (11) a. Structural Case is assigned under government (plus Case adjacency) b. sCaseAssign IN_ALL_CONFIGURATIONS X WHERE sCaseConfig(X,Case,NP) THEN NP HAS_FEATURE case(Case). c. sCaseConfig(CF,Case,XP) :governs(Head,XP,CF), caseAssigner(Head,Case), ADJACENT(Head,XP,CF) IF caseAdjacency. d. caseAssigner(INFL,nom) :% I(AGR) assigns nominative Case CAT(INFL,i), INFL HAS_FEATURE agr(_). e. caseAssigner(Verb,acc) :% Verbs assign (acc) Case to CAT(Verb,v). % a direct object f. caseAssigner(ECM,obq) :% ECM complementizer assigns CAT(ECM,c), % oblique Case ECM HAS_FEATURE ecm. (12) a. Case Filter: All lexical NPs must receive Case. b. caseFilter IN_ALL_CONFIGURATIONS X WHERE lexicalNP(X) THEN assignedCase(X). c. assignedCase(X) :- X HAS_FEATURE case(Case), ASSIGNED(Case). d. lexicalNP(X) :- CAT(X,np), \+ EC(X).
(11) defines the structural Case assignment operation sCaseAssign.6 The IN_ALL_CONFIGURATIONS primitive in (11)b) specifies that every configuration X in a phrase structure tree that happens to satisfy sCaseConfig/3, as defined in (11c), will have an NP whose feature slot case(_) will be instantiated by the value of Case determined by the
6
Note that sCaseAssign is parameterized through sCaseConfig/3 by the predicate caseAdjacency, which is set for languages like English.
ARABIC PAPPI
29
choice of case assigner as defined in (11)d–f).7 For example, Figure 5 below provides a more detailed look at the parse of example (1) (shown previously in Figure 2) illustrating the effect of the structural Case operation. As indicated by the valued feature structures case(nom) and case(acc), r-rajul-u and r-risaalat-a have been assigned abstract nominative and accusative Cases by virtue of their syntactic position, and both NPs will also satisfy the requirements of the Case Filter.8
Figure 5. Constituent Features for (1)
Similarly, (12) defines the operation caseFilter as a universally quantified condition over tree structures that checks to see whether all lexical NPs, as defined by (12d), have had their Case slots filled or not.9 7
The infix predicate HAS_FEATURE/2, and the relations ADJACENT/2 and CAT/2 are all linguistic primitives provided by the programming language layer. HAS_FEATURE and CAT/2 facilitate access to constituent features and category labels, respectively. ADJACENT/2 computes whether two constituents are adjacent in syntactic structure. 8 The parse in Figure 5 also shows the effects of Theta theory. The theta slots for both arguments have also been valued. 9 The primitives ASSIGNED/1 and EC/1 check to see whether a supplied argument is unvalued or contains an empty category, respectively.
30
SANDIWAY FONG
2.4 The computational mechanism level The final layer in Figure 1 is the computational mechanism layer. Rules and principles written at the programming language level are automatically compiled down to and mapped onto a variety of different computational mechanisms. For example, the phrase structure component is mapped to a backtracking LR(1)-based parser in PAPPI. Universally and existentially quantified conditions on phrase structure are mapped onto a series of tree-scanning operations. In standard PAPPI, candidate structures are generated one-at-a-one, and principles are evaluated serially, i.e. in generate-and-test fashion. Alternatives to this architectural model are also possible. For example, Fong (1999) describes an implementation of the lowest level for parallel execution across multiple machines. The advantage of software abstraction surfaces as transparency in the sense that none of the principles had to be rewritten or otherwise redefined when retargeted for parallel execution. 3. NegP and VSO In the (standard) extended verbal projection model implemented in PAPPI, i.e. VP-(NegP)-IP-CP, a verb may raise to inflection via negation and further onto the complementizer (C) position. For example, English allows both (13) and (13). (13) a. Didn’t John leave? b. Did John not leave?
This model has to be revised for Arabic clause structure if we are to model the account in Fassi Fehri (1993:26), in which there is no verb movement to C even in the case of yes-no questions, and negation must appear between IP and CP. (14) ;a-laa y-a;tii zayd-un Q not 3-comes Zayd-nom “Isn’t Zayd coming?”
This can be accomplished by re-defining (or effectively parameterizing) categorial selection for Arabic X-bar structures only as shown in (15), cf. (7). The extended verbal projection is now morphed into the sequence VP-IP-(NegP)-CP.
ARABIC PAPPI
31
(15) compl(i,vp). compl(neg,i2). compl(c,i2). compl(c,negp).
As a result, (14) can now be parsed as shown in Figure 6.
Figure 6. Parse Structure for (14)
4. VSO/SOV Word Order and the AGR Criterion Fassi Fehri (1993:32) motivates the AGR Criterion through contrasts like in (16) and (17). (16) a. daxal-at n-nisaa'-u makaatib-a(-hunna) entered-F the-women-NOM office.PL-ACC(-their.F) “The women have entered (their) offices” b. *daxal-na n-nisaa'-u makaatib-a(-hunna) entered-F.PL the-women-NOM office.PL-ACC(-their.F) (17) a. n-nisaa;-u daxal-na makaatib-a(-hunna) the-women-NOM entered-F.PL office.PL-ACC(-their.F) “The women have entered (their) offices” b. *n-nisaa'-u daxal-at makaatib-a(-hunna) the-women-NOM entered-F office.PL-ACC(-their.F)
In (16) and (17), the verb exhibits gender agreement only with the subject n-nisaa;-u “the women”, whereas in (16) and (17), number agreement is also present. One possible generalization is that rich AGR (gender + number agreement) can license a referential NP in specifierIP position, whereas poor AGR (gender agreement only) only licenses the empty expletive EXPL (introduced in section 2.2).
32
SANDIWAY FONG
The PAPPI specification of the AGR Criterion is given in (18). (18) a. agrCriterion IN_ALL_CONFIGURATIONS X WHERE specIP(X,Spec) THEN poorAGRiffEXPL(Spec,X). b. specIP(IP,Subject) :- CAT(IP,i2), Subject SPECIFIER_OF IP. c. poorAGRiffEXPL(Spec,IP) :IP HAS_FEATURE agr(AGR), poorAGR(AGR) IFF expletive(Spec). d. expletive(X) :- X HAS_FEATURE nonarg(+). e. poorAGR(X) :- unspecifiedForNumber(X).
Along similar lines to the operations defined in section 2.3, agrCriterion is a universally quantified tree predicate that checks all specifier-IP configurations for compliance with the AGR Criterion. In the case where poor AGR holds, only an expletive, defined as possessing the feature nonarg(+), may occupy the surface subject position. Should poorAGR/1 fail to hold, i.e. AGR is specified for number, the IFF operator in (18c) ensures that expletive/1 does not hold for the element occupying the subject position. The AGR Criterion permits examples (16a) and (17a), as shown in Figures 7 and 8, respectively, to pass.
Figure 7. Parse Structure for (16a)
ARABIC PAPPI
33
Figure 8. Parse Structure for (17a)
It also correctly blocks (17b), i.e. the case of poor AGR with a nonexpletive in specifier-IP position, as shown in Figure 9.
Figure 9. Parse Structure for (17b)
However, it fails to immediately block (16b), i.e. the case of rich AGR without subject raising to specifier-IP, as shown in Figure 10.
34
SANDIWAY FONG
Figure 10. Parse Structure for (16b)
The reason is that PAPPI inserts an (underspecified) null NP at the time of phrase structure recovery for (16b). In particular, the null NP is initially unspecified with respect to expletiveness or referential argumenthood. The AGR Criterion operation checks only for nonexpletiveness, so it is passed onwards to other principles as a possible empty argument. However, the Theta Criterion, which forces available theta roles and arguments into a 1-to-1 pairing, ultimately blocks the empty argument version of (16b) since the empty NP occupies speciferIP, which is a non-theta position. 5. Conclusions We have described a computer implementation of Arabic clausal structure compatible with the P&P framework. By re-using grammar components already developed for core grammar, or by building on or re-configuring mechanisms developed specifically for existing languages in the PAPPI system, we have demonstrated how a sample Arabic parser can be quickly produced. We have also shown how language-particular constraints such as the AGR criterion can be defined in a multilingual parsing system.
ARABIC PAPPI
35
REFERENCES Birtürk, Ayşenur. 1998. A Computational Analysis of Turkish using the Government-Binding Approach. Ph.D. dissertation, Middle East Technical University (METR), Ankara, Turkey. Chomsky, Noam. 1981. Lectures on Government and Binding. Dordrecht: Foris. Colmerauer, Alain, H. Kanoui, R. Pasero & P. Roussel. 1973. “Une Système de Communication Homme-Machine en Français”. Report, Artificial Intelligence Group, Université d’Aix-Marseille II. Fassi Fehri, Abdelkader. 1993. Issues in the Structure of Arabic Clauses and Words. Amsterdam: Kluwer. Fong, Sandiway. 2001. “Japanese PAPPI”. Researching and Verifying an Advanced Theory of Human Language, Report (5) ed. by Kazuko Inoue, 445-464. Chiba, Japan: Kanda University of International Studies (KUIS). _____. 1999. “Parallel Principle-Based Parsing”. In Proceedings of the Sixth International Workshop on Natural Language Understanding and Logic Programming (NLULP ’99), 45–57, International Conference on Logic Programming (ICLP), Las Cruces, New Mexico. _____. 1991. Computational Properties of Principle-Based Grammatical Theories. Ph.D. dissertation, MIT Artificial Intelligence Laboratory, Cambridge. Lasnik, Howard & Juan Uriagereka. 1988. A Course in GB Syntax: Lectures on binding and empty categories. Cambridge, MA: MIT Press. Lin, Koong. 1997. Pappi-C: A Chinese Principles-and-Parameters. Ph.D. dissertation, Tsinghua University, Taiwan. Pollock, Jean-Yves. 1989. “Verb Movement, Universal Grammar, and the Structure of IP”. Linguistic Inquiry 20.365–424.
CORPUS-BASED LINGUISTIC ANALYSES TESTING INTUITIONS ABOUT ARABIC STRUCTURE AND USE
1
Salem Ghazali Institut Supérieur des langues de Tunis
1. Introduction Evidence from language corpora shows that a great deal of the information provided by dictionaries in general, and Arabic dictionaries in particular, on the meaning and use of words is scanty, sometimes obsolete and hardly meets the needs of the learner. Following Sinclair (1996, 1998, 1999) and others for English, I will attempt to argue on the basis of evidence from Arabic corpora that the meaning or function of a lexical item is largely determined by other words with which it tends to co-occur to a varying degree. Dictionaries do not say much about typical meanings and uses of words as part of a pattern whether in terms of word forms (collocations) or grammatical classes (colligations) nor do they indicate the semantic preference a word may have or the communicative intent the use of a specific word may imply. As an illustration of the valuable contribution of corpus linguistics to refining and correcting our understanding of lexis and grammar, this paper investigates the use of two Arabic words, one very common, the
1
I am very grateful to Hafedh Hlila and Ferid Chekili for reading this paper and for the pertinent comments and suggestions they made. Given their interest in syntactic theory, they both proposed that I include syntactic arguments and justifications for a number of points in the paper. I have not, however, gone that far since this paper is intended to be a description of how words and structures are actually used and not an account of theoretical issues underlying that usage.
38
SALEM GHAZALI
verb particle qad, and one much less frequent, the verb waʃuka /Ɂawʃaka. 1.1 The corpus The corpus from which these words were extracted comprises approximately five million words representing twentieth-century texts such as the complete novel Ɂal-Ɂayyaam by Taha ˙ussein, passages from several other modern novels, essays on philosophy and literature, translations of foreign literary and philosophical texts, doctoral theses in philosophy and linguistics, phonetics, secondary school books, the Bible, newspaper articles as well as texts from the Middle Ages such as kitaabu Ɂal-buxalaa by Ɂal-jaaḥiẓ and other texts from Ɂat-tawḥiidiʕ’s night chats. The available corpus includes around 18 million words, but more than half of them come from newspapers. It was then decided to work with a balanced selection and not include all the journalistic texts in order to reduce bias by having more or less equal input from different types of texts. 2. The Verb Ɂawʃaka The dictionary entry for the root “w-2-k” includes, among other forms, two main verb forms, waʃuka and Ɂawʃaka, the second being the “muqaaraba” verb form. The Arabic dictionaries Ɂal-ʕarab Ɂal-muḥiiṭ and Ɂal-muʕ1im Ɂal-wasiiṭ give both forms the same meanings: (a) “hurry, speed up”, and (b) “forthcoming, about to happen”, but only occurrences of the second meaning are found in the present corpus. Ɂalmuʕjim Ɂal-wasiiṭ explicitly notes that the muqaaraba verb Ɂawʃaka may also be used in addition to waʃuka implying, I presume, that the typical form is waʃuka. However, all of the occurrences of the verb in the corpus are word-forms of Ɂawʃaka and not waʃuka. The dictionary also states that the verb is used more frequently in the imperfective, an observation confirmed in this corpus as the verb occurred 32 times in the imperfective and only 14 times in the perfective. There were also four occurrences of the lexeme as a noun waʃk. 2.1 Collocation and colligation patterns The two major collocates of the verb Ɂawʃaka are the complementizer Ɂan (32 occurrences) and the preposition 'ala (17 occurrences). The most common pattern (21 occurrences) is one where
CORPUS-BASED LINGUISTIC ANALYSES
39
the verb Ɂawʃaka is followed by a definite lexical NP then by the complementizer Ɂan followed by a verb in the imperfective as in (1v)), (2i), (2ii), (2v) and (3iii) below. The NP may of course include a modifier or may be clausal, but typically Ɂan immediately follows the NP. The structural subject NP may also precede the verb Ɂawʃaka (13 occurrences), in which case Ɂan immediately follows the verb as in (1iii), (1vii), and (1x) below. The second most frequent pattern is when Ɂawʃaka is followed by the preposition 'ala. In this pattern (nine occurrences), the verb is typically followed by the preposition ʕala then an NP (1ii), (1viii) (see also 2ii, 3iv). A lexical subject NP may also follow the verb and precede the preposition ʕala (2vi). (1)
i. δakara raɁiisu-l-ḥukuumati Ɂanna Ɂal-ʒazaaɁira Ɂawʃakat fi sanat 1994 ʕala Ɂal-Ɂiflaas “The head of the government mentioned that in 1994 Algeria was on the brink of bankruptcy.” ii. wa ḥiinan kuntu Ɂuuʃiku ʕala Ɂal-halaak ...and sometimes I was about to die iii. wa kaana…huʒuumun ʕala Ɂal-muɁallifiina yuuʃiku Ɂan yakuuna ʕudwaanan. “The attack on (criticism of) authors was almost an aggression” iv. maʕa Ɂintiẓaarin liwaʃaki qiyaami Ɂal-saaʕati “…while waiting for doomsday” v. wa kaan qabla kulli ʃayɁin muqtaṣidan yuuʃiku Ɂiqtiṣaadu-hu Ɂan yabluxa Ɂal-buxla “He was, first of all, thrifty, almost greedy.” vi. wa Ɂawʃakat Ɂal-fawḑa Ɂan tasuuda “The situation was about to become chaotic.” vii. ḥatta Ɂawʃakat Ɂan tatruka fi ḥayaati Ɂal-fata ɁaƟaaran munkirat “…it was about to have adverse effects on the life of the boy.” viii. wa Ɂana raʒulun qad Ɂawʃaktu ʕala Ɂal-kibari “I am a man about to reach old age” ix. wa Ɂawʃakat Ɂiʒaazati ʕala Ɂal-intihaaɁi “...and my vacation was about to end.” x. ʕinda ḥaaffati Ɂal-sariiri, tadnu min δura Ɂal-ʃaʒani, tuuʃiku Ɂan tadmaʕa. “On the bedside, becoming extremely sad, about to burst into tears.”
40
SALEM GHAZALI
These patterns thus show that the verb has a narrow range both in terms of collocation and colligation; only a limited set of words are coselected and these words occur in a specific structure. With regard to grammatical distribution (colligation), the verb is always followed, within a four-word span, by either a complementizer followed by a verb in the imperfective or a preposition followed by an NP. In terms of lexical co-selection (collocation), the comple-mentizer is always Ɂan and the preposition is always ʕala. Having outlined these regularities in terms of both the syntactic patterning and the lexical co-selection of the verb Ɂawʃaka, let us turn to some pragmatics aspects of the verb; that is, what is expressed or implied by the verb when it is associated with other words. 2.2 Semantic preference and prosody A closer look at the collocates of Ɂawʃaka reveals that 30 out of the 50 occurrences of the verb found in the corpus refer to a situation where something unpleasant or undesirable is about to happen or came close to taking place. In fact, the semantic preference of this verb, i.e., the terms it co-selects, are usually words with unpleasant nature such as bankruptcy (two occur-rences), death (two occurrences), burning, aggression, doomsday, disorder, chaos, weeping, confusion, shut horizons, greed, infatuation, adverse consequences, pitfalls, abyss, aging, etc. The examples in (1) below, where the underlined words in italic are co-selected by the verb Ɂawʃaka, illustrate some of the semantic preferences mentioned above: a country on the verge of bankruptcy, a person about to die, waiting for doomsday, a situation about to become chaotic, a person about to become old, etc. The co-selected nouns of unpleasant nature may directly follow the verb Ɂawʃaka or may be separated from it by two, three, or four words (usually the subject NP that may be followed by a PP and possibly a complement clause containing the collocate). The result is thus an expressive connotation of an undesirable state of affairs that is (likely) about to take place. The illocutionary intent of the writer/speaker can only be extracted over a wide span of words, i.e., through what Sinclair (1991) has termed semantic prosody. Lexico-semantic information of this nature is very useful for the learner of Arabic, but dictionaries of Arabic do not provide them. Both of the widely-used Arabic dictionaries mentioned above are silent about such semantic prosody.
CORPUS-BASED LINGUISTIC ANALYSES
41
They only give two definitions for the verb Ɂawʃaka as stated earlier. The fact that specific connotations are associated with the verb Ɂawʃaka in 60% of the cases encountered in the corpus data shows that this usage is not due to chance, but that this verb is typically used with that particular coloring in Arabic. Going through other collocates of the verb Ɂawʃaka, one will also note that there is another semantic prosody implying that some process or state of affairs is about to come to an end and, in most cases, this ending is disastrous, undesirable, unwelcome or regretful. In these cases, the verb Ɂawʃaka has semantic preferences for words such as “end, completion, finish, exhaustion, termination, separation”, etc. In addition to what is implied in some of the examples in (1) above where “life and the world are coming to an end” in (ii) and (iv) respectively, “the end of youth” in (vii) and “the end of vacation” in (viii), there are at least 12 other occurrences of Ɂawʃaka similar connotations as illustrated in the following examples: (2)
i. wa kaana rubbama taʕarraḍa li-baʕḍi Ɂal-hammi ḥiina yuuʃiku Ɂal-ʃahru Ɂan yanqaḍ i “He might be met with some grief when the month was about to end” ii. wa yuuʃiku ma bayna Ɂaydii-himaa min Ɂal-maali Ɂan yanfaδa “and will be about to have spent all the money they have in their hands” iii. fal-sanatu tuuʃiku ʕala nihaayati-ha wa la budda lii Ɂan Ɂastaʕidda lil Ɂimtiḥaani “The (academic) year is about to end and I must get ready for the exam.” iv. wa lamma qaṣura Ɂalʕumru wa Ɂawʃaka ʕala Ɂan-nihaayati qaal lahaa… “When he started aging and his life was about to end, he said to her…” v. wa qad Ɂawʃaka al-ṣayfu Ɂan yaḍalla-na wa sa-naftariqu “The summer is approaching and we will be separated.” vi. wa Ɂiδa ḥadaƟa Ɂan Ɂawʃaka bankun ʕala Ɂal-Ɂiflaasi “If it happens that a bank is on the verge of bankruptcy”
In (2i), we have a situation where a person becomes anxious when the end of the month approaches, namely because he is going to be short of money as shown in example (2ii), which I assume follows sentence (i) in the same text. The verb Ɂawʃakain (2i) has two key
42
SALEM GHAZALI
collocates, hamm “grief” and yanqaẓii “ends”. In (2iii), a student is worried about the academic year coming to an end because he has to get ready for his exams. In (2iv) a person is aging and reaching the end of his life. In (2v) the summer is approaching, and some state of affairs, perhaps the school year, is coming to an end, which will lead to separation from friends or loved ones. There is thus an underlying regularity in the communicative use of this verb that is not inherent in the meaning provided in dictionaries but consistently emerges from the co-text. The semantic prosodies discussed above are present in the majority of the occurrences of the verb Ɂawʃaka. There are, however, a minority of cases where the verb is used in a neutral way as shown in (3) below. No expressive connotation of the type found in the examples discussed above can be inferred from the words surrounding the verb Ɂawʃaka; only one of the inherent meanings defined in the dictionary seems to be conveyed. There are a few additional occurrences of the verb where the context provided by the concordance lines was not sufficient to allow for an adequate assessment of semantic prosody. (3)
i. wa kunta taraa-ha ḥiina yartafiʕu Ɂal-ḍuḥaa wa yuuʃiku Ɂan-nahaaru Ɂan yantaṣifa… “You would see her during forenoon and as midday approaches” ii. Ɂila Ɂal-ḥaddi Ɂallaδi tuuʃiku maʕa-hu Ɂan takuuna Ɂal-qaasima Ɂalmuʃtaraka Ɂal-Ɂaʕẓam “...to the point where it would almost be the largest common denominator” iii. wa haakaδa yantahi ʕindamaa Ɂawʃaka ʕala Ɂal-xatmi Ɂila Ɂalqawli … “...and so he ends by saying when he is about to conclude….”
Except for some words such as technical terms, it is a well-known fact by now that a word interacts with other words it co-selects to create meaning and in some cases to allow for the communicative intent to emerge. When a statistical tendency for a given expressive connotation emerges from corpus data as in the case of the verb Ɂawʃaka, dictionary definitions should include that connotation. It is very useful for the learner/user of the language to come to grips with both the literal and the implied meaning of a lexical item, especially when one is dealing with a language that is not spoken as a mother tongue such as Modern
CORPUS-BASED LINGUISTIC ANALYSES
43
Standard Arabic. I assume that if some proficient users of Arabic were asked to provide the meaning of the verb Ɂawʃaka, it would be unlikely that their definitions will allude to the semantic prosodies discussed above; one would not have expected them to have more insights than lexicographers. However, there is overwhelming evidence from corpus analysis that the implied meaning of the verb is part of their linguistic competence or intuition, a compelling reason to produce corpus-based dictionaries of Arabic. 3. The Particle qad 3.1 Preliminary remarks I will now turn to qad, a word of a different nature. Unlike the verb Ɂawʃaka, qad is a verbal particle that is very frequent as it accounts for 0.3 to 1% of all the words in the corpus depending on the type of text. Initially, the corpus contained 6,314 occurrences of qad including cases when it is preceded by a coordination con-junction (wa-qad and fa-qad) and the assertive particle (la-qad). Only occurrences of bare qad and wa-qad (4,532 occurrences) were considered for this paper, which finally amounted to about 4,000 occurrences after leaving out duplicates, quotations, insufficient context, and errors. During the course of data analysis, additional texts in electronic form were made available. These include the classical texts from Ɂal-buxalaaɁ that provided an additional 226 occurrences of bare qad, and large corpora from current newspapers. After going over a couple of million words from the newly-acquired corpus from newspapers, there did not seem to be any new patterns emerging; so I stopped searching for other occurrences in newspaper texts. I finally selected the two forms qad and waqad that account for 74% of the occurrences of all the forms of qad. Within the (wa)-qad group, bare qad represents 54% of the occurrences. To my knowledge, scholars have addressed mainly the function of the verbal particle qad with respect to tense, aspect, or modality. For a discussion of these uses of qad, the reader may refer to Bahloul (1994). No serious claims will be made in this paper as to the temporal, aspectual, or pragmatic functions of qad. The main focus will be on the collocation/ colligation pattern of this verbal particle and its role in clausal structure. Thus, relatively frequent occurrences of (wa)-qad introducing canonical sentences will not be of much interest here.
44
SALEM GHAZALI
Before examining the patterns of usage of (wa)-qad as they emerge from the corpus data, let us first see what the dictionaries have to say. Note that qad is part of the entry for the root qadd or qadad which includes other lexemes that are synchronically unrelated to the verb particle qad, such as qadd “stature”, qadiid “jerked meat”, etc. For Ɂalmuʕ1im Ɂal-wasiiṭ, qad: (a) is a particle which, when preceding a verb in the perfective, conveys emphasis, and when preceding a verb in the imperfective, the verb complex will convey possibility or doubt and in other cases diminutive or augmentative meaning; (b) may also be in the form of a verbal noun such as in qadni dirham, meaning “one Dirham is enough for me”. (a) and (b) are obviously separate lexemes. lisaan Ɂal-ʕarab Ɂal-muḥiiṭ lists the above forms and usages with more details and examples from classical poetry and adds others, namely, (c) qadi used in elliptical constructions where the following verb is omitted. This usage is reminiscent of the elliptical use in English of the auxiliary “do” as in “we didn’t go but it’s as if we did”. Qadi will be used here like “did” in English where the VP is deleted. This dictionary also mentions that a verb in the perfective can be an adjunct (ḥaal) only if it is preceded by qad explicitly or implicitly, a grammatical structure that will be dealt with in some detail below. It is also worth noting at this stage that none of the forms qadni and qadi in (a) and (b) above appeared in the corpus data whether in classical or modern texts. 3.2 Structural patterns of (wa)-qad I will first make some general observations on the major co-texts where (wa)-qad appeared in the corpus, focusing mainly on the relatively frequent patterns. The first general observation is that 75% of the occurrences of (wa)-qad are found before the perfective. The high frequency of the occurrence of (wa)-qad with the perfective is not stable throughout the different corpus samples. In books such as ;al;ayyaam, (wa)-qad+ perfective accounts for most of the occurrences (98%). In doctoral theses, the compound (wa)-qad+imperfective is more common than (wa)-qad+ perfective (54% and 48%, respectively). This may be due to the fact that reporting on the results of scholarly work generally calls for circumspection, thus the use of (wa)-qad with the imperfective to express probability or possibility when explaining some observed phenomena. Similarly, in newspapers qad+imperfective
CORPUS-BASED LINGUISTIC ANALYSES
45
is used in editorials 50% of the time but only 8% of the time in reports. Having noted this, I will not have anything else to say about the use of (wa)-qad with the imperfective and will concentrate on its use with the perfective in the rest of the paper. 3.2.1 Bare qad Going over all the concordance lines, one notes that the major collocate of qad is the complementizer Ɂanna. The two other complementizers Ɂinna and, to a much lesser extent, Ɂan are also possible in related constructions as will be explained below. In fact, 39% (about 680 lines) of the occurrences of qad appeared in one of the following typical structures: (a) (main clause)+ Ɂanna +NP+ qad+ verb in the perfective, that is, a complement clause with an SVO word order introduced by the complementizer Ɂanna. The NP subject of the complement clause may be just a noun as in (4ii, iv), a noun with modifiers as in (4i), or a clitic subject pronoun attached to Ɂanna (4iii). The NP subject is followed by qad, then the verb in the perfective. These constructions can be matched with English “that clauses” where the complementizer “that” corresponds to Ɂanna in the Arabic constructions and qad would occur in the bracketed empty slots in the English translations below. Note also that the complementizer ;anna may be preceded by a preposition that can be cliticized such as bi- in bi-;anna (see 4iv). These prepositions, when they occur, are required by the type of verb in the main clause and are not pertinent in the pattern as co-selections of qad. (4) i. …Ɂaslafna Ɂanna [mawḍuuʕa Ɂal-zamaani fi kutubi Ɂal-qudaaama]NP qad ɁuƟira ʕaraḍan “…we have previously stated that the question of time in classical books [….] was raised accidentally.” ii. laakin yabdu Ɂanna [Ɂal- Ɂafʕaa]NP qad ḥassat bi-Ɂal-xaṭari Ɂalwaʃiiki…. “But it seems that the viper [….] had sensed imminent danger.” iii. … wa raɁaa Ɂanna -[hu]NP qad Ɂaḍaaʕa maa yakfii min waqtin wa ʒuhdin “…and he realized that he [….] had wasted enough time and effort.” iv. wa Ɂaḥassa Ɂal-ʕamm maḥfuuẓ bi-Ɂanna [Ɂal-laḥẓata]NP qad Ɂiqtarabat… “…and uncle Mahfuuz felt that the moment [….] is approaching.”
46
SALEM GHAZALI
As illustrated in (5) below, Ɂanna may also be preceded by li- (liɁanna) “because”, ka- (ka-Ɂanna) “as if”, raʁma (raʁma Ɂanna) “despite”, ʁayra (ʁayra Ɂanna), Ɂilla (Ɂilla Ɂanna) “but, however, although”, etc. Ɂanna and whatever precedes it in these constructions functions also as a complementizer introducing ad-verbial clauses that are fully formed sentences having the same pattern with respect to qad as the complement sentences above, that is, … Ɂanna + NP+ qad+ verb in the perfective. (5)
iii. sa-taraa-ha ka-Ɂal-Ɂaalati waqafat bal sadaɁat li-Ɂanna [muḥarrikaha]NP qad Ɂuntuziʕa min-ha. “You will see her like a machine that stopped working and rusted because its engine […] has been removed.” iv. … Ɂilla Ɂanna [Ɂan-numuwwa Ɂad-diimuʁraafii]NP qad tamayyaza huwwa Ɂal-Ɂaaxar bi-quwwati-hi Ɂal-haaɁila “However, population growth [….] was also extremely important.” v. raʁma Ɂanna[-humaa , Ɂal-ɁiƟnatayni,]DP qad subiqataa bi-ḥarfi Ɂal-ẓaaɁ Ɂal-mufaxxim “Although they [….] were both preceded by the emphatic consonant ẓaa?” vi. wa tuṭʕimu Ɂal-ṣaadira wal-waarida ka-Ɂanna [Ɂallaaha]NP qad Ɂistaxlafa-ka ʕalaa rizqi-himaa “You are feeding everyone as if God [….] entrusted you with their livelihood.”
(b) Ɂinna +NP+(adverb, PP)+qad+verb in the perfective2 where Ɂinna introduces a root sentence and can be preceded by the preposition fa (fa-Ɂinna). There are also several occurrences of the same pattern with the conjunction lakinna “but” being in the initial position of a root sentence. Note also that like Ɂanna, Ɂinna introduces embedded sentences complements of the verb qaala, (a verb of saying). The colligation pattern of qad in these sentences remains the same. The concordance lines in (6) are illustrations of these co-selections of qad. The bracketed empty slots in the English translations show the positions corresponding to Ɂinna and qad.
2
The verb following qad may also be in the imperfective, but as stated earlier, qad+imperfective expressions will not be dealt with in this paper.
CORPUS-BASED LINGUISTIC ANALYSES
(6)
47
i. Ɂinna [haaδayni Ɂal-qaanuunayni]NP qad Ɂabrazaa la-naa Ɂahammiyyata Ɂal-ʕaamili Ɂaz-zamani “[....] these two laws [….] have highlighted for us the importance of the time factor.” ii. Ɂinna [Ɂal-fursa]NP qad Ɂaʒalluu muluuka-hum Ɂiʒlaala-hum li-ɁlɁaalihati “[....]the Persians [….] have venerated their kings the same way they venerated Gods” iii. wa bi-Ɂal-fiʕli fa-Ɂinna [haaɁulaaɁi]NP qad Ɂistaṭaaʕuu Ɂan yataɁaamaruu ḍidda-hum “Indeed [….] these [….] have been able to conspire against them.” iv. wa lakinna -[hu]NP maʕa δaalika qad ḍaraba li-Ɂal-fataa mawʕidan “but, anyhow, he [….] gave an appointment to the boy.” v. wa yuqaalu Ɂinna -[hu]NP qad ʕammara Ɂarbaʕa miɁati ʕaamin “It is said that he [….] had lived four hundred years.”
Co-occurrences of qad with Ɂan are rare in this corpus and totally absent from newspapers, theses, and other modern essays and books. There are about half a dozen concordance lines, most of which come from novels. The typical structure is a verbal complement sentence (VSO order) introduced by Ɂan directly followed by qad then the verb in the perfective: main clause+ Ɂan +qad+verb in the perfective, as illustrated by the examples in (7) below. (7) i. wa Ɂiδa huwa yunbiɁu-humaa bi-Ɂan qad Ɂaana la-humaa Ɂan yusaafiraa “and there he was informing them that [….] it was time for them to travel.” ii. wa min-al-muḥqqaqi Ɂayḍan Ɂan qad kaana la-hum fi-Ɂal-rubʕi zumalaaɁun “It is also certain that [….] they have classmates in the area.” iii. wa Ɂaʃʕara-hu bi-Ɂan qad Ɂutiiḥa la-hu Ɂan yaʒlisa “He let him know that [….] he was permitted to sit down.” iv. wa xuyyila Ɂilaa haδihi Ɂal-Ɂumm Ɂ-taʕiisati Ɂan qad samiʕa Ɂallaahu la-haa wa li-zawʒi-ha “this miserable mother had the impression that [….] God has heard her prayers and those of her husband.”
The interesting observation to be made here is that while qad coselects Ɂanna/Ɂinna 39% of the time in the corpus as a whole, this pattern accounts for 50% of the occurrences of qad+perfective in
48
SALEM GHAZALI
journalistic texts. In fact as will be shown below, the remaining 50% of the occurrences of qad in journalistic style are in the verbal complex kaana…qad, which practically leaves no room for any other possibility. One may wonder whether qad in these constructions is intended for tense, aspect or modality as has been claimed in the literature, or is rather part of a pattern where its use is almost automatically triggered by other words. Given the high frequency of these patterns especially in Modern Standard Arabic as used in periodicals, I believe that collocations (and colligations) are being formed with qad with limited choice for the writer as to the phrasal context in which it should occur. Being repeatedly used in similar patterns, qad may be losing its role as an aspectual or assertive particle to become a regularly co-selected item in a phrasal expression. The second major collocate of qad is kaana in the verbal complex kaan(…)qad+perfective which accounts for 32% of all the occurrences of qad (580 concordance lines) of the general corpus, but the rates of occurrences are not uniform across all types of texts. These constructions are relatively rare in classical texts such as those of Ɂaljaḥiẓ and Ɂat-tawḥiidi (13% and 8%, respectively) and very frequent in modern writing such as in recent doctoral theses and newspapers (64% and 50%, respectively). This trend may very well indicate that kaan(…)qad+perfective is gaining the status of a collocation in Modern Standard Arabic irrespective of whether qad has retained its aspectual or modal functions. In these collocations, qad may directly follow kaana, or the two words may be separated by the NP subject: kaana+NP+qad as in (8) (i), (ii), and (iii) or NP+kaana+qad (8) (iv) and (v). The latter may be nested in a complement clause introduced by Ɂanna/Ɂinna. Note that the lexical subject NP may follow the verbal complex and other material (8v) or be absent from the immediate co-text (8vi). The examples below illustrate these structures. (8)
i. wa kaana [Ɂal-kongres Ɂal-Ɂamriiki] NP qad haddada bi-waqfi Ɂalmusaaʕadaati Ɂallati tamnaḥu-ha Ɂal-wilaayaatu… “The American Congress had threatened to stop the aid provided by the United States.” ii. wa kaana [Ɂal-druuz]NP qad Ɵaaruu ʕala firansa sanata 1925 “The Druuz revolted against France in 1925.”
CORPUS-BASED LINGUISTIC ANALYSES
49
iii. wa kaana [lubnaan]NP qad ṭaalaba maʒlisa Ɂal-Ɂamn Ɂal-dawli Ɂalyawm bi-Ɂtixaaδi ɁiʒraaɁaatin “Lebanon asked the (UN) Security Council today to take measures” iv. Ɂiδ Ɂanna [Ɂal-mufaawaḍaat]NP kaanat qad badaɁat Ɂawwalan maʕa raʒuli Ɂal-Ɂaʕmaali Ɂal-Ɂustraali “Given that negotiations had initially started with the Australian businessman.” v. fa-Ɂinna[hu]NP yakuunu qad ḥadaƟa fiʕlan fi ɁaƟnaaɁi δaalika Ɂalwaqti “It must have then really happened during that time.” vi. Ɂiδa kaana qad sabaqa-hu Ɂilaa Ɂal-kalaami fi-l-mawḍuuʕi [xuṭabaaɁun.] NP. “...if (other) speakers had preceded him in talking about the subject” vii. wa kaana qad ʕaqada Ɂawwala Ɂiʒtimaaʕin la-hu fi Ɂayyaar (maayu) Ɂal-maaḍi “It held its first meeting last May.”
A much less frequent collocation, in which qad co-selects kaana, is one where kaana immediately follows qad. The typical patterns are either qad+kaana+verb (in the imperfective) such as: ḥatta Ɂintahaa Ɂilaa ʁadiiri maaɁin kaƟiiri Ɂal-ḍafaadiʕi qad kaana yaɁtii-hi … “until he reached a pond full of frogs where he used to go…”
or qad+kaana+adjective (a predicate complement in general) such as in: wa yabduu Ɂanna Ɂabaa tammaam qad kaana waaʕiyan bi-δaalika “It seems that Abaa Tammaam was aware of that.”
These constructions account for only 3.5% of the occurrences of qad, but constitute nonetheless clear cases of collocations. Note that these patterns would be relatively more frequent if one considered colligations; that is, not the exact lexical items themselves but their word class. In these patterns, qad is often followed by one of the verbs “sisters” of kaana such as Ɂaṣbaha, Saara, baata, etc., which are inchoative predicates meaning roughly “become”, as well as other verbs. It is also interesting to note that qad kaana constructions are strikingly absent from recent newspapers texts and doctoral theses in
50
SALEM GHAZALI
this corpus. These types of texts, however, exhibit very high frequencies of occurrences of collocations where qad co-selects kaana in the imperfective to indicate probability, but as stated above these patterns are not dealt with in this paper. The corpus also comprises a range of collocations involving qad, most of which come from literary sources, and some are rather obsolete constructions. The most frequent of these collocations (4% of the total occurrences of qad) is when qad starts (or is part of) a quotation following the verb qaal “say” as in: wa qaala: “qad qabiltu δaalika Ɂayyuha Ɂal-Ɂamiir.” “He said: ‘I accept, your Royal Highness’.”
The other collocations are much less frequent but each one of them appeared at least ten times in the corpus, with (f) and (h) below found only in classical literature. a. (fa)(wa)+Ɂiδa (bi)+NP+qad+verb “(and then) and all of a sudden NP” b. wa law +qad+ verb “if (only) NP” c. kam min+ NP+ qad+ verb “how many NP” d. Ɂawa laysa+ qad+ verb “didn’t (had not, was it not the case that) NP” e. rubba +NP+ qad+ verb “many a NP” g. ma daama(NP)+ qad+ verb “since (NP) ” f. falaa yazaalu +qad+ verb “NP still (yet, didnʕt stop)” h. qad+ wallaahi+ verb “qad +by God verb” (this is the only construction where qad may be separated from the following verb)
Before turning to some syntactic aspects pertaining to the distribution of qad, I would like to make one further comment on what seems to me to be a change in the use of this particle in modern writing. It is argued that one of the major functions of qad is to denote a completed action. The Hans Wehr Arabic-English dictionary gives the word “already” as a possible translation for “qad”. There is also the word baʕdu which means “already, yet” in Arabic. For “purists”, using qad and baʕdu together would be a pleonasm since one of them is redundant as it adds no information not already provided by the other. Surprisingly, the corpus contains 21 instances of the occurrences of these two words together in the same sentence. These patterns come
CORPUS-BASED LINGUISTIC ANALYSES
51
from modern texts including scholarly essays. The examples in (9) below are an illustration of these would-be “pleonasm”. (9)
i. laʕalla Ɂal-qaariɁa qad fahima baʕdu Ɂannana lam nudrik… “The reader may have already understood that we have not realized” ii. ḥatta yakuuna Ɂal-lisaanu qad tahayyaɁa baʕdu li-l-ḥarakati Ɂalmuwaaliyati. “...until the tongue is already in a position for the following vowel.” iii. wa lam yakun baʕdu qad balaʁa sinna Ɂar-ruʒuulati “and he had not yet reached adulthood.” iv. kaana taariixu Ɂal-miitaafiizika qad ḥarrara baʕdu Ɂat-tafkiira Ɂalfalsafii “The history of metaphysics had already liberated philosophical thinking.”
When confronted with these examples, a specialist of Arabic retorted “people don’t know Arabic anymore”. That may not be the only reason, however. Factors affecting language change, such as transfer from English and French, are causing the emergence of a new usage where qad may be in the process of forming a collocation with baʕdu to achieve the function it used to achieve on its own. The remainder of the description of bare qad will be devoted to occurrences where I believe its presence is motivated by syntactic structure regardless of any additional underlying assertive, temporal or aspectual interpretations. Consider the examples in (10) below where the bracketed strings in English are the translations of what immediately follows qad in the original Arabic sentences. (10)
i. falaa yazaalu qad ʁaṣṣa….. “He kept on [choking…]” ii. haaδa Ɂal-ʃaaʕiru Ɂallaδi badaa fi naẓari Ɂal-Ɂaxbaariyyiina qad ẓafara bi-ma ɁaxṭaɁa-hu ʁayru-hu. “This poet, who according to historians, seemed [to have succeeded where others have failed].” iii. fa-lamma naẓara Ɂilaa Ɂar-raʒuli qad inƟanaa raaʒiʕan… “When he saw the man [go back]. (when he noted [that the man was going back]).” iv. fa-lamma raɁaa faraḥa-hu qad Ɂaḍʕafa qaal Ɂinna faraḥa-ka… “When he saw (felt) that his joy had increased, (when he saw his joy [increase]) he said: ‘your joy …” v. fa-waʒadtu-humaa qad faxuraa ʕalayya bi-maa ḥabaahumaa bi-hi …
52
SALEM GHAZALI
vi. vii.
viii.
ix.
x. xi.
xii.
xiii.
“I found them [boasting about the favor they obtained]… or I found that they were boasting…” yabdu lii Ɂal-baabu qad qudda min xaʃabin qadiimin “The door seems to me [to have been carved out of old wood], or (It seems to me that the door….)” Ɂamma Ɂal-marɁatu fa-hiya ʒaalisatun ʕala-r-rimaali qad rafaʕat Ɂiḥda rukbatay-ha wa … “As to the woman, she was sitting on the sand [bending up one of her knees and…]” kuntu kaƟiiran maa Ɂaḥussu bi-Ɂaxii Ɂal-Ɂaʁbar qad Ɂaxraʒa raɁsa-hu min-al-faʒwati “I often sensed my dusty (or proper N) brother [stick his head out of the opening].” Ɂila daaɁirati-l- Ɂistiʃraafi Ɂal-firansii Ɂallati tabduu qad faqadat maa kaana la-haa… “...to the French supervising body which seems [to have lost what it had]…” fa-ḥasiba nafsa-hu qad Ɂaḍḥaa fi Ɂaḥqari qaryatin min quraa.. “He believed himself [to be in one of the most despicable villages]…” wa-qtaraba kandiid min-humaa faraɁaa δaalika Ɂal-muḥsina Ɂilay-hi qad ṭafaa li-burhatin Ɵumma ʁamara-hu Ɂal-yammu… “Candide came closer to them and saw the one who was good to him [float for a short period then sink in the sea].” fa-taʒidu baʕḍa tilka Ɂal-masaaɁila Ɂal-dustuuriyya qad ʕaalaʒuu-ha min baabin yataʕallaqu bi-Ɂal-qaḍaaɁ “You find [that they have treated some of those constitutional issues from a legal perspective].” yuɁƟiruuna Ɂal-ʒuluusa ʕalaa haaδihi Ɂal-ḥuṣuri wa-l-Ɂabsiṭati qad Ɂulqiyat ʕala-l-Ɂarḍi… “They like to sit on these mats and carpets [(which) were laid on the floor…]”
Sentence (10i), from classical literature, represents a pattern that is absent from modern writing in this corpus. The collocation laa yazaalu qad is present in texts from the same period (the Middle Ages). This sentence will be ill-formed if qad is removed. Sentence (10ii) is also ungrammatical, in my opinion, if qad is deleted. Without qad, sentence (10iii) is ambiguous: a) either the subject of the verb ɁinƟanaa “go back” is the NP Ɂar-raʒul “the man” in which case ɁinƟanaa raaʒiʕan is a circumstantial clause (an adjunct), or b) the subject of the verb ɁinƟanaa is the same as that of the verb naḍara “saw”. The sentence
CORPUS-BASED LINGUISTIC ANALYSES
53
can be disambiguated by changing the inflection on the first verb naḍara, for example, naḍarna “we saw”, nonetheless the sentence is considered to “read much better” with qad introducing the circumstantial clause.3 Syntactically then, qad seems to be similar to the complementizer “that” in English and sentence (10iii) can equally be translated as “when he noted that the men was going back…” The same observations can be made with respect to the role of qad in many of the sentences in (10). In fact, in most of these cases qad is used to introduce a complement clause and is found in constructions where in English, for example, one finds: a) b) c)
“That” before a finite complement sentence, A non-finite ordinary clause with a PRO subject, and A non-finite exceptional clause with a cognitive verb. (Note the presence of verbs such as waʒada “find”, ḥasiba “consider”, raɁaa “see, feel”, etc.).
There are also frequent occurrences of qad in clauses following the verb badaa “seem”. Arguing for the assertive properties of qad, Bahloul (1994) suggested that it is less likely to appear in constructions following yabdu “it seems” because this verb is far from being assertive (p. 122). Data from this corpus, however, show that badaa/yabduu...qad constructions are very common, as illustrated in (10ii), (10vi) and (10ix). This constitutes further evidence, in my view, that qad is often needed for clausal syntax regardless of pragmatic considerations. When asked about the role of qad in these constructions some teachers of Arabic say that it is there for linking. One argument for my intuition that the main role of qad in these patterns is to introduce clauses is the fact that it may often be easily replaced with the complementizer Ɂanna. In some of these constructions, ;anna may occupy the exact position of qad; in others a change is needed at the level of clausal structure. Examples (10ii), (10v), (10vi), (10ix) and (10xii) are repeated below as (11i), (11ii), (11iii), (11iv), and (11v), with Ɂanna instead of qad. Note that in (11i), (11iii), and (11iv) Ɂanna occupies the same position as qad in the sentence and that there is no need to change the translation.4
3
According to the judgments of some colleagues who teach Arabic language and literature.
54
SALEM GHAZALI
(11)
i. haaδa Ɂal-ʃaaʕiru Ɂallaδi badaa fi naẓari Ɂal-Ɂaxbaariyyiina Ɂanna-hu ẓafara bi-ma- ɁaxṭaɁa-hu ʁayru-hu. “this poet, who according to historians, seemed to have succeeded where others have failed.” ii. yabdu lii Ɂanna Ɂal-baaba qudda min xaʃabin qadiimin “It seems to me that the door has been carved out of old wood.” iii. fa-waʒadtu Ɂanna-humaa faxuraa ʕalayya bi-maa ḥabaa-humaa bi-hi … “I found them boasting about the favor they obtained… or I found that they were boasting….” iv. Ɂila daaɁirati Ɂal- Ɂistiʃraafi Ɂal-firansii Ɂallati tabduu Ɂanna-ha faqadat maa kaana la-haa… “...to the French supervising body which seems to have lost what it had….” v. fa-taʒidu Ɂanna-hum ʕaalaʒuu baʕḍa tilka Ɂal-masaaɁili Ɂaldustuuriyya min baabin yataʕallaqu bi-l-qaḍaaɁ “You find that they have treated some of those constitutional issues from a legal perspective”
The “logical connector” function of qad is not limited, in my view, to replacing the complementizer Ɂanna. It is often required by the syntax when a modifying construction is introduced or a fresh start is needed. In example (10xiii), qad is followed by an attribute and can perfectly be replaced by the relative pronoun Ɂallati. In example (10vii), what follows qad is an adverbial clause. In various other constructions qad, usually preceded by a comma, is found at the start of a new sentence that is related to a previous one as in the following example: kaana bilgar ʕimlaaqan tabluʁu qaamatu-hu Ɂal-sittata Ɂaqdaam, qad raɁaa-ni Ɂafqudu waʕyi ʕalaa haaδa-l-maʃhadi. “‘Bilgar,’ who was a six foot giant, saw me faint in front of this scene.”
Similar occurrences of qad are numerous in the corpus, and most of them seem to be there to facilitate transition from one sentence to the other. The great majority of these occurrences of qad are best translated either with a subject pronoun co-referential with an NP in the previous sentence or by a relative pronoun in a non-restrictive relative clause. Of course, qad is often intended for tense/aspect and modality purposes, and there are many instances of that usage in the corpus. A
CORPUS-BASED LINGUISTIC ANALYSES
55
clear example is (12i) below where the verb following qad expresses an activity that is already completed and where qad seems to be intended for both assertive and aspectual purposes. It is interesting to compare (12i) to (12ii) where the latter expresses some sort of a habitual activity or general truth and where the time of predication has no bearing on the proposition expressing the general truth. In (12ii), the speaker is trying to explain that the process of forming a word such as Ɵawb “garment” resembles the process of weaving it. Note that qad can be deleted in this sentence without affecting grammaticality, but note also that it is part of a Ɂanna +NP+qad+verb in the perfective collocation/colligation in which there is a co-selection of specific lexical items and grammatical classes. (12)
i. Ɂam yuʕaddu Ɂal-ḥafiifu min-al-ḥarfi wa-l-ḥarfu qad Ɂiktamala bitamaami-l-ḥabsi wal-Ɂinfiʒaari “Or should aspiration be considered part of the consonant when the consonant has (already) been completed after closure and release?” ii. Ɂamaa taʕrifu yaa Ɂabaa biʃr Ɂanna Ɂal-kalaama Ɂismun waaqiʕun ʕalaa ɁaʃyaaɁin qad ɁiɁtalafat bi-maraatiba, wa-taquulu bi-l-maƟali: haaδa Ɵawbun… “Don’t you know, Abaa Bichr, that language (speech) is naming things that are gradually composed, you say for, example, ‘this is a garment…’”
In summary, there is no room in this paper to provide all the occurrences of bare qad, but the examples given are fairly representative and provide support for the fact that: a) A great deal of the occurrences of qad are in collocations of the type kaan(…)qad+perfective, Ɂanna +NP+qad+verb in the perfective or Ɂinna +NP+(adverb, PP)+qad+verb in the perfective. These collocations account for practically all the occurrences of qad in newspapers and scholarly essays in present-day usage. Other collocations involving qad exist as well but are less frequent. b) The particle qad is not employed necessarily for tense, aspect or modality. It is often used as a complementizer or a subordinator to facilitate sentence embedding or for smooth transition between sentences.
56
SALEM GHAZALI
3.2.2 wa-qad The situation is a lot less complex with wa-qad as it is used predominantly (84% of the cases) in sentence-initial position. Sentence-initial position does not necessarily mean after a comma or a period, even in texts where punctuation is provided. In newspaper articles this is practically the only context where wa-qad is used. Another context where we find wa-qad is when it functions as a subordinator to generally introduce an adverbial (circumstantial) clause (13% of the cases). The remaining few occurrences not accounted for are either wa-qad kaana collocations or instances where the context provided by the concordance line does not permit elucidation of its exact function. Some examples of wa-qad in sentence-initial position are given in (13) below. In examples (13iii), (13iv), and (13v) where wa-qad does not follow a period, its position in the English translations corresponds to the underlined conjunction “and”. The rest of the examples (13vi to 13x) illustrate collocations with wa-qad. Haδa waqad “besides” and xaaṣṣatan (xuṣuuṣan) wa-qad “especially that” are frequent in present-day Arabic, especially in newspapers. Ɂammaa waqad “now that” is relatively less frequent. Ɂillaa wa-qad, which has no obvious translation out of context, is only attested in classical literature in this corpus. (13)
i.
ii. iii.
iv.
v.
wa-qad taɁassasat Ɂal-raabiṭatu munδu sittati Ɂaʕwaamin wa taḍummu fi ʕuḍwiyyati-haa… “The league was founded six years ago and includes in its membership…” wa-qad ɁanʃaɁat tuunis fi-l-Ɂaʃhuri Ɂalmaaḍiyyati maʒlisan Ɂaʕlaa…. “Tunisia has set up, in the last few months, a supreme council…” humaa fi-l-ḥaqiiqati buʕdun waaḥidun, waqad ẓahara haδa Ɂalmafhuum maʕa naẓariyyati Ɂan-nisbiyyati Ɂal-muḥaddada “They are actually one dimension, and this concept appeared with special relativity theory.” Ɵumma ḥallalnaa-ha kaamilatan wa-qad makkanatna ɁalɁistintaaʒaatu min Ɂidraaki Ɵaʁaraatin naaʒimatin… “We then analyzed all of them and the findings allowed us the note some shortcomings resulting from…” wa ʃaaʕa ṣiitu-hu fi Ɂuruuba wa fi Ɂamariika wa-l-ṣiin wa-qad ʕurifa xaaṣṣatan bi-kitaabi-hi Ɂal-ʃahiir …
CORPUS-BASED LINGUISTIC ANALYSES
57
“He became famous in Europe, America and China and was especially known for his famous book…” vi. haaδa wa-qad mazzaqa Ɂal-qaδδaafi nusxatan min-al-qaanuuni Ɂalʒadiidi “Besides, Qadhdhaafi has torn out a copy of the new legislation.” vii. Ɂammaa wa-qad Ɂaṣrartumaa ʕalaa Ɂal-raḥiili, fa-Ɂinni sa-Ɂaamuru… “Now that you have insisted on leaving, I will order (instruct)….” viii. wa natamanna la-haa naʃran qariiban, xaaṣṣatan wa-qad raɁayna ɁalbaaḥiƟiina fi-l-ṣawtiyyaati Ɂal-ʕarabiyyati… “We hope it will soon be published, especially that we saw researchers in Arabic phonetics…” ix. … manaaṭiqa raʁma Ɂittissaaʕi-ha ẓallat maḥruumatan, xuṣuuṣan waqad faʃalat firqatu Ɂafaaquṣ Ɂal-qaarrati fi ɁadaaɁi haaδihi Ɂalmuhimmati. “…areas which, despite the importance of their size, continued to be deprived, especially after the permanent Sfax (theatrical) group had failed in fulfilling this mission.” x. wa maa wahaba Ɂallaahu Ɂal-ʕaqla li-Ɂaḥadin Ɂillaa wa-qad ʕaraḍa-hu li-Ɂal-naʒaati wa laa ḥalaa-hu bi-l-ʕilmi Ɂilla wa-qad daʕaa-hu Ɂila Ɂalʕamali bi-ʃaraaɁiṭi-hi “When God grants reason to someone then he will certainly lead him to safety and when he graces him with science, he is necessarily requiring him to obey its methods.”
Let us now consider the occurrences of wa-qad in (14) below where it is not in sentence-initial position and where the underlined words in the English sentences are possible translations for wa-qad. The first observation is that wa-qad can be deleted in none of these examples without reorganizing the sentences. Qad, but not wa, may be deleted perhaps in (14vii). Second, what follows wa-qad in these constructions, especially in (14i) to (14vii), is an adverbial clause (a circumstantial clause, or ḥaal in traditional Arabic grammar termi-nology) which is a fully formed sentence but has the characteristics of complement sentences in the sense that it can neither be interrogative nor imperative. Thus, here too wa-qad functions as some sort of a complementizer that can be translated in English by “when, while, after, before” or possibly a relative pronoun ((14vi) and (14vii)). In (14viii) and (14ix), the expressions set off by commas are parenthetical but wa-qad cannot be removed. In (14x) Ɂammaa wa-qad raɁaytuki faincludes the collocation Ɂammaa wa-qad +perfective mentioned above and where wa-qad is an obliga-tory constituent.
58
SALEM GHAZALI
(14)
i.
maa ɁakƟara maa kaana yastamiʕu li-l-qaariɁati wa-qad ḥamala Ɂamiinata bayna δiraaʕay-hi “He often used to listen to the reader (while) holding Amina in his arms.” ii. Ɵumma qaal li-Ɂaxii-hi wa-qad waḍaʕa yada-hu ʕalaa katifay-hi “...then he said to his brother while putting his hand on his shoulder…” iii. wa Ɂummi fi rukni-ha tastamiʕu Ɂilaa Ɂal-Ɂaxbaari yarwuuna-haa waqad tahallalat Ɂasaariiru waʒhi-haa “My mother, in her corner, is listening to the news being read with her face lines shining (bright facial expression).” iv. wa kayfa tamʃi wa-qad ʒaʕalta fi baṭni-ka maa yaḥmilu-hu ʕiʃruuna raʒulin “How can you walk when you had put in your stomach what needs twenty men to be carried.” v. wa-Ɂittaʒaha Ɂal-ʒamiiʕu Ɂila “Ɂat-tribunaal” wa-qad Ɂaḥaaṭat bi-hi quwwaatu Ɂalʒandarma wa-Ɂal-buuliis “They all headed for the court which (while it) was surrounded by national guard and police forces.” vi. Ɂaktubu naṣṣan ʕalaa lisaani Ɂal-Ɵuʕbaani wa-qad Ɂadxala-hu Ɂalʕammu maḥfuuẓ ʒiraaba-hu “I am writing an essay in the name of the snake which uncle Mahfuuz has put in his bag.” vii. wa humaa Ɂaxawaaya Ɂal-Ɂakbaru minnii diib wa-qad Ɂaṣbaḥa fiimaa baʕd Ɂadiib wa haykal “They are my two older brothers Diib, who (and he) later became Adiib, and Haykal.” viii. Ɵumma taqaddamat Ɂilay-hi waḥda-ha, wa-qad ʒaaʕa, fa-ʁaḍiba wa qaama min makaani-hi naḥwa-ha fa-qaala la-ha “Then she approached him by herself, and he was hungry, so he became angry and got up to walk towards her and said to her:” ix. fa-Ɂal-laahu ʕinda-hum manaḥa Ɂal-Ɂinsaana, wa-qad xalaqa-hu, ʕaqaaran yaḍmanu la-hu ʕaafiyyatan daaɁimatan “For them, God provided man, when (given that) he created him, with a drug that will guarantee him good health for life.” x. laqad Ɂaḥbabtu Ɂal-Ɂaanisa koniikand Ɂammaa wa-qad raɁaytu-ki faɁinnii Ɂaxʃaa Ɂallaa Ɂuhibba-haa baʕdu “I loved Miss Konikand but after seeing you (now that I have seen you) I am afraid I don’t love her anymore.”
These structures then illustrate the fact that wa-qad cannot be attributed the same function in sentence-initial position and in
CORPUS-BASED LINGUISTIC ANALYSES
59
embedding contexts. In my opinion, it is highly unlikely in Modern Standard Arabic that sentence-initial wa-qad+perfective serves any purpose other than being a filler that a writer resorts to almost automatically in order to start a sentence. Inside a sentence, however, the particle plays a major role, namely in making sentence embedding possible. One of my colleagues, who has been teaching Arabic lexicology and terminology at the university for many years, was asked to translate a book from French into Arabic, which he did to the great satisfaction of those who paid for the job, except for one small detail. He was kindly requested to go over his translation again with a view to remove some of the too many (wa)-qads. There are, of course, various contexts where the function of waqad is mainly temporal, aspectual or assertive. In (15i) and (15ii) below, qad may be deleted without affecting grammaticality, if wa is left. In that case, sentence (15i) loses the emphasis provided by qad “did die” and (15ii) also loses emphasis, and at the same time, the message that the activity of emptying the plate had already been completed will not be explicitly expressed. (15)
i. li-δalika ʕazama ʕalaa taḥṭiimi qafaṣi-hi Ɂal-ṣaxriyyi Ɂaw yamuut waqad maata. “For that reason, he decided to destroy his stone cage or die, and he did die.” ii. wa Ɂinna-maa hiyya laḥaẓaatun laa tataʒaawazu rubʕa Ɂal-saaʕati waqad fariʁa ma kaana fi-Ɂal-ṭabaqi “They were a few moments, no more than a quarter of an hour, before the plate was emptied.”
4. Conclusion When I started this investigation, I had no preset ideas on the function and use of the lexical items I decided to examine other than my high school Arabic and whatever intuition may have resulted from that training a long time ago. I have learned, using the British National Corpus,4 and later from my own work on Arabic and that of others on English, that one’s intuition and the description of a language found in dictionaries and grammar books (which are also mainly based on 4
“How to Use Corpora in Lexicography”, a workshop organized by John Sinclair and others in the Tuscan Word Center (Italy) in October 2000.
60
SALEM GHAZALI
introspection) do not provide the whole story. Extensive corpus research can help update language descriptions since language is continually changing, and in many occasions corrects the intuition of the unbiased researcher by providing key information on the nature, function and structure of language. As such, corpus-based analysis is not a linguistic theory, but one of its major empirical tools. Once I have examined all the concordance lines for the two words Ɂawʃaka and (wa)-qad, I have learned, among other things, that: (1) Both of them occur mainly as parts of typical constructions, whether collocations or colligations, where the choice of lexical items and grammatical class is constrained. (2) Ɂawʃaka has a preference for words with undesirable connotations and a semantic prosody implying the coming about of a situation that is not welcome or the end of a process leading to an undesirable state of affairs. (3) Bare qad occurs mainly in two major collocations, especially in present Modern Standard Arabic, and may also be used elsewhere as a complementizer introducing complement sentences. (4) wa-qad, if not used to start a new sentence, may also function as complementizer introducing adverbial clauses. (5) Although (wa)-qad is described in the literature I am aware of as having temporal, aspectual or assertive functions, the corpus data show that they are often either confined automatically to some position in the sentence, in a collocation/ colligation or required by the syntax for clause embedding. (6) Some of the meanings and functions of these words given by dictionaries are not found in this corpus. Most of these observations are not available in dictionaries or, to my limited knowledge, in syntactic analyses
REFERENCES Bahloul, Maher. 1994. The Syntax and Semantics of Taxis, Aspect, Tense and Modality in Standard Arabic. Ithaca: Cornell University. Ghazali, Salem & Abdelfattah Brahem. 2001. “Dictionary Definitions and Corpusbased Evidence in Modern Standard Arabic”, Arabic NLP Workshop, ACL/EACL, Toulouse. Hanks, Patrick. 2000. “Immediate Context Analysis: Distinguishing meanings by studying usage”. The Tuscan Word Center Workshop on Lexicography.
CORPUS-BASED LINGUISTIC ANALYSES
61
Sinclair, John. 1991. Corpus, Concordance, Collocations. Cambridge: Oxford University Press. _____. 1996. “The Search for Units of Meaning”, TEXTUS 9.1, 75-106. _____. 1998. “The Lexical Item”. Contrastive Lexical Semantics ed. by E. Weigand. Amsterdam: John Benjamins. _____. 1999. “A Way with Common Words”. Out of Corpora ed. by H. Hasselgard & S. Oksefjell. Amsterdam and Atlanta: Rodopi. Tognini Bonelli, Elena. 2000. “Functionally Complete Units of Meaning across English and Italian: Towards a corpus-driven approach”. Lexis in Contrast ed. by B. Granger & B. Altenberg. Amsterdam and Philadelphia: John Benjamins. Wehr, Hans. 1976. A Dictionary of Modern Written Arabic. Ed. by J. Milton Cowan. Ithaca: Spoken Language.
LEARNING ARABIC MORPHOLOGY USING STATISTICAL CONSTRAINT-SATISFACTION MODELS1 Paul Rodrigues & Damir Ćavar Indiana University
1. Introduction The morphology of Arabic has been a difficult problem for unsupervised morphological analysis systems. Typical solutions to the analysis of Semitic morphology involve rules and grammar machines that, by necessity, are nearly as complicated as the morphology they are trying to discover. Furthermore, most of these systems incorporate fixed lexicons that would require many hours of labor, broad knowledge of Arabic, and lexicographic experience to replicate. Additionally, many of these solutions incorporate linguistic knowledge, and are of little theoretical interest. Arabic words are constructed by a root and pattern-based morphological system, where the root represents a semantic field and the pattern represents syntactic information, such as voice, transitivity, or intensity. There are over 5,000 Arabic roots, which can be 3, 4, or 5 characters in length, with the shorter roots being the more common. The 3, 4, and 5 character roots each have different pattern systems. For example, McCarthy (1979) contains a table showing 72 patterns for triliteral roots, and 24 patterns for quadriliteral roots. Root morphemes occur with varying degrees of regularity. Sound roots are the most perfect, with the three radicals of the root appearing in the surface form of the word. Doubly weak roots are the least perfect, 1
The authors would like to thank Stuart Davis for pointing us to the phonological literature mentioned in this paper, Robert F. Port and Katherine Tippetts for their comments on presenting the results, and an anonymous reviewer.
64
PAUL RODRIGUES & DAMIR CAVAR
in which only one radical can be found in the word (Mace 1998:26103). Arabic also has concatenative morphology. Particles such as the definite article (al), the conjunction (wa), or pronouns can be attached to the stem, as can a morpheme representing grammatical gender. Concatenative mor-phemes that represent natural gender, person, and number exist. Additionally, case endings such as nominative (u), accusative (a), and genitive (i) may appear attached to nouns in the formal language. Reduplication occurs, but is not reported to be a productive process. McCarthy (1979) discusses several examples, such as e.g. waswas “whisper”, and mishmish “apricot”. There has been a controversy over the past several years as to the status of the root as a morpheme. Aphasia studies (Prunet et al., 2000) and hypocoristic analyses (Davis & Zawaydeh 2001) have shown evidence for the root being considered a morpheme, while the work of others, such as McOmber (1995) and Ratcliffe (1997), have shown that words are derived from a stem morpheme. We would like to point out that root identification is still necessary for lexicography and information retrieval, regardless of its morphemic status in the speaker’s lexicon. The model we propose is statistical and constraint-based. It approaches Arabic morphological parsing with a segmental approach, in accordance with current linguistic theory. Learning occurs incrementally, and we adapt our grammar with each new word. We track the accuracy of our algorithm at each word, allowing us to see how well the algorithm learns. Though bootstrapping with a dictionary would most certainly aid the algorithm, we include none. 2. Prior Work Most of the work in Arabic morphological parsing has been using the finite-state approach. Beesley & Karttunen (2000) and Beesley (1996) describe some of the solutions spearheaded by their team at XEROX. The systems described are extremely complicated machines, requiring many hand-coded rules. Though these papers are clear in theory, with clear explanation of architecture, we are never shown quantitative evaluations on Arabic data.
LEARNING ARABIC MORPHOLOGY
65
There have been several successful approaches using dictionaries. One such approach using a combination of a dictionary with statistics is Sebawai (Darwish 2002) Sebawai uses a training set of word-root pairs in order to bootstrap the learning of the root, and to construct a list of prefixes and suffixes. An additional dictionary, a small list of particles, was supplied to the parser. Sebawai reached 92.7% precision and no recall was reported. One more recent approach is the Buckwalter Arabic Morphological Analyzer (Buckwalter 2002). This is an extremely accurate morphological parser, scoring as high as 99.25% precision (no recall was reported) (Maamouri et al. 2004). This system relies on a large lexicon of 548 prefixes, 906 suffixes, and 78839 stems, as well as thousands of rules stored in morpheme compatibility tables. There has been a statistical approach designed for Hebrew root morphology with motivations similar to ours (Daya et al. 2004). Their best results came from using a Hidden Markov Model (HMM) for each radical, trained on a corpus of manually tagged roots, scoring 80.90% precision with 88.16% recall. Their system is also a constraint-based ranking system, but our systems diverge in that our approach is entirely based upon statistical co-occurrence of the root radicals, and not any sort of machine (such as a HMM), and we do not manually mark the roots in the learning phase. Additionally, they included a database of suffixes in the experiment that produced the best results. After discovering the root of the word, the suffix must be valid. If it is, a higher ranking is awarded. Though their results are impressive, we view this suffix-checking approach to be supervision. Without this suffix-dictionary, their system only reaches 59.83% precision and 57.98% recall. These numbers are still the results of a supervised approach however, as they still use a root-dictionary for learning. Though Buckwalter’s system has high precision, it requires a massive lexicon. Darwish’s and Daya et al.’s approaches pride themselves on the ability to bootstrap off of short dictionaries, but these are not realistic models of natural linguistic acquisition. Unsupervised approaches have had much greater success in the domain of concatenative morphology acquisition.
66
PAUL RODRIGUES & DAMIR CAVAR
With Dictionary (Supervised) Without Dictionary (Unsupervised) Approach Results Approach Results Darwish P=93% Rodrigues, Ćavar P, R=75% Daya et al. (Suffix+Root P=81%, Dict.) R=88% Elghamry P, R=92% Daya et al. (Root Dict P=60%, Only) R=58% Buckwalter P=99% Table 1. Summarizes different computational approaches to Arabic root parsing. (P=Precision, R=Recall)
John Goldsmith’s Linguistica approach (2001) showed very good results using a Minimum Description Length (MDL) analysis, reaching 85.9% precision and 90.4% recall averaged across English and French corpora. Goldsmith quickly points out, however, that “some of the assumptions made in the implementation restrict the useful application of the algorithms to languages in which the average number of affixes per word is less than what is found in such languages as Finnish, Hungarian, and Swahili, and we restrict our testing in the present report to more widely studied European languages.” By restricting the test set to those languages that are not only purely concatenative but also hold a low average number of morphemes per word, the problem has been greatly simplified. The Linguistica approach postulates that there is only one morphological split point, and that all words have a stem and an affix. Essentially, each character offers a Boolean decision as to whether or not it is the terminal character in the stem morpheme. A word eight characters long, with purely concatenative morphology, offers only eight possible split points. A random split will have a 12.5% chance of being correct. However, a random selection of a threecharacter Semitic root from an eight-character string only offers a 1.7% chance of being correct. The experiments performed by Creutz & Lagus (2002) on Finnish text demonstrate how difficult it is for an unsupervised morphological system to make multiple splits within a word. Linguistica performed with only 43.1% of the words correctly parsed and an additional 24.1% words partially parsed. Creutz & Lagus’ own MDL algorithm performed only moderately better.
LEARNING ARABIC MORPHOLOGY
67
3. A Statistical Approach 3.1 Root morphology In the algorithm we present here, the root system is learned by comparing frequency statistics. Evidence is weighed for and against a hypothesis of three characters being declared as the root morpheme. Positive evidence includes: the summation of the ratios between a letter being a root and being an affix, the summation of the frequency that a letter has shown up as a possible root, and the summation of the probabilities that the letter belongs to the root. This is then divided by the negative evidence: the summation of the probabilities that the letter is an affix and the summation of the frequency that a letter has shown up as a possible affix. This ratio yields a “score” for the triliteral. The triliteral that has the highest score is determined to be the root of the word. If there is a tie, then the most frequent trilateral is chosen (Elghamry 2004). Elghamry’s 2004 paper described this as a two-pass algorithm, and not an online-learning algorithm. Constraints were added to the triliteral root algorithm to reduce the search time. We considered only triliteral roots. By requiring a linear distance of five characters in between the first and last letter of the root, and a distance of no more than three characters between the radicals, numerous unlikely character combinations were rejected. While incorporating these rules adds supervised knowledge to the system, removal of these constraints does not significantly impact the results. All characters within the first and last radicals of the root, except for the middle radical, are considered the template. All characters outside the end radicals are considered concatenative morphology. For example, in the word kitaabi “my book”, ktb would be labeled the root, the interior i_aa would be our template, and the i would be labeled as possible concatenative morphology. The output of the program represents this as Xi. This does not match perfectly the definition of template used by McCarthy (1979), as his analysis introduced several templates that contain phonemes outside the end root radicals. In our system, this would be learned as concatenative morphology. 3.2 Concatenative morphology The concatenative morphology algorithm is broken up into two sub-systems, GEN and EVAL. GEN generates possible morphological
68
PAUL RODRIGUES & DAMIR CAVAR
seg-mentations for a word. EVAL then takes these hypotheses and evaluates them according to several metrics and constraints. (Ćavar et. al. 2005) GEN uses Alignment-Based Learning to generate morpheme hypotheses. At every new word, we generate only the hypotheses based upon alignment with previously learned morphemes. EVAL uses a constraint-based voting architecture to determine the optimum segmentation over various memory and processing constraints. The metrics we include are: MINIMUM DESCRIPTION LENGTH MINIMIZE KULLBACK-LEIBLER DIVERGENCE MINIMIZE RELATIVE ENTROPY MAXIMIZE MUTUAL INFORMATION MAXIMIZE FREQUENCY MAXIMIZE MORPHEME LENGTH
The Minimum Description Length Principle (Grünwald 1996) allows us to constrain our grammar hypotheses only to those that increase our grammar size the least. This allows for a smaller memory footprint and faster recall speed. For each word, we calculate the Kullback-Leibler Divergence (KLD) (MacKay 2003). KLD tells us how much our grammar will increase in size if we add our hypothesis. The result is a measure of bits. The function q represents the probability mass function (pmf) of the original grammar, and the function p represents the pmf of the new grammar. Variable x represents the currently processed token.
We have also included a variant of KLD called Relative Entropy (RE), which calculates the conditional probability between uni- and bigram sequences. Variables y and x are our string tokens.
LEARNING ARABIC MORPHOLOGY
69
Mutual Information (MI) tells us the dependence between one morpheme hypothesis and another. We weigh the result by the probability of co-occurrence. The frequency-weighted MI of al+kitAb is computed in the following example.
Each constraint has a vote, and the hypothesis with the most number of votes wins. Constraints can be adjusted in importance by weighting parameters. For integration with the Semitic root parser, these weights have been adjusted to account for the frequency of the X root placeholder. MAXIMIZE LENGTH was set to 1.5 and MAXIMIZE FREQUENCY was set to 2.5, while the remaining constraints were set to 1.0. These weights were set based on limited empirical testing, future work should include the ability to learn these weights. This system for concatenative morphology has been shown to work well with languages with simpler morphology, such as English, but poorly for agglutinative languages, such as Uzbek. Additionally, languages such as Arabic that use interdigitation perform poorly, without the tiered approach discussed in this paper (Ćavar et al. 2005). This is due to the algorithm’s preference for a fixed substring root in which statistical dependence could be calculated. The tiered solution discussed in this paper actually reduces the unpredictable data, yielding excellent results on the concatenative analysis. The reduplication module performs simple compression, looping through a string and searching for every substring match. It generates all ABC-pattern grammars of the word, and the shortest grammar string is chosen. For example, waswas “whisper” would return ABCABC, ABAB, and AA. AA would be chosen, as it represents the largest repeated pattern in the string. In the case of two reduplication grammar strings of equal length, the most frequent one is returned.
70
PAUL RODRIGUES & DAMIR CAVAR
4. Learning These experiments were performed on morphosyntactically correct words generated from the Buckwalter Arabic Morphological Analyzer database, a database of roots, prefixes, suffixes and combination rules. (Buckwalter 2002) This database uses the Buckwalter transliteration system, a lossless Latin-based orthographic system for Arabic.
LEARNING ARABIC MORPHOLOGY
71
Two random numbers between 0 and 1 are generated for each word. When the first number was above 0.15, prefixation was allowed to occur. When this was true for the second number, suffixation occurred. A random prefix, root, and suffix were selected. If the prefix was allowed to combine with the root, and the root allowed to combine with the suffix, and the root was triliteral, the vowelized word was returned. The words generated yielded an average length of 8.1319 characters, which is slightly higher than the average in the Arabic Treebank (Maamouri et al. 2004). This method, though not ensuring the word category distribution of Arabic, is necessary to both guarantee that our root is triliteral, as well as perform the online evaluation of the 10,000 words. Verifying the roots by running through another morphological analyzer would introduce undesired error into the evaluations. Preprocessing was done on the corpus to allow a better comparison with the other parsers. Alif maqsoura and ya were conflated, as well as hamza, alef maada, alef with hamza above, and alef with hamza below. These two rules were used by Darwish (2002). Alef wasla and alef were conflated as well. The shadda, the symbol
72
PAUL RODRIGUES & DAMIR CAVAR
representing gemination, was replaced by the letter immediately prior. The learning charts displayed here track the progress of root learning over the 10,000 words. The lighter lines represent moving averages of precision over 50 words, and the darker lines represents a moving average of 50 data points on that curve. We find that this algorithm predicts fairly consistently over the course of the dataset, reaching approximately 75% precision after 10,000 words. Since we do not incorporate supervised knowledge of weak radicals, our score can never be perfect. Our algorithm has a high preference towards negative evidence. Because of this, unvoweled text does not perform as well as fully pointed text. While requiring vowels is not desired for information retrieval of text, the voweled words are a closer correlate to speech. The results of the corpus without short vowels appears below. The concatenative morphology module was then fed the results of the root parse of the voweled corpus. Example output is shown below. (#wa _@0) [39 (((#liX$) 6) ((#biX$) 7)((#biAX$) 2) ((#biAlX$) 2) ((#yaX$) 4)... )] (#lilX$) [5 (((#AlX$) 1) ((#taX$) 2) ((#kaAlX$) 3) ((#Xi$) 1) ... )] (_@0 al$) [10 ( ((#liX$) 2) ((#wakaX$) 1) ((#X$) 5) ((#saX$) 1) ((#Xm$) 1))] (_@0 #taX$) [6 (((#fa) 1) ((#sa) 3) ((#wa) 2))] (#waliX$ _@0 n$) [1 (((u) 1))] (_@0 #AlX$) [10 (((#fa) 8) ((#ka) 1) ((#wa) 1))] (_@0 #liyuX$) [3 (((#fa) 3))] (_@0 #lituX$) [1 (((#fa) 1))] (@0 #taX$) [6 (((#fa) 1) ((#sa) 3) ((#wa) 2))]
Discovered morpheme signatures are within the left set of parentheses. Within the bracketed set, we find the morphemes that allow combination with the signature on the left. The number of times that morpheme is observed is also listed. For example, wa was discovered 39 times, and the this morpheme co-occurred with liX six times, biX seven times, etc. X represents the stem discovered during the root stage. Quantitatative results and learning charts are difficult to produce for the concatenative morphology. Reduplication, for example, is not noted
LEARNING ARABIC MORPHOLOGY
73
in the Buckwalter database. Additional allomorphy occurs, which makes string verification of our splits inaccurate. 5. Conclusions An unsupervised statistical approach towards Arabic morphological learning is a viable one. We have shown that without a dictionary, and using only dependency statistics, Semitic root morphology can be predicted with 75% precision. Additionally, we have introduced an algorithm for Arabic concatenative morphology that shows usable results. Our use of negative evidence in the root identification algorithm throws out the vowel template as possible root characters. This allows natural separation of a triliteral root and a vowel template. This also correlates with one of the most fundamental ideas in computational linguistics, “Zipf’s Law.” Zipf’s laws state that longer words contain more semantic infor-mation, and shorter words are more frequent. Additionally, clustering by length and frequency reveals distinct categories of open- and closed-class words. This is essentially how our Arabic root identification algorithm works at a morphemic level. Promiscuity separates the word into two tiers, one being a root template and the other being a more promiscuous vowel template. This frequency effect supports strongly the notion that the root template, and not the stem, is analogous to an open-class morpheme, while the vowel template is analogous to a separate closed-class and functional morpheme Future work should include a notion of confidence. This will allow a separation of precision and recall scores, as well as the ability to extend the algorithm to four- and five-radical roots. A quantitative analysis of the concatenative morphology must be performed as well. An open source (free to download, modify, and distribute) implementation of the root algorithm described in this paper is available online.2 The concatenative morphology system is available to researchers by contacting the authors.
2
http://jones.ling.indiana.edu/~prrodrig/
74
PAUL RODRIGUES & DAMIR CAVAR
REFERENCES Beesley, Kenneth R. 1996. “Arabic Finite State Morphological Analysis and Generation”. Proceedings of the 16th International Conference on Computational Linguistics 1.89-94. Copenhagen. Beesley, Kenneth R. & Lauri Karttunen. 2000. “Finite-State Non-Concatenative Morphotactics”. Proceedings of the 5th Workshop of the ACL Special Interest Group in Computational Phonology, 1-12. Luxembourg. Buckwalter, Tim. 2002. Buckwalter Arabic Morphological Analyzer Version 1.0. Linguistic Data Consortium. LDC2002L49. http://www.ldc.upenn.edu/ Catalog/CatalogEntry.jsp?catalogId=LDC2002L49. Ćavar, Damir, Paul Rodrigues & Giancarlo Schrementi. 2005. “Unsupervised Morphology Induction for Part-of-Speech Tagging”. Philadelphia: U. Penn Working Papers in Linguistics 10.1. Creutz, Mathias & Krista Lagas. 2002. “Unsupervised Discovery of Morphemes”. Proceedings of the 6th Meeting of the ACL Special Interest Group in Computational Phonology, 21-30. Philadelphia. Darwish, Kareem. 2002. “Building a Shallow Morphological Analyzer in One Day”. ACL Workshop on Computational Approaches to Semitic Languages, 47-54. Philadelphia. Daya, Ezra, Dan Roth & Shuly Wintner. 2004. “Learning Hebrew Roots: Machine learning with lingustic constraints”. Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing. Barcelona. Davis, Stuart & Bushra Adnan Zawaydeh. 2001. “Arabic Hypocoristics and the Status of the Consonantal Root”. Linguistic Inquiry 32:3.512-530. Elghamry, Khaled. 2004. “A Constraint-based Algorithm for the Identification of Arabic Roots”. Proceedings of the 1st Midwest Computational Linguistics Colloquium. Bloomington, IN. Goldsmith, John. 2001. “Unsupervised Acquisition of the Morphology of a Natural Language”. Computational Linguistics 27:2.153-198. Grünwald, Peter. 1996. “A Minimum Description Length Approach to Grammar Inference in Symbolic, Connectionist and Statistical Approaches to Learning for Natural Language Processing”. Lecture Notes in Artificial Intelligence ed. By S. Wermter, E. Riloff, G. Scheler 1040:203-216. Springer Verlag: Berlin. Maamouri, Mohamed, Ann Bies, Tim Buckwalter & Wigdan Mekki. 2004. “The Penn Arabic Treebank: Building a large-scale annotated Arabic corpus”. The NEMLAR International Conference on Arabic Language Resources and Tools, 102-109. Cairo. Mace, John. 1998. Arabic Grammar: A reference guide. Edinburgh: Edinburgh University Press. MacKay, David J. C. 2003. “Information Theory, Inference, and Learning Algorithms, Version 6.0”. Cambridge: Cambridge University Press. McCarthy, John 1979. Formal Problems in Semitic Phonology and Morphology. Ph.D. dissertation. MIT. Distributed 1981 by Indiana University Linguistics Club. McOmber, Michael. 1995. “Morpheme Edges and Arabic Infixation”. Perspectives on Arabic Linguistics VII ed. by Mushira Eid, 173-189. Amsterdam & Philadelphia: John Benjamins. Prunet, Jean-Francois, Renee Beland & Ali Idrissi. 2000. “The Mental Representation of Semitic Words”. Linguistic Inquiry 31.609-648.
LEARNING ARABIC MORPHOLOGY
75
Ratcliffe, Robert. 1997. “Prosodic Templates in a Word-Based Morphological Analysis of Arabic”. Perspectives on Arabic Linguistics X ed. by Mushira Eid & Robert Ratcliffe, 147-171. Amsterdam & Philadelphia: John Benjamins. van Zaanen, Menno M. 2001. Bootstrapping Structure into Language: Alignmentbased learning. Ph.D. dissertation. The University of Leeds.
LEARNING TO USE THE PRAGUE ARABIC DEPENDENCY TREEBANK Otakar Smrž, Petr Pajas, Zdeněk Žabokrtský, Jan Hajič, Jiří Mírovský, Petr Němec Charles University in Prague, Institute of Formal and Applied Linguistics
1. Introduction Prague Arabic Dependency Treebank (PADT), recently published in its first version (Hajič et al. 2004a) by the Linguistic Data Consortium, is both a collection of multi-level linguistic annotations over Modern Standard Arabic, and a suite of unique software implementations designed for general use in Natural Language Processing. The underlying theory of this resource is reviewed in Hajič et al. (2004b). In the present paper, we focus rather on the practical aspects of using the PADT data and the computational tools in original research. 1.1 Data survey The corpus of PADT 1.0 consists of morphologically and analytically annotated newswire texts of Modern Standard Arabic, which originate from Arabic Gigaword (Graff 2003) and partly overlap with the plain data of Penn Arabic Treebank, Part 1 (Maamouri et al. 2003) and Penn Arabic Treebank, Part 2 (Maamouri et al. 2004). The rough survey of the annotations is given in Table 1. Data sets AFP, UMH and XIN come from the earlier period of the project when morphological annotations were not based on the MorphoTrees technology (cf. subsection 2.1). Therefore, the files recording the process of
78
SMR1, PAJAS, 1ABOKRTSKY, HAJI3, MIROVSKY, NEMEC
morphological disambiguation of these data could not be distributed. Still, the resulting morphological information is available in the analytical files, along with the analytical annotations. The other data sets, namely ALH, ANN and XIA, are complete already and provide files of three different types—non-annotated text, MorphoTrees annotations, and analytical annotations. Information from the morphological level is also, as a prerequisite, propagated into the analytical level. Not all the data are processed on both levels, though. Data [A] Tokens [M] Original Data Provider News Period AFP 13 000 — Agence France Presse 2000 / VII UMH 38 500 — Ummah Press Service 2002 / I–III XIN 13 500 — Xinhua News Agency 2003 / V ALH 10 000 73 500 Al-Hayat News Agency 2001 / IX ANN 12 500 25 500 An-Nahar News Agency 2002 / XI XIA 26 500 49 500 Xinhua News Agency 2003 / V 113 500 Analytical level TrEd Netgraph Oraculum Encode::Arabic 148 000 MorphoTrees software + documentation Table 1. Survey of the contents of the Prague Arabic Dependency Treebank 1.0. Columns [A] and [M] represent the number of syntactic units, i.e. tokens, for analytical level and MorphoTrees, respectively.
1.2 Annotation environment The indispensable annotation environment for this and various other treebanking projects is the TrEd tree editor (Hajič et al. 2001) written in Perl/Tk. It is not only a fully programmable and customizable graphical user interface, but also an excellent suite of utilities for automated, optionally parallel, processing of the data (consistency checks and revising, batch conversions, search, difference evaluation, etc.). TrEd is documented on http://ufal.mff.cuni.cz/~pajas/tred/. We will explore some of its features in 4.2. 1.3 Treebank search engines Netgraph (Mírovský & Ondruška 2002) is a client–server application for efficient searching in treebanks. Unlike TrEd, it provides the user with an easy-to-learn graphical query language that does not
LEARNING TO USE THE PADT
79
presume any programming skills. The client application is implemented in Java and is available on http://quest.ms.mff.cuni.cz/netgraph/. Oraculum (Ljubopytnov et al. 2002) supports linguistically even more expressive queries, and operates through a sophisticated web browser interface, which is now being ported to Arabic. 1.4 Other tools Next to several other linguistically significant solutions (cf. section 5), there is the Encode::Arabic module (Smrž 2003) for Perl that supports miscellaneous modes of processing of the non-trivial, yet ingenious ArabTeX encoding notation of the Arabic script and/or its phonetic transcription (Lagally 2004). Encode::Arabic covers the Buckwalter transliteration as well. 2. Data Structures The PADT annotations are distributed as UTF-8 encoded files in the FS format, which is documented on TrEd’s website. TrEd and the array of associated tools and libraries provide options for converting these data into several XML-compliant formats, and vice versa. TrEd’s graphical renderings can be printed as PostScript, PDF, or image files. If independent data processing is desired, the files can best be accessed using the Fslib module for Perl, which is available in the distribution along with many other modules and scripts serving for data flow management, migration of annotations, updating and quality checking, difference evaluation or execution of systematic revisions. The non-annotated textual data are provided in the original XML format of the Arabic Gigaword corpus. 2.1 Functional morphology & MorphoTrees The morphological annotations of PADT used to directly employ the information produced by Buckwalter Arabic Morphological Analyzer (Buckwalter 2002). With the introduction of Functional Arabic Morphology (Smrž in prep., Smrž et al. 2005), all morphological tags were mapped as closely as possible into the current positional notation representing individual grammatical categories in separate columns.
80
SMR1, PAJAS, 1ABOKRTSKY, HAJI3, MIROVSKY, NEMEC
The new type of annotations required a different disambiguation tool. The flexibility of TrEd made it possible to design and implement MorphoTrees in it as a special annotation context (Smrž & Pajas 2004).
Figure 1. The hierarchy of MorphoTrees and their annotation using restrictions (cf. Smrž & Pajas 2004).
Figure 2. View of annotated paragraph. Note the levels of distinct information.
MorphoTrees is the idea of building effective and intuitive hierarchies over and among the input and output strings of morpho-
LEARNING TO USE THE PADT
81
logical systems. It is especially interesting for Arabic and the functional morphology, but it is not limited to either of these. Figure 1 illustrates how MorphoTrees organize the morphological information/analyses into a multi-level hierarchy. The leaves of these trees are the imaginable tokens with their tags as the atomic units, and the root is the input string being analyzed, or generally an entity (some tree of discourse elements). Rising from the leaves up, there is the level of lemmas of the lexical units, the level of non-vocalized standard orthographic forms, and the level of decomposition of the entity into a sequence of such forms, implying the number of tokens and their spelling. As a convenient extension, the overall solutions of the annotations can also be viewed in a similar hierarchical structure. An example of such a paragraph tree is given in Figure 2. 2.2 Analytical dependency trees Analytical annotations represent the surface syntax of the language in the dependency formalism outlined in Hajič et al. (2004b). They provide a link from morphology to tectogrammatics—the level of linguistic meaning—of the Functional Generative Description theory (cf. Sgall et al. 2004). The analytical level is modeled with dependency trees whose nodes map, one to one, to the tokens resulting from the morphological analysis and tokenization, and whose roots group the nodes according to the division of the discourse into sentences or paragraphs. Edges in the trees establish/reconstruct syntactic relations between the governor and the dependent, or rather, the whole subtree under and including the dependent. The nature of the government is expressed by the analytical functions of the nodes being linked. In addition to this strict dependency structure, information of other kinds and character can be captured in the trees, while computational procedures for inferring any complementary information can be implemented independently of data. In TrEd, resolution of grammatical correference is automated in this manner. Identifying resumptive pronouns and deverbal inner objects by themselves is enough for some algorithm to find their grammatical counterparts and render these pairs.
82
SMR1, PAJAS, 1ABOKRTSKY, HAJI3, MIROVSKY, NEMEC
In Figure 3, the instances of such non-dependency relations are shown with dashed arcs. Nonetheless, one might begin with Figure 4 for a more elementary example of an analytical tree.
Figure 3. Analytical tree featuring advanced phenomena like ellipsis of another predicate, deverbal inner objects in adverbial function, or composite auxiliary elements. Note the labels [ExD] (on otherwise coordinative expression), [Adv_Msd], [AuxY] / [AuxP] (compound preposition) or [AuxY] / [ExD], respectively.
3. Installation and General Setup PADT 1.0 is distributed by Linguistic Data Consortium, University of Pennsylvania, http://www.ldc.upenn.edu/. The PADT project has its own website, http://ufal.mff.cuni.cz/padt/, where the data and the tools are documented in detail, and from where updates and extensions to the distribution are available.
LEARNING TO USE THE PADT
83
User’s installation should start with TrEd / Perl, and might proceed with downloading the Netgraph client / Java. The software applications are platform independent, and it is relatively easy to set things up. Installation of the data management scripts and modules or the CVS repositories for the FS annotation files is optional. In order to search PADT with Netgraph, the client application must connect to a server accessing the data. Users are welcome to register with our Netgraph server, even though servers can also be run locally.
Figure 4. Analytical representation of the sentence of Figure 2, with displayed morphological tags. Note the topology and functions of the predicate and its participants (subject, direct and indirect objects), and consider differences among the distinct attributive modifications.
84
SMR1, PAJAS, 1ABOKRTSKY, HAJI3, MIROVSKY, NEMEC
4. The Quest for Improper Annexation Let us have a look into the annotated data. Linguists need to search for a particular phenomenon in the language, evaluate it, contrast it with some other phenomena, consider the contexts of usage, etc. The example case that we will explore in this section is improper annexation in Arabic. A condensed definition of this phenomenon might not be precise—and we will not attempt it. Instead, we will pronounce and eventually refine our intuition that improper annexation is a genitive construction whose first term is an adjective, and whose second term is a [definite] noun (cf. for instance Schulz 2004:131– 133,140,149). We will, of course, use the treebank in order to test and improve the description of this notion. More importantly, we will learn about the applicability of PADT and its tools, and about some limitations. 4.1 Querying PADT with Netgraph A query in Netgraph is a generalized subtree having the properties of the desired treebank structures specified as attributes of its individual nodes or edges. Queries can be created interactively through a graphical interface, or equivalently, they can be linearized in a bracketing-style notation, which we will use here. [tag=A?????????] ( [tag=N?????????,afun=Atr]) Figure 5. Netgraph query for the analytical level—a simple relation.
The example query in Figure 5 will return all occurrences of adjectives that have an attributive noun as one of its children. Such a relation is weaker than what improper annexation requires. In particular, the query ignores any constraints on word order, mutual distance, grammatical case and definiteness that we expect from a genitive construction. Anyway, it is just fine to ask Netgraph again and more specifically, adding some attributes to the nodes and listing the acceptable combinations of morphological categories in the tags. This gradual ruling out of irrelevant solutions is a helpful practice.
LEARNING TO USE THE PADT
85
Netgraph queries need not concern the analytical level only. The structures in MorphoTrees can be investigated as well. Consider the query of Figure 6, which says: look for the paragraph trees, i.e. those whose root (_depth=0) is of type ‘paragraph’, in which we are interested in two immediately succeeding token nodes on the lowest level (_depth=3) such that the first one is a non-indefinite adjective and the second one is a non-indefinite noun either certainly in genitive, or with the value for case unset. Recall Figure 2 for better visualization. [type=paragraph,_depth=0] ( [_transitive=true,_depth=3, _name=N1, type=token_node, tag=A????????C|A????????D|A????????R|A????????-] , [_transitive=true,_depth=3, ord={N1.ord}+3, type=token_node, tag=N???????2C|N???????2D|N???????2R|N???????2-| N???????-C|N???????-D|N???????-R|N???????--]) Figure 6. Netgraph query for improper annexation in MorphoTrees.
Upon submitting this query to the server, we receive much more precise tips of what improper annexation could be. But when browsing through the results in Netgraph and trying to determine which of these are and which are not the appropriate cases, one may usually not see enough context of the surrounding paragraphs, and may not export the information in a very flexible way in order to process it further. Neither may the data be edited directly, if one is supposed to make corrections based on the search. How do we meet such requirements, then? 4.2 Searching and viewing in TrEd TrEd, even in its graphical annotation mode, can work with filelists, by which we define the extent of the corpus where search operations are to take place. Besides the obligatory menu item ‘Node > Find ...’ by its attributes, there is the function ‘User-defined > PerlEval’ that executes a given Perl code in the current environment of TrEd’s data structures.
86
SMR1, PAJAS, 1ABOKRTSKY, HAJI3, MIROVSKY, NEMEC
The program in Figure 7 keeps iterating over the MorphoTrees data until the configuration of nodes discussed with Figure 6 is encountered. Then, the control returns to TrEd, which sets the cursor to the newly found occurrence of the hypothesized improper annexation. ChangingFile(0);
## $this represents the current node do {
if ($this->root()->{'type'} eq 'paragraph') { $prev = undef; while ($this = $this->following()) { if ($this->{'type'} eq 'token_node') { if (defined $prev and $prev->{'tag'} =~ /^A........[CDR-]$/ and $this->{'tag'} =~ /^N.......[2-][CDR-]$/ and $this->{'ord'} == $prev->{'ord'} + 3) { return; } $prev = $this; } } } } while NextTree() || NextFile();
Figure 7: TrEd evaluation code in Perl, equal to the query of Figure 6.
The program in Figure 8 is designed for the analytical level, where the dependency information, rather than immediate adjacency, can be exploited. The algorithm carefully finds the head of the genitive construction even if its tail actually consists of multiple genitives in (hierarchical) coordination or apposition (cf. Figure 9, ex. E). Plus, there are constraints on the morphological tags of the nodes in question, relaxed a little with respect to the tagset of the former disambiguation. It might be clear by now that this powerful mechanism of computing with trees can be abstracted from, and that the return instruction can be replaced with, say, printing out the current node’s address and some significant attributes of its neighbors, or with code for complex restructuring, or with simple counting. In fact, there are two important modifications of TrEd, named btred and ntred, with which almost every automatic processing, including searching, is done very quickly and conveniently.
LEARNING TO USE THE PADT
87
ChangingFile(0); ## $this represents the current node do { while ($this = $this->following()) { if ($this->{'afun'} eq 'Atr' and $this->{'tag'} =~ /^N.......[23-][CDRX-]$/) { $head = $this; $head = $head->parent() while $head->{'parallel'} =~ /^(?:Co|Ap)$/; $head = $head->parent(); return if $head->{'tag'} =~ /^A........[CDRX-]$/; } } } while NextTree() || NextFile();
Figure 8. TrEd evaluation code for finding improper annexation on the analytical level. Note how coordination/apposition nodes between the two parts of the genitive construction are treated. Values 3 and X in the tags reflect some systematic ambiguity present in the old data sets.
4.3 Improper annexation Having applied the criteria of Figure 7 and Figure 8 on our treebank data, we certainly did not obtain only improper annexations! How can we tell? And why have we not come up with the right kind of queries? For the answer to the first question, we can refer to Schulz (2004) or Badawi et al. (2004). There are crucial semantic distinctions to make as to whether the adjectival head of the genitive construction logically qualifies the dependent noun, or whether this relation is reversed. Such information is neither present in morphology, nor in analytical syntax. On the other hand, our queries do include some looseness. Ideally, the values of the relevant morphological categories should all be set. Then, the definiteness values for the head of a genitive construction could only be R (reduced) or C (complex), as we exemplify in Hajič et al. (2004b), and there would emerge other regularities that we could try to capture, or patterns that we could try to exclude. In Figure 9, we give several examples of true improper annexation that we have found, and compare it with another phenomenon that partly invades the set of search results due to the unset case information
88
SMR1, PAJAS, 1ABOKRTSKY, HAJI3, MIROVSKY, NEMEC
of the nominatives therein.
Figure 9: Contrasting improper annexation (examples A–F) with nact sababī (examples O–R). Note the patterns of definiteness or agreement in both of these phenomena (cf. e.g. Badawi et al. 2004:110–116).
Needless to say, preferring the recall of a query to its precision helps discover more inconsistencies or mistakes in annotation. The way we process the results in order to filter out false positives, like printing additional information, sorting and uniq-ing it, etc., is also important. In our current situation, roughly one out of six tips provided by the queries happened to be correctly classified as improper annexation.
LEARNING TO USE THE PADT
89
Figure 10 summarizes the most interesting of these as observed in PADT—in its development version growing in size. Some of the phrases are rather idiomatic (cf. Wehr 1980), but what we notice is the actual freedom of expression and productivity of this linguistic construct. In the list, the heads of the annexations are lexicographically normalized, and the numbers in the rightmost column indicate the counts of occurrences within the treebank. 5. Applications and Prospects The applicability of treebanks is very diverse. The annotated structures can be studied in the educational or purely linguistic framework, as we have just illustrated. The other prominent motivation is to use the data for machine-learning purposes, possibly aspiring to machine translation (cf. Čmejrek et al. 2004) or computational modeling of meaning. In the course of the PADT project, we have developed systems for automatic morphological and analytical disambiguation, a.k.a. tagging and parsing (cf. Hajič et al. 2004b, Hajič et al. 2005). This technology is going to be employed in the processing of the Arabic English Parallel News Part 1 (Ma 2004). Alternative automated annotation methods also come into question, like the parallel-corpus-based syntactic projection (Hwa et al. 2005) or the conversion of constituency annotations into dependencies (Žabokrtský & Smrž 2003; cf. Habash & Rambow 2004). We would as well like to implement algorithms for detection of inconsistencies and errors in the annotations (cf. Dickinson & Meurers 2003). The PADT website will offer any eventual updates. The current distribution already includes scripts for safe and maximally efficient migration of annotations if some data need to be synchronized and the changes propagated across the levels of description.
90
SMR1, PAJAS, 1ABOKRTSKY, HAJI3, MIROVSKY, NEMEC
Figure 10. Selected occurrences of improper annexation found on either level of the treebank.
LEARNING TO USE THE PADT
91
6. Conclusion We have tried to give a practical introduction to the Prague Arabic Dependency Treebank project, with emphasis on PADT 1.0 available to researchers worldwide. Having described the essential data structures in the treebank, we chose to search for and explore a particular linguistic phenomenon. We demonstrated the methodology for posing queries, and outlined how the information in the treebank might be processed in the general case. We have presented and discussed the most noteworthy instances of improper annexation in Arabic that we found in the treebank using this methodology. This is a significant result by itself, and would be extremely hard to achieve without the kind of annotations the treebank provides. We would like to invite others to try their own queries. Treebanking entails many challenging tasks, and we continue to approach them, as well as to improve the existing solutions. 7. Acknowledgements The research described herein was supported by the Ministry of Education of the Czech Republic through projects LN00A063 and MSM113200006, and continues with the support from the Grant Agency of Charles University in Prague, project 207-10/203333. At the time of writing this paper, one of the authors was a grantee of the Fulbright-Masaryk Fellowship awarded by the Fulbright Commission in the Czech Republic. The ‘quest for improper annexation’ was first suggested by Tim Buckwalter, while Iveta Kouřilová helped us with understanding and presenting the topic. We would like to thank them very much as well.
92
SMR1, PAJAS, 1ABOKRTSKY, HAJI3, MIROVSKY, NEMEC
REFERENCES Badawi, Elsaid, Mike G. Carter & Adrian Gully. 2004. Modern Written Arabic: A comprehensive grammar. London: Routledge. Buckwalter, Tim. 2002. Buckwalter Arabic Morphological Analyzer Version 1.0. LDC catalog number LDC2002L49, ISBN 1-58563-257-0. Linguistic Data Consortium, University of Pennsylvania. Čmejrek, Martin, Jan Cuřín & Jiří Havelka. 2004. “Prague Czech-English Dependency Treebank: Any hopes for a common annotation scheme?”. HLTNAACL 2004 Workshop: Frontiers in Corpus Annotation, 47–54. Boston. Dickinson, Markus & W. Detmar Meurers. 2003. “Detecting Inconsistencies in Treebanks”. Proceedings of the Second Workshop on Treebanks and Linguistic Theories (TLT 2003). Växjö. Graff, David. 2003. Arabic Gigaword. LDC catalog number LDC2003T12, ISBN 1-58563-271-6. Linguistic Data Consortium, University of Pennsylvania. Habash, Nizar & Owen Rambow. 2004. “Extracting a Tree Adjoining Grammar from the Penn Arabic Treebank”. Proceedings of Traitement Automatique du Langage Naturel (TALN-04). Fez. Hajič, Jan, Barbora Hladká & Petr Pajas. 2001. “The Prague Dependency Treebank: Annotation structure and support”. Proceedings of the IRCS Workshop on Linguistic Databases, 105–114. University of Pennsylvania. _____, Otakar Smrž, Petr Zemánek, Petr Pajas, Jan Šnaidauf, Emanuel Beška, Jakub Kráčmar & Kamila Hassanová. 2004a. Prague Arabic Dependency Treebank 1.0. LDC catalog number LDC2004T23, ISBN 1-58563-319-4. Linguistic Data Consortium, University of Pennsylvania. _____, Otakar Smrž, Petr Zemánek, Jan Šnaidauf, Emanuel Beška. 2004b. “Prague Arabic Dependency Treebank: Development in data and tools”. Proceedings of the NEMLAR International Conference on Arabic Language Resources and Tools, 110–117. Cairo. _____, Otakar Smrž, Tim Buckwalter & Hubert Jin. 2005. “Feature-Based Tagger of Approximations of Functional Arabic Morphology”. Proceedings of the Fourth Workshop on Treebanks and Linguistic Theories (TLT 2005), 53–64. Barcelona. Hwa, Rebecca, Philip Resnik, Amy Weinberg, Clara Cabezas & Okan Kolak. 2005. “Bootstrapping Parsers via Syntactic Projection across Parallel Texts”. Natural Language Engineering, June 2005. Lagally, Klaus. 2004. ArabTeX: Typesetting Arabic and Hebrew. User Manual Version 4.00, Fakultät Informatik, Universität Stuttgart. Ljubopytnov, Vladimír, Petr Němec, Michaela Pilátová, Jakub Reschke & Jan Stuchl. 2002. “Oraculum, A System for Complex Linguistic Queries”. SOFSEM 2002 Student Research Forum, 27–34. Ma, Xiaoyi. 2004. Arabic English Parallel News Part 1. LDC catalog number LDC2004T18, ISBN 1-58563-310-0. Linguistic Data Consortium, University of Pennsylvania. Maamouri, Mohamed, Ann Bies, Hubert Jin & Tim Buckwalter. 2003. Arabic Treebank: Part 1 v 2.0. LDC catalog number LDC2003T06, ISBN 1-58563261-9. Linguistic Data Consortium, University of Pennsylvania. Maamouri, Mohamed, Ann Bies, Tim Buckwalter & Hubert Jin. 2004. Arabic Treebank: Part 2 v 2.0. LDC catalog number LDC2004T02, ISBN 1-58563282-1. Linguistic Data Consortium, University of Pennsylvania.
LEARNING TO USE THE PADT
93
Mírovský, Jiří & Roman Ondruška. 2002. “Netgraph System: Searching through the Prague Dependency Treebank”. Prague Bulletin of Mathematical Linguistics 77.101–104. Schulz, Eckehard. 2004. A Student Grammar of Modern Standard Arabic. Cambridge: Cambridge University Press. Sgall, Petr, Jarmila Panevová & Eva Hajičová. 2004. “Deep Syntactic Annotation: Tectogrammatical representation and beyond”. HLT-NAACL 2004 Workshop: Frontiers in Corpus Annotation, 32–38. Boston. Smrž, Otakar. In prep. Functional Arabic Morphology. Formal System and Implementation. Ph.D. thesis, Charles University in Prague. _____. 2003. Encode::Arabic. Programming module. Comprehensive Perl Archive Network, http://search.cpan.org/dist/Encode-Arabic/. _____ & Petr Pajas. 2004. “MorphoTrees of Arabic and Their Annotation in the TrEd Environment”. Proceedings of the NEMLAR International Conference on Arabic Language Resources and Tools, 38–41. Cairo. Wehr, Hans. 1980. A Dictionary of Modern Written Arabic. Arabic–English. New York: Spoken Language Service. Žabokrtský, Zdeněk & Otakar Smrž. 2003. “Arabic Syntactic Trees: From constituency to dependency”. EACL 2003 Conference Companion, 183–186. Budapest.
Section II
Phonology, Morphology, and Syntax
INTONATIONAL AND RHYTHMIC PATTERNS ACROSS THE ARABIC DIALECT CONTINUUM Salem Ghazali*, Rym Hamdi*§ and Khouloud Knis* *Institut Supérieur des langues de Tunis § Université Lumière Lyon 2
1. Introduction If a non-native speaker of Arabic lived in Morocco for a period of time, and after learning MA1 tried to communicate with his newlyacquired language in Egypt, he would be in for a major disappointment. What he/she had thought was Arabic would be just Jabberwocky for an Egyptian. In a comparable situation, illiterate Arabs, too, are very likely to face the same ordeal due to the various lexical, syntactic, morphological and sound pattern differences between the Arabic dialects. There are practically as many distinct words for the possessive pronoun “mine” or “yours”, for example, as there are national anthems in the various Arab countries, and probably not much less different expressions to mean “there is” or “I want”. Syntactically, yes/no questions in MA are phrased like wh-questions in other dialects. The morphosyntax of negation is another case in point. A sentence is negated with a free morpheme maa before a verb in the East, a disjoint bound morpheme maa…ʃ or maa...ʃi surrounding the verb in Egypt, Libya and Tunisia, and this circumfix is even extended to adjectives further West. At the phonetic and phonological level, the variations cover all the aspects of speech manifestation from the realization of segments and their temporal organization to syllable structure and 1
The Arabic dialects will be referred to with the following abbreviations: MA=Moroccan Arabic. AA=Algerian Arabic; TA=Tunisian Arabic; EA= Egyptian Arabic; LA=Lebanese Arabic; SA=Syrian Arabic, JO=Jordanian Arabic; IA= Iraqi Arabic; MSA=Modern Standard Arabic.
98
SALEM GHAZALI, RYM HAMDI & KHOULOUD KNIS
supra-segmental features such as stress rhythm and intonation. Drawing upon several previous studies, some of which are unpublished, this paper aims at comparing different aspects of the supra-segmental or prosodic variations across various Arabic dialects in the light of phonetic and phonological factors underlying their structure. For historical and other reasons, variations exist even within the same country, but speakers of other dialects will still manage to identify someone as roughly coming from a specific Arab country or region rather than another. A perceptual study (Barkat 2000) shows that subjects from both North Africa (hereafter NA) and the Middle East (hereafter ME) were able to correctly identify speech stimuli produced by speakers from Morocco, Algeria, Tunisia, Lebanon, Syria and Jordan as belonging to NA or the ME 97% percent of the time. When asked to be precise and identify the country to which the speaker belongs, correct identification rates dropped for subjects from NA to 78% when the speakers are from NA, i.e., from a neighboring country, and to 32% when they are from the ME. For subjects from the ME, correct identification rates are 90% for the same region (ME) and 59% for speakers from NA. Similar stimuli were presented to Frenchspeaking subjects, who were asked to perform the same task as in the previous experiment. These subjects were successful in distinguishing stimuli as being from NA or the ME only in 56% of the cases. The results are statistically significant at the .05 level but slightly different from chance. Thus, while speech differences within the Arabic dialect continuum are barely perceptible to the foreign ear, speakers of Arabic are sensitive to inter-regional—and to a lesser extent intra-regional— variations. According to the Arab subjects in the above-mentioned experiment, the determining factors in deciding whether a stimulus was from NA or the ME was the fast speaking rate in the dialects of NA as well as their jerky nature, especially with respect to MA and AA. Linguistically, we can interpret this impression as relating to the suprasegmental feature of speech rhythm and perhaps to intonation structure. These prosodic features, however, are not independent of the nature and organization of segmental material. Languages have been classified as belonging to roughly three classes of rhythm: stress-timed, syllable-timed and mora-timed. Roach (1982), Dauer (1983, 1987), Laver (1994) and Ghazali et al. (2002),
INTONATIONAL AND RHYTHMIC PATTERNS
99
among others, include descriptions of these types of rhythm and other pertinent matters, namely evidence against the isochrony hypothesis. With regard to the rhythm type to which Arabic belongs, all the investigations, regardless of the dialect, have classified Arabic as stress-timed (Abercrombie 1967, Miller 1984, Benguerel 1999, Tajima et al. 1999, Cheikhrouhou 2005, Ben Abada 2004). Now, if all Arabic dialects are stress-timed, as these investigations suggest, then either rhythm is not an important factor in discriminating between them, or rhythm is a cline, i.e., there are subclasses of rhythm within the stresstimed group. These sub-classes, should they exist, must be distinct enough to allow for discrimination and not too different to fall in another rhythm category such as syllable timing. In a study that will be described below, Ghazali et al. (2002) attempted to investigate these speech rhythm variations within the Arabic dialect continuum. Before discussing the results of that investigation, it would be useful to present an overview of what might be causing these differences in speech rhythm. 2. Vowel Duration and Syllable Structure along the Continuum 2.1 Vowel duration Segmental duration is phonemic in Arabic (for both vowels and consonants), but both short and long vowels are longer in the dialects of the ME than their counterparts in NA. In fact, as one moves towards the Western pole of the Arab region, not only do all vowels shorten but also the difference between long and short vowels decreases to the point where the opposition may no longer be based on duration as was suggested for Moroccan dialects. Table 1 is a summary of short/long vowel ratios in different Arabic dialects and MSA obtained from various studies. These results are based on measurements of vowel durations reported in Jomaa (1991), Barket (2000) and other investigations. 2 In comparing these percentages, one has to keep in mind that the measurements come from different investigators who did not necessarily carry out their analyses under similar experimental conditions. While the values in Table 1 generally represent the 2
We have sometimes computed the ratios ourselves following inspection of pertinent elements in the data available.
100
SALEM GHAZALI, RYM HAMDI & KHOULOUD KNIS
durations of the vowels [a]-[aa], other factors may vary from one data set to another. We do not always have information on the features of the consonant following the test vowel (voicing, manner and place of articulation), the characteristics of the syllable hosting the experimental vowel (open or closed, stressed or unstressed), the number of syllables forming the test word, the number of subjects from whom the measurements were obtained, etc. Similarly, not all the data summarized in Table 1 are very useful for comparison. For example, the duration percentages from the Iraqi dialect were obtained from measurements of short and long vowels produced in isolation and maintained artificially for 300 and 600 ms, respectively. Note also that for MSA, the ratio increases for some speakers when the test words are produced by subjects from NA. This may be in line with the higher ratios in the dialect of the Western region where subjects may be extending the vowel system of their dialect to that of the Standard. Dialect Short/Long Vowel Ratio Source Standard Arabic a. Eastern 39% Port et al. 1980 b. Western 39-50% Ghazali & Braham 1992 Syrian 46% Irikoussi 1981 Kuwaiti 48% Al-Dossari 1989 Jordanian 62% Mitalb 1984 37-50% Zawaydah & de Jong 1999 Iraqi 50% Al-Ani 1970 Lebanese 50% Obrecht 1968 Sayah 1979 Saudi 52% Al-ghamdi 1992 Egyptian 59% Norlin 1987 Tunisian 59% Jomaa 1991 63% Ghazali unpublished Moroccan 77% Rhardisse et al. 1990 Table 1. Durations of Short Vowels Expressed as Percentages of Long Vowels
On the whole then, the data in Table 1 show a trend towards a decrease in duration differences between short and long vowels as we move into NA. Concerning MA, there has been an ongoing debate on whether this dialect distinguishes between short and long vowels. While studies, such as the one on which the data in Table 1 are based (Rhardisse et al. 1990), argue for the presence of quantity, others (Benkirane 1982, 2002; Embarki 1997) provide a host of evidence
INTONATIONAL AND RHYTHMIC PATTERNS
101
(phonological, phonetic and perceptual) against a short vs. long vowel opposition in MA. They propose instead a system with three full vowels [i, a, u] and one reduced schwa [ǝ] in closed syllables. The phenomenon of vowel reduction is not, of course, limited to MA. Short high vowels are produced as centralized [ɪ] and [ʊ] in closed syllables in all dialects (Ghazali 1979), and in the Western dialects where short vowels cannot occur in open syllable, [ɪ] and [ʊ] are the only short high vowels available. In this case, quantity opposition is accompanied by quality distinction among the high vowels, and phonological contrast preserving vowel quality can only be obtained with the low vowel. Thus, while a long high vowel can occur in both closed and open syllable CVV$(C)CV, its short counterpart can only occur in a closed one CVCC…as in the examples from TA below. High vowels Long “religious holiday” ʕiid ʒiibu “his pocket” ʒiibhæ “her pocket” ʃuufu “look at him” ʃuufhæ “look at her”
Short ʕɪdd ʒɪbnæ ʒɪbbæ ʃʊftu mʊddæ
“count!” “cheese” “a man’s garment” “I saw him” “a period of time”
The low vowel mææʃi “going” xamsæ “five” kælbæ “bitch”
mæʃjæ xaamsæ kæælbæ
“pace, gait” “the fifth” (fem. sing.) “restless, turbulent”
Vowel reduction seems to be gaining more ground in other Western dialects, especially among younger generations (Hammi 2004). In TA, for example, both [i] and [æ] tend to approximate the formant structure of [ǝ] in final open syllables. In a study based on spontaneous speech, Barkat (2000) found that while vowels cluster more around the center of the vowel space in the dialects of NA, they remain more peripheral in the Eastern dialect. She observed that, in her corpus, the most frequent vowel in NA is the central vowel [ǝ], which represents 30% of the occurrences of all the short vowels. She also noted that TA represented somewhat a transition dialect intermediate between NA and the ME where vowel reduction is
102
SALEM GHAZALI, RYM HAMDI & KHOULOUD KNIS
more marked than in the Eastern dialects but less important than in the rest of NA. With regard to vowel duration obtained from spontaneous speech, her results show the same trend as in the studies included in Table 1. Adding up the durations of all vowels produced by all the subjects regardless of vowel quality, she obtained a 41% short/long vowel ratio for the ME region and 51% for NA. 2.2 Syllable structure3 It is a well-known fact that Western dialects do not in general permit short vowels in open syllables. There are, however, exceptions at least in some southern dialects of Tunisia, where very short vowels are preserved [CǝCV]/ [CICV]. Sometimes only high vowels are deleted but the low vowel is kept in that position, as in the dialect of Sfax in Tunisia where, for example, “tomato” is pronounced [ṭamaaṭim] and “waste” is [xaṣaara] (CVCVVCV(C)) as opposed to [ṭmaatim] and [xṣaara] (CCVVCV(C)) elsewhere in NA. There are also occurrences of short vowels in open syllables in words that have been borrowed from MSA or from foreign languages but have become part of the native lexical inventory. The vowels in the first syllable of the words [mudiir] “director”, [muʕællɪm] “teacher” from MSA, and [mækiinæ] “machine”, [biduun] “liquid container” and [babuur] “ship” (from French “machine, bidon and vapeur”) are phonetically short. They are, however, preserved perhaps for one of the two following reasons or for both: first, socio-linguistic factors to distinguish the vernacular from the Standard. For example, the word for “teacher” is used in the dialect to denote “mason” or “foreman”, and in this meaning it is pronounced [mʕallɪm] with the first vowel being deleted. The second reason, which in my view accounts for all the cases, is that these vowels have been reanalyzed as long vowels, by analogy to other words. Let’s illustrate this with examples from TA. These data from Ghazali (1979) are also valid for other Western dialects that have preserved vowel quantity. The vowel in the monosyllabic word [ziit] “oil” is long and of course stressed. In the word [zituun], from underlying /ziituun/ “olives”, the 3
Although they ultimately underlie rhythmic patterns, foot and moraic structure will not be dealt with in this paper. For a discussion of these phonological matters, the reader is referred to Benkirane (1982), Benhallam (1990), Imouzaz (2002) for MA, Kiparsky for a comparison of the different dialects, McCarthy & Prince (1990a) as well as many other scholars.
INTONATIONAL AND RHYTHMIC PATTERNS
103
vowel in the first syllable shortens because stress has shifted to the second syllable containing the long vowel, but does not drop. In the word [zitunææt] “olive trees”, stress shifts to the last syllable which includes a long vowel and, consequently, the vowels of the first two syllables shorten but do not drop. Thus, in a borrowed word such as [biduun], the first vowel is most likely reanalyzed as long by analogy to [zituun]. Note that an underlying short vowel will drop in a similar environment; the last vowel [ɪ] in the verb [jɪktɪb] “he writes” will drop when the last consonant becomes the onset of the following syllable following processes of affixation (or cliticization) that adds a morpheme beginning with a vowel. Thus /jɪktɪb+u/ is phonetically [jɪktbu] “they write, or he writes it”. The deletion of high vowels in open syllables has resulted in a great deal of lexical items comprising syllables with complex onsets; thus, instead of the typical CVCV…and CVCCVCV… in the ME, one finds a predominance of CCV… and CCVCCCV… in NA. If we consider the large number of reduced vowels and the fact that, contrary to what happens in many other languages, consonants in Arabic lengthen in clusters instead of being compressed (Rejili & Ghazali 2003), then there is a great deal of closure and little space left for vowels in the speech chain of Western dialects. This is most likely what gave the impression of fast and jerky speech as reported by Barkat’s subjects. 3. Speech Rhythm 3.1 Production tasks. 3.1.1 Variations within the Arabic continuum In an attempt to relate these vowel duration and syllable structure differences in the bipolar Arabic dialect domain, data were obtained from six Arabic dialects (Ghazali et al. 2002, Hamdi 2001). The dialects investigated were: Moroccan (four speakers), Algerian (two speakers) and Tunisian (two speakers) representing Western Arabic, then Jordanian (two speakers), Syrian (three speakers) and Egyptian (one speaker) representing the Middle East. The speech data were taken from the recordings used in Barkat (2000) where each subject listened sentence by sentence to the story “The North Wind and the Sun” in French and translated each sentence spontaneously into his dialects. The language corpus used consisted of 140 Arabic sentences (ten sentences per subject) with an average duration of 2.5 seconds for each
104
SALEM GHAZALI, RYM HAMDI & KHOULOUD KNIS
sentence. Following the experimental procedures proposed by Ramus et al. (1999), each segment was classified as vowel or consonant. The next step was to measure the duration of (i) each sentence, (ii) each string of consecutive vowels (vocalic intervals), and (iii) each string of consecutive consonants (consonantal intervals). For example, the following sentence from MA is comprised of ten vocalic and ten consonantal intervals. mbʕdha bdtǝriḥ ṭṣuṭ bkul quwwǝtha CCVCCCV CCVCVCVC CCVC CCVC CVCCVCCV And then the wind began to blow with all its force
The following step consisted in computing the proportion of vocalic and consonantal intervals (V% and C%, respectively) in each sentence and the standard deviation of vocalic and consonantal intervals within each sentence (∆V and ∆C, respectively). Details of how these variables are computed are explained in Ramus & Melher (1999). Table 2 shows the average proportions of vocalic intervals (%V) and the average standard deviation of consonantal intervals (∆C) for the subjects in each of the six dialects investigated. Western %V ∆D Eastern Area Area Morocco 32.38 8.52 Egypt Algeria 33.84 6.75 Jordan Tunisia 34.97 5.25 Syria Table 2. Computed values for %V and ∆C
%V
∆C
36.98 41.09 43.66
3.89 4.76 4.54
These results show that while the proportion of vocalic intervals represents less than 50% of the total duration of a sentence in all the dialects, it is more important in the Eastern dialects than the Western ones. In fact, there is a gradual increase of %V as one moves from West to East. Conversely, ∆C decreases from West to East. Figure 1 illustrates the negative correlation between %V and ∆C and dialect location (r = - 0.75); as one moves from West to East ∆C decreases and %V increases.
INTONATIONAL AND RHYTHMIC PATTERNS
105
Using a t-test for the difference between pairs of dialects, we found that the significance level is directly proportional to the distance between the two countries. For example, while these results (for both %V and ∆C) are not significant when Syria is compared to Jordan or when we compared Morocco to Algeria, that is, when two dialects belong to the same region, they are highly significant for pairs of dialects located in the opposite ends of the continuum such as Syrian and Moroccan (p> 0.001). Note, however, that both Tunisia and Egypt, which are located near the center of the continuum, show significant results only when compared to Morocco, a fact that we will try to account for later. Figure 2 is an illustration of the average values of the proportion of vocalic intervals and the standard deviation of the consonantal intervals when the three dialects of each region are grouped together. It clearly shows that %V is higher in the dialects of the Middle East than in the dialects of North Africa (p< 0.0001), while we obtain the opposite results for ∆C. Figure 3 is a three-way comparison: Morocco and Algeria representing the far end of the western pole, Syria and Jordan the eastern end, and Tunisia and Egypt an intermediate zone. This comparison confirms the gradual decrease of %V from East to West with Tunisia and Egypt exhibiting intermediate values, but shows that with respect to ∆C, Tunisia and Egypt are closer to the dialects of the Middle East than to North Africa. 3.1.2 Discussion Languages with the highest ∆C and low %V such as English were those traditionally classified as stress-timed. In Ghazali et al. (2002), the dialects that exhibit these characteristics are those spoken in North Africa. Since Eastern dialects such as those of Iraq (Benguerel 1999) and Jordan (Tajima et al. 1999) have also been classified as stresstimed, then we should perhaps allow for a great deal of variation within the class of stress-timing. To maintain a discrete category ‘stresstiming’ as distinct from some other timing, there should exist one or more key factors, the presence of which constantly induces the perception of stress-timing. Such a conditioning factor could be the tendency in all Arabic dialects for long or heavy syllables to attract stress. Since syllabic weight in these dialects is a cline, we may get the
106
SALEM GHAZALI, RYM HAMDI & KHOULOUD KNIS
Figure 1. Distribution of the dialects along the %V and ∆C dimensions based on Ghazali et al. (2002)
Figure 2. Comparison of %V and ∆C in NA and the ME based on Ghazali et al. (2002) 100 80 60 40 20 0 %V
Delta C Jord-Syria
T un-Egypt
Moro-Alg
Figure 3. Comparison of %V and ∆C in three groups of dialects (Algeria + Morocco; Tunisia + Egypt; and Jordan + Syria) based on Ghazali et al. (2002)
INTONATIONAL AND RHYTHMIC PATTERNS
107
impression of different subclasses of rhythm. Note also how dialects geographically located between the two poles are also intermediate with respect to phonetic facts (Figure 1). Barkat (2000) reported that most of the discrimination errors made by her subjects were the result of not being able to correctly classify Tunisian speakers. In fact, Tunisian speakers have a %V similar to North Africa but a ∆C closer to the Middle East. In other words, their vowels are slightly longer and less reduced than those of Moroccans and Algerians, but significantly shorter than Syrians and Jordanians. They don’t, however, exhibit the same syllabic complexity as the other North African subjects. 3.1.3 Comparing Arabic to other languages Using the same experimental procedures, Hamdi (in preparation) compared the six Arabic dialects to three other languages: English and French categorized in the literature on speech rhythm as stress-timed and syllable-timed respectively, and Catalan, an intermediate category between stress-timing and syllable-timing. Regarding the Arabic dialects in this investigation, Lebanese replaced Syrian and the number of subjects who generated the speech data was increased to five for Egyptian and Algerian and to ten for the other dialects. For English, French and Catalan there were five subjects for each language. Figure 4 shows that there are three overlapping sub-classes of Arabic dialects: Moroccan and Algerian with high ∆C values and low %V, Jordanian and Lebanese with high %V and Low ∆C, and a somewhat intermediate area occupied by Tunisian and Egyptian. These results confirm the previous findings from a smaller sample. Figure 5, which is a comparison of the various Arabic dialects with English, French and Catalan, shows that: a) Moroccan and Algerian (Western dialects) are on one end and French on the other. They are most distinct, with Moroccan and Algerian having the highest ∆C and the lowest %V values and exactly the opposite for French. b) English is somewhere in the middle although higher on the ∆C values than all other languages and dialects except for MA and AG. English, however is not very low in terms of %V. c) Lebanese and Jordanian (the Eastern dialects) are closer to French with respect to %V, but closer to English in terms of ∆C. d) Tunisian and EA are both closer to Catalan and have
108
SALEM GHAZALI, RYM HAMDI & KHOULOUD KNIS
lower values for both %V and ∆C than English. They also have higher ∆C value than ME dialects but lower %V values. Remember that ∆C is a measure of the complexity of consonant sequencing, i.e., it is directly proportional to syllable complexity. The %V parameter is an indicator of vowel reduction. The value of %V is low when there is a predominance of short or reduced vowels. Thus, if these parameters are taken as indicators of rhythm types (Ramus et al. 1999), then MA and AG are more stress-timed than English, a language that is already considered as strongly stress-timed. It may also be the case that a new category of rhythm is needed to account for these two dialects, but we do not know at this stage what that would be. Before discussing the results of other investigations using different experimental techniques, note that Hamdi et al. (2004) attempted to compare results from ∆C and %V measurement with results obtained from Pair-wise Variability Indices (PVI). This technique, described in Grabe & Low (2002), is more sensitive to stress patterns as it measures the mean difference in duration between two successive vowels in an utterance. Results from this method, which is supposed to provide a global representation of speech rhythm, are basically the same as those obtained from ∆C and %V measurements (r2= 0. 83). 3.2 Speech cycling tasks 3.2.1 Comparing Jordanian Arabic (ME) to English and Japanese Tajima et al. (1999) used the speech cycling method (Tajima 1998) to compare English, Arabic (JA) and Japanese, knowing that the latter has been categorized as mora-timed. They reported that Arabic and English exhibited similar rhythmic patterns that were different from Japanese. They concluded that “Arabic and English speakers seem to pay close attention to the stressed syllables, producing them at simple harmonic phases” (288). Comparing English and Arabic, they noted that “stressed syllables within a phrase deviated from a strictly isochronous sequence to a greater extent in Arabic than in English.” These remarks seem to be in line with the differences between English and JA observed in the ∆C and %V values above, which suggested that English was more stress-timed than JA.
INTONATIONAL AND RHYTHMIC PATTERNS
109
Figure 4. %V as a function of ∆C for all subjects in Arabic dialects, from Hamdi (in preparation)
Figure 5. Distribution of the Arabic dialects, English, French and Catalan with respect to %V and ∆C, from Hamdi (in preparation)
3.3 Perceptual tasks 3.3.1 Comparing Tunisian Arabic (Intermediate) to English and French Ben Abda (2004) set up a perceptual experiment using reversed speech stimuli in an attempt to find out if subjects could discriminate
110
SALEM GHAZALI, RYM HAMDI & KHOULOUD KNIS
between spectrally inverted sentences from English, French and TA. The sentences came from the story “The Little Red Riding Hood” and included statements, different types of questions, exclamations, etc. With reversed speech, subjects only have available syllabic structure (although inverted) and supra-segmental information. The subjects who produced the speech samples were one native speaker of American English and four native speakers of TA. The latter produced both the TA and the French stimuli. The subjects for the listening task were ten native speakers of TA and ten native speakers of English. They were all university graduates and were trained by the experimenter to distinguish between stress timing and syllable timing. The first task consisted in identifying the inverted stimuli as English or Arabic. The results from this first task, although statistically significant, show some degree of confusion (116 correct identifications and 84 misses). Looking at the distribution of these answers, one notes that most of the correct answers (82/116) came from the native speakers of English and most of the errors from Tunisian listeners (66/84). When the task was to discriminate between French and English, the overwhelming majority of answers were correct (177) and only 23 were wrong, with Tunisian subjects doing better this time. In the third listening task, stimuli from the three languages were presented at the same time. In this case, French was clearly distinguished from Arabic and English, but English and Arabic were confused, with correct identification occurring only 53% of time, again with Tunisians accounting for most of the incorrect answers. What caused the difference between the behavior of English and Tunisian listeners is not clear. It might be that the English group took the experiment more seriously and were thus more attentive, or that they were particularly sensitive to some cues in the acoustic signal. Note, though, that care was taken to use Arabic sentences that did not contain the back consonants that do not occur in English such as uvulars and pharyngeals. Perceptual experiments using reversed speech confirm then the findings from production experiments with regard to the similarity between English and, at least, Tunisian and JA. They also confirm the differences between French on the one hand, and English and Arabic on the other. In this experiment, stimuli were also used which consisted of just F0 and amplitude and no segmental material. F0 was extracted
INTONATIONAL AND RHYTHMIC PATTERNS
111
from the same test sentences but before spectral inversion. Surprisingly, here, too, the subjects were able to discriminate between French on the one hand, and English and Arabic on the other, but not between Arabic and English in a significant manner. No attempt was made in this experiment to produce stimuli where segmental material is preserved but F0 is kept constant. Controlling those variables could be useful in determining the relative importance of each variable as a discrimination factor. In the next section we will examine pitch variations in the Arabic dialects. 4. Intonation Investigating intonation patterns across Arabic dialects was motivated by the same reasons as those behind our work on rhythm, namely to understand the supra-segmental features underlying the perceived speech differences. Knis (2004) recorded data that consisted of both read sentences and spontaneous speech from six Arabic dialects. The sentences included statements, yes/no questions, whquestions, and sentences in contexts expressing doubt or surprise. The subjects were also asked to tell the story of “The Little Red Riding Hood”. There were two subjects from each of the following countries: Morocco, Tunisia, Egypt, Syria and Iraq. The subjects were intended to be representative of the Western, the Eastern and the intermediate regions. In addition to speech data from the dialects, the subjects were also asked to produce the same speech material in MSA, the aim of the investigation being to find out if the dialects had an effect on the production of the standard. Only the results pertaining to the intonation patterns of statements in the dialect will be discussed in this paper. The analysis of pitch patterns was carried out using a version of the ToBI method (Silverman et al. 1992) 4.1 Results4 4.1.1 Egyptian Arabic The statements produced by the two Egyptian speakers show clear cases of the declination phenomenon. The intonation contour tends to go down from beginning to end with peaks corresponding to stressed syllables that are generally lower compared to the initial pitch accent. 4
Figures 6 to 17 are based on data from Knis (2004).
112
SALEM GHAZALI, RYM HAMDI & KHOULOUD KNIS
Figures 6 and 7 illustrate the intonation contours of the sentences Ɂinnaharda elgawwi baarid “it’s cold today” and Ɂibna ʕammak ʕajjaan “your cousin is ill”, res-pectively. In general the patterns comprise two pitch accents with two peaks that may sometimes have the same level occurring on the first and the last stressed syllables. The boundary tone in statements is almost always a low tone (L%). In summary, the most typical patterns for statements in EA are the following: either HL*L!H*LL% (57% of the sentences) or LH*LL% (30% of the sentences) where HL*=falling, !H*=down-stepped high, L%=low boundary tone. 4.1.2 Syrian Arabic Statements in SA exhibit a rising-falling pattern with the high tone extending over an important stretch of segments. This results in plateau tunes or hat patterns where the maximum pitch does not form a peak but stretches over the whole stressed syllable extending sometimes to the next one as in the sentence maama bilbeet “my mother is at home” in Figure 8. There are also bi-tonal pitch contours where pitch rises to the same level over the two stressed syllables, and the last pitch accents falls on the final or penultimate syllable. The intonation of statements in this sample is also characterized by declinations. There is an initial high peak then pitch starts to fall with rises corresponding to stressed syllables as in the sentences ṭaɁṣ beerid eljoom “it’s cold today” illustrated in Figure 9. Another important observation one can make regarding SA is that pitch is sometimes falling on stressed syllables. The first syllable of the initial word kaanit in the sentence kaanit elɁasɁilǝ ṣaʕbǝ “the questions were difficult” (Figure 10) is stressed and exhibits a HL* pattern. This pattern is rare in Western dialects. On the whole, the major patterns observed are either LH*HLL% or LH*L!H*L% which account for 53% and 30% of all the patterns, respectively.
INTONATIONAL AND RHYTHMIC PATTERNS
Figure 6. Ɂinnaharda elgawwi baarid
Figure 7. Ɂibna ʕammak ʕajjaan
Figure 8. maama bilbeet
113
114
SALEM GHAZALI, RYM HAMDI & KHOULOUD KNIS
Figure 9. ṭaɁṣ beerid eljoom
Figure 10. kaanit elɁasɁilǝ ṣaʕbǝ
4.1.3 Iraqi Arabic. What characterizes statements in the Iraqi sample is the frequency of the HL* pattern as illustrated in the sentence eldʒaw baarid eljoom “it’s cold today” (Figure 11). This recurrent falling pattern is very rare in the other dialects. One-pitch accent contours are seldom found here, and the typical patterns are intonation contours with continuous pitch variations on the syllables that bear lexical stress. Syllable prominence is achieved either through pitch rise or pitch fall. In general, the salient feature of this dialect is the predominance of peaks and valleys within the contour, which leads to a continually changing melody as in sentence ṣabaaḥ elxeer jaa ʒaddati “good morning grandmother”. It seems as if there was a pitch accent on each lexical stress even when
INTONATIONAL AND RHYTHMIC PATTERNS
115
the highest pitch falls on one particular syllable, which in turn varies from one position to another in the sentence. The final boundary tones are mostly low; the non-low tone at the end of the sentence in Figure 12 could be explained by the fact that it was extracted from a story where the speaker had more to say in this particular situation. The most recurrent pitch patterns in statements for this dialect are: HL*H*L% or LH*L!H*L% and to a lesser extent LH*HL%. These three patterns account for 92% of all the sentences examined.
Figure 11. eldʒaw baarid eljoom
Figure 12. ṣabaaḥ elxeer jaa ʒaddati
4.1.4 Moroccan Arabic The declination which characterizes Eastern dialects is not present in the intonation contours of MA. There is usually one rising pitch that corresponds to a stressed syllable, and in this sample the typical pattern is a rising-falling one, with the peak being on the penultimate stressed syllable, especially in a neutral context with no particular focus as in the sentence wǝld ʕammɪk mriδ “your cousin is ill” (Figure 13).
116
SALEM GHAZALI, RYM HAMDI & KHOULOUD KNIS
Sentences with more constituents in their phrase structure may have two pitch accents or more. The sentence in Figure 13 is made up of one NP (a genitive construction) and an AP and exhibits one pitch accent. The sentence lʒaw barid ljum “it’s cold today” (Figure 14) is comprised of a NP, an AP, and an ADV and is rendered with two pitch accents separated by a short pause. The first accent rises and falls then slightly rises again because it is non-terminal, and the second pitch is rising-falling, indicating the end of the sentence. These patterns are similar to those described by Benkirane (2002) who observed this same rising-falling contour for statements in MA. He noted that in the sentence amina mreḍa “Amina is ill”, the peak was on the penultimate stressed syllable, and that on the whole there was no evidence for declination in the intonation contour and no alternations of peaks and valleys. The most prominent intonation patterns for statements in MA are LH*LL% (40% of the cases) and LH*HLL% (30% of the cases).
Figure 13. wǝld ʕammɪk mriδ
Figure 14. lʒaw barid ljum
INTONATIONAL AND RHYTHMIC PATTERNS
117
4.1.5 Tunisian Arabic For statements, the sentences from the TA sample show a simple rising-falling pitch pattern with the peak being at the beginning of the contour on the first stressed syllable as in the sentence baarda ljuum “it’s cold today” in Figure 15. This LH*LL% pattern is very common in short statements comprised of no more than two or three words. In the sentence wɪld ʕammɪk mrii δ “your cousin is ill” (Figure 16), the pitch rise is on the second syllable, that is, the first syllable of the second word in the compound noun wɪld ʕammɪk, literally “son-uncle (uncle’s son)”, which is the typical stress pattern in these types of endocentric compounds where stress falls on the non-head noun. The final boundary tone was falling in most of the cases, but both subjects produced the sentence ommi fɪddaar “my mother is at home” with a final rise (Figure 17). The major pitch patterns for statements in TA are then LH*LL% (46%), H*LL% (20%) and LH*HLL% (13%).
Figure 15. baarda ljuum
Figure 16. wɪld ʕammɪk mriiδ
118
SALEM GHAZALI, RYM HAMDI & KHOULOUD KNIS
Figure 17. ommi fɪddaar
4.2 Summary We may conclude from comparing the intonation patterns of statements in the samples representing the six dialects above that: a. They generally end with a low boundary tone as is the case for the great majority of world languages. Most of the exceptions to this pattern can be explained by utterances that remain incomplete. b. In Western dialects, and to a lesser extent in EA, nuclear accents and accompanying pitches are predominantly rising-falling (LH*LL %). c. High pitches in SA are rarely represented by rapid changes in the contour, but are maintained over an extended period of time forming a plateau. The pattern LH*HL% represents more than 50% of the statements in this dialect. d. IA is characterized by the predominance of the HL* pattern which is encountered to a limited extent in SA but almost absent from the other dialects. e. Declinations are frequent in the Eastern dialects, giving them a greater melodic variation compared to Western dialects. 5. Conclusion When we set out to obtain the speech data that would allow us to compare the different Arabic dialects, we were hopeful our informants would produce equivalent if not identical utterances. Although we tried various techniques to elicit the desired responses, there was not much we could do about the lexical item a speaker chose to designate something in his/her dialect. In the intonation study, for example, the
INTONATIONAL AND RHYTHMIC PATTERNS
119
expression “it’s cold today”, came out as as baarda ljuum in TA, lʒaw barid lyum in MA, ṭaɁs beerid eljoom in SA, and Ɂinnaharda- lgawwi baarid in EA, with two different words for “weather” (if we ignore some phonetic variations). Note that we could have ended up with three separate words had the TA subject produced the usual expression iddinja baarda ljuum, literally, “the world is cold today”. Lexical differences in themselves may thus be sufficient cues for discriminating between the various dialects, especially those on the opposite ends of the Arab region. Beyond lexical diversity, however, there is an array of inter-related and inter-dependent segmental and supra-segmental variables that contributes to the coloring of this dialectal panorama. In addition to specific intonational patterns distinguishing the Western from the Eastern varieties, there are salient features characterizing different dialects or sub-regions. Short and reduced vowels coupled with complex syllabic structure, as in the Western pole of the continuum, seem to be correlates of stress-timing. But some Eastern dialects were also shown to be closer to stress-timed languages such as English than to mora-timed languages like Japanese (Tajima et al. 1999, Benguerel 1999). However, when samples representing all the Arabic varieties were compared at the same time to other syllable-timed and stresstimed languages, we no longer obtain a homogenous group of Arabic dialects with respect to rhythm types. Some Western dialects (MA and AA) with very high ∆C and very low %V seemed to be more stresstimed than English. The Eastern dialects (JO and LA), exhibiting high %V and very low ∆C, were closer to syllable-timed languages such as French than to MA or AA. A third intermediate group (TA and EA) appeared somewhere in between, although there was a slight difference between them with respect to ∆C. Since rhythm categorization is mainly a matter of perception, controlled perceptual experiments could serve to indicate whether these differences in the acoustic signal really correspond to distinct rhythm types or to subclasses within a major rhythm category. .
120
SALEM GHAZALI, RYM HAMDI & KHOULOUD KNIS
REFERENCES Abercrombie, David. 1967. Elements of General Phonetics. Edinburgh: Edinburgh University Press. Al-Ani, Salman. 1970. Arabic Phonology: An acoustic and physiological investigation. The Hague: Mouton, Al-Dossari, A. 1989. Le phasage des gestes mandibulaires vocaliques et consonantiques en arabe koweïtien. Mémoire de DEA, Université de Grenoble III. Barkat, Melissa. 2000. Détermination des indices acoustiques robustes pour l’identification automatiques des parlers arabes. Thèse de Doctorat, Université Lumière Lyon. Ben Abda, Imen. 2004. The Perception of Rhythm in English and Tunisian Arabic: A comparative study. M.A. thesis, Institut Supérieur des Langues de Tunis, Tunis. Benguerel, A. 1999. “Stress-timing vs. Syllable-timing vs. Mora-timing: The perception of speech rhythm by native speakers of different languages”. VARIA, Etudes & Travaux 3. Benhallam, A. 1990. “Moroccan Arabic Syllable Structure”. In Langues et Littératures VIII, Publications de la Faculté des Lettres et des Sciences Humaines, Rabat. Benkirane, Thami. 1982. “Durée, prosodie et syllabation en arabe marocain. Travaux de l’Institut de Phonétique d’Aix 8. 49-83. _____. 2002. Codage Prosodique de l’énoncé en arabe marocain. Thèse de doctorat d'état. Cheikhrouhou, Maha. 2005. Tunisian Arabic and English Speech Rhythm: A comparative analysis. Doctoral thesis, University of Manouba, Tunis. Cruttenden, Alan. 1986. Intonation. New York: Cambridge University Press. Dauer, R. M. 1983. “Stress-timing and Syllable-timing Reanalyzed”. Journal of Phonetics 51-69. _____. 1987. “Phonetic and Phonological Components of Language Rhythm”. Proceedings of the XIth ICPhS, Tallinn, Estonia 5.447-450. Embarki, M. 1997. “La quantité vocalique en arabe marocain: entre l'apparentement historique et la réalité acoustique”. Actes des Journées d'Etude Linguistique: La Voyelle dans tous ses Etats, Nantes 44-49. Ghazali, Salem. 1979. “Du Statut des voyelles en Arabe.” Etudes Arabes; Analyses Théorie 2/3:8.199-219. Ghazali, Salem & Abdelfattah Braham. 1992. “Voyelles longues et voyelles brèves en arabe standard: Organisation temporelle”. 19ème JEP du GCP de la SFA. Bruxelles. Ghazali, Salem, Rym Hamdi & Melissa Barkat. 2002. “Speech Rhythm Variation in Arabic Dialects.” Proceedings of the First International Conference on Speech Prosody. Aix-en-Provence, pp 331-334. Grabe, E. & E. L. Low. 2002. “Durational Variability in Speech and the Rhythm Class Hypothesis”. Papers in Laboratory Phonology 7. The Hague: Mouton, Hamdi, Rym. 2001. ?al-?iqaaʕ fi-llahajaat ?al-ʕarabiyya: diraasasamaaiyya. M.A. thesis, Institut Supérieur des Langues de Tunis, Tunis. Hamdi, Rym, Melissa Barkat-Defradas, E. Ferragne & François Pelligrino. 2004. “Speech Timing and Rhythmic Structure in Arabic Dialects: A comparison of two approaches”. ICSLP.
INTONATIONAL AND RHYTHMIC PATTERNS
121
Hamdi, Rym. In preparation. Détermination d’indices prosodiques robustes en vue de l’identification automatique des parlers arabes. Ph.D. dissertation, Institut Supérieur des Langues de Tunis, Tunis. Hammi, Rihab. 2004. English Vowel Reduction to Schwa by EFL Tunisian Students. M.A. thesis, Institut Supérieur des Langues de Tunis, Tunis. Hirst, Daniel & Albert Di Cristo, eds. 1998. Intonation Systems. A survey of twenty languages. Cambridge: Cambridge University Press. Imouzaz, Said. 2002. Intéraction des contraintes dans la morphologie nongabaritique de l’arabe marocain de Casablanca: témoignage pour la théorie de l'optimalité. Thèse de doctorat, Université Hassen II, Mohammedia. Jomaa, Mounir. 1991. Organisation temporelle acoustique et articulatoire de la quantité en Arabe tunisien. Thèse de doctorat, Université Stendhal, Grenoble III. Kiparsky, Paul. “Syllables and Moras in Arabic”. Available from the Internet. Knis, Khouloud. 2004. ?atharu ?allahajaat ?alʕarabiyya fi tanghiim ?al-fuSHaa. M.A. thesis, Institut Supérieur des Langues de Tunis, Tunis. Laver, John. 1994. Principles of Phonetics. Cambridge: Cambridge University Press. McCarthy, John & A. Prince. 1990a. “Foot and Word in Prosodic Morphology: The Arabic Broken Plural.” Natural Language and Linguistic Theory 8.209-283. Miller, M. 1984. “On the Perception of Rhythm”. Journal of Phonetics 12. 75-83. Mitleb, F. M. 1984. “Vowel Length in Arabic and English: A spectrographic test". Journal of Phonetics 12. 229-235. Norlin, K. 1987. “A Phonetic Study of Emphasis and Vowels in Egyptian Arabic”. Working Papers 30, Lund University, Department of Linguistics, 1-119. Obrecht, Dean. 1968. Effects of the Second Formant on the Perception of Velarization Consonants in Arabic. The Hague: Mouton. Ramus, F. & J. Melher. 1999. “Language Identification with Supra-segmental Cues. A study based on speech re-synthesis”. Journal of the Acoustical Society of America 105.1: 512-521. Ramus, F., M. Nespor & J. Mehler. 1999. “Correlates of Linguistic Rhythm in the Speech Signal”. Cognition l.73:3.265-292. Rejili, Choukri & Salem Ghazali. 2003. “Consonant-cluster Duration in Standard Arabic”. 15th International Congress of Phonetic Sciences, Barcelona, 11051108. Rhardisse, N., R. Sock & C. Abry. 1990. “L’efficacité des cycles acoustiques dans la distinction des quantités vocaliques et consonantiques en arabe marocain". 18ème JEP du GCP de la SFA, pp 108-112. Roach, Peter. 1982. “On the Distinction between ‘Stress-timed’ and ‘Syllabletimed’ Languages”. Linguistic Controversies, ed. by D. Crystal, pp. 73-79. London: Edward Arnold. Silverman, Kim, Mary E. Beckman, John Pitrelli, Mari Ostendorf, Colin Whightman, Patti Price, Janet Pierrehumbert & Julia Hirschberg. 1992. “A Standard for Labeling English Prosody”. Proceedings, Second International Conference on Spoken Language Processing 2: 867-70. Banff, Canada. Tajima, K. 1998. Speech Rhythm in English and Japanese: Experiments in speech cycling. Ph.D. dissertation, Indiana University, Bloomington. Tajima, K., B. Zawaydeh & M. Kitabara. 1999. “A Comparative Study of Speech Rhythm in Arabic, English and Japanese”. Proceedings of the XIV ICPhS, San Francisco.
Roots and patterns in Arabic Lexical Processing*
Abdessatar Mahfoudhi King Saud University
1. Introduction The status of roots and patterns remains controversial both in theories of Arabic morphology and Arabic lexical processing. There are two major views on Arabic morphology. Some argue for roots and patterns as the basis of word formation (e.g., McCarthy 1981), but others defend a stem- or word-based approach (e.g., Benmamoun 1999). Lexical processing studies are also far from being homogenous. The present study was designed to provide further external evidence for the cognitive validity of these two morphemes. It included three experiments that used a lexical decision task with masked visual priming to examine the priming effect of sound roots and patterns with a sound or a weak root in word recognition. The results will be discussed in light of theories of Arabic morphology and lexical processing. The paper is structured as follows. Section 1 gives a brief description of the two major theories of Arabic morphology. Section 2 reviews previous psycholinguistic studies on roots and patterns in Arabic. Section 3 reports on the experiment on the role of roots in Arabic lexical processing. Section 4 reports on the *
This research was partially funded by a grant from the Tunisian Ministry of Higher Education and a grant from the Faculty of Graduate Studies at the University of Ottawa. I would like to thank all the participants in this study, as well as all the people who helped in their recruitment. I am also grateful to Eta Schneiderman for valuable comments on an earlier version of this work and to Sami Boudelaa and Keneth Forster for helpful comments on the design and suggestions on the DMDX software. I am, of course, the only person responsible for any possible errors of fact or interpretation.
124
abdessatar mahfoudhi
experiments on the role of patterns in Arabic lexical processing. The last section discusses the results of the two studies. 2. Theories of Arabic Morphology: Patterns and Roots or Stems and Affixes? There are two major opposing theories of Arabic morphology. On one side, there is the morpheme-based theory, whose proponents (e.g., Cantineau 1950a, 1950b; McCarthy 1981) argue that derivations are based on the process of mapping out roots onto patterns. For instance, the word rakib “ride” is made of the root {r, k, b} and the pattern {CaCiC}. The root carries the core meaning of the word “riding” and the pattern has the syntactic meaning ‘perfective, active’. While the classical theory,1 as adopted in Cantineau’s work, relies on roots and patterns, McCarthy’s Prosodic Morphology proposes that the pattern should be divided into three morphemes represented on separate tiers à la autosegmental phonology (Goldsmith 1976): (i) the skeleton made of vocalic and consonantal slots, (ii) affixal consonants, if any, and (iii) vowels. On the opposite side, there is the stem-based theory (e.g., Ratcliffe 1997, Benmamoun 1999) which maintains that derivations are stem-based. This controversy has implications for lexical representation and lexical processing. The stem/word-based theory is in line with the tenets of the full-listing hypothesis of lexical processing (e.g., Butterworth 1983), which assumes that words are represented and accessed as whole units. The morpheme-based theory is congruent with both the decompositional hypothesis (e.g., Taft 1981) and the dual-access hypothesis (e.g., Caramazza et al. 1988) of lexical processing, both of which assume that (at least some) complex words are accessed and represented as separate morphemes. 3. Previous Studies While there is evidence for the role of the root in lexical processing, evidence for the pattern is still inconclusive.2 Boudelaa & Marslen-Wilson (2005), who used a visual lexical processing task, found a priming effect 1
For an overview of the classical theory of Arabic morphology, the reader is referred to Bohas & Guillaume (1984). Bohas & Guillaume emphasize that unlike the modern structuralist Semitists (e.g., Cantineau 1950a, 1950b) who suggest that all derivations are a mapping of a root to a template, the old Arab grammarians propose word-toword derivations in many cases. It is, however, possible to propose root to template mapping while still proposing that words are derived from others with some additions, deletions of suffixes as well as a change in vowels as done by Watson (2002).
roots and patterns in arabic lexical processing
125
of roots in Modern Standard Arabic deverbal nouns and verbs at prime display time 32, 48, 64 and 80 ms. Abu-Rabia & Awwad (2004), on the other hand, did not find any priming effect at display time of 50 ms. using both masked priming and naming tasks. As for the pattern, Boudelaa & Marslen-Wilson (2005) found priming effect of this construct at display time 48 and 64 ms. in deverbal nouns and only at SOA 48 ms. in verbs. Mimouni et al. (1998) tested both normals and aphasic speakers of Algerian Arabic at SOA 250 ms. using a cross-modal priming lexical decision task and found no effect of the pattern in nouns. Abu-Rabia & Awwad (2004) also found no priming of patterns in derived Modern Standard Arabic nouns at an SOA (stimulus onset asynchrony) of 50 ms using masked priming and naming tasks. 4. Study 1: Sound Roots 4.1 Objectives The goal of this study/experiment was to validate previous findings related to the role of roots in Arabic lexical processing. The experiment tested whether a masked word prime would facilitate the recognition of a target word having the same root. Given some evidence for the importance of the root in lexical processing in Arabic (Boudelaa & MarslenWilson 2005) and Hebrew (e.g., Frost et al. 1997), I expected to find a priming effect of the root. The assumption behind priming is that a significant facilitation effect is evidence that the shared morpheme is being decomposed from a complex word and used to activate the target word. To ascertain that any potential facilitatory effect is morphological, a semantic condition and an orthographic/phonological condition, as well as an unrelated condition, were included. The data were divided into two sets. The first set included 24 targets and was paired with primes that belonged to one of these three conditions: (i) +Root +Semantics, (ii) +Orthography/ +Phonology, and (iii) Unrelated. In the first condition, primes and targets had the same root as well as a transparent semantic relationship. In the second condition, primes shared with targets roughly the same number of letters/phonemes in the same order as the related conditions, but not the 2
Other evidence for the cognitive relevance of the root morpheme in Arabic comes from speech errors (Abd-el-Jawad & Abu-Salim 1987 and Berg & Abd-el-Jawad 1996), speech of aphasic patients (Prunet et al. 2000), and well-formedness judgments (Frisch & Zawaydeh 2001).
126
abdessatar mahfoudhi
same root. The third condition included primes that had no semantic or formal relation with their targets. The second set of targets was paired with primes that belonged to one of these conditions: (i)+Root-Semantics, (ii)+Orthography/+Phonology, and (iii) Unrelated (see Figure 1, below). In the first condition, primes and targets shared the same root but an opaque semantic relationship. In the second condition, the primes shared the same number of letters/phonemes with the targets, as did the morphological primes. In the third condition, the primes shared no morphological or orthographic/phonological relationship with the targets. The unrelated condition in both target sets served as the baseline against which the two other conditions were measured. 4.2 Participants The participants were 36 Arabic-speaking students from Tunisia, where all the experiments were conducted. They were aged between 22 and 27 and all had at least 12 years of formal education in Arabic. They had normal or corrected to normal vision. The participants in all experiments were volunteers. 4.3 Stimuli and design The targets were 48 triliteral Arabic verbs in the third person singular perfective past, a rather neutral form. They had a mean letter length of 4.42 letters and a mean syllable length of 3.65 syllables. As indicated above, the targets were divided into two sets, each containing 24 words. Each target of the first half was paired with three primes, one from each of the first three experimental conditions mentioned above: (i) the morphologically and seman-tically related, (ii) the orthograph-ically/phonologically related, and (iii) the unrelated conditions. The second set of targets was also paired with three types of primes: (i) the morphologically related (+Root-Semantics), (ii) the orthographically/ phonologically related, and (iii) the unrelated. The letter and syllable lengths of the primes were kept very similar. The primes in the morphologically and semantically related condition had an average of 3.83 letters and 3.38 syllables. The mean length of letters and syllables in the morphological condition with opaque semantic relationship was 4.17 and 3.55, respectively. In the orthographically/phonologically related condition, the mean length of letters and syllables was 4.19 and 3.56, respectively. The primes in the unrelated condition had a mean length of letters of 3.81 and a mean
roots and patterns in arabic lexical processing
127
length of syllables of 3.37. For the list of experimental items in the three conditions, see Appendix 1. The number, position, order, and continuity of the overlapping letters in the orthographic/phonological control condition mimicked as much as possible those in the related conditions. The average amount of primetarget overlap was 3.13 letters and 3.13 phonemes in the morphologically and semantically related condition; 3.13 letters and 3.13 phonemes in the morphologically related condition; and 2.87 letters and 3.02 phonemes in the orthographically/ phonologically related condition. The semantic relatedness between both morphologically related primes and their targets was based on the judgment of twenty native speakers of Arabic on a seven-point relatedness scale, with 1 being ‘unrelated’ and 7 being ‘very much related’. The semantically related set included items whose mean rating was 4 or more, with an overall mean of 5.03. The semantically unrelated set included items whose mean ranking was less than 3.5, with an overall mean of 2.47. The final selection of the 48 target words and all primes was based on a judgment of familiarity, which consisted of ranking words on a sevenpoint familiarity scale by 30 native speakers, 1 being ‘unfamiliar’ and 7 ‘very familiar’. This procedure was followed in all the following experiments. Only words that had a familiarity mean score between 4 and 6 were finally included. The targets had an overall mean familiarity score of 5.16. The unrelated primes were given an average score of 4.87. The overall mean was 4.88 in the orthographically related condition and 5.09 in the morphologically related condition (5.20 in the [+Semantic +Root] condition and 4.97 in the [-Semantic +Root] condition). In addition to the 48 words and their corresponding primes in every condition, 48 unrelated word-word fillers were selected. Another 96 word-nonword filler pairs were added, 48 of which were formally related, while the other 48 pairs were unrelated. To familiarize the participants with the task, 34 practice trials were also included. The nonwords in all experiments were created by mixing legal non-existing roots with existing word patterns. All the stimuli were divided into three lists, each containing a total of 226 pairs, half of them were words and half were nonwords. The stimuli were rotated within the four conditions in a Latin-square design in such a way that each participant was assigned the same number and type of prime-target pairs. The stimuli in this and other experiments were pre-
128
abdessatar mahfoudhi
sented in the unvowelled version of Arabic orthography, but caution was taken to include only words that had only one reading. Set1 Prime Target تــقــسـّـم قــاســم [qaasama] [taqassama] 1. +Root+Sem “was divided” “shared” تــقــاعــس قــاســم [qaasama] 2. +Orthog/+Phono [taqaaʕasa] “was uninterested” “shared” تــصــدّ ر قــاســم [qaasama] 3. Unrelated [taṣaddara] “occupied the leading position” “shared” Set 2 1. +Root–Sem 2.+Orthog/+Phono 3. Unrelated
احــتــرم [ʔiħtarama] “respected” تــكــرّ م [takarrama] “showed one’s generosity” تــوطــّـد [tawaṭ ̣ṭada] ̣ “was strengthened”
حــرّ م [ħarrama] “forbid” حــرّ م [ħarrama] “forbid” حــرّ م [ħarrama] “forbid”
Figure 1. Examples of Prime-target Pairs Used in Study/Experiment 1, with Arabic Script, Phonetic Transcription, and Gloss
4.4 Procedure and apparatus One third of the 36 participants were arbitrarily assigned to each of the three lists. They were tested individually in a quiet room. The participants were instructed to respond as quickly and as accurately as possible by pressing the Yes key for a word response and the No key for a nonword response. The dominant hand was used for word (Yes) responses and the non-dominant hand for the nonword (No) responses. The experiment lasted about 15 minutes. The experiment was conducted on an HP portable computer running the display system DMDX3. Each trial consisted of three events. The first event was a mask of 28 vertical lines (following Boudelaa & MarslenWilson 2001) that was displayed for 500 ms. The second event that im3
The DMDX software was developed by J. C. Forster at the University of Arizona.
roots and patterns in arabic lexical processing
129
mediately followed was a prime word that appeared for 50.25 ms. The last event that immediately followed the prime was a target word, which remained on the screen for 2000 ms. or until a response was provided. The mask was presented in 30-point Traditional Arabic font size, the prime in 24-point font size and the target in 34-point font size. 4.5 Results The averages of correct response times (RT) and mean error frequencies were obtained for both participants and items. Both types of data were analyzed using separate analyses of variance (ANOVAs). For correct responses, outliers that were two standard deviations above or below the mean were eliminated without being replaced. Participants who had more than 20% error on the experimental words were excluded and replaced. The effect of priming in the related conditions was evaluated against the orthographic condition. The means, standard deviations, and error rates for all conditions are presented in Table 1. Three sets of two-way ANOVAs were run for subjects (F1) and items (F2). The two independent variables were prime condition and list, with each containing three levels. However, the effect of list will not be reported because this between-subjects factor was introduced to reduce variance. To check whether the root had a special priming effect, I ran a set of ANOVAs on the first three conditions, (1) +Root (with either a transparent or an opaque semantic relationship), (2) +Orthography/+Phonology and (3) Unrelated. Prime condition was significant only in subject analysis, F1(2, 66)=5.82, p<.005. Planned comparisons revealed a significant difference between the morpho-logical condition and the unrelated condition: F1 (1, 33)=10.65, p <.005 and more important between the morphological condition and the form condition, F1(1, 33)=7.24, p<.05. The form condition, by contrast, was not different from the unrelated condition: F1 (1, 33)=.50, p >.05. Error analysis also showed a significant effect of priming condition in subject analysis, F1 (2, 66)=3.98, p<.05. Planned comparisons revealed a marginal significant difference only between the morphological condition and the unrelated condition, F1 (1, 33)=3.62, p <.06.
130
abdessatar mahfoudhi
Condition
RT(ms)
SD
% error
1. +Root+/-Sem 2. +Orthog/+Phono 3. Unrelated
716 735 744
86 94 92
6.5 10.9 9.7
1a.+Root+Sem 2a. +Orthog/+Phono 3a. Unrelated
761 766 765
113 101 95
5.6 7.9 6.5
1b.+Root-Sem 2b. +Orthog/+Phono 3b. Unrelated
700 721 743
79 98 110
6.4 11.1 8.5
Table 1. Lexical Decision Reaction Times (RTs), Standard Deviations (SD), and Percentage Error Rates (% error) in Study/Experiment 1
To test whether semantics had a priming effect in the morphological condition, I further ran two other sets of ANOVAs. The first set of these two-way ANOVAs included these three conditions: +Root, +Semantics; +Orthography/+Phonology; and Unrelated. Here I included half of the data: the items that were both semantically and morphologically related and their equivalents in the other conditions. The analysis did not yield a main effect of prime condition in either subject or item analysis, F1(2, 56)=.14, p >.05 and F2(2, 63)=.02, p >.05. The second set of ANOVAs included the +Root, -Semantics condition; the +Orthography/ +Phonology condition; and the Unrelated condition. Interestingly enough, the prime condition variable was significant in subject analysis, F1(2, 60)=6.73, p <.005. A deviation planned comparison test revealed that only the contribution of the morphological condition was significant, F1(2, 60)=8.62, p <.01. 4.6 Discussion The results of this experiment support the hypothesis, based on previous work done on Arabic (Boudelaa & Marslen-Wilson 2005), that the root has a priming effect and therefore has a cognitive validity in the Arabic mental lexicon. The priming effect of the root is different from the formal effect of orthography/phonology since the amount of priming in the morphological condition was significantly different from that of
roots and patterns in arabic lexical processing
131
the orthographic condition (a difference of 19 ms.). The results also show that morphological priming is not dependent on the semantic relationship that often accompanies a morphological relationship. In fact, I found in the separate treatment of the seman-ticallyrelated prime-target pairs and the less semantically-related pairs a rather interesting tendency: The less semantically-related are the ones that have a significant priming effect. 5. Study 2: Sound and Disrupted Patterns In this study, I examine whether patterns, sound or weak, have a priming effect. Weak patterns are patterns that are intertwined with weak roots to make words. Weak verbs are verbs that have lost an underlying glide, ending with only two root consonants in the surface form. The deleted glide, a /w/or a /y/, could be the first, the second, or the third (final) consonant of the root. Only verbs with medial and final deleted glides were included in the stimuli because verbs with initial weakness are rare and result from assimilation, which is not the case with the other weak verbs. While glide deletion is more common in weak verbs with medial or final glides, it does not involve all paradigm forms. Verbs lose their medial glide in the following forms: Form I (faʕala), Form IV (ʔafʕala), Form VII (ʔinfaʕala), Form VIII (ʔiftaʕala), and Form X (ʔistafʕala). The forms in which verbs delete their final glide include: Form I (faʕala), Form II (faʕʕala), Form III (faaʕala), Form IV (ʔafʕala), Form V (tafaʕʕala), Form VI (tafaaʕala), Form VII (ʔinfaʕala), Form VIII (ʔiftaʕala), and Form X (ʔistafʕala). 5.1 Experiment 2A 5.1.1 Objectives This experiment tested whether patterns conjugated with either sound or weak roots primed words with similar patterns that were conjugated with sound roots. These were compared to primes that shared the same number of letters with the targets but had different patterns and different roots. Three conditions were tested: (i) +Pattern with Sound Root, (ii) +Pattern with Weak Root, and (iii) +Phonology/ Orthography (control condition). In the first condition, primes shared the same patterns with targets but had different sound roots and different meanings. In the second condition, the roots of the primes were weak, that is, they lacked one consonant. The root weakness affects the pattern by changing its syllabic
132
abdessatar mahfoudhi
structure. The site of the weakness was either medial or final. The stimuli included an equal number of each type of weak verbs to examine if the position of weakness played any role. In the third condition, primes and targets shared the same number of letters as in the related conditions. The shared letters/phonemes could be either consonants or long vowels. Short vowels, represented by diacritics, were not tested, as the stimuli were presented in the unvowelled version of Arabic orthography. 5.1.2 Participants The volunteer participants were another group of 36 Arabic-speaking students from the same population as in the previous experiment. None of them participated in the other experiments. 5.1.3 Stimuli and design The targets were 48 verbs that were derived from the following patterns: fa’ala (11 verbs), ʔafʕala (11), tafaʕʕala (5), ʔiftaʕala (8), ʔistafʕala (3), tafaaʕala (3), ʔinfaʕala (4), faaʕala (2), and faʕʕala (1). The targets were 4.19 letters long and 3.5 syllables long, on average. Each target word was paired with three primes, one from each of the three conditions mentioned above: (i) +Pattern with Sound Root, (ii) +Pattern with Weak Root, and (iii) +Orthography/ Phonology. The mean length of the primes was 4.17 letters and 3.5 syllables in the pattern with sound root; 4.19 letters and 2.73 syllables in the pattern with a weak root condition; and 4.17 letters and 3.40 syllables in the +Orthography/Phonology condition. The number, position, order, and continuity of the overlapping letters in the control condition mimicked as much as possible those in the related conditions. The average amount of prime-target overlap was 1.31 letters and 1.31 phonemes in the pattern with sound root con-dition; 1.34 letters and 1.34 phonemes in the pattern with weak root condition; and 1.34 letters and 1.34 phonemes in the +Orthography/ Phonology condition. Unlike in the related conditions, primes in the control condition had different patterns from those of the targets. The control primes, however, had roughly the same number of ‘prefixed’ patterns as the related primes and always mimicked the initial letter overlap between related primes and targets. A sample of the stimuli is given below in Figure 2 (also see Appendix 2). For an item to be included, it had to have a mean familiarity score between 4 and 6 on a seven-point scale. In addition to the 48 target words,
roots and patterns in arabic lexical processing
133
48 unrelated word-word fillers were selected. Another 96 word-nonword pairs were added, 48 of which were formally related and 48 pairs were unrelated. The overlap in the related word-nonword pairs was morphophonological. Primes shared with nonword targets weak patterns, sound patterns, or only consonants of both roots and affixes. The primes in the related word-nonword pairs were mostly with sound roots. The stimuli were finally divided into three lists. One third of the 36 participants were tested on each of the three lists. Prime Target اســتــعــمــر اســتــقــبــل [ʔistaqbala] 1. +Pattern with sound root [ʔistaʕmara] “colonized” “welcomed” اســتــغــنــى اســتــقــبــل [ʔistaqbala] 2. +Pattern with weak root [ʔistaġnaa] “was able to dispense with” “welcomed” أســقــط اســتــقــبــل [ʔistaqbala] 3. Control: +Orthog/Phono [ʔasqaṭa] “made fall” “welcomed” Figure 2. Examples of Prime-target Pairs Used in Experiment 2A, with Arabic Script, Phonetic Transcription, and Gloss
5.1.4 Procedure and apparatus This was the same as in the previous experiment. 5.1.5 Results RT and error data were cleaned following the same procedure as in the previous experiment. A small part of the data fell outside the twostandard deviation cutoffs (4.73%). The effect of priming in the related conditions was compared to the orthographic/phonological condition. The means, standard deviations, and error rates for all experimental conditions are presented in Table 2. The prime condition variable was not significant in either RT analysis (F1 (2, 66)=.30, p >.05 and F2 (2, 135)=.05, p >.05) or error analysis (F1 (2, 66)=1.36, p >.05 and F2 (2, 135)=.64, p >.05).
134
abdessatar mahfoudhi
Condition
RT(ms)
SD
% error
1.+Sound Pattern 2. +Weak Pattern 3. +Orthography
727 730 729
90 90 89
7.2 5.3 5.9
Table 2. Lexical Decision Reaction Times (RTs), Standard Deviations (SD), and Percentage Error Rates (% error) in Experiment 2A
5.1.6 Discussion The results show that there is no difference at all between the two related conditions and the control condition. The lack of priming with a sound pattern (only 2 ms more than the control) suggests that, unlike the root, the pattern does not play a role at this stage of lexical processing (50 ms). One possible explanation for the lack of priming with patterns could be the fact that the short vowels of the pattern are not orthographically represented. This result does not support the findings on Arabic reported by Boudelaa & Marslen-Wilson (2005). They found priming effects of the pattern at display time 48 and 64 ms. in deverbal nouns and only at SOA 48 for verbs. As Boudelaa & Marslen-Wilson’s stimuli were not published, I cannot compare their stimuli to mine. The results in Hebrew are equally intriguing. Studies on Hebrew showed that pattern priming had a priming effect at SOA of 42-43 ms in verbs and not in nouns (Frost et. al. 1997; Deutsch, Frost & Forster 1998). The lack of priming with Arabic patterns could be due to the fact that the overlap is minimal. 5.2 Experiment 2B 5.2.1 Objectives Although I found no priming with sound patterns, I wanted to test if there was no priming with exact weak patterns where the overlap is maximal. That is, I tested the condition in which both the prime and the target were weak and shared orthographic vowels and consonants. I also included a condition where primes and targets shared a weak pattern with a different site of weakness. Two slightly different morphological conditions were, therefore, included in addition to the control condition (see Figure 3, below). In the first related condition, both primes and targets shared the same patterns with the same site of weakness and therefore had very similar prosodic templates. In the second condition, primes
roots and patterns in arabic lexical processing
135
and targets shared the same patterns but differed as to the site of the weakness. The dissimilarity in the site of the weakness also affected the orthography. The site of the weakness is a long/orthographic vowel, particularly a long aa which is written in two different ways: as an ʔalif <>ا in the middle of the word and as yaaʔ < >ىat the end. This discrepancy in orthography was controlled in the three conditions by selecting half of the primes with an ʔalif and the other half with a yaa;. In the +orthography/phonology condition, primes shared the same number of letters with targets. The overlap was both in consonants and long vowels (24 verbs overlapped in long vowels and consonants and 24 in consonants). 5.2.2 Participants Another 36 Arabic-speaking students from the same population as in the previous experiments volunteered to take part in this experiment. None of them participated in the other experiments. 5.2.3 Stimuli and design The target words were 48 verbs that were derived from the following patterns: fa’ala (24), ʔafʕala (11), ʔiftaʕala (9), ʔistafʕala (2), and ʔinfaʕala (2). They had a mean letter length of 3.81 and a mean syllable length of 2.5. Each target word was paired with three primes, one from each of the three conditions: (i) Same Pattern, with a weak root (ii) Slightly Different Pattern, with a weak root and (iii) +Orthography/Phonology. The primes that shared the same patterns with targets were, on average, 3.81 letters long and 2.50 syllables long. The mean letter and syllable length of the primes that shared slightly different patterns with targets were 3.81 and 2.35, respectively. In the +Orthography/Phonology condition, primes had a mean letter length of 3.92 and a mean syllable length of 2.94. The number, position, order, and continuity of the overlapping letters in the control condition mimicked as much as possible those in the morphological conditions. The average amount of prime target overlap was 1.32 letters and 1.81 phonemes in the Same Pattern condition; 1.31 letters and 1.81 phonemes in the Slightly Different Pattern condition; and 1.35 letters and 1.82 phonemes in the Orthography/Phonology condition (see Figure 3, below). The familiarity score of the selected items ranged between 3.75 and 5.75 over a seven-point scale. The overall means of the targets and the
136
abdessatar mahfoudhi
primes in the different conditions were as follows: 4.16 for the targets; 4.59 for the orthographically/phonologically-related primes; 4.50 for the primes that shared a slightly different pattern with the targets; and 4.36 for the primes that shared the same pattern with the targets. As in the previous experiments, 48 unrelated word-word fillers were selected. Another 96 word-nonword filler pairs were added, 48 of which were formally related while the other 48 pairs were unrelated. There were also 34 practice pairs. The overlap in the related nonword-word pairs/fillers was either morphological or phonological. As in the experimental word-word pairs, the morphological overlap in these word-nonword fillers was in the shared word patterns. The phonological overlap was in some of the root consonants and affix consonants. The stimuli were finally divided into three lists. Each list was presented to a different group of twelve participants. Prime Target انــتــقــى ارتــمــى [ʔintaqaa] [ʔirtamaa] 1. Same pattern with weak root “threw oneself” “selected” احــتــاط انــتــقــى [ʔintaqaa] 2. Slightly different pattern [ʔiħtaaṭa] with weak root “was cautious” “selected” أتــقــن انــتــقــى [ʔintaqaa] 3. Control: +orthog/ phono [ʔatqana] “mastered” “selected” Figure 3. Examples of Prime-target Pairs Used in Experiment 2B, with Arabic Script, Phonetic Transcription, and Gloss
5.2.4 Procedure and apparatus The procedure was the same as in the previous experiments. 5.2.5 Results The data cleaning led to the elimination of 4% of the data. The effect of priming in the related conditions was compared to the orthographic/ phonological condition. The means, standard deviations, and error rates for all experimental conditions are presented in Table 3. The analysis of RT data showed a significant effect of the prime condition variable, F1 (2, 66)=4.20, p <.05. This significance was not due to a priming difference between the +Same Pattern condition and the con-
roots and patterns in arabic lexical processing
137
trol condition, but rather to the inhibitory effect of the second condition (+Slightly Different Pattern). Planned comparisons showed a significant difference between the +Slightly Different Pattern condition and the control condition (+Orthography/ Phonology), F1 (1, 33)=8.78, p <.01. Error analysis did not reveal any difference between conditions. I also ran a t-test to see if the items that shared both orthographic vowels and consonants in the first condition were different from the control condition. The test revealed no significant difference between the two conditions, t(35)=.64, p>.05.
Condition
RT(ms)
SD
% error
1.+Same Weak Pattern 689 96 7.5 2.+Slightly different Weak Pattern 699 80 7.4 680 73 5.4 3. Unrelated Table 3. Lexical Decision Reaction Times (RTs), Standard Deviations (SD), and Percentage Error Rates (% error) in Experiment 2B
5.2.6 Discussion The results of this experiment clearly show that weak patterns do not prime each other, even if they share the same orthographic representation of both the consonants and the long vowels of the pattern. 6. General Discussion The findings of Experiment 1 corroborate the results of previous studies that examined the priming effect of the root in lexical processing (Mimouni et al. 1998 and Boudelaa & Marslen-Wilson 2005) and its manipulation in speech errors of normals (Berg & Abd-El Jawad 1996) and aphasics (Prunet et al. 2000). The findings of all these studies attribute cognitive validity to the root in Arabic, thus providing external evidence for the morpheme-based theories. These include the classical root and pattern theory and the prosodic morphology hypothesis. The results of Experiments 2A and 2B showed that neither sound nor weak patterns primed targets sharing sound or weak patterns, respectively. Previous studies do not support a stable effect of the pattern in Arabic lexical processing. Boudelaa & Marslen-Wilson (2005), who used a masked priming paradigm, found priming with verbal patterns at SOA 48 ms. but not at SOAs 32, 64 or 80 ms. Using a similar method, Abu-
138
abdessatar mahfoudhi
Rabia & Awwad (2004) did not find any priming with patterns at SOA 50 ms. The lack of priming by patterns is seemingly counter to all morpheme-based theories of Arabic morphology, which consider the pattern essential in word formation. But, it could be that the pattern is extracted, if we assume that words are decomposed, and used as a separate unit at a later stage of the word recognition process. The fact that there was a priming effect of the root and not of the pattern suggests that decomposition does not depend on the exhaustive decomposition of the word into its two main morphemes, root and pattern. It seems that at this stage (50 ms) only the root is extracted and used to access/recognize words. However, further research with a larger number of stimuli is needed to make sure if and when patterns play a role in Arabic lexical processing. On the whole, the results of the present studies tend to support a morpheme-based theory of Arabic morphology and a morpheme-based theory of lexical processing. However, it is also possible to give a connectionist explanation to the priming obtained with roots. Connectionists (e.g., Seidenberg & McClelland 1989; Plaut & Gonnerman 2000) would claim that the priming by the root is due to the recurrent overlap of form and meaning shared in primes and targets and not to the fact that the root has a special status as a decomposable separate morpheme.
roots and patterns in arabic lexical processing
139
APPENDIX 1 The Stimuli from Study/Experiment 1 +Root, Sem
+Orthophono
اخــتــلــف [ʔixtalafa] “He/it differed” عــرف [ʕarifa] “knew” أقــبــل [ʔaqbala] “came”
أتــلــف [ʔatlafa] “damaged” عــارض [ʕaaraḍa] “opposed” ّ اســتــقــل [ʔistaqalla] “became independent” تــقــاعــس [taqaaʕasa] “was uninterested” تــمــلـّـك [tamallaka] “possessed” حــ ّدق [ħaddaqa] “looked fixedly” اســتــســمــح [ʔistasmaħa] “asked for permission” أذنــب [ʔaðnaba] “did wrong”
تــقــسـّـم [taqassama] “was divided”
Unrelated أســقــط [ʔasqaṭa] “made fall” هــ ّدم [haddama] “destroyed” أهــدر [ʔahdara] “thwarted”
Target تخــالــف [taxaalafa] “differed” تــعــارف [taʕaarafa] “became acquainted” تــقــبـّـل [taqabbala] “accepted”
تــصــ ّدر [taṣaddara] “occupied the leading position” تــفــرع ّ [tafarraʕa] “branched out” غــطــس [ġaṭasa] “dived” انــزعــج [ʔinzaʕaja] “felt uneasy”
احــتــمــل [ʔiħtamala] “bore” تــحــ ّدث [taħaddaθa] “talked” تــســلـّـم [tasallama] “received”
تــكــاســل [takaasala] “was lazy”
أذن [ʔaðana] “permitted”
ســتــر [satara] “covered”
وضح ّ [waḍḍaħa] “clarified”
اســتــبــشــر [ʔistabšara] “rejoiced”
الم [laama] “blamed”
جــرّ ب [jarraba] “experienced”
هــجــم [hajama] “attacked”
هــاجــر [haajara] “emigrated”
ذبــح [ðabaħa] “slaughtered”
ألــهــم [ʔalhama] “inspired”
اتـّهــم [ʔittahama] “accused”
تــصــنّع [taṣannaʕa] “faked”
تــالمــس [talaamasa] “touched one another” تــهــاجــم [tahaajama] “attacked one another” اســتــلــهــم [ʔistalhama] “was inspired”
تــحــمـّـل [taħammala] “endured” حــادث [ħaadaθa] “talked to s.o.” اســتــلــم [ʔistalama] “received” اســتــأذن [ʔistaʔðana] “asked for permission” بــشـّـر [baššara] “announced as good news” لــمــس [lamasa] “touched”
قــاســم [qaasama] “shared”
140 عــطــف [ʕaṭafa] “sympathized with” تــغــزّ ل [taġazzala] “flirted with”
abdessatar mahfoudhi
رفــق [rafaqa] “was nice”
الطــف [laaṭafa] “treated with kindness” تــغــامــز [taġaamaza] “signaled to one another” راوض [raawaȡa] “tamed” تــجــبـّر [tajabbara] “showed oneʕs power” اســتــغــرق [ʔistaġraqa] “was absorbed in sth.” رفـّه [raffaha] “entertained”
عــبــد [ʕabada] “worshipped”
عــبـّر [ʕabbara] “expressed”
كــابــر [kaabara] “treated s.o. haughtily” اتـّفــق [ʔittafaqa] “agreed”
دبـّر [dabbara] “made arrangements” استـوقـف [ʔistawqafa] “brought to a stop” مــتّع [mattaʕa] “made enjoy” قــاتــل [qaatala] “fought” الــتــفــت [ʔiltafata] “turned around” تــرفّع [taraffaʕa] “was too proud”
ركــض [rakaȡa] “run” أخــبــر [ʔaxbara] “informed” اغــتــرب [ʔiġtaraba] “emigrated”
مــنــع [manaʕa] “prevented” قــلــع [qalaʕa] “uprooted” انــفــتــح [ʔinfataħa] “unfolded” تــحــرّ ف [taħarrafa] “turned off”
كــرّ س [karrasa] “established”
تــعــاطــف [taʕaaṭafa] “sympathized”
تــنــصـّت [tanaṣṣata] “tried to hear”
غــازل [ġaazala] “flirted with”
خــلــع [xalaʕa] “took off” تــقــدّ م [taqaddama] “advanced”
تــراكــض [taraakaȡa] “run fast” خــبـّر [xabbara] “informed”
أعــدم [ʔaʕdama] “executed”
تــغــرّ ب [taġarraba] “emigrated”
زهــد [zahida] “renounced wordly pleasures” صــرخ [ṣaraxa] “screamed”
تــرفـّق [taraffaqa] “was nice”
ســانــد [saanada] “supported”
تــعــبـّد [taʕabbada] “devoted oneself to God” تــكــبـّر [takabbara] “was haughty”
أعــجــب [ʔaʕjaba] “pleased”
وافــق [waafaqa] “agreed”
زوّ د [zawwada] “provided with” ســرد [sarada] “detailed” أربــك [ʔarbaka] “confused” تــوسل ّ [tawassala] “implored”
امــتــنــع [ʔimtanaʕa] “abstained” اقــتــلــع [ʔiqtalaʕa] “uprooted” تــفــتّح [tafattaħa] “opened up” احــتــرف [ʔiħtarafa] “practiced sth. as a profession”
roots and patterns in arabic lexical processing
141
أبــلــغ [ʔablaġa] “informed” انــعــقــد [ʔinʕaqada] “was knotted”
تــبــادل [tabaadala] “exchanged” اســتــقــدم [ʔistaqdama] “summoned”
تــفــقّد [tafaqqada] “examined” أشــرف [ʔašrafa] “supervised”
تــنــازع [tanaazaʕa] “carried on a dispute” أضــرب [ʔaȡraba] “forsook”
تــزعم ّ [tazaʕʕama] “was the leader”
تــحــصل ّ [taħaṣṣala] “obtained”
بــالــغ [baalaġa] “exaggerated” تــعــقّد [taʕaqqada] “became complicated” انــتــزع [ʔintazaʕa] “snatched”
انــتــاب [ʔintaaba] “befell”
تــحــفّظ [taħaffaȡa] “was wary” بــرز [baraza] “stood out” دافــع [daafaʕa] “defended” انــتــســب [ʔintasaba] “was related to” رابــط [raabaṭa] “was stationed” تــصــفّح [taṣaffaħa] “leafed” تــالزم [talaazama] “was constantly around s.o.” أظــهــر [ʔaȡhara] “showed” اعــتــصــر [ʔiʕtaṣara] “squeezed out”
تــلــفّظ [talaffaȡa] “enunciated” بــارك [baaraka] “blessed” نــفــع [nafaʕa] “was useful” تــنــاوب [tanaawaba] “took turns” رتّب [rattaba] “arranged” تــصــالــح [taṣaalaħa] “reconciled” تــأزّم [taʔazzama] “became critical”
انــشــغــل [ʔinšaġala] “was preoccupied” تــجــوّ ل [tajawwala] “moved around” جــمــد [jamada] “froze” ذكــر [δakara] “mentioned” أرهــق [ʔarhaqa] “exhausted” حــاول [ħaawala] “tried” اســتــجــوب [ʔistajwaba] “questioned” تــنــشق ّ [tanaššaqa] “inhaled”
تــضــارب [taȡaaraba] “conflicted with each other” احــتــفــظ [ʔiħtafaȡa] “maintained” تــبــارز [tabaaraza] “met in combat” انــدفــع [ʔindafaʕa] “rashed” نــاسب [naasaba] “fit” ارتــبــط [ʔirtabaṭa] “committed o.s.” صــافــح [ṣaafaħa] “shook hands” ألــزم [ʔalzama] “enforced”
انــتــظــر [ʔintaȡara] “waited” تــقــاصــر [taqaaṣara] “shrank”
أبــعــد [ʔabʕada] “took away” ألــحــق [ʔalħaqa] “joined”
تــعــقّل [taʕaqqala] “was reasonable”
تــقــلّب [taqallaba] “was altered”
تــوازن [tawaazana] “was balanced”
تــظــاهــر [taȡaahara] “demonstrated” عــاصــر [ʕaaṣara] “belonged to the same age” اعــتــقــل [ʔiʕtaqala] “arrested”
142 أنــزل [ʔanzala] “caused to descend” ّ تــقــطع [taqaṭṭaʕa] “was disrupted” أعــلــم [ʔaʕlama] “informed” تــراجــع [taraajaʕa] “retreated” أصــدر [ʔaṣdara] “published” نــقــش [naqaša] “carved out” تــوصل ّ [tawaṣṣala] “reached” ظــلــم [ȡalama] “treated unjustly” احــتــرم [ʔiħtarama] “respected”
abdessatar mahfoudhi
انــزلــق [ʔinzalaqa] “slided”
أتــعــب [ʔatʕaba] “tired”
نــازل [naazala] “got into a fight”
تــقــاعــد [taqaaʕada] “retired” تــعــلّق [taʕallaqa] “was attached”
أغــضــب [ʔaġȡaba] “annoyed” أنــجــح [ʔanjaħa] “rendered successful” تــزين ّ [tazayyana] “dressed up” أخــفــت [ʔaxfata] “silenced”
قــاطــع [qaaṭaʕa] “cut off” عــلـّم [ʕallama] “taught” أرجــع [ʔarjaʕa] “returned” صــادر [ṣaadara] “confiscated”
بــرع [baraʕa] “excelled” تـهــكـّم [tahakkama] “mocked”
تــنــاقــش [tanaaqaša] “debated” واصــل [waaṣala] “continued”
طــرد [ṭarada] “expelled” ّ تــوطد [tawaṭṭada] “was strengthened”
انــظــلــم [ʔinȡalama] “suffered injustice” حــرّ م [ħarrama] “prohibited”
تــأرجــح [taʔarjaħa] “swung” تــصــاعــد [taṣaaʕada] “went up gradually” نــاشــد [naašada] “implored” تــواكــل [tawaakala] “reacted with indifference” ّ نــظم [naȡȡama] “organized” تــكــرّ م [takarrama] “showed oneʕs generous side”
APPENDIX 2 The Stimuli from Experiment 2A +Sound Pattern
+Weak Pattern
+Orthophono
Target
أنــقــذ [ʔanqaδa] “saved” تــســلّح [tasallaħa] “armed o.s.” امــتــنــع [ʔimtanaʕa] “refused”
أرضــى [ʔarḍaa] “satisfied” تــخــلّى [taxallaa] “relinquished” اجــتــاز [ʔijtaaza] “passed”
افــتــرض [ʔiftaraḍa] “supposed” تــســاءل [tasaaʔala] “wondered” أورد [ʔawrada] “mentioned”
أكــمــل [ʔakmala] “finished” تــجــمل ّ [tajammala] “made o.s. pretty” افــتــقــد [ʔiftaqada] “missed”
roots and patterns in arabic lexical processing
انــتــفــع [ʔintafaʕa] “benefited” أنــبــت [ʔanbata] “grew” اســتــرجــع [ʔistarjaʕa] “regained” أنــكــر [ʔankara] “denied” انــتــســب [ʔintasaba] “was related to” نــهــب [nahaba] “plundered” اقــتــرب [ʔiqtaraba] “approached” وضــع [waḍaʕa] “put” الــتــزم [ʔiltazama] “took upon o.s.” ألــحــق [ʔalħaqa] “joined” تــضــافــر [taḍaafara] “interwove” عــبــر [ʕabbara] “crossed”
انــتــاب [ʔintaaba] “befell” أســال [ʔasaala] “made flow” اســتــثــنــى [ʔistaθnaa] “excluded” أغــرى [ʔaġraa] “tempted” احــتــاط [ʔiħtaaṭa] “was cautious” خــان [xaana] “was disloyal” اخــتــار [ʔixtaara] “chose” بــاع [baaʕa] “sold” اعــتــدى [ʔiʕtadaa] “aggressed” أدمــى [ʔadmaa] “caused to bleed” تــعــافــى [taʕaafaa] “recuperated” عــاد [ʕaada] “returned”
اســتــعــمــر [ʔistaʕmara] “colonized”
اســتــغــنــى [ʔistaġnaa] “was able to dispense with” فــات [faata] “passed (away)” أفــاق [ʔafaaqa] “woke up” راعــى [raaʕaa] “observed”
فــرح [fariħa] “was glad” أتــقــن [ʔatqana] “perfected” راقــب [raaqaba] “supervised”
143
أنــتــج [ʔantaja] “produced” ّاجــتــر [ʔijtarra] “ruminated” اســتــنــد [ʔistanada] “was based” انــكــســر [ʔinkasara] “broke” أتــلــف [ʔatlafa] “damaged” عــقـّد [ʕaqqada] “complicated” أمــســك [ʔamsaka] “held” بــادل [baadala] “exchanged” أفــســد [ʔafsada] “spoiled” اجــتــهــد [ʔijtahada] “worked hard” تــوصل ّ [tawaṣṣala] “reached” عــاشــر [ʕaašara] “was on intimate terms” أســقــط [ʔasqaṭa] “let fall”
انــتــشــر [ʔintašara] “spread” أحــدث [ʔaħdaθa] “brought about” اســتــوقــف [ʔistawqafa] “brought to a stop” أحــضــر [ʔaħḍara] “brought” اعــتــمــد [ʔiʕtamada] “relied on” لــمــس [lamasa] “touched” اجــتــمــع [ʔijtamaʕa] “met” غــفــر [ġafara] “forgave” اكــتـــســب [ʔiktasaba] “acquired” أعــرض [ʔaʕraḍa] “turned away” تــكــاســل [takaasala] “was lazy” طــرد [ṭarada] “expelled”
فــارق [faaraqa] “left” انــهــزم [ʔinhazama] “was defeated” فــرش [faraša] “spread out”
فــهــم [fahima] “understood” أخـــلــص [ʔaxlaṣa] “was faithful” شــارك [šaaraka] “participated”
اســتــقــبــل [ʔistaqbala] “welcomed”
144 بــارز [baaraza] “met in a combat” انــدمــج [ʔindamaja] “was incorporated” لــعــب [laʕiba] “played” خــلــط [xalaṭa] “mixed” أفــرز [ʔafraza] “excreted” أنــفــق [ʔanfaqa] “spent” تــثــبت ّ [taθabbata] “ascertained” ابــتــلــع [ʔibtalaʕa] “swallowed” انــعــطــف [ʔinʕaṭafa] “was bent” ألــبــس [ʔalbasa] “dressed” اجــتــنــب [ʔijtanaba] “avoided” لــجــأ [lajaʔa] “took refuge” اســتــعــبــد [ʔistaʕbada] “enslaved” ســرق [sariqa] “stole” رســم [rasama] “drew” نــهــض [nahaḍa] “rose”
abdessatar mahfoudhi
قــاســى [qaasaa] “suffered”
غــفــل [ġafala] “neglected”
دافــع [daafaʕa] “defended”
انــقــضــى [ʔinqaḍaa] “was finished”
أنــجــب [ʔanjaba] “gave birth”
انــفــتــح [ʔinfataħa] “opened up”
بــات [baata] “spent the night” بــنــى [banaa] “built” أفــرج [ʔafraja] “liberated” أزال [ʔazaala] “eliminated” تــمــنّى [tamannaa] “wished” ارتــمــى [ʔirtamaa] “threw oneself” انــحــنــى [ʔinħanaa] “bent” أتــاح [ʔataaħa] “made possible” انــتــمــى [ʔintamaa] “belonged” بــان [baana] “became clear” اســتــفــاد [ʔistafaada] “benefited” مــال [maala] “bent” جــاب [jaaba] “wandered” هــدى [hadaa] “guided”
غــالــط [ġaalaṭa] “deceived” حــرّ ك [ħarraka] “moved” افــتــقــر [ʔiftaqara] “lacked” اقــتــرف [ʔiqtarafa] “committed” تــقــاتــل [taqaatala] “killed one other” أنــشــد [ʔanšada] “sang” أنــذر [ʔanδara] “warned” اخــتــرع [ʔixtaraʕa] “invented” أســعــد [ʔasʕada] “made happy” صـــافــح [ṣaafaħa] “shook hands” أســكــر [ʔaskara] “made drunk” فــ ّكر [fakkara] “thought” عــذّب [ʕaδδaba] “tortured” خــاطــر [xaaṭara] “risked”
مــزج [mazaja] “mixed” شــتــم [šatama] “insulted” أوكــل [ʔawkala] “entrusted” أغــضــب [ʔaġḍaba] “annoyed” تــذ ّكر [taδakkara] “remembered” اقــتــصــد [ʔiqtaṣada] “became thrifty” انــقـــلــب [ʔinqalaba] “was turned” أغــمــض [ʔaġmaḍa] “closed one’s eyes” اعــتــكــف [ʔiʕtakafa] “devoted oneself” خــدم [xadama] “served” اســتــغــرق [ʔistaġraqa] “was immersed” جــذب [jaδaba] “withdrew” ســبــق [sabiqa] “preceded” بــلــغ [balaġa] “reached”
roots and patterns in arabic lexical processing
145
انــعــدم [ʔinʕadama] “was missing” غــلــق [ġalaqa] “closed” أظهر [ʔaḍhara] “showed” تــطــاول [taṭaawala] “stretched up (in defiance)” أشرف [ʔašrafa] “supervised” انــســدل [ʔinsadala] “descended (on)” تــنــفّض [tanaffaḍa] “was shaken” تــنــ ّهد [tanahhada] “sighed” تــمــرّد [tamarrada] “rebelled” جــرّب [jarraba] “experienced” تــرابــط [taraabaṭa] “was closely tied”
انــزوى [ʔinzawaa] “secluded o.s.” طــوى [ṭawaa] “folded” أرســى [ʔarsaa] “established” تــبــاهــى [tabaahaa] “was proud”
أنــجــز [ʔanjaza] “carried out” مــ ّهد [mahhada] “paved” انــشــغــل [ʔinšaġala] “was occupied” تــعــلّق [taʕallaqa] “was attached”
انــبــهــر [ʔinbahara] “was dazzled” زرع [zaraʕa] “sowed” أخــفــت [ʔaxfata] “silenced” تــنــاقــش [tanaaqaša] “discussed”
أضــاف [ʔaḍaafa] “added” انــســاق [ʔinsaaqa] “was led” تــســلّى [tasallaa] “took pleasure” تــصــدّى [taṣaddaa] “resisted” ّ تــغــطى [taġaṭṭaa] “was covered” ســمى ّ [sammaa] “named” تــراخــى [taraaxaa] “slackened”
انــتــقــم [ʔintaqama] “avenged” أنــعــم [ʔinʕadama] “bestowed” تــســارع [tasaaraʕa] “hurried” تــقــاعــد [taqaaʕada] “retired” تــراكــم [taraakama] “piled up” نــدم [nadima] “regretted” تــعــقّل [taʕaqqala] “was reasonable”
أدخــل [ʔadxala] “made/let enter” اقــتــســم [ʔiqtasama] “shared”
أبــاد [ʔabaada] “exterminated” احــتــار [ʔiħtaara] “was bewildered”
ابــتــعــد [ʔibtaʕada] “moved away” أســكــن [ʔaskana] “lodged”
أبــطــل [ʔabṭala] “thwarted” انــفــجــر [ʔinfajara] “exploded” تــكــتّم [takattama] “held one’s tongue” تــســمر ّ [tasammara] “was nailed down” تــشــبه ّ [tašabbaha] “compared o.s. with” كــلّف [kallafa] “charged with” تــصــالــح [taṣaalaħa] “became reconciled with one another” أوضــح [ʔawḍaħa] “explained” اكــتــأب [ʔiktaʔaba] “was depressed”
146
abdessatar mahfoudhi
APPENDIX 3 The Stimuli from Experiment 2B +Pattern نــعـــى [naʕaa] “announced the death” وفــى [wafaa] “was loyal” ثــنــى [θanaa] “bent” أعــار [ʔaʕaara] “lent”
+Slightly different pattern مــال [maala] “bent”
احــتــمــى [ʔiħtamaa] “protected o.s.” أمــضــى [ʔamḍaa] “spent”
نـــال [naala] “obtained” ثــاب [θaaba] “returned” أهــدى [ʔahdaa] “gave as a present” احــتــال [ʔiħtaala] “resorted to tricks” اجــتــاح [ʔijtaaħa] “stroke” أبــاد [ʔabaada] “exterminated”
رحــى [raħaa] “ground” رضــى [raḍaa] “accepted” ارتــمــى [ʔirtamaa] “threw oneself” انــبــنــى [ʔinbanaa] “was built”
زال [zaala] “disappeared” زار [zaara] “visited” احــتــاط [ʔiħtaaṭa] “was cautious” انــحــاد [ʔinħaada] “deviated”
أجــال [ʔajaala] “passed around” حــوى [ħawaa] “comprised”
أخــفــى [ʔaxfaa] “hid” خــاف [xaafa] “was afraid”
اكــتــفــى [ʔiktafaa] “was satisfied”
+Orthophono
Target
ســهــا والــى [sahaa] [waalaa] “was close (to s.o. “was absent-minded” or a party)” شــدا مـنّى [šadaa] [mannaa] “chanted” “made s.o. hope” ســطــا طــوّ ر [saṭaa] [ṭawwara] “assailed” “developed” أجــاب احــتــذى [ʔajaaba] [ʔiħtaδaa] “answered” “imitated” أنــار [ʔanaara] “illuminated”
اعــتــنــى [ʔiʕtanaa] “took care (of)”
أقــام [ʔaqaama] “set up” امــتــاز [ʔimtaaza] “was distinguished” ســلّى [sallaa] “entertained” هــدّ م [haddama] “demolished” أتــقــن [ʔatqana] “mastered” أنــزل [ʔanzala] “caused to descend” ّاعــتــز [ʔiʕtazza] “was proud” دمـّر [dammara] “destroyed”
ارتــقــى [ʔirtaqaa] “ascended” أغــرى [ʔaġraa] “tempted” دعــا [daʕaa] “invited” لــهــا [lahaa] “amused o.s.” انــتــقــى [ʔintaqaa] “selected” انــجــلــى [ʔinjalaa] “was removed” أذاع [ʔaδaaʕa] “disseminated” بــدا [badaa] “appeared”
roots and patterns in arabic lexical processing
بــكــى [bakaa] “wept” اغــتــال [ʔiġtaala] “assassinated” أوحــى [ʔawħaa] “inspired”
بــاح [baaħa] “revealed” احــتــفــى [ʔiħtafaa] “received kindly” أضــاع [ʔaḍaaʕa] “lost”
اكــتــرى [ʔiktaraa] “rented”
ارتــاح [ʔirtaaħa] “rested”
أجــاد أخــلــى [ʔajaada] [ʔaxlaa] “mastered” “emptied” لــوى جــاع [lawaa] [jaaʕa] “cursed” “became hungry” كــوى طــار [kawaa] [ṭaara] “burnt” “flew” أشــاع أرســى [ʔašaaʕa] [ʔarsaa] “brought to public “fixed firmly” notice” أذاب أنــجــى [ʔaδaaba] [ʔanjaa] “caused to melt” “rescued” بــقــى صــان [baqaa] [ṣaana] “remained” “preserved” ارتــوى اجــتــاز [ʔirtawaa] [ʔijtaaza] “quenched oneʕs “passed” thirst” اســتــوى احــتــاج [ʔistawaa] [ʔiħtaaja] “was the same “needed” level” نــوى طــاف [nawaa] [ṭaafa] “intended” “went about” ابــتــغــى اغــتــاظ [ʔibtaġaa] [ʔiġtaaḍa] “desired” “became angry” طــلــى بــات [ṭalaa] [baata] “painted” “spent the night”
غــذّى [ġaδδaa] “nourished” أشــقــى [ʔašqaa] “made unhappy” ارتــشــى [ʔirtašaa] “accepted a bribe” اســتــبــاح [ʔistabaaħa] “regarded as public” اســتــحــى [ʔistaħaa] “was ashamed” فــسـّر [fassara] “explained” كــشر ّ [kaššara] “grinned” انــزوى [ʔinzawaa] “lived in seclusion” اشــتــرى [ʔištaraa] “bought” هــنّأ [hannaʔa] “congratulated” أهــان [ʔahaana] “humiliated” اســتــغــاث [ʔistaġaaθa] “appealed for help” جــمـّـع [jammaʕa] “grouped” اســتــنـتــج [ʔistantaja] “concluded” ســخـّـن [saxxana] “heated”
147
دنــا [danaa] “came near” اقــتــاد [ʔiqtaada] “led” أقــصــى [ʔaqṣaa] “removed” انــتــهــى [ʔintahaa] “finished” أعــان [ʔaʕaana] “helped” طــفــا [ṭafaa] “floated” حــشــا [ħašaa] “stuffed” أطــاح [ʔaṭaaħa] “caused to fall” أفــاد [ʔafaada] “was useful” طــهــا [ṭahaa] “cooked” اشــتــهــى [ʔištahaa] “desired” ابــتــلــى [ʔibtalaa] “put to the test” رجــا [rajaa] “aspired” اقــتــنــى [ʔiqtanaa] “acquired” قــســا [qasaa] “was cruel”
148 عــمــى [ʕamaa] “became blind” زنــى [zanaa] “committed adultery” انــصــاع [ʔinṣaaʕa] “gave in” مــشــى [mašaa] “walked” عــوى [ʕawaa] “howled” أضــاف [ʔaḍaafa] “added”
abdessatar mahfoudhi
غــاص [ġaaṣa] “dived” ســال [saala] “flowed”
ّ عــطر [ʕaṭṭara] “perfumed” كــلّف [kallafa] “charged (with)”
خــطــا [xaṭaa] “stepped” عــفــا [ʕafaa] “forgave”
انــطــوى [ʔinṭawaa] “was folded” صــاح [ṣaaħa] “screamed” نــام [naama] “slept” أضــنــى [ʔaḍnaa] “weakened”
انــهــار [ʔinhaara] “collapsed” زهــا [zahaa] “was radiant” صــفــا [ṣafaa] “became pure” أشــاد [ʔašaada] “praised”
عــال [ʕalaa] “rose” كــســا [kasaa] “clothed” احــتــســى [ʔiħtasaa] “sipped” مــحــا [maħaa] “wiped off” ســمــا [samaa] “towered up” اســتــغــنــى [ʔistaġnaa] “was able to dispense with” غــفــا [ġafaa] “dosed off” نــمــا [namaa] “grew”
اســتــشــار [ʔistašaara] “asked for advice” حــكــى [ħakaa] “told” شــفــى [šafaa] “cured” ارتــدى [ʔirtadaa] “wore” ســقــى [saqaa] “irrigated” رثــى [raθaa] “elegized” اســتــجــدى [ʔistajdaa] “begged”
اســتــعــصــى [ʔistaʕṣaa] “was difficult”
أنــســى [ʔansaa] “made forget” صــلّى [ṣallaa] “prayed” غــنّى [ġannaa] “sang” امــتــطــى [ʔimtaṭaa] “took a means of transport” اســتــرق [ʔistaraqa] “stole”
قــاس [qaasa] “measured” بــان [baana] “appeared” ارتــاد [ʔirtaada] “frequented” فــاض [faaḍa] “overflowed” جــاء [jaaʔa] “came” اســتــقــال [ʔistaqaala] “resigned”
حــوّل [ħawwala] “transformed” ســلّح [sallaħa] “armed” انــحــبــس [ʔinħabasa] “was held” ضــمد ّ [ḍamma] “bandaged” عــلّم [ʕallama] “taught” اســتــنــد [ʔistanada] “was based”
خــشــى [xašaa] “feared” شــوى [šawaa] “grilled”
تــاه [taaha] “got lost” داس [daasa] “stepped on”
فــجـّر [fajjara] “exploded” كــلـّم [kallama] “talked to”
ا ســتــفــاق [ʔistafaaqa] “woke up”
roots and patterns in arabic lexical processing
أصــاب [ʔaṣaaba] “hit the target” وقــى [waqaa] “protected” هــذى [haδaa] “was delirious”
ألــقــى [ʔalqaa] “threw” حــام [ħaama] “hovered” خــان [xaana] “was disloyal”
اشــتــكــى [ʔištakaa] “complained” دوّ ى [dawwaa] “sounded” مــوّ ل [mawwala] “financed”
أطــاع [ʔaṭaaʕa] “obeyed” أتــاح [ʔataaħa] “made possible”
أعــفــى [ʔaʕfaa] “relieved” أوصــى [ʔawṣaa] “entrusted”
ارتــخــى [ʔirtaxaa] “slackened” انــحــنــى [ʔinħanaa] “bent”
149
أغــار [ʔaġaara] “raided” رنــا [ranaa] “looked intently” غــال [ġalaa] “became excessive (of prices, ideas)” أدام [ʔadaama] “made last” أذاق [ʔaδaaqa] “had s.o. taste s.th.”
References Abu-Rabia, Salim & Yasmin Awwad. 2004. “Morphological Structures in Visual Word Recognition: The case of Arabic”. Journal of Research in Reading 27:321-336. Benmamoun, Elabbas. 1999. “Arabic Morphology: The central role of the imperfective”. Lingua 108:175-201. Berg, Thomas & Hassan Abd-El-Jawad. 1996. “The Unfolding of the Suprasegmental Representations: A crosslinguistic perspective”. Journal of Linguistics 32:291-324. Bohas, Georges & Jean-Patrick Guillaume. 1984. Étude des théories des grammariens arabes: morphologie et phonologie [Study of the theories of Arab grammarians: Morphology and Phonology]. Damas: Institut Français de Damas. Boudelaa, Sami & William D. Marslen-Wilson. 2001. “Morphological Units in the Arabic Mental Lexicon”. Cognition 81:65-92. _____. 2005. “Discontinuous Morphology in Time: Incremental masked priming in Arabic”. Language and Cognitive Processes 20.1-2:207-260. Butterworth, Brian. 1983. “Lexical Representation”. Language Production. Vol. 2 ed. by Brian Butterworth, 257-294. London: Academic Press. Cantineau, Jean. 1950a. “Racines et schèmes [Roots and patterns]”. Mélanges ed. by W. Marcais, 119-124. Paris: Maisonneuve et Cie. _____. 1950b. “La notion de schème et son altération dans diverses langues sémitiques [The notion of pattern and its modification in different Semitic languages]”. Semitica III 73-83. Caramazza, Alfonso, Alessandro Laudanna & Cristina Romani. 1988. “Lexical Access and Inflectional Morphology”. Cognition 28:297-332. Deutsch, Avital, Ram Frost & Kenneth I. Forster. 1998. “Verbs and Nouns Are Organized and Accessed Differently in the Mental Lexicon: Evidence from Hebrew”. Journal of Experimental Psychology: Learning, Memory, and Cognition 24.5:1238-1255. Frisch, Stefan & Bushra Zawaydeh. 2001. “The Psychological Reality of the OCP Place in Arabic”. Language 77: 91-106.
150
abdessatar mahfoudhi
Frost, Ram, Kenneth Forster & Avital Deutsch. 1997. “What Can We Learn from the Morphology of Hebrew? A masked-priming investigation of morphological representations”. Journal of Experimental Psychology: Learning, Memory, and Cognition 23:829-856. Goldsmith, John. 1976. “An Overview of Autosegmental Phonology”. Linguistic Analysis 2.1:23-68. McCarthy, John J. 1981. “A Prosodic Theory of Non-concatenative Morphology”. Linguistic Inquiry 12:373-418. Mimouni, Zohra, Eva Kehayia & Gonia Jarema. 1998. “The Mental Representation of Singular and Plural Nouns in Algerian Arabic as Revealed through Auditory Priming in Agrammatic Aphasic Patients”. Brain and language 61:63-87. Plaut, David & Laura M. Gonnerman. 2000. “Are Non-semantic Morphological Effects Incompatible with a Distributed Connectionist Approach to Lexical Processing?” Language and Cognitive Processes 15.4-5:445-485. Prunet, Jean-François, Renée Béland & Ali Idrissi. 2000. “The Mental Representation of Semitic Words”. Linguistic Inquiry 31.4:609-648. Ratcliffe, Robert R. 1997. “Prosodic Templates in a Word-based Morphological Analysis of Arabic”. Perspectives on Arabic Linguistics X ed. by Mushira Eid & Robert R. Ratcliffe, 147-171. Amsterdam: John Benjamins. Seidenberg, Mark S. & James L. McClelland. 1989. “A Distributed, Develop-mental Model of Word Recognition and Naming”. Psychological Review 96.4:523568. Taft, Marcus. 1981. “Prefix Stripping Revisited”. Journal of Verbal Learning and Verbal Behavior 20:289-297. Watson, Janet. 2002. The Phonology and Morphology of Arabic. Oxford, UK: Oxford University Press.
AFFRICATION IN NORTH ARABIC REVISITED∗
Eiman Mustafawi Qatar University
1. Introduction One of the characteristics of North Arabic1 varieties is the affrication of the voiced velar stop [ɡ] to [ǰ]2 (Johnstone 1967:2), a process that is generally assumed to be triggered in the contiguity of front vowels (Cantineau 1936, 1937; Johnstone 1967, 1978; Matar 1969, 1985; Al-amadidhi 1985; among others). This requirement is exemplified in (1). (1) [ǰ] occurs adjacent to front vowels, [ɡ] occurs elsewhere: a. riiǰ ‘saliva’ b. ryuuɡ ‘breakfast’ c. booɡ ‘stealing’
Nevertheless, exceptions to this generalization are not uncommon (Johnstone 1967, 1978; Matar 1969, 1985; Al-amadidhi 1985). That is, *
This paper is a shorter version of chapter 3 of my Ph.D. dissertation, which is partly funded by a scholarship from the Faculty of Graduate and Postdoctoral Studies at the University of Ottawa and another scholarship from the Counsel of Supreme Education in Qatar. I would like to thank my supervisor Marie-Hélène Côté for her lengthy discussions of the material and for her valuable comments on earlier versions of this work. I am the only person responsible for any possible errors in fact or interpretation. 1 A dialect group, which according to Johnstone (1967), includes the Arabic varieties spoken along the west coast of the Persian Gulf and the middle territory of Saudi Arabia, in addition to the Bedouin-origin varieties in the Levant and Iraq. 2 In these varieties /k/ also affricates to [č]. However, since both alternations are governed by the same constraints (see Mustafawi 2006) and due to space limitations, the latter alternation is not discussed here.
152
EIMAN MUSTAFAWI
these authors recognize the existence of cases in which the condition of affrication is satisfied yet the process does not apply, as exemplified in (2), and cases in which affrication seems to apply although its condition is not met, as shown in (3). (2) a. ɡees b. ɡidar c. saɡf d. diɡiiɡɐ e. buxnaɡ (3) a. rifǰɑɑn b. ʕtɑɑǰ c. ǰɑɑsim
‘hopscotch’ ‘he could’ ‘ceiling’ ‘a minute’ ‘a traditional outfit for girls’ ‘friends’ ‘old (pl.)’ ‘divider (p.n.)’
Indeed, “it should be noted that these affricates do not occur in every word in which it is theoretically possible for them to occur, and that occasionally they do occur in the contiguity of back vowels” (Johnstone 1967:6). These exceptions constitute a problem that has not been adequately dealt with so far. The current paper provides an analysis of the affrication of [ɡ] to [ǰ] in one of the varieties that exhibit this alternation, namely, Qatari Arabic3 (QA). The reason for restricting the analysis to one variety is my observation that the context triggering the process differs to some extent interdialectally. This is apparent in the data cited in Johnstone (1967), who, nevertheless, ignores these differences when stating a unique rule for the application of the process in the varieties he investigated. Previous analyses of affrication in QA suggest that the process is triggered in the contiguity of front vowels, as in the other varieties of Arabic that exhibit this alternation (Johnstone 1967; Matar 19854; Alamadidhi 1985). Further, Al-amadidhi (1985) finds that, in QA, the 3
QA is spoken in the state of Qatar. Johnstone (1967) recognizes QA as a one of the dialects of the Eastern Arabian coast, which represent one group of the North Arabian dialects. The variety investigated here is the one spoken in the city of Doha, as it varies to some extent from that spoken in some towns and villages in the north and south of the country. In addition to being the affricate variant of /γ/ in QA, [ǰ]is a distinct phoneme, that is, /ǰ/. 4 Matar investigated affrication in Kuwaiti Arabic (1969) and Bahraini Arabic (1985); nevertheless, he suggests that his findings apply to the varieties of Qatar and the UAE as well.
AFFRICATION IN NORTH ARABIC REVISITED
153
process is optional, which is also found to be the case in one of the Bahraini varieties investigated by Matar (1985) and in the varieties of both Kuwait and Bahrain, according to Johnstone (1967). However, in the current study, although the variability of the process is confirmed, the context triggering it is argued to be adjacency to only the high front vowels [i] and [ii]. Other segments including front vowels that are [-high] block affrication. The domain of affrication is found to be the stem, therefore segments that are adjacent to /ɡ/ but not part of the stem do not affect the process. Further, due to paradigmatic effects affrication does not apply to broken plurals, verbs, participles and verbal nouns.5 I also discuss apparent counterexamples to my analysis. These are argued to be instances of doublets, in which case the relevant items would be represented as /ǰ/ at the lexical level of the grammar of QA. The analysis adopted here employs the constraint-based model of Optimality Theory (Prince & Smolensky 2004), which holds that linguistic units are the outcome of the interaction among violable universal constraints that are ranked on a language-specific basis. The main contribution of this study is that the cases that were previously considered to be exceptions to the process become completely transparent. The paper is organized as follows: in section 2, the phonetic context triggering the process is discussed and the basic constraints employed to account for the alternation are introduced. The domain of the process is established in section 3, and the interaction between /ɡ/ affrication and paradigm uniformity requirements is discussed in section 4. In section 5 apparent counterexamples are analyzed. The conclusion is given in section 6, with a summary of the constraints employed to account for this phenomenon. 2. The Phonetic Context of Affrication Contrary to the traditional assumption, I find that in QA, affrication of the voiced velar stop /ɡ/ applies only when adjacent to the high front vowel [i] or its long counterpart [ii], which is indicated by the examples given in (4) and (5). The items in (4) also show that the process is 5
Other factors contributing to affrication but not discussed here are the OCP and emphasis spread (Mustafawi 2006).
154
EIMAN MUSTAFAWI
variable. Further, in order to undergo the process the velar stop needs to be adjacent to only [i(i)]. That is, word-medial /ɡ/ must be both immediately preceded and followed by a high front vowel for affrication to apply. Any other segment that is adjacent to /ɡ/, at the same time, including a front vowel other than [i(i)], blocks the process. This restriction is apparent in the examples given in (5d-e). (4) a. ɡiriib b. diɡiiɡ c. riɡiiɡ d. ɡiliil e. riiɡ f. ɡiimɜ (5) a. ʔablaɡ b. lɡeemɑɑt c. fooɡ d. liɡan e. saɡii f. ʕinɡeeš g. ʕanɡaz h. ɡubɡub
ǰiriib diǰiiǰ riǰiiǰ ǰiliil riiǰ ǰiimɜ
‘close by’ ‘thin/small’ ‘thin/transparent’ ‘little quantity/lacking’ ‘saliva’ ‘price’ ‘s.o. with a defected eye (m.)’ ‘dumplings’ ‘up/above’ ‘large dish’ ‘high tide’ ‘fruit seed’ ‘chicken pox’ ‘a crab’
The situation described is the result of the interaction of four constraints that allow the affricate [ǰ] in the contexts exemplified in (4) and enforce the velar stop in other contexts. The first constraint is one of the faithfulness family (McCarthy & Prince 1995) and it militates against changes in the place of articulation. The second constraint belongs to the markedness family and it requires [ɡ] to occur adjacent to segments other than [i(i)]. The definition of this constraint is given in (7). (6) Faithfulness: MAX-IO (dorsal)6 Every [dorsal] specification in the input is present in the output.
6
I consider the affricate to be a [-cont] coronal segment. For a discussion of the place feature of [ǰ] in QA see Mustafawi (2006).
AFFRICATION IN NORTH ARABIC REVISITED
155
(7) Markedness constraint: [ɡ] <--> ¬ [-back, +high] 7 [ɡ] occurs adjacent to segments other than [i(i)]. <--> means ‘adjacent’. ¬ means ‘not’. [-back, +high] represent segments that are characterized by these features, which are [i] and [ii].
To account for the variability of the process, constraints (6) and (7) need to be crucially unranked with respect to each other as suggested by Anttila (1997), Anttila & Cho (1998) and Auger (2001), among others. In order to ensure that the non-faithful variant of /ɡ/ is [ǰ], not any other segment, a generic faithfulness constraint is necessary. Constraint (8) needs to outrank constraints (6) and (7), as shown in Tableau 1. (8) FAITH-F Output correspondents of an input [αF] segments are also [αF]. (Correspondent segments in input and output have identical values for [voice], [high] and [cont].
Tableau 1 shows the output of the interaction among the three constraints in contexts other than that triggering affrication. The faithful candidate (a) violates none of the constraints and is necessarily optimal. All the other candidates are ruled out by being unfaithful to the input in at least one of the features [high], [voice] or [cont]. Tableau 1 Constraint ranking: FAITH-F » MAX-IO (dorsal), [ɡ] <--> ¬[-back, +high] /liɡan/ FAITH-F MAX-IO (dorsal) [ɡ] <--> ¬ [-back, +high] ‘large plate’ a. →liɡan b. liǰan *! c. likan *! d. ličan *! * e. liqan *!* f. liɣan *!* * g. lixan *!** * h. lišan *!* *
7
The format of this constraint enforcing adjacency is borrowed from Côté (2000).
156
EIMAN MUSTAFAWI
Tableaux 2 and 3 show how to obtain variation between [ɡ] and [ǰ] in the context triggering affrication, as a result of crucial non-ranking between the markedness constraint [ɡ]<-->¬[-back, +high] and the faithfulness constraint MAX-IO(dorsal). If MAX-IO(dorsal) is ranked higher than [ɡ]<-->¬[-back, +high], as in Tableau 2, then the output is candidate (a), which contains a segment [ɡ] that is not adjacent to a segment other than [i(i)], in violation of [ɡ]<-->¬[-back, +high]. If, on the other hand, the markedness constraint outranks MAX-IO(dorsal), then the output is candidate (b), as is the case in Tableau 3. Tableau 2 Constraint ranking: FAITH-F » MAX-IO (dorsal) » [ɡ] <--> ¬[-back, +high] /riiɡ/ ‘saliva’ FAITH-F MAX-IO (dorsal) [ɡ]<-->¬[-back, +high] a. →riiɡ * b. riiǰ *! c. riiš *!* * d. riiq *!* e. riiɣ *!* * Tableau 3 Constraint ranking: [ɡ] <--> ¬[-back, +high] » MAX-IO (dorsal) 8 /riiɡ/ ‘saliva’ [ɡ] <--> ¬[-back, +high] MAX-IO (dorsal) a. riiɡ *! b. → riiǰ *
3. The Stem as the Domain of Affrication 3.1 Suffixation and cliticization The domain of affrication is the stem; therefore, neither suffixation nor cliticization affect the process, as shown in (9). (9)
a. diɡiiɡ-ɐ b. riɡiiɡ-ɐ c. riiɡ-ha d. ṭɘriiɡ-na e. ṭɘriiɡ-kum
diǰiiǰ-ɜ riǰiiǰ-ɜ riiǰ-ha ṭɘriiǰ-na ṭɘriiǰ-kum
‘thin/small (f.)’ ‘delicate (pl. non-human)’ ‘her saliva’ ‘our way’ ‘your way’
The feminine morpheme [-ɜ]/[-ɐ]9 is suffixed to adjectives that modify
8
Constraint (8) is not shown in this and subsequent tableaux.
AFFRICATION IN NORTH ARABIC REVISITED
157
singular feminine nouns, as in (9a). It also assigns the feminine gender to adjectives that modify plural non-human nouns, as exemplified in (9b).10 In both cases, this morpheme does not affect the context of affrication, which applies even though [ɡ] is followed by a segment other than a high front vowel. The nouns in (9c-e) are suffixed by the possessive pronouns [-ha], [-na] and [-kum], which are assumed to be clitics (McCarthy 2005). Like the feminine morpheme, these clitics do not block affrication. That is because the feminine morpheme in (9a-b) and the clitics in (9c-e) are outside the domain of the process, which is the stem. 3.2 Internal modifications to the stem Since the domain of affrication is the stem, any modification to the stem potentially affects the context in which the process may apply. This is evident in the broken plurals (BP) of singular forms that may undergo affrication. That is, when broken plural formation involves internal modifications to stem (Wright 1967 vol. 1; McCarthy & Prince 1990:211) in ways that remove the phonetic context in which /ɡ/ occurs, affrication gets blocked in these plural forms. This is illustrated in (10), where variable affrication in singular forms (first column) is blocked in the corresponding broken plurals (second column). (10)
Singular a. ɡiriib/ǰiriib b. diɡiiɡ/diǰiiǰ c. riɡiiɡ/riǰiiǰ d. ɡiliil/ǰiliil e. ṭɘriiɡ/ṭɘriiǰ
BP ɡrɑɑb/*ǰrɑɑb dɡɑɑɡ/*dǰɑɑǰ rɡɑɑɡ/*rǰɑɑǰ ɡlɑɑl/*ǰlɑɑl ṭuruɡ/*ṭuruǰ
‘close by’ ‘thin/small’ ‘transparent/delicate’ ‘small quantity’ ‘way’
The definition of the constraint given in (7) needs to be modified in order to account for the fact that the phonetic context of /ɡ/ affrication needs to be met within the stem. Therefore, a new definition is given to this constraint in (11).
9
This morpheme has two surface representations as indicated in the examples above; [-ɐ] surfaces when preceded by a segment that is [+back], otherwise [-ɜ] surfaces. 10 It is possible that the feminine morpheme in (9a) is different from that in (9b).
158
EIMAN MUSTAFAWI
(11) Markedness: [ɡ] <--> ¬[-back, +high]stem [ɡ] occurs adjacent to segments other than [i(i)] within its stem.
In a form like (9c), that is, [riiɡ-ha], this constraint is violated since [ɡ] is only adjacent to [ii] in the stem, and affrication optionally applies, as in Tableaux 2 and 3. 4. Paradigm Uniformity Effects 4.1 Broken plurals The phonetic conditioning of affrication is met in a number of broken plural forms as in (12); however, the process does not apply to these forms. (12)
BP a. ɡišrɑɑn b. baxɑɑniɡ c. ɡibɑɑɡib 11
Singular ʔaɡšar buxnaɡ ɡubɡub
‘aggressive’ ‘a traditional outfit’ ‘a crab’
Notice that the phonetic conditioning of affrication is not met in the singular forms of these broken plurals, which I find to be motivating affrication blockage in the broken plural forms. That is, affrication under-applies in the broken plural forms that meet the phonetic conditioning of the process, if their singular forms do not undergo the process. These examples indicate that there is a paradigmatic effect that is forcing the broken plural forms to copy their singular bases with respect to the variant they adopt for underlying /ɡ/. Constraints forcing unity in derivational or inflectional paradigms (McCarthy & Prince 1995) are proposed in many studies to account for similar cases in different languages. Among these are Benua (1997), Hayes (1998), Steriade (2000), Burzio (1994), Kenstowicz (1996, 1998), McCarthy (2005) and Gafos (2003). The idea is based on the observation that words that are related either derivationally or inflectionally (or even in both ways) resist certain phonological processes or over-apply them, in order to keep identity with the rest of the paradigm that they belong to, with respect to a certain feature. The constraint responsible for this effect in the current situation is given in (13). 11
Since this item is a case of reduplication, Base-Reduplicant-faithfulness could equally account for the blockage of affrication in the first /ɡ/ in the plural form in (12c), see McCarthy & Prince (1995).
AFFRICATION IN NORTH ARABIC REVISITED
159
(13) MAX-OO (dorsal) Every [dorsal] specification in a base form is present in derived forms.
The base in nominal/adjectival paradigms is usually the singular form. And in the current case, the inflected form is the broken plural. In Tableau 4, the singular of (12a) is evaluated, and constraint (13) is irrelevant to the base since the inflected form is required to mimic the base, not vice versa. Therefore, constraint (13) is not included in Tableau 4. In this Tableau, candidate (b) violates MAX-IO, and as a result, candidate (a) becomes optimal. Tableau 4 Constraint ranking: [ɡ] <--> ¬[-back, +high]stem, MAX-IO (dorsal) /ʔaɡšar/ [ɡ]<--> ¬ [-back, +high]stem MAX-IO (dorsal) ‘aggressive’ a. → ʔaɡšar b. ʔaǰšar *!
To evaluate the broken plural of (12a), constraint (13) is included in Tableau 5. Since candidate (b) violates MAX-OO, as well as MAX-IO, candidate (a) becomes optimal. Notice that the constraint in (13) needs to outrank the basic constraints of affrication. Tableau 5 Constraint ranking: (dorsal) /ɡišrɑɑn/ ‘aggressive (pl.)’ a. → ɡišrɑɑn b. ǰišrɑɑn
MAX-OO (dorsal) » [ɡ] <--> ¬[-back, +high]stem, MAX-IO MAX-OO (dorsal)
[ɡ]<--> ¬ [-back, +high]stem
MAX-IO (dorsal)
* *!
*
In the case of the items given in (10), a representative of which is repeated here as (14), there is variation in the base forms, but not in the broken plural forms. (14) ɡiriib/ǰiriib
ɡrɑɑb /*ǰrɑɑb
‘close by’
In Tableau 6, candidates for the singular of (14) are evaluated, and since the two relevant candidates incur the same number of violations to the equally ranked constraints, both of them become optimal.
160
EIMAN MUSTAFAWI
Tableau 6 Constraint ranking: [ɡ] <--> ¬[-back, +high]stem, MAX-IO (dorsal) /ɡiriib/ [ɡ]<--> ¬ [-back, +high]stem MAX-IO (dorsal) ‘close by’ a. →ɡiriib * b. → ǰiriib *
However, the broken plural of this form surfaces invariably with the velar stop, as shown in Tableau 7. Here, each candidate is faithful to one of the variants of the base, so none of them violates MAX-OO. However, since candidate (b) violates MAX-IO, the faithful candidate (a) becomes optimal. Tableau 7 Constraint ranking: MAX-OO (dorsal) » [ɡ] <--> ¬[-back, +high]stem, MAX-IO (dorsal) /ɡrɑɑb/ MAX-OO [ɡ]<--> ¬ [-back, +high]stem MAX-IO ‘close by (pl.)’ (dorsal) (dorsal) a. → ɡrɑɑb b. ǰrɑɑb
*!
4.2 Verbs None of the items undergoing affrication in (4) above is a verb, a fact that indicates that /ɡ/ affrication is blocked in verbs. Although many verbs meet the phonetic conditioning of affrication, they do not undergo the process, as illustrated in (15). (15) a. ta-zliɡ-iin b. y-ʕarriɡ c. ɡidar d. y-ħadiɡ e. ɡimt f. ɡilt
‘you slip (f.)’ ‘he sweats’ ‘he could’ ‘he fishes’ ‘I got up’ ‘I said’
This pattern is also observed in verbs that share their consonantal roots and semantic fields with nouns/adjectives that may undergo affrication. This is exemplified in (16) in which an example of one verbal form is given for each of the items undergoing affrication in (4) above, except for (4f) which does not have a verbal correspondent in QA.
AFFRICATION IN NORTH ARABIC REVISITED
(16) a. y-ɡarrib b. y-daɡɡiɡ c. y-raɡɡiɡ d. y-ɡɘḷḷ e. y-itrayyaɡ
‘he becomes close by’ ‘he makes small’ ‘he makes thin’ ‘it becomes little’ ‘he has breakfast’
161
(ɡiriib/ǰiriib) (diɡiiɡ/diǰiiǰ) (riɡiiɡ/riǰiiǰ) (ɡiliil/ǰiliil) (riiɡ/riiǰ)
The motivation for this pattern seems to be the need for paradigm uniformity in inflectionally related verbs. Any verbal inflectional paradigm in QA may include as many as nineteen members that represent different combinations of inflection for tense, aspect, person, gender, and number. It is not possible to host the condition of affrication in all of these members in any paradigm. That is because in addition to affixation, which I suggest does not affect the context of affrication, the stem adopts different templates in each verbal paradigm. Hence, if the condition of affrication is met in some members of a certain paradigm, it is not met in other members of that paradigm, as illustrated in (17). (17) a. zilaɡ/ta-zliɡ-iin b. ʕarraɡ/y-ʕarriɡ c. ɡidar/ʔa-ɡdar d. ħadaɡ/y-ħadiɡ e. ɡuum/ɡimt f. ɡɑɑl/ɡilt
‘he slipped/you slip (f.)’ ‘he sweated/he sweats’ ‘he could/ I can’ ‘he fished/ he fishes’ ‘you get up!/ I got up’ ‘he said/I said’
Therefore, instead of including members that surface variably with [ɡ] and [ǰ], and others that surface only with [ɡ], the grammar chooses to restrict the form that is adopted for underlying /ɡ/ in these paradigms to one surface representation. This effect can be reached by employing an undominated output-output faithfulness constraint, in the sense of the Optimal Paradigms Model proposed by McCarthy (2005). This constraint is given in (18). (18) Faithfulness: MAX-OP (dorsal) A [dorsal] specification in a member of an inflectional paradigm is present in every other member of that paradigm.
This constraint requires that the stem in each member of verbal inflectional paradigms be faithful to the stem in the other members in that paradigm with respect to the variant adopted for underlying /ɡ/
162
EIMAN MUSTAFAWI
(McCarthy 2005). Unlike the case with nouns and adjectives, in verbal paradigms, there is no specific base to which the inflected forms need to conform. All the members of verbal inflectional paradigms are equal according to the Optimal Paradigms model proposed by McCarthy (2005). This motivates the distinction between MAX-OO in (13) and MAX-OP in (18). However, OP cannot rule out a candidate whose members uniformly surface including [ǰ], instead of [ɡ]. Therefore, an additional constraint is needed to rule out this candidate as given in (19). (19) Markedness: ǰ <--> [i(i)] ǰ is adjacent to [i(i)].
This constraint requires that [ǰ] be adjacent to either [i] or [ii]. In Tableau 8, each candidate consists of a whole inflectional paradigm, and the members of each candidate are evaluated simultaneously. In addition, the violations incurred by the members of a given paradigm are added up together (McCarthy 2005). Candidate (a) consists of a paradigm whose members surface invariably with [ɡ], though some of them host the phonetic conditioning of affrication. Candidate (b) consists of a paradigm that surfaces invariably with the affricate. And candidate (c) consists of a paradigm whose members vary with respect to the variant they adopt for /ɡ/. Candidate (c) is ruled out by violating the undominated MAX-OP (dorsal), since the members that surface with [ɡ] in this paradigm are unfaithful to the members that surface with [ǰ], and vice versa. In Candidate (b), all of the members that surface with [ǰ] that is not adjacent to [i(i)] violate the markedness constraint (ǰ <--> [i(i)]), therefore, this candidate is ruled out. In Candidate (a), the members in which [ɡ] occurs adjacent to [i] violate the markedness constraint ([ɡ] <--> ¬ [-back, +high]stem), but since this constraint is ranked lower than both MAX-OP (dorsal) and ǰ <--> [i(i)], candidate (a) becomes optimal.
AFFRICATION IN NORTH ARABIC REVISITED
163
Tableau 8 Constraint ranking: MAX-OP (dorsal), ǰ <--> [i(i)] » [ɡ] <--> ¬ [-back, +high]stem, MAX-IO (dorsal) /ɡidar, ʔaɡdar../ MAXǰ <--> [ɡ] <--> ¬ [-back, MAX-IO ‘to be able to’ OP [i(i)] +high]stem (dorsal) (dorsal) a.→ ɡidar, ʔaɡdar.. * b. ǰidar, ʔaǰdar.. *! ** c. ǰidar, ʔaɡdar.. *!* *
4.3 Participles and Verbal Nouns The phonetic conditioning of /ɡ/ affrication is met in a number of active participle forms and verbal nouns (a type of infinitive); however, the process does not apply to them, as illustrated in (20). (20)
Verb a. ʕišaɡ b. ʕallaɡ c. wɑɑfaɡ d. ħilaɡ e. farraɡ
Active participle ʕɑɑšiɡ m-ʕalliɡ m-wɑɑfiɡ ħɑɑliɡ m-farriɡ
verbal noun ʕišiɡ/ʕišɡ taʕliiɡ mwɑɑfiɡa ħilɑɑɡa tafriiɡ
‘to love’ ‘to hang (s.th.)’ ‘to agree’ ‘to shave’ ‘to cause to separate’
Generally, the active participle in QA as in Classical Arabic (CA) indicates “a temporary, transitory or accidental action or state of being” (Wright 1967 vol. 1:131-132), and it “signifies the doer of an action” (Azami 1988:245). However, in QA, the active participle may be inflected only for gender and number, as illustrated in (21), whereas in CA, it is also inflected for case. (21) a. ħɑɑliɡ b. ħɑɑlɡ-a c. ħɑɑlɡ-iin
‘to shave (sg. m.)’ ‘to shave (sg. f.)’ ‘to shave (pl.)’
The verbal nouns, on the other hand, “are abstract substantives, which express the action, passion, or state indicated by the corresponding verbs, without any reference to object, subject, or time” (Wright 1967 vol. 1:110). Although participles and verbal nouns are generally classified with nouns/adjectives and nouns, respectively, they are treated as verbs in certain contexts, as illustrated in (22) and (23) below. In fact, the active
164
EIMAN MUSTAFAWI
participle is mostly used as a verb in QA. Passive participles are also treated as verbs in certain contexts, but since these may only adopt the template [ma-CCuuC] or [m-CaCCaC] which do not include the context triggering affrication according to the analysis proposed here, the passive participles are not discussed. In CA, however, the class of nouns/adjectives that may behave like verbs is larger (Wright 1967 vol. 2; Al-hashemi 2000:239-65). (22) Active participles: I-perfective: a. Ali ħɑɑliɡ šaʕr-a. Ali shaved hair-3sg.m. “Ali has shaved his hair” II-imperfective: a. il-walad ɡɑɑʕid f il-sayyɑɑra. the-boy sitting in the-car “The boy is sitting in the car” b. mɑɑn-ii ħɑɑliɡ šaʕr-ii. not-1sg. shaving hair-1sg. “I am not going to shave my hair” (23) Verbal nouns: a. ʔashal šay taʕliiɡ il-satɑɑyir. easier thing hanging the-curtains “the easiest thing (to do is) hanging curtains” b. ɡil-t l-ah: il-tafriiɡ been il-nɑɑs muu zeen. told-1sg. to-3sg.m. the-separation between the-people not good “I told him: to cause people to separate (from each other) is not good”
It is worth mentioning that the verbal nouns in contexts such as those exemplified above can be substituted by [ʔan + present] which is one of the conditions for verbal nouns to function as verbs in CA (Alhashemi 2000:242). Also, [ʔan + present] is one of the constructions of the subjunctive mood in Arabic (Wright 1967 vol.2:24), and if the verbal noun could be used in the same context, it follows that the latter may be one of the constructions of the subjunctive mood as well. At least, this seems to be the case in QA. Therefore, I conclude that the reason for the blockage of affrication in these two classes of words is that in this variety, the active participles and the verbal nouns may function as verbs which entails
AFFRICATION IN NORTH ARABIC REVISITED
165
that they should be subject to the same OP (Optimal Paradigms faithfulness) constraint as verbs. Accordingly, I suggest including the participles and the verbal nouns to the inflectional paradigms of their corresponding verbs. This prevents affrication from applying in these forms, as in verbal forms in Tableau 8. The motivation for treating verbal inflectional paradigms differently from other lexical classes in QA could be found in the field of acquisition. Any verbal inflectional paradigm may include as many as nineteen members, in addition to the active and passive participles and verbal nouns, whereas the members of nominal and adjectival paradigms do not exceed four for nouns and five for adjectives (see also McCarthy 2005). It could be more difficult for the learner to acquire larger paradigms if they contain members that exhibit alternations and others that do not. Hence, to facilitate the acquisition process, the grammar of QA chooses to block affrication, in order to keep unity among verbal paradigms. The distinction that the grammar of QA makes between verbs (including participles and verbal nouns) and other lexical classes with respect to affrication is not unattested. Many phonological processes distinguish between different lexical classes in different languages. For example, Smith (1997:2) argues that the category of nouns exhibits a “privileged phonological behavior” and shows phonological contrast more than any other category, such as verbs and adjectives, due to domain-specific faithfulness constraints. She bases her argument on the patterns observed in the distribution of accent among different word classes in different dialects of Japanese. Smith (1997:11) also mentions that nouns display richer phonological contrast than verbs in other languages such as English and Spanish (stress patterns), Arabic (verbs must be templatic but not nouns), and many Bantu languages (only nouns may start with NC clusters). Also, according to Benua (1997:25), morpheme-specific or classspecific phonological behavior may be the outcome of different OOcorrespondence constraints governing these classes. She draws examples from English affixal morphemes, diminutive and distributive reduplication in Lushootseed and imperative truncation and jussive/2fs truncation in Tiberian Hebrew. In Moroccan Arabic, McCarthy (2005) shows that monosyllabic triconsonantal nouns may either take the form CəCC or CCəC, but similar verbs invariably take the form CCəC due to
166
EIMAN MUSTAFAWI
a highly ranked OO-correspondence constraint in verbs. In the current study, however, nouns and adjectives form a category that exhibits more alternation than verbs due to a faithfulness constraint that is operative only in verbal paradigms (Optimal ParadigmFaithfulness). 5. Doublets There are a number of verbs in which both the velar stop [ɡ] and the affricate [ǰ] may occur, as given in (24). Besides, the affricate in most of these items occurs adjacent to the low front vowel [a], which is not predicted, according to the analysis proposed in this paper. These verbs look like counterexamples to my proposal with respect to affrication being under-applied in verbs, and being blocked by adjacency to segments other than [i(i)]. (24) a. ɡaddam/ǰaddam b. ɡassam/ǰassam c. ɡisam/ǰisam d. rɑɑfaɡ/rɑɑfaǰ e. fɑɑraɡ/fɑɑraǰ f. ɣimaɡ/ɣimaǰ g. ʕitaɡ/ʕitaǰ
‘he fronted’ ‘he distributed’ ‘he divided’ ‘he accompanied’ ‘he got off (s.o.’s) back’ ‘it became dark’ ‘it became old’
A close examination of the data, however, reveals that these items do not exemplify cases of affrication in the category of verbs; they are only doublets, which according to (Al-yasoui 1996:10) are “synonyms that share the same template, differ only in one sound, in the same position”. Doublets have come to exist because of “diachronic phonological processes” or “dialect borrowing” (Mahadin 1989:1-2). Al-yasoui (1996:10) finds that the total number of doublets in CA “exceeds 1800 pairs”. These pairs include many items that involve the alternation between <ǰ> and
, the latter being the reflex of QA /ɡ/. Matar also (1985:154-155) cites examples from CA traditional dictionaries that can be pronounced as either [q] (the counterpart of QA [ɡ]) or [ǰ] without the involvement of any SYNCHRONIC phonological process. He refers to borrowing among the different varieties of Old Arabic as responsible for this situation. Classifying the verbs given in (24) as doublets entails that the members of each pair of these are represented distinctly at the lexical
AFFRICATION IN NORTH ARABIC REVISITED
167
level of the grammar of QA. That is, the words surfacing with [ǰ] belong to the phoneme /ǰ/, and those surfacing with [ɡ], belong to the phoneme /ɡ/. This is not an ad hoc stipulation, rather, it is motivated by the way these items are treated in the grammar of QA,12 as will be explained below. Notice that [ɡ] and [ǰ] in the pairs in (24) appear adjacent to [i] (24c) or [a] (24a-b, d-g). It may be suggested that adjacency to front [a] may also trigger affrication in some words. An example such as (24e), however, provides evidence against such a view when emphasis is considered. In this example, the consonant that precedes the vowel adjacent to [ɡ] and [ǰ] is /r/. This segment is reported to surface as emphatic in QA in certain contexts (Bukshaisha 1985; Al-Sulaiti 1993). These contexts include adjacency to back vowels (Hussain 1985) as well as in the neighborhood of underlyingly emphatic segments (Bukshaisha 1985; Hussain 1985), as in some other varieties of Arabic (Cantineau 1936, 1937). In (24e) /r/ is preceded by the low back vowel [ɑɑ] which provides the context for its emphatic variant to surface. The emphatic variant of /r/, as that of the other segments that are contextually emphatic retract the place of articulation of only the ADJACENT vowels (Cantineau 1936) and only tautosyllabic segments, according to Ghazeli (1977:169).13 In this case, /a/ that is adjacent to /r/ in (24e), surfaces retracted, yielding [ɐ]. This central vowel is not expected to trigger affrication, even if we assume that front [a] does, and consequently, [fɑɑrɐǰ] should be ruled out. But if each item in this pair is a distinct lexical item, the verb with [ǰ] occurs freely, regardless of the adjacent segments. Further, forms that are derivationally/inflectionally related to the verbs given in (24) are treated differently from those related to items that display unambiguous cases of affrication. For example, the verbal nouns (not all of these have verbal nouns in QA, though) and the active and passive participles related to these verbs exhibit [ǰ] ~ [ɡ] alternation, which is not the observed pattern for the verbal nouns and participles in QA (section 4.3), including those related to forms 12
In agreement with my intuition, a number of QA speakers pointed out to me that the variants with [ɡ] in (24) are associated with Bedouins, which is not the case with those involving unambiguous affrication, as in (4). 13 Ghazeli attributes this ability only to /r/ because he does not discuss other segments.
168
EIMAN MUSTAFAWI
undergoing affrication. As an illustration, the active participles of the verbs given in (24) are given in (25a-f). These are followed by the active participles related to nouns/adjectives undergoing affrication (25g-k). (25) a. m-ɡaddim b. m-ɡassim c. *ɡɑɑsim d. m-rɑɑfiɡ e. m-fɑɑriɡ f. m-ɣammiɡ g. m-ʕattiɡ COMPARE: g. m-daɡɡiɡ h. m-ɡɑḷḷil i. m-ɡarrib j. m-raɡɡiɡ k. mi-trayyiɡ
m-ǰaddim m-ǰassim ǰɑɑsim m-rɑɑfiǰ mfɑɑriǰ m-ɣammiǰ m-ʕattiǰ
‘to front’ ‘to divide’ ‘to divide (p.n.)’ ‘to accompany’ ‘to get off s.o. back’ ‘to darken (s.th.)’ ‘to make old’ ‘to make small/thin’ ‘to make little’ ‘to make close/near’ ‘to make thin/transparent’ ‘to have breakfast’
Beside the fact that [ǰ] occurs in contexts not triggering affrication in (25a-c), in QA, (25c) surfaces invariably with the affricate. If this item were a case of affrication, it would be expected to surface variably with [ɡ] and [ǰ], as in unambiguous cases of affrication. Another example is the nouns/adjectives/adverbs that are related to the verbs given in (24), which are treated differently from those that display unambiguous cases of affrication. (26) a. ɡiddɑɑm b. ɡismɜ c. *rifiiɡ d. *firiiɡ e. ɣɑɑmiɡ f. ʕatiiɡ
ǰiddɑɑm ǰismɜ rifiiǰ firiiǰ 14 ɣɑɑmiǰ ʕatiiǰ
‘front’ ‘division/distribution’ ‘friend’ ‘neighborhood’ ‘dark’ ‘old’
(26a) surfaces sometimes with an initial high glide, that is [y], which is generally a free variant of underlying /ǰ/, yielding [yiddɑɑm]. This alternation is never exhibited in cases of unambiguous affrication, which indicates that the affricate in (26a) is considered by native 14
It could be argued, however, that this pair is not related to (24e) in the synchronic grammar of QA.
AFFRICATION IN NORTH ARABIC REVISITED
169
speakers to be derived from underlying /ǰ/, not /ɡ/. In (26c) and (26d), only the forms with the affricate are used by QA speakers. If these forms were true cases of affrication, they would be expected to surface variably with both segments.15 Further, the broken plural forms of (26c) and (26d) surface invariably with the affricate, as shown in (27), regardless of the context, which does not trigger affrication. (27) a. rifǰɑɑn16 b. firǰɑɑn
‘friends’ ‘neighborhoods’
The broken plural forms of (24e) and (24f), respectively, and which are the only other items that have broken plural forms, show the same alternation as their singular forms, as illustrated in (28a) and (28b). The broken plural forms of unambiguous cases of affrication, on the other hand, surface invariably with [ɡ], as shown in (28c-g). (28) Doublets a. ʕittaɡ/ʕittaǰ/ʕtɑɑɡ/ʕtɑɑǰ b. ɣimmaɡ/ɣimmaǰ COMPARE: Cases of affrication c. ɡrɑɑb d. dɡɑɑɡ e. ɡlɑɑl f. rɡɑɑɡ g. ryuuɡ
ʕatiiɡ/ʕatiiǰ ɣɑɑmiɡ/ɣɑɑmiǰ
‘old’ ‘dark’
ɡiriib/ǰiriib diɡiiɡ/diǰiiǰ ɡiliil/ǰiliil riɡiiɡ/riǰiiǰ riiɡ/riiǰ 17
‘near’ ‘thin/tiny’ ‘small quantity’ ‘transparent’ ‘saliva’
In addition, verbs that are derivationally related to nouns/adjectives that constitute cases of unambiguous affrication, exemplified in (29), surface invariably with the velar stop [ɡ], unlike the verbs that are related to the adjectives/nouns that I propose to be doublets, which are given in (24). (29) a. ɡarrab b. daɡɡaɡ 15
‘he became close’ ‘he made small’
Forms that do not display alternation are considered to be lexically represented as their surface representation (see Lexicon Optimization, Prince & Smolensky 2004). 16 The feminine plural of this item surface invariably with [ǰ]: [rifǰ-ɑɑt] ~ [rifiiǰ-ɑɑt]. 17 The plural of this form means ‘breakfast’ which is etymologically, but maybe not synchronically, related to the singular (Johnstone 1978: 293).
170
EIMAN MUSTAFAWI
c. ɡɑḷḷ d. raɡɡaɡ e. raɡɡ f. tirayyaɡ
‘it became less’ ‘he made thin’ ‘it became thin/soft’ ‘he had breakfast’
Another difference between unambiguous cases of affrication and what I consider to be doublets is the comparative adjective, which may surface with either [ɡ] or [ǰ] in items that I consider to be doublets, as in (30a-b), but which surface invariably with the velar stop in adjectives that undergo affrication in the base form, as shown in (30c-f), which correspond to (4a-d). (30) Doublets a. ʔaʕtaɡ/ʔaʕtaǰ b. ʔa0maɡ/ʔa0maǰ COMPARE: Cases of affrication c. ʔaɡrɐb d. ʔadaɡɡ e. ʔaraɡɡ f. ʔaɡɑḷḷ
‘older’ ‘darker’
‘closer’ ‘thinner/smaller’ ‘softer/more transparent’ ‘lesser’
Since the general pattern of the items that may undergo affrication is different from that of the items discussed in this section, I argue for treating them differently, and considering the latter to be doublets. Even of some if these doublets were the outcome of affrication in some stage of the history of the variety, they are not treated as such in the synchronic grammar of QA, as shown above. To investigate the process in its synchronic state it is best to observe and investigate the current output of the grammar of this variety, and not to rely solely on history (etymology) or comparison with other varieties. 6. Conclusion The final constraint ranking of /ɡ/ affrication in QA is given in (31).
AFFRICATION IN NORTH ARABIC REVISITED
171
(31) Constraint ranking summary FAITH-F
ǰ <--> [i(i)] MAX-OO (dorsal) MAX-OP (dorsal)
MAX-IO (dorsal) [ɡ]<--> ¬[-back, +high] stem
In this paper I propose an analysis of the affrication of /ɡ/ in one of the varieties of Arabic that exhibit this alternation, namely, QA, within the framework of Optimality Theory. Contrary to previous analyses of affrication in Arabic varieties, the process is found to be triggered only by adjacency to [i(i)], not any other front vowel. Further, other segments that occur adjacent to an underlying /ɡ/, including front vowels other than [i(i)], block affrication. The domain of affrication is found to be restricted to the stem, and the variable nature of the process is accounted for by having the markedness constraint [ɡ]<--> ¬ [-back, +high]stem and the faithfulness constraint MAX-IO(dorsal) crucially unranked with respect to each other. It is shown that the process does not apply to certain lexical classes due to paradigmatic effects that are active in these classes. That is, due to a dominant output-output faithfulness constraint affrication does not apply to broken plurals. Due to a dominant Optimal Paradigm-faithfulness constraint, the process does not apply to verbs, active participles or verbal nouns. Participles and verbal nouns are suggested to be added to verbal inflectional paradigms since they behave like verbs in QA, as well as in CA. The data support the predictions of the Optimal Paradigm Model with respect to the existence of inflectional paradigms that lack bases, such as the Arabic verbal paradigms. Apparent counterexamples are analyzed as doublets, in which case [ǰ] in the relevant forms is proposed to be the output of underlying /ǰ/ not /ɡ/.
172
EIMAN MUSTAFAWI
APPENDIX The phonemic inventory of QA a. Consonants Stops: b, t, ṭ, d, k, ɡ, q, ʔ Affricates: č, ǰ Fricatives: f, θ, ð, ð̣, s, z, ṣ, š, x, ɣ,19 ħ, h Nasals: m, n Liquids: l, ḷ Trills: r Approximants: j, w, ʕ b. Vowels Front vowels: i, ii, ee, a Back vowels: u, uu, oo, ɑɑ 20
REFERENCES Al-amadidhi, Darwish. 1985. Lexical and Sociolinguistic Variation in Qatari Arabic. Ph.D. dissertation, University of Edinburgh, Edinburgh. Al-Sulaiti, Latifa. 1993. Some Aspects of Qatari Arabic Phonology and Morphology. Ph.D. dissertation, University of Essex, Colchester. Al-yasoui, Rufail. 1996. Gharaebu Al-lughati Al-arabiyya. [The mysteries of the Arabic language]. Beirut: Dar Al-mashriq. Anttila, Arto. 1997. “Deriving Variation from Grammar”. Variation, Change and Phonological Theory ed. by Frans Hinskens, Roeland van Hout & W. Leo. Wetzels, 35-68. (=Current Issues in Linguistic Theory 146). Amsterdam & Philadelphia: John Benjamins. _____ & Young-mee Yu Cho. 1998. “Variation and Change in OptimalityTheory”. Lingua 104: 31-56. Auger, Julie. 2001. “Phonological Variation and Optimality Theory: Word-initial vowel epenthesis in Vimeu Picard”. Language Variation and Change 13:3.253-303. Benua, Laura. 1997. Transderivational Identity: Phonological relations between Words. Ph.D. dissertation, University of Massachusetts, Amherst. Bukshaisha, Fouzia. 1985. An Experimental Phonetic Study of Some Aspects of Qatari Arabic. Ph.D. dissertation, University of Edinburgh, Edinburgh. In QA, the back fricatives [x, ɣ] are uvular, not velar, as may be indicated by the transcription. 20 Bukshaisha (1985) suggests that the long low vowel is underlyingly /aa/. However, since this vowel never surfaces as [aa], and it always surfaces as a back vowel (Al-Sulaiti, 1993), I consider it to be phonemically /ɑɑ/ (see Lexicon Optimization, Prince & Smolensky 2004). 19
AFFRICATION IN NORTH ARABIC REVISITED
173
Burzio, Luigi. 1994. Principles of English Stress. Cambridge: Cambridge University Press. Cantineau, Jean. 1936. “Études sur quelques parlers de nomades arabes d’Orient”. A. I. E. O. II: 1-118. _____. 1937. “Études sur quelques parlers de nomades arabes d’Orient”. A. I. E. O. III: 119-236. Côté, Marie-Hélène. 2000. Consonant Cluster Phonotactics: A perceptual Approach. Ph.D. dissertation, MIT, Cambridge. Gafos, Adamantios. 2003. “Geenberg’s Asymetry in Arabic: A consequence of stems in paradigms”. Language 79:2.317-357. Ghazeli, Salem. 1977. Back Consonants and Backing Coarticulation in Arabic. Ph.D. dissertation, University of Texas, Austin. Hayes, Bruce. 1998. “Gradient Well-formedness in Optimaliy Theory”. Available: http://www.linguistics.ucla.edu/people/hayes/index.htm#phonetics. Hussain, Abdulla. 1985. An Experimental Investigation of Some Aspects of the Sound System of the Gulf Arabic Dialect with Special Reference to Duration. Ph.D. dissertation. University of Essex, Colchester. Johnstone, T. M. 1967. Eastern Arabian Dialect Studies. London: Oxford University Press. _____. 1978. “The Affrication of ‘Kaf’ and ‘Qaf’ in the Arabic Dialects of the Arabian Peninsula”. Readings in Arabic Linguistics ed. by Salman Al-Ani, 285-303. Bloomington: Indiana University. Kenstowicz, Michael. 1997. “Base Identity and Uniform Exponence: Alternative to cyclicity”. Current Trends in Phonology: Models and methods ed. by J. Durand & B. Laks, 363-393. Salford: University of Salford. Also available online: http://roa.rutgers.edu/view.php3?id=31. _____. 1998. “Uniform Exponence: Exeplification and extension”. Available online: http://roa.rutgers.edu/view.php3?id=230 Mahadin, Radwan. 1989. “Doublets in Arabic: Notes towards a diachronic phonological study”. Language Sciences 11:1.1-25. Matar, Abdulaziz. 1969. Khasaaisu Al-lahjati Al-kuwaitiyya. [The peculiarities ofthe Kuwaiti dialect]. Kuwait: Matab Al-risaala. _____. 1985. Al-saala Al-arabiyya fi Lahjaati Al-xaliij. [The Arabic authenticity in the dialects of the Gulf]. Riyadh: Dar Aalam Al-kutub Lil-nashr w Al-tawzi. McCarthy, John. 1992. “On Affricates”. Available online: courses.umass.edu/
ling730/on_affricates_1992.pdf _____. 2005. “Optimal Paradigms”. Paradigms in Phonological Theory ed. by Laura J. Downing, T. Alan Hall & Renate Raffelsiefen, 170-210. Oxford: Oxford University Press. McCarthy, John & Alan Prince. 1990. “Foot and Word in Prosodic Morphology: The Arabic broken plural”. Natural Language and Linguistic Theory 8:209283. _____. 1995. “Faithfulness and Reduplicative Identity”. University of Massachusetts Occasional Papers in Linguistics 18: Papers in Optimality Theory ed. by Jill Beckman, Laura Walsh Dickey & Suzanne Urbanczyk, 249-384. Mustafawi, Eiman. 2006. An Optimality Theoretic Analysis of Variable Phonological Alternations in Qatari Arabic. Ph.D. dissertation, University of Ottawa, Ottawa. Prince, Alan & Paul Smolensky. 2004. Optimality Theory: Constraint interaction
174
EIMAN MUSTAFAWI
in Generative Grammar. Malden, MA: Blackwell. Smith, Jennifer. 1997. “Noun Faithfulness: On the privileged behavior of nouns in phonology”. Available online: http://roa.rutgers.edu/files/242-0198/roa-242smith-5.pdf. Steriade, Donca. 2000. “Paradigm Uniformity and the Phonetics-Phonology Boundary”. Papers in Laboratory Phonology V: Acquisition and the lexicon ed. by Michael B. Broe & Janet B. Pierrehumbert, 313-334. Cambridge & New York: Cambridge University Press. Wright, W. 1967. A Grammar of the Arabic Language. Vol. 1. Cambridge: Cambridge University Press. Wright, W. 1967. A Grammar of the Arabic Language. Vol. 2. Cambridge: Cambridge University Press.
THE SYNTAX OF COMPLEX TENSE IN MOROCCAN ARABIC*
Hamid Ouali & Catherine Fortin University of Michigan
In this paper, we provide a Minimalist analysis of the syntax of complex tense in Moroccan Arabic (MA). We argue that in the clause structure of MA, tense and aspect are separate syntactic heads and cannot be conflated into a single, multi-purpose head (contra e.g., Fassi Fehri 1993). As evidence, we demonstrate that selectional restrictions exist between tense and imperfective/perfective verb stems; we then show that these restrictions also exist for passive, causative and reflexive stems. We further argue that MA complex tense clauses, which consist of an auxiliary (KAN) and a verb stem, are biclausal; both auxiliary and lexical verb are fully inflected for tense and aspect. However, when compared to regular embedded clauses, the matrix domain in complex tense sentences is shown to be defective, as it lacks a vP. Complex tense sentences are further contrasted with ECM (exceptional case-marking) sentences, which are shown to contain two vPs, but are likewise defective as they lack a TP in the embedded domain. 1. Tense and Aspect in Moroccan Arabic MA expresses aspect through the phonological realization of agreement markers and their position with respect to the verb stem and tense through a prefix. We argue that Tense (T) and Aspect (Asp) *
We thank Acrisio Pires, Sam Epstein and audiences at NACAL 33, ACAL 36 and ALS 19 for much useful discussion.
176
OUALI & FORTIN
correspond to different projections in the syntactic structure of MA (following in this respect a number of scholars like Benmamoun 2000, and contra other scholars including Fassi Fehri 1993), as shown in (1). (1)
TP T’ T
AspP Asp’ Asp
Unlike Standard Arabic (SA), MA expresses no mood distinctions morphologically; for this reason, we set aside the question of whether mood is syntactically represented in MA clause structure. (2)
ya-drus-u (SA) 3M-study.IMP-IND “He studies/is studying.”
(3) y-drәs (MA) 3M-study.IMP “He studies/is studying.”
1.1 Selectional Restrictions As support for the clause structure we propose in (1), we will show that the distribution of imperfective and perfective stems in MA is governed by selectional restrictions with respect to tense. In MA, tense is represented by a prefix, while aspect is morphologically encoded by the position and phonological realization of the agreement marking on the verb. Agreement on imperfective verbs is realized as both a prefix and a suffix (4b), while agreement on perfective verbs is realized as a suffix only (4a). (4) a. Ø
/*ka
/*0a le'b-u ASTPRES /*FUT play.PERF-3PL “They played” b. *Ø /ka /0a y-le'b-u *PAST /PRES /FUT 3-play.IMP-PL “They are playing/will play” PAST
THE SYNTAX OF COMPLEX TENSE IN MOROCCAN ARABIC
177
As illustrated in (4), perfective stems are only compatible with past tense, while imperfective stems are only compatible with present and future tense morphology. This is due to a selectional restriction between null past and perfective stem. Imperfective stems are ‘default’ and appear in all other environments. The following table illustrates the restricted selection between past and perfective and the default nature of imperfective with present and future.1
VERBAL ASPECTUAL FORMS PERFECTIVE
ka/ta (PRESENT) *
TENSE AFFIXES 0a (FUTURE) *
Ø (PAST) past
progressive/ Future * habitual present Table 1. Selectional Restrictions between Past Tense and Perfective Aspect IMPERFECTIVE
1.2 Previous Analysis of MA Tense and Aspect: Benmamoun 2000 We next briefly consider a previous analysis of MA tense and aspect, that of Benmamoun 2000. Benmamoun proposes a unified analysis of SA, MA and Egyptian Arabic clause structure. We follow him in claiming that Asp0 and T0 are syntactically separate. However, he argues that MA ka/ta are aspectual (imperfective) clitics, located in Asp0, which carry no tense information, as shown in (5). (5)
Asp Asp
VP
ka/ta
…
…
[irrelevant details omitted] Benmamoun (2003: 33)
These morphemes have a habitual and progressive interpretation, and cliticize to the imperfective verb. Given this line of argument, one 1
An anonymous reviewer raises the question of whether it is a coincidence that past T is the only T to select the perfective aspect and the only marker that is null and suggests that perfective verb forms do indicate tense and may raise to T. If Perfective was marked for tense, as the reviewer claims, and would have to move to T to check its Tense feature, it is not clear why the verb does not move to T to check tense in present. Unless the reviewer clarifies that her/his objection does not hold.
178
OUALI & FORTIN
would predict that the future tense marker 0a, which Benmamoun 2000 argues to be merged under T0 and in chapter 2 to be used with verbs in the imperfective form, to co-occur with the aspectual morphemes ka/ta to generate a future habitual/progressive reading. Notice that, according to his arguments, 0a is merged in To and ka/ta is merged in Aspo, as shown in (5), and both are only compatible with an imperfective verb. However, the prediction that 0a and ka/ta may co-occur is not borne out, as shown in (6). (6)
0a
(*ka)
FUT
y-lә'b-u 3-play.IMP-PL
“They will play”
We argue instead that ka/ta are present tense morphemes that are merged in To. We also argue, agreeing with Benmamoun, that the future maker 0a is a tense marker and is merged in To. The fact that ka/ta and 0a cannot co-occur is accounted for in the same manner as why English will and –ed cannot co-occur (that is, both will and –ed are merged in T0).2 We follow Benmamoun in arguing that the vowel melody in MA plays no role in realizing tense or aspect (and, in effect, that MA does not need be analyzed as a templatic language).3 We also follow Benmamoun in arguing that the vocalic melody does not represent voice (active v. passive) either. As shown in (7), the vocalic melody in MA stems does not change whether voice is active or passive; MA uses a prefix (t-) to express passive/reflexive/middle voice.
2
The only context where ka/ta can co-occur is in complex-tense constructions which involve using the copula kan “be” with the main verb, and which we argue are bi-clausal, as discussed in detail in section 2. 3 In contrast, Fassi Fehri’s (1993) analysis of SA clause structure, shown in (i), holds that Voice/Aspect/Tense are merged together into a single morpheme within the IP domain. The IP domain also contains Agreement and Mood. Note that, for Fassi Fehri, tense markers are vowels that merge in the stem of the verb, or prefixal/suffixal consonants which are strictly internal to the word. (i)
[CP [ModP [IP (Voice/Aspect/Tense; Agr; Mood) [VP
THE SYNTAX OF COMPLEX TENSE IN MOROCCAN ARABIC
(7)a.
Ø
Ø
PAST
klә-Ø b. eat.PERF-3M “He ate.”
PAST
179
t-klә-Ø
PASS-eat.PERF-3M “It was eaten.”
1.3 Passives, Causatives & Reflexives Passive, reflexive and causative stems provide further evidence for our claim that tense and aspect are syntactically distinct in MA. The selectional restrictions described above for perfective and imperfective stems are also observed by causative, passive, or reflexive stems. Past tense selects for perfective aspect, while present/future tense selects for imperfective aspect. As in the case of imperfective and perfective stems, with causative, passive and reflexive stems, the aspect of the verbal stem is signaled by agreement only. 1.3.1 Reflexives The reflexive stem (t-'ang “hug each other”) is phonologically invariant, whether its aspect is perfective, as in (8a), or imperfective, as in (8b); aspect is identifiable only by the position and phonological shape of the agreement marking, as described in Section 1.1.4 (8) a.
Ø
/*ka /*0a t-'ang-u /*PRES /*FUT REFL-hug-3PL “They hugged.” *Ø /ka /0a y-t-'ang-u *PAST /PRES /FUT 3-REFL-hug -PL “They are hugging/will hug.” PAST
b.
4
An anonymous reviewer objects to our use of 3rd person plural throughout. We use it for consistency reasons only; our claim that the aspect of the verbal stem is signaled by agreement still holds regardless of what person and number is involved: (i)
(ii)
ka n-t-'ang-u PRES 1-REFL-hug-PL “we hug each other” Ka t-t-'ang-u PRES 2-REFL-hug-PL “you hug each other”
180
OUALI & FORTIN
1.3.2 Passives The passive stem (t-'ǝng “was hugged”) is phonologically invariant, whether its aspect is perfective, as in (9a) or imperfective, as in (9b); once again, aspect is identifiable only by the agreement markers. Note that reflexives and passives are formed with the same prefix, t-. However, reflexives can be distinguished from passives according to the shape of the stem vowel; the reflexive stems contain a full vowel, the passive stems a reduced vowel. /*ka /*0a t-'әng-u /*PRES /*FUT PASS-hug-3PL “They were hugged.” *Ø /ka /0a y-t-'әng-u *PAST /PRES /FUT 3-PASS-hug-PL “They are being/will be hugged.”
(9)a. Ø
PAST
b.
1.3.3 Causatives The causative stem (w-kkәl “cause to eat”) is also phonologically invariant, whether its aspect is perfective, as in (10a), or imperfective, as in (10b); aspect is identifiable only according to the phonological realization of the agreement markers. (10)a. Ø
/*ka /*0a wkkәl-Ø-ha /*PRES/*FUT cause.eat-3SM-her “He made her eat.” *Ø /ka /0a y-wkkәl-ha *PAST /PRES /FUT 3SM-cause.eat-her “He makes /will make her eat.” PAST
b.
In section 1, we’ve shown that the vocalic melody in MA appears to carry no information about tense and aspect. Aspect is signaled by the position and shape of the agreement markers on the verb stem. Tense is a prefix, which selects for a certain aspectual form: past selects perfective aspect, while present and future both select imperfective aspect. Therefore, we’ve argued that aspect and tense must be represented separately in MA clause structure. In section 2, we turn to our analysis of complex tense constructions in MA.
THE SYNTAX OF COMPLEX TENSE IN MOROCCAN ARABIC
181
2. Complex Tense vs. ECM Constructions Having established that tense is morphologically marked in MA and that there is a selectional restriction between the tense marker and the aspectual form of the verb, we now examine how complex tense is expressed in this language and analyze the syntax of sentences such as in (11). (11)
y-kun-u ka y-le'b-u 3-be.IMP-PL PRES 3-play.IMP-PL “They will be playing” 0a
FUT
In such sentences the verb co-occurs with the copula kan “be”. Both the copula and the main verb are preceded by a tense marker, contrary to what we find in ECM (exceptional case-marking) constructions such as (12), where the embedded verb cannot be preceded by a tense marker. (12)
Ø
bghi-t-hum want.PERF-1SG-them “I wanted them to play” PAST
(*ka) y-le'b-u (*PRES) 3-play.IMP-PL
Both complex tense constructions and ECM constructions can be contrasted with regular embedded clauses, as in (13). Unlike complex tense constructions, regular embedded clauses license their own subject, different from the subject licensed in the matrix clause; unlike ECM constructions, this subject must bear nominative case. (13)
Ø
ka
PAST
gal-t Fatima bәlli huma say.PERF-1SG Fatima that they “Fatima said that they are playing/ they play”
PRES
y-lə'b-u 3-play.IMP-PL
We propose here that the structure of complex tense clauses is biclausal, as illustrated in (14). (14)
Complex tense clauses: [TP [AspP [VP BE [TP [AspP [vP [VP main verb no vP in matrix domain
This explains the fact that both the main verb and the copula are inflected for aspect and agreement and preceded by a tense marker. Complex tense BE selects ϕ-complete TP. In ECM structures, however,
182
OUALI & FORTIN
the matrix WANT selects for imperfective aspect as shown in (15), and therefore the embedded clause does not contain its own tense. (15)
WANT-type
(ECM) clauses: [TP [AspP [vP [VP WANT [AspP [vP [VP no TP in embedded clause
Consequently, perfective aspect is not licensed in the embedded clause of ECM constructions. We only find perfective aspect in the embedded clause when it is selected by T0, and specifically past T0, as discussed in the previous section. Imperfective aspect must be selected by future or present T0 or ECM verbs, which makes a default form as discussed in section 1. The clause structure of complex tense clauses involves two TPs and a single vP (in the embedded domain) as illustrated in (14). Conversely, the clause structure of an ECM sentence involves only a single TP (in the matrix domain) but two vPs as shown in (15), unlike regular embedded clauses, which involve two TPs and two vPs as shown in (16). (16)
Regular embedded clauses: [TP [AspP [vP [VP SAY [CP [TP [AspP [vP [VP
2.1 Complex Tense Constructions As previously mentioned, complex tense in MA is expressed by using a copula with the main verb. The copula is preceded by a tense marker, and so is the main verb. In the examples in (17), the copula is preceded by a past tense morpheme. The main verb is preceded by a (null) past tense marker in (17a), and the sentence has past perfect interpretation; by a present tense marker in (17b), and the sentence has a past progressive interpretation; and by a future marker in (17c), and the sentence has a future in the past interpretation. The interpretation of the embedded tense head is dependent on matrix—deictic—tense (i.e. anaphoric to matrix tense; see, e.g., Stowell 1996 and Fassi Fehri 2004 for Standard Arabic). (17) a. Past Perfective Ø kan-u PAST be.PERF-3P “They had played”
Ø PAST
lә'b-u play.PERF-3PL
THE SYNTAX OF COMPLEX TENSE IN MOROCCAN ARABIC
183
b. Past (Perfective) Progressive Ø kan-u ka y-lә'b-u PAST be.PERF-3P PRES 3-play.IMP-PL “They were playing” c. Future in the Past Ø kan-u 0a y-lә'b-u PAST be.PERF-3P FUT 3-play.IMP-PL “They were going to play”
In (18a-c), the copula is preceded by a future tense marker, and the main verb combines with each of the different tense markers (namely past, present and future), resulting in future perfective, future progressive, and future in the future interpretations, respectively.5 (18) a. Future Perfective 0a y-kun-u Ø FUT 3-be.IMP-PL PAST “They will have played” b. Future Progressive 0a y-kun-u ka FUT 3-be.IMP-PL PRES “They will be playing” c. Future in the future 0a y-kun-u 0a FUT 3-be.IMP-PL FUT “They will be about to play”
lә'b-u play.PERF-3PL y-lә'b-u 3-play.IMP-PL y-lә'b-u 3-play.IMP-PL
As expected, complex tense constructions are also permitted with embedded causatives, as shown in (19a-c) and (20a-c), as well as with reflexives and passives, which are not shown. (19) a. Past Perfective Causative Ø kan-Ø Ø wkkәl-Ø-ha PAST be.PERF-3SM PAST Cause.eat-3SM-her “He made her eat.” b. Past Perfective (Progressive) Causative Ø kan-Ø ka y-wkkәl-ha PAST be.PERF-3SM PRES 3SM-Cause.eat -her “He was/has been making her eat.” c. Future in the Past Causative Ø kan-Ø 0a y-wkkәl-ha PAST be.PERF-3SM FUT 3SM- Cause.eat -her “He was about to/going to make her eat.” 5
Although complex tenses formed with a present tense auxiliary are possible, in practice they appear to be blocked by simple tense expressions.
184
OUALI & FORTIN
(20) a. Future Perfective Causative 0a y-kun Ø wkkәl-Ø-ha FUT 3SM-be.IMP PAST Cause.eat-3SM-her “He will have made her eat.” b. Future Progressive Causative 0a y-kun ka y-wkkәl-ha FUT 3SM-be.IMP PRES 3SM-Cause.eat-her “He will be making her eat.” c. Future in the Future Causative 0a y-kun 0a y-wkkәl-ha FUT 3SM-be.IMP FUT 3SM- Cause.eat-her “He will be about to make her eat.”
In the next section, we will propose a detailed analysis of the syntax of these complex tense constructions. 2.1.1 The structure of complex tense clauses6 The clause structure in (21) repeats the structure we offered for complex tense structures in (14). We propose that the matrix copula BE selects TP. This embedded T, like the matrix T, is ϕ-complete, i.e. it is marked for ϕ-features and tense.7 This is demonstrated by the full tense marking on both embedded and matrix verbs. The embedded domain contains a vP, but the matrix domain does not. This is unlike WANT-type clauses, which do contain a vP in the matrix domain and which we will revisit in section 2.2. (21)
Complex tense clauses: [TP [AspP [VP BE [TP [AspP [vP [VP main verb no vP in matrix domain
The matrix domain clearly does not contain a vP, since complex tense clauses do not project an independent external argument. The copula kan in BE clauses can only license one subject as shown in (22) vs. (23), although as (23) shows the subject can occur in a variety of positions.
6
See Fassi Fehri 2004 for an approach to complex tense in Standard Arabic which is, in some respects, analogous to the one proposed here, as he also appeals to multiple TPs. 7 Here we realize that there is discrepancy between our use of φ-completeness and that of Chomsky (2000) for whom a φ-complete T entails that it is selected by C.
THE SYNTAX OF COMPLEX TENSE IN MOROCCAN ARABIC
(22) (23)
185
* l-bnat… 0a y-kun-u l-wlaad ka y-lә'b-u The-girls FUT 3-be.IMP-PL the-boys PRES 3-play.IMP-PL (l-bnat) 0a.. y-kun-u.. (l-bnat) ka y-l'b-u (l-bnat) (the-girls) FUT 3-be.IMP-PL (the-girls) PRES 3-play.IMP-PL (the-girls) “The girls will be playing”
No matter what position in the sentence the subject ends up in, it is always marked for nominative case. The copula kan cannot assign accusative Case to the embedded subject, as shown in (24), where the subject is a pronoun in the accusative form. For the sentence to be grammatical the pronoun (subject) has to be in the nominative form, as illustrated in (25). (24)
* ɣa
y-kun-u-hum 3-be.IMP-PL-them ɣa y-kun-u huma FUT 3-be.IMP-PL they “They will be playing” FUT
(25)
ka PRES
ka PRES
y-lә'b-u 3-play.IMP-PL y-lә'b-u 3-play.IMP-PL
Since complex tense clauses contain two TPs, it is predicted that such clauses would allow negation to surface in two different positions. This prediction is borne out. There is no semantic difference between the two, i.e. the scope of negation does not change, regardless of whether negation dominates both TPs as in (26a), or just the lower TP as in (26b).8 y-kun-u-ʃ Ø mʃa-w daba 3-be.IMP-P-NEG PAST leave.PERF-3P now “They will not have left now/by now” 0a y-kun-u ma Ø mʃ a-w-ʃ daba FUT 3-be.IMP-P NEG PAST leave.PERF-3P-NEG now “They will have not left now/by now”
(26)a. ma
NEG
b.
8
0a (*ma)
FUT
NEG
This might suggest that vP is the event domain: as there is only one vP in complex tense clauses, there is no difference in scope of negation. As will be shown below, the situation is different in ECM constructions, which contain two vPs, hence two event domains. Whether negation is higher than the embedded vP only or higher than the matrix vP, results in a difference in scope of negation. A full analysis of event structure of MA is beyond the scope of this paper, but see Travis (2000) for a proposal of the syntactic representation of event structure that is compatible with our analysis of MA clause structure.
186
OUALI & FORTIN
Let us now analyze the syntax of ECM constructions, which are normally biclausal, and see what sets them apart from the complex tense structures. 2.2 Want-type (ECM) Constructions 2.2.1 The Structure of ECM Constructions As we proposed above, and as represented in (27), matrix WANT selects for imperfective aspect, regardless of matrix tense. Matrix WANT cannot select for perfective aspect. (27)
WANT-type
(ECM) clauses: [TP [AspP [vP [VP WANT [AspP [vP [VP no TP in embedded clause
We argue that perfective aspect on the embedded verb cannot be licensed because perfective must be selected by embedded past T, regardless of main clause tense. Since the embedded clause contains no TP, perfective aspect is impossible as shown in (28) through (30). (28)
Ø
b0a-Ø-ha t-akul / * kl-at want.PERF-3SM-her 3SF-eat.IMP / eat.PERF-3SF “He wanted her to eat.” ka y-b0iha-ha t-akul / * kl-at PRES 3SM-want.IMP-her 3SF-eat.IMP / eat.PERF-3SF “He wants her to eat.” 0a y-bɣiha-ha t-akul / * kl-at FUT 3SM-want.IMP-her3SF-eat.IMP / eat.PERF-3SF “He will want her to eat.” PAST
(29)
(30)
We argue that WANT-type verbs, unlike complex tense constructions, both license embedded subjects and assign/value their (accusative) Case. Consequently, both the embedded domain and the matrix domain contain vPs, each of which licenses an external argument. The examples in (31) and (32) illustrate that the subject of the embedded clause must bear accusative case. (31)
Ø
b0i-t-hum y-lә'b-u want.PERF-1SG-them 3-play.IMP-PL “I wanted them to play” PAST
THE SYNTAX OF COMPLEX TENSE IN MOROCCAN ARABIC
(32)
* Ø PAST
b0i-t want.PERF-1SG
huma they
187
y-lә'b-u 3-play.IMP-PL
As in complex tense clauses, there are two positions available for negation: surrounding the matrix verb, as in (33), and surrounding the embedded verb, as in (34). However, each position corresponds with a different interpretation; that is, in ECM sentences, scope of negation does depend upon its surface position. Negation takes wide scope in (33), and narrow scope in (34). (33)
Ø
ma-b0i-t-hum-ʃ y-le'b-u NEG-want.PERF-1SG-them-NEG 3-play.IMP-PL “I didn’t want them to play” Ø b0i-t-hum ma-y-lә'b-u- ʃ PAST want.PERF-1SG-them NEG-3-play.IMP-PL-NEG “I wanted them not to play” PAST
(34)
2.2.2 A puzzle: A complementizer in ECM clauses? Embedded clauses in WANT-type sentences are optionally headed by (what appears to be) a complementizer, baʃ, as shown in (35). (35)
Ø
b0a-Ø-ha (baʃ) want.PERF- CSM-her “He wanted for her to eat.” PAST
t-akul 3SF-eat.IMP
This poses a puzzle for our claim that WANT-type verbs in MA select AspP, as these embedded clauses do appear to be (optionally) headed by a C0. However, we will show that baʃ is not a true complementizer, and hence does not pose a difficulty for our analysis. First, the optional C0 in WANT-type clauses (baʃ) is not the same C0 observed in regular embedded clauses (bəlli). (36)
Ø
gal-Ø Ali bәlli say.PERF-3SM Ali that “Ali said that Meriam left.” * gal Ali baʃ Meriam mʃat. PAST
(37)
(Meriam) (Meriam)
mʃ-at leave.PERF-3SF
Additionally, WANT-type verbs cannot select bәlli.
(Meriam) (Meriam)
188 (38)
OUALI & FORTIN
Ø
b0a-Ø-ha Ali want.PERF-3SM-her Ali “Ali wanted Meriam to leave.” PAST
(*bәlli) (that)
t-әmʃi Meriam. 3SF-leave.IMP Meriam
Furthermore, baʃ does not block movement of the embedded subject clitic into the matrix clause, as would be expected if it were C0, as shown in (39). Nor does baʃ block accusative Case assignment from the matrix verb to the embedded subject, also shown in (39), and it does not appear to have any Case-assigning properties of its own, unlike for in English, as in (40). (39)
Ø
b0a-Ø-ha baʃ t-akul want.PERF-3SM-her that 3SF-eat.IMP “He wanted for her to eat.” I want very much *(for) John to go. PAST
(40)
Finally, baʃ cannot co-occur with tense. (41)
Ø
b0a-Ø-ha want.PERF-3SM-her “He wanted her to go.” PAST
baʃ (*ka/*0a) that (PRES/FUT)
t-emʃi 3SF-leave.IMP
As baʃ cannot co-occur with tense, baʃ cannot intervene in complex tense (BE) contexts. (42)
ɣa
ka
FUT
PRES
y-kun-u (*baʃ) 3-be.IMP-PL (that) “They will be playing”
y-lә'b-u 3-play.IMP-PL
While the true nature of baʃ remains an open question, we have established that it is not a complementizer and hence does not present a problem for our claim that ECM verbs select AspP. 3. Conclusion To summarize the main theoretical contributions of our paper, we have argued that tense and aspect are distinct in the clause structure of MA. Tense and Aspect are projected separately in the structure of MA. We have shown that the properties of MA complex tense clauses, in which both the auxiliary and lexical verb are fully inflected for tense, aspect, and agreement, are accounted for with a biclausal structure. Complex tense clauses contain two complete TPs, and, as they license a
THE SYNTAX OF COMPLEX TENSE IN MOROCCAN ARABIC
189
single external argument, a single vP, in the embedded domain. The properties of MA ECM clauses can similarly be accounted for with a biclausal structure that differs from that of the complex tense clauses in two ways. The embedded verb in ECM clauses is not marked for tense because the embedded domain lacks a TP; the ECM verb selects AspP. Given that two external arguments are licensed in ECM clauses, two vPs are present, one in the matrix domain and another in the embedded domain. Both complex tense clauses and ECM clauses have been further contrasted with regular embedded clauses, which are maximally a CP and which are not structurally defective in any way.
REFERENCES Benmamoun, Elabbas. 2000. The Feature Structure of Functional Categories. New York: Oxford. Chomsky, Noam. 2001b. “Derivation by Phase”. Ken Hale: A life in language, ed. by Michael Kenstowicz. Cambridge: MIT Press. _____. 2004. “Beyond Explanatory Adequacy”. Structures and Beyond: Current issues in the theory of language, ed. by Adriana Belletti. Oxford: Oxford University Press. Fassi Fehri, Abdelkader. 1993. Issues in the Structure of Arabic Clauses and Words. Dordrecht: Kluwer. _____. 2004. “Temporal/aspectual Interaction and Variation across Arabic Heights”. The Syntax of Time, ed. by Jacqueline Guéron & Jacqueline LeCarme. Cambridge: MIT Press. Ouali, Hamid & Acrisio Pires. To appear. “Complex Tense, Agreement and whextraction”. Proceedings of the 31st Annual Meeting of the Berkeley Linguistics Society (2005). Stowell, Tim. 1996. “The Phrase Structure of Tense”. Phrase Structure and the Lexicon, ed. by Johan Rooryck & Laurie Zaring. Dordrecht: Kluwer. Travis, Lisa. 2000. “Event Structure in Syntax”. Events as Grammatical Objects, ed. by Carol Tenny & James Pustejovsky. Stanford: CSLI Publications.
ON AGREE AND POSTCYCLIC MERGE IN SYNTACTIC DERIVATIONS FIRST CONJUNCT AGREEMENT IN STANDARD ARABIC REVISITED*
Usama Soltan University of Maryland, College Park
1. Introduction The investigation of agreement phenomena has been at the heart of syntactic theorizing within the generative tradition during the past two decades or so. Central to this research project has always been the question of what built-in mechanisms in the grammar are needed to account for agreement in natural languages. In the GOVERNMENTBINDING (GB) literature (see, for example, Chomsky 1981), two main mechanisms were typically invoked: the Spec-head configuration and the notion of government, a duality of devices that became theoretically undesirable under the assumptions of the post-GB MINIMALIST PROGRAM (MP) for linguistic theory (Chomsky 1993, 1995), where all agreement/Case-assignment is accounted for in terms of the Spec-head configuration, with the notion of government being entirely eliminated *
For their valuable questions, comments, and suggestions, I would like to extend my thanks to Mark Baker, Elabbas Benmamoun, Cedric Boeckx, Tomohiro Fujii, Norbert Hornstein, Anthony Kroch, Howard Lasnik, Andrew Nevins, Hamid Ouali, Milan Rezac, Norvin Richards, Juan Uriagereka, and the audience at the Second ECO5 Syntax Workshop held at University of Maryland, College Park in the spring of 2004, the participants at the Workshop on Minimalist Theorizing held at Indiana University, Bloomington, in the summer of 2004, as well as the audience at the 19th Arabic Linguistics Symposium, held at the University of Illinois, UrbanaChampaign, in the Spring of 2005. Special thanks are due to Elabbas Benmamoun for his encouragement and patience during my writing of that paper. It goes without saying that all mistakes or shortcomings in this paper are entirely my responsibility.
192
USAMA SOLTAN
from the theory of grammar. A more recent approach (Chomsky 2000, 2001a, 2001b), however, treats agreement not as a reflex of a phrase structure theoretic relationship, but as the result of a primitive built-in operation of the grammar, call it AGREE, whereby an agreement relation between two elements within the structural hierarchy of a sentence can be established at a distance, though still subject to certain locality considerations (cf. section 5 below for a more articulated formulation of how the operation AGREE works in syntactic derivations).1 In this paper, I revisit the phenomenon of FIRST CONJUNCT AGREEMENT (FCA, henceforward) with data from Standard Arabic (SA), showing that FCA provides further evidence for the operation AGREE in the grammar and against the Spec-head approach to agreement phenomena. The paper is organized as follows. Section 2 presents the facts of FCA in SA and how they relate to the general phenomenon of the subject-verb agreement asymmetry in the language. An earlier analysis of FCA in terms of Spec-head agreement is then discussed in section 3, where empirical arguments are presented against such an account of FCA. In section 4, I articulate the analysis of the agreement asymmetry between conjoined subjects in pre- and postverbal position in terms of interface conditions governing the occurrence of pro in null subject languages, along the lines suggested in Soltan (2006). In section 5, I present a minimalist analysis of FCA in terms of the interaction between AGREE and postcyclic Merge of adjuncts (the latter operation independently argued for in the literature to account for classical LF effects), whereby FCA is accounted for as a PF effect of postcyclic Merge of conjunction phrases. Section 6 sums up the conclusions of the paper.
1
For a more elaborate discussion of how each of these two distinct approaches to agreement fares conceptually and empirically, see Soltan (2006, 2007). See also Hornstein (2005), where arguments are made in favor of the account of agreement in terms of phrase structure theoretic relations resulting from the primitive operations of Concatenate and Merge. A discussion of this latter approach is presented in Soltan (2007).
ON AGREE AND POSTCYCLIC MERGE IN SYTACTIC DERIVATIONS
193
2. First Conjunct Agreement in Standard Arabic2 In SA, FCA is obligatory in VS orders where the subject is a conjoined DP, as shown by the contrast between (1a) and (1b) in the gender infection on the verb, overtly manifest in the case of feminine gender. ʒaa;a came-3sgmas b. ʒaa;a-t came-3sgfem
(1) a.
Zayd-un Zayd-nom Hind-u Hind-nom
wa and wa and
Hind-u Hind-nom Zayd-un Zayd-nom
Full agreement with the whole conjoined DP is not possible in this context, as the ungrammaticality of the dual morpheme in (2) illustrates: (2)
*ʒaa;-aa came-3dumas
Zayd-un Zayd-nom
wa Hind-u and Hind-nom
But SA also allows a conjoined DP to precede the verb, in which case full agreement, not FCA, is the only possibility, as shown by the grammaticality contrast between (3a) and (3b) below: Zayd-un ʒayd-nom b. *Zayd-un ʒayd-nom
(3) a.
wa and wa and
Hind-u Hind-nom Hind-u Hind-nom
ʒaa;-aa came-3dumas ʒaa;a/ʒaa;a-t came-3sgmas/came-3sgfem
As is well known, this full-versus-partial agreement pattern associated with word order alternation is not confined to cases where the subject is a conjoined DP. Rather, SA exhibits this SUBJECT VERB-AGREEMENT ASYMMETRY (SVAA, henceforth) with lexical DPs as well: SV orders show full agreement between the preverbal DP and verb in all φfeatures (4a), while VS orders show only partial (i.e., gender) agreement (4b).3 No other mix-and-match of agreement pattern and word order is permissible (4c,d):4 2
The following abbreviations are used in the glosses of Arabic data. 1, 2, and 3=first, second, and third person, mas=masculine, fem=feminine; sg=singular; du=dual; pl=plural; nom=nominative; acc=accusative; gen=genitive (with genitive used loosely for all nonnominative and nonaccusative cases). 3 Throughout the paper I will use the abbreviations “VS” for constructions with a postverbal DP, and “SV” for constructions with a preverbal DP. As the reader will
194
USAMA SOLTAN
(4) a. b. c. d.
;al-;awlaad-u qara;-u the-boys-nom read 3plmas qara;a ;al-;awlaad-u read 3sgmas the-boys-nom *;al-;awlaad-u qara;a the-boys-nom read 3sgmas *qara;-uu ;al-;awlaad-u read 3plmas the-boys-nom
;al-dars-a the-lesson-acc ;al-dars-a the-lesson-acc ;al-dars-a the-lesson-acc ;al-dars-a the-lesson-acc
SV+full agreement VS+partial agreement *SV+partial agreement *VS+full agreement
The occurrence of the SVAA with conjoined DPs, as illustrated in (1) and (2), is thus expected to follow from the analysis of the SVAA in general. In Soltan (2006), I propose that the preverbal DP in SV orders does not arrive to its surface position via movement, but is instead base-generated there and linked to a null element pro in the VP-internal subject position (cf. Fassi Fehri (1993) and Demirdache (to appear) for a similar base-generation analysis of SV orders where agreement morphology is treated as an incorporated pronominal). I provide empirical evidence for the correctness of this analysis from the facts of agreement with pronominal subjects as well as from the contrast between VS and SV orders with regard to the semantics of each structure, interaction with wh-extraction, as well as the Case properties of postverbal and preverbal DPs. I present these below. One relevant fact about subject-verb agreement in SA that has been occasionally mentioned in the relevant literature is the lack of notice shortly, while the use of “S” for “subject” is uncontroversial for VS orders, this is not necessarily the case with SV orders, where the initial DP has been argued to be a topic, rather than a grammatical subject. I will present evidence below that this is indeed the case. 4 Agreement is “partial” in VS orders because even though the number feature surfacing on the verb is always singular in this context, the verb still shows gender agreement with the postverbal DP. In (4b) such gender agreement is not morphologically manifest, since the masculine agreement morpheme is null in this language. If the postverbal DP is feminine, a gender suffix (the traditionally called femininity marker –t) obligatorily appears on the verb, as the paradigm of data in (i) below illustrate: (i) a. b. c.
;al-fatayat-u the-girls-nom qara;a-t read-3sgfem *qara;a read-3sgmas
qara;-na read-3plfem ;al-fatayat-u the-girls-nom ;al-fatayat-u the-girls-nom
;al-dars-a the-lesson-acc ;al-dars-a the-lesson-acc ;al-dars-a the-lesson-acc
ON AGREE AND POSTCYCLIC MERGE IN SYTACTIC DERIVATIONS
195
asymmetry in agreement with pronominal subjects, whether these pronominals are null (which is the unmarked case) or overt, and whether these pronominals precede (5a) or follow (5b) the verb. Partial agreement in these contexts is impossible (5c) (EV=epenthetic vowel):5 (5) a. b. c.
(hum) qara;-uu they read 3plmas qara;-uu (hum-u) read 3plmas they-EV *qara;a hum-u read 3sgmas they-EV
;al-dars-a the-lesson-acc ;al-dars-a the-lesson-acc ;al-dars-a the-lesson-acc
SV+full agreement VS+full agreement *VS+partial agreement
The same agreement pattern holds with conjoined subjects where the first conjunct is a pronominal: As (6) shows, full agreement in person, number, and gender between the verb and the first conjunct pronominal is obligatory:6 (6) a. b.
ʒi;-tu ;anaa wa came-1sg I and ʒi;-na hunna came-3plfem theyFEM
Hind-u Hind-nom wa ;abaa;-u-hunna and fathers-nom-theirFEM
Such facts on agreement with pronominal subjects or conjoined subjects whose first conjunct is a pronominal seem to point to the descriptive generalization in (7): 5
Notice here that since SA is a null subject language, overtness of the pronominal subject is a marked option and is always associated with emphasis/contrastive focus effects. In Soltan (2006), I argue that this overtness of a pronominal be treated as the result of a lexicalization requirement at the interface prohibiting focus/emphasis features from being associated with null elements. 6 Unlike the case with non-conjoined pronominal subjects (cf. fn. 5), overtness of the pronominal conjunct here is obligatory and does not correlate with any emphasis/contrastive focus effects: (i) a. b.
*ʒi;-tu pro wa Hind-u came-1sg and Hind-nom “Hind and I came.” *ʒi;-na pro wa ;abaa;-u-hun came-3plfem and fathers-nom-theirFEM “They(fem) and their fathers came.”
In Soltan (2006), I propose that overtness of a pronominal conjunct is enforced by an interface condition requiring phonological parallelism of coordinate structures.
196
USAMA SOLTAN
(7) Full agreement is always required when the subject is (or includes as a first conjunct) a pronominal, whether that pronominal is overt or null, and whether it occurs in pre- or postverbal position.
On the other hand, there is good empirical evidence that SV orders differ in several ways from their corresponding VS orders in their semantic, syntactic as well as Case properties. Semantically, SV orders have always been traditionally taken to represent TOPIC-COMMENT structures, involving what is sometimes called a “categorical” interpretation, whereby the preverbal DP is interpreted as topic of the discourse against which the event is presented, whereas their corresponding VS orders are assumed to denote the (default/unmarked) “thetic” interpretation, whereby an event is neutrally reported with the participants involved.7 As it turns out, this is supported by the fact that indefinite nonspecific NPs cannot occur preverbally in SA, as the ungrammaticality of (8a) below indicates (cf. Fassi Fehri 1993, Mohammad 2000, Demirdache (to appear)): (8) a. b.
*walad-un kasara boy-nom broke 3sgmas kasara walad-un broke 3sgmas boy-nom
;al-baab-a the-door-acc ;al-baab-a the-door-acc
This topic-like property of preverbal DPs in SV structures suggests that such DPs occupy a left-peripheral position in the sentence, in a way similar to LEFT-DISLOCATED (LD-ed, henceforward) elements, which also function as topics in syntactic structures.8 7
The thetic-categorical distinction is a traditional grammar notion that has been first revived within generative grammar in Kuroda (1972). Other research in generative syntax that has made use of this distinction includes Raposo & Uriagereka (1995), Basilico (1998), among others. 8 As already noted above, the analysis presented here has a lot in common with the so-called incorporation analysis of the SVAA, proposed independently by both Fassi Fehri (1993) and Demirdache (to appear), which is also in essence the classical analysis offered by Arabic traditional grammarians. For two convincing arguments against the incorporation analysis, see Benmamoun (2000). For how the analysis presented here differs from the incorporation analysis while escaping Benmamoun’s objections, see Soltan (2007). An alternative analysis of the SVAA in terms of postsynatctic merger between the subject (number feature) and the verb is argued for in Benmamoun (2000), a full discussion of which is beyond the scope of this paper, but see Soltan (2007) for a discussion.
ON AGREE AND POSTCYCLIC MERGE IN SYTACTIC DERIVATIONS
197
In addition to semantic differences, VS and SV orders differ with regard to their interaction with wh-movement: As Fassi Fehri (1993) points out, while extraction across a postverbal DP is nonproblematic, extraction across preverbal DPs is typically disallowed:9 (9) a. b.
man Daraba who hit 3sgmas *man Zayd-un who Zayd-nom “Who did Zayd hit?”
Zayd-un Zayd-nom Daraba hit 3sgmas
The contrast in wh-extraction between (9a) and (9b) could be explained if the preverbal DP in this language is actually sitting in an A'-position, unlike preverbal DPs in English-like languages, thus blocking whmovement under standard minimality assumptions.10 Wh-extraction facts thus indicate that the preverbal DP in SV orders is base-generated in its surface position in the sentence, rather than arriving there via movement from within the thematic domain.11 A third piece of empirical evidence for the A'-status of the position of the preverbal DP in SV structures in SA comes from the Case 9
As Elabbas Benmamoun (personal communication) points out, (9b) is acceptable in some of today’s Arabic dialects. Notice, however, that in most of today’s Arabic dialects, including those pointed out by Benmamoun, SV is the unmarked order. In addition, some of these dialects, e.g., Moroccan and Lebanese Arabic, do not exhibit the SVAA, as noted by Aoun, Benmamoun, & Sportiche (1994). If that is the case, an explanation for the absence of intervention effects in wh-questions in these dialects could be the result of SpecTP being an A- rather than A'-position. The parametric difference between SA and those dialects under this proposal lies then in a diachronic change of the status of SpecTP. For an elaborate discussion, see Soltan (2007). 10 That SpecIP may parametrically be an A'-position has been independently argued for by Mahajan (1990) for Hindi and Borer (1996) for Modern Hebrew. See also fn. 9 above. 11 Interestingly, if a resumptive pronoun occurs in object position, hence presumably signaling absence of a movement operation in the structure, the order “Wh DP V” becomes possible, assuming minimality is a condition on movement operations: (i)
man Zayd-un Daraba-hu who Zayd-nom hit 3sgmas-him “Who did Zayd hit?”
198
USAMA SOLTAN
properties of post- and preverbal DPs. Postverbal DPs uniformly appear with nominative case, whereas preverbal DPs appear with nominative case only in absence of an available Case assigner (e.g., an overt C of the ;inna-type or an Exceptional Case Marking (ECM) verb of the want-type). Consider the following data: (10) a. b. c.
(11) a. b.
qara;a ;al-;awlaad-u ;al-dars-a read 3sgmas the-boys-nom the-lesson-acc ;al-;awlaad-u qara;-uu ;al-dars-a the-boys-nom read 3plmas the-lesson-acc ;inna ;al-;awlaad-a qara;-uu ;al-dars-a C the-boys-acc read 3plmas the-lesson-acc “(I affirm that) The boys read the lesson." ;araad-a Zayd-un ;an ya-9hab-a ;al-;awlaad-u wanted-3sgmas Zayd-nom C leave-3sgmas the-boys-nom ;araad-a Zayd-un ;al-;awlaad-a ;an ya-9hab-uu wanted-3sgmas Zayd-nom the-boys-acc C leave-3plmas “Zayd wanted the boys to leave.”
The two sentences in (10a,b) show that both postverbal and preverbal DPs appear with nominative case. What (10c) shows, however, is that this is not always the case with preverbal DPs, since that DP obligatorily surfaces with (what is morphologically identical to) accusative case when preceded by a C of the ;inna-type. Similarly, in ECM constructions of the want-type, the embedded subject will appear with nominative case if it stays in situ (11a). By contrast, if the ECM subject appears preverbally, it will surface with accusative case assigned by the ECM verb (11b).12 These Case facts suggest that the nominative appearing on both preverbal and postverbal DPs is not the same: nominative case assigned to postverbal DPs is structural, whereas nominative case appearing on preverbal DPs is actually the 12
The assumption here, as it will be clear from the analysis presented in the following sections, is that preverbal ECM subjects like those in (11b) are basegenerated in their surface position (perhaps Spec of embedded CP), where they get assigned accusative case. A movement analysis of ECM subjects will face the problem of explaining why the ECM subject needs to move if it can get Caseassigned in situ, as shown by (11a). For that movement analysis to work, a mechanism of Case overriding is needed, such that the nominative case assigned earlier to the ECM subject is then overridden by the accusative case assigned later by the ECM verb. For a discussion of the theoretical and empirical problems encountering a movement analysis of ECM in Standard Arabic, see Soltan (2007).
ON AGREE AND POSTCYCLIC MERGE IN SYTACTIC DERIVATIONS
199
default case typically assigned to topics in this language in absence of any available lexical or structural Case assigner. That nominative is a default case in SA gains support from the Case properties of copular topic-comment constructions, where no overt verb occurs. In such structures, the so-called topic (and also the predicate if nominal or adjectival) will appear with nominative case: (12) a. b. c.
Zayd-un Zayd-nom Zayd-un Zayd-nom Zayd-un Zayd-nom
fii ;al-daar-i in the-house-gen mu'allim-un teacher-nom sa'iid-un happy-nom
Summarizing the discussion on the status of preverbal and postverbal DPs in SA, there is good empirical evidence in favor of the following descriptive generalization: (13) While postverbal DPs are noncontroversially subjects, preverbal DPs exhibit the semantic, syntactic and Case properties typically associated with topics/LD-ed elements.
Given the two descriptive generalizations in (7) and (13), I argue in Soltan (2006) that the asymmetry in agreement properties between preverbal and postverbal DPs is due to a structural difference between the two word orders, such that VS and SV sentences are assigned the following structures, respectively: (14) VS: [TP T+[v*+V] [v*P DP tv* [VP tV YP]]] (15) SV: [TP DP T+[v*+V] [v*P pro tv* [VP tV YP]]]
In VS structures the postverbal DP remains inside the VP, where it is still accessible for agreement with T in a manner yet to be made precise. In the SV orders, the preverbal DP is base-generated in SpecTP, arguably an A'-position in this language, whereas the VPinternal subject position is occupied by a null element pro that is associated with the preverbal DP, in the same fashion LD-ed elements are linked to a resumptive pronoun in the thematic domain. The same analysis should straightforwardly extend to cases where the subject is a
200
USAMA SOLTAN
conjoined DP, thereby accounting for the agreement asymmetries noted earlier with regard to the sentences in (1-3). To conclude this section, lack of asymmetry of subject-verb agreement with (typically null) pronominal subjects as well as the A'properties associated with preverbal DPs, whether conjoined or nonconjoined, point in the direction of an analysis of the SVAA not in terms of movement and Spec-head agreement as some of the earlier analyses have proposed (see, for example, Mohammad 1990, 2000; Aoun et al. 1994), but rather in terms of base-generation of preverbal DPs in their surface position.13 Before I present the base-generation analysis in detail, however, in the next section I discuss an analysis of FCA in terms of Spec-head agreement, showing how it is empirically inadequate, hence the need for an alternative approach to FCA. 3. A Spec-Head Agreement Approach to FCA Aoun, Benmamoun & Sportiche (1994) propose an analysis of FCA in terms of Spec-head agreement. According to them, FCA is only “superficial”: cases of FCA, they argue, are actually derived through applying COORDINATION REDUCTION (CR) to an underlying clausal coordination structure, such that the Moroccan Arabic sentence in (16) is derived as in (17): (16)
n'as Kariim w Marwan fǝ -l-biit slept.3sg Kareem and Marwan in-the-room (17) Derivation: Across-the-board verb raising + Right Node Raising [n'asj [IP Kariim … ti …]] w [ej [IP Marwan … ti …]] [fǝ -l-biit]i
If conjunction is in fact clausal in FCA contexts, then we should expect the [DP and DP] string to fail semantic plurality tests, which, Aoun et al. argue, is true in both Lebanese Arabic (LA) and Moroccan Arabic (MA). I illustrate here by citing their LA examples:
13
As mentioned earlier, this is precisely the analysis of SV structures in Arabic traditional grammar. In the generative literature, the same analysis was proposed in Demirdache (to appear) as well as in Fassi Fehri (1993). The analysis that I will offer in the next section will share the underlying idea of the analyses in these works, but it will differ in details. See Soltan (2007) for an elaborate discussion.
ON AGREE AND POSTCYCLIC MERGE IN SYTACTIC DERIVATIONS
(18) a. b. c. (19) a. b. c. (20) a. b.
Kariim Kareem *raa˙ left.3sg raa˙o left.pl Kariim Kareem *bi˙ibb love.sg bi˙ibbo love.pl *lta;a met.3sg lta;o met.3pl
w Marwan and Marwan Kariim w Kareem and Kariim w Kareem and w Marwan and Marwan Kariim w Kareem and Kariim w Kareem and Kariim w Kareem and Kariim w Kareem and
Raa˙o left.pl Marwan Marwan Marwan Marwan bi˙ibbo love.pl Marwan Marwan Marwan Marwan Marwan Marwan Marwan Marwan
201
sawa (LA) together sawa together sawa together ˙aalun/ba'dun (LA) themselves/each other ˙aalun/ba'dun themselves/each other ˙aalun/ba'dun themselves/each other (LA)
As the data in (18-20) show, occurrence of FCA is incompatible with the presence of an element that inherently denotes semantic plurality: the adverbial sawa (=together) in (18), plural reflexives and reciprocals in (19), as well as functioning as subject of intransitive “meet” (20). Under Aoun et al.’s analysis, the explanation is simple: semantic plurality items cannot be licensed in FCA contexts for the simple reason that the surface string [DP and DP] is never a phrasal constituent at any point during the derivation; rather, it is the result of applying CR to a clausal coordination structure.14 Assuming that Aoun et al.’s tests of semantic plurality are reliable diagnostics for the plurality of a string of the form [DP and DP] (but see fn. 14), their analysis still cannot be maintained for FCA in other languages where conjoined subjects in VS structures pass all these tests of semantic plurality. One such language is the closely related language of SA, where the adverbial ma'an (=together), the reciprocal ba'D-ahum ;al-ba'D (=each other), as well as the occurrence as subject of intransitive ;iltaqa (=meet), are all possible in FCA contexts (cf. Harbert & Bahloul 2002):
14
Munn (1999) raises serious doubts on the adequacy of the tests that Aoun et al. use in support of their analysis, to which Aoun, Benmamoun & Sportiche (1999) reply. For considerations of space, I will not discuss these here, referring the reader to these sources for an extensive discussion
202
USAMA SOLTAN
ʒaa;a-t Hind-u came-3sgfem Hind-nom “Hind and Zayd came.”
(21) a.
Hind-u
wa and wa
Zayd-un Zayd-nom
ma'an together ba'D-a-hum ;alba'D
b.
tu˙ibbu
c.
love.sgfem Hind-nom and brothers-nom-her “Hind and her two brothers love each other.” ;iltaqa-t Hind-u wa ;axaw-aa-haa met.3sgfem Hind-nom and brothers-nom-her “Hind and her two brothers met at the party.”
;axaw-aa-haa
some-acc-them the-some
fii ;al-˙afl-i at the-party-gen
Harbert & Bahloul (2002:60) point out that the same is also true of Welsh, where occurrence of reciprocals (22a), functioning as subject of intransitive “meet” (22b), as well as the use of the inherently dual preposition “between” (23a,b), are all compatible with FCA: (22) a. b. (23) a. e.
Es went.1sg Cwrddais met.1sg cynnen strife cwlwm bond
i I
a’m brawd gyda ein and-my-brother with each i a’m brawd ym Mharis I and-my-brother in Paris rhyngof fi a thi between.1sg me and you o gariad sydd rhyngoch of love which-is between.2pl
gilydd other
chwi you
a and
hi her
Similarly, Johannessen (1996) provides examples from Czech where FCA does occur in the presence of semantic plurality items such as the so-called “strong and” i (=both), and distributive “each”, as illustrated by the examples in (24a,b), respectively: (24) a.
b.
Püjdu tam já i will-go.1sg there I.nom and “Both of you and I will go there.” Po jednom jablku at-the-rate-of one.loc apple-loc “John and Peter ate an apple each.”
ty you.nom.2SG sndl Jan ate.3sg John
a and
Petr Peter
To conclude, even if a CR analysis of FCA constructions in MA and LA was feasible, there is overwhelming evidence that FCA constructions in SA, Welsh, and Czech cannot be derived from an underlying clausal conjunction structure, therefore casting doubts on the adequacy of the Spec-head approach to FCA, hence the need for an
ON AGREE AND POSTCYCLIC MERGE IN SYTACTIC DERIVATIONS
203
alternative analysis that follows from general mechanisms that are independently needed in the theory of grammar. This is the topic of the next section. 4. The SVAA Revisited: A Base-Generation Analysis Recall from section 2 that there are two main agreement facts for sentences with conjoined DPs that we are trying to account for: First, that agreement on the verb is full (i.e., in all φ-features) if the conjoined DP is in preverbal position, but partial (i.e., restricted to gender only) if the conjoined DP is in postverbal position. Second, agreement with a postverbal conjoined subject is with the first conjunct only, not with the whole DP or with the last conjunct. As argued in the previous section, to account for the asymmetry in agreement between SV and VS orders, I will assume that VS and SV orders differ structurally along the lines in (25) and (26): (25) VS: [TP T+[v*+V] [v*P DP tv* [VP tV YP]]] (26) SV: [TP DP T+[v*+V] [v*P pro tv* [VP tV YP]]]
In Soltan (2006, 2007), I argue that, given the structural distinction between (25) and (26), a natural solution for the SVAA arises: full agreement obtains in the SV orders because of the presence of a pronominal subject, which is in essence the generalization in (7). Partial agreement in the VS order could be viewed then as the result of a default agreement morpheme on T(ense) in this language. 15 Still, this does not explain why full agreement is obligatory when the subject is pronominal, but not so when the subject is a lexical DP. An answer to this question is readily available from one of the standard assumptions of pro theory: the so-called PRO IDENTIFICATION REQUIREMENT (cf. Rizzi 1982, McCloskey 1986), which can be reformulated as an interface condition (perhaps holding at PF):16 15
In Soltan (2006), I assume that gender agreement is due to the presence of a feature on T that is not part of the φ-complex. See also Ouhalla (2005) for a similar proposal. I will get back to this later on in this paper. 16 The occurrence of pro should also be subject to another interface condition of interpretability such that pro has to be interpretable, reasonably enough an LF condition. Interpretation of pro is achieved through coreference with an antecedent in the sentence or in the discourse. CLASS
204
USAMA SOLTAN
(27) A null element pro has to be identified at the interface, where identification is established by a head with a complete φ-complex associated with pro.17
Given (27), the presence of full agreement in SV orders comes down to an interface requirement on the structure in (26): agreement has to be full or pro will not be identified. Since lexical DPs are not subject to an identification requirement, full agreement is not required for interface convergence; default agreement is therefore allowed.18 In sum, SV orders in SA differ from VS orders in that the former contain a pro subject in the VP-internal subject position, associated with a preverbal DP, in the same way a LD-ed DP is related to a resumptive pronoun. Since pro is subject to an identification requirement, full agreement is always manifest to allow the derivation to converge at the interface. Lexical DPs, by contrast, need not be identified; hence, the occurrence of either default agreement (as in SA) or full agreement (as in MA/LA) is possible in VS orders. If this analysis is correct, then the surface SVAA in SA can be explained in terms of the conditions imposed by the interface systems on structural representations, a result that seems in conformity with the strong minimalist thesis that language design is such that it satisfies bare output conditions. It remains, however, to see if this informal analysis can be cast within a minimalist framework. I turn to this next. 5. Deriving FCA: AGREE and Postcyclic Merge 5.1 Theoretical assumptions To provide an account of FCA, I will assume, following Chomsky (2000, 2001a, 2001b), that agreement is induced in syntactic structures through the application of a primitive grammatical operation AGREE, specifically designed for that purpose. More precisely, AGREE is a
17
I’m ignoring here pro-drop languages of the Chinese-type, where agreement morphology is null, hence cannot serve as an identifier for pro. In such languages, pro identification has to proceed in a different fashion. I do not have anything to contribute to the discussion of pro licensing in such languages at the moment. 18 As noted earlier (see fns. 5 and 6), overtness of a pronominal subject will be forced by interface conditions, such as the requirement that emphasis/focus features be represented on a phonologically overt element, and the requirement that coordinate structures be parallel in their phonological content.
ON AGREE AND POSTCYCLIC MERGE IN SYTACTIC DERIVATIONS
205
head-head relation that takes place at a distance (rather than in a Spechead configuration) within a local search domain: (28)
As diagrammed in (28), AGREE is an operation that establishes a relationship between an element α (call it a PROBE) with uninterpretable features and an element β (call it a GOAL) with matching interpretable features in the domain of α, whereby the uninterpretable features on the PROBE are valued by the matching interpretable features on the GOAL. Typical examples of uninterpretable features are φfeatures or wh-features on functional heads, or Case on nominals. Long distance agreement is attested in natural language grammar, as in English expletive constructions, for example: (29) [There T seem [to be two men in the room]] AGREE
In addition to AGREE, I will adopt the following assumptions with regard to the structural and morphological properties of conjoined DPs (notated as #DP#, henceforward). First, conjoined phrases are hierarchically organized (Munn 1992, Kayne 1994), though I choose here to follow Munn (1993, 1999) in assuming that the hierarchical organization within a conjoined phrase is actually the result of adjunction. More precisely, the conjunction head plus its DP2 complement form an adjunct of DP1, as shown in (30) below:
206
USAMA SOLTAN
(30)
Second, adjuncts can be introduced into the derivation “noncyclically”, via an operation of late-Merge, an idea first suggested in Lebeaux (1988), and implemented in different ways in Chomsky (1993), Fox & Nissenbaum (1999), and Uriagereka (2002). Postcyclic Merge has typically been proposed to account for certain LF effects (e.g., binding) that cannot be accounted for under a strictly cyclic derivation. Consider the examples below, for instance: (31) Which picture [COMPLEMENT of Billi] [ADJUNCT that Johnj liked] did he*i/j buy? (32) a. Which claim [COMPLEMENT that Johni was asleep] was he*i willing to discuss? b. Which claim [ADJUNCT that Johni made] was hei willing to discuss?
In (31), while conference between Bill and he is disallowed, coreference between John and he is possible, even though both DPs ccommand the pronominal, in violation of BINDING CONDITION C. A postcyclic approach to adjuncts is able to solve that problem, however, if at the point where binding conditions are evaluated the adjunct relative clause has not been Merged yet. The same proposal can also account for the asymmetry in binding possibilities between (32a) and (32b): Binding of he by John in (32a) violates Condition C; binding of he by John in (32b) is possible since the binder DP is contained within an adjunct clause that can be inserted postcyclically, thereby allowing the apparent violation of Condition C. In this paper I would like to argue that postcyclic Merge may also have comparable effects at the PF level. In particular, FCA is argued to be the result of postcyclic Merge interacting with the operation AGREE in the course of the derivation. A third assumption with regard to conjoined phrases is that the φfeatures of the root node #DP# are determined via the application of the so-called FEATURE RESOLUTION RULES (FRRs), e.g., first person+second
ON AGREE AND POSTCYCLIC MERGE IN SYTACTIC DERIVATIONS
207
person=first person; singular+dual=plural; masculine+feminine= masculine; etc., (cf. Corbett 1983, 2000 for an extensive discussion). Finally, consider the inventory of uninterpretable features on T. These should include φ-features for the traditional Person and Number features, which may also happen to have DEFAULT values. Assume a separate CLASS feature, familiar from languages with rich classifier systems, which is morphologically manifest as a Gender feature in many languages. If Gender is not part of the φ-complex on T, then it should be able to probe separately for the purposes of AGREE (see Ouhalla 2005). Furthermore, T may appear with an EPP feature, understood here as the requirement to be “an occurrence of something,” where an occurrence of α is a sister of α (Chomsky 2001b). In principle, then, T can appear with φ, CLASS, EPP, or any combination of these three, subject to lexical parameterization. 5.2 Deriving full agreement with preverbal conjoined DPs in SV structures19 For simplicity of presentation, suppose that our target Arabic SV structure is “John and Mary read the book” with full agreement surfacing on the verb “read.” Given the theoretical assumptions made earlier in this section and the empirical evidence discussed in section 3, the structure of this sentence is as in (33) below, where #DP# is the conjoined phrase “Mary and John”: (33) [CP C [TP #DP# Tφ/CLASS/EPP [v*P pro v* [VP …]]]] AGREE
At the interface, since pro is identified by the agreement features on T, the derivation converges. The impossibility of partial/default agreement is ruled out by the interface condition on pro identification in (27), whereas the impossibility of FCA follows simply from the fact that the first conjunct (or the whole conjoined phrase for that matter), being base-generated in SpecTP, is never in the search domain of T.
19
Assume verb raising to v* and T throughout, perhaps an operation of the phonological component driven by the affixal properties of functional heads (cf. the structures in (25) and (26) in section 4). For simplicity of presentation, I will not show this in the structural representations in this section.
208
USAMA SOLTAN
Notice that under this analysis we can now account for the set of semantic, syntactic, and Case properties associated with SV orders (cf. section 2). First, indefinite nonspecific NPs cannot be associated with pro, which is inherently a D head, hence their incompatibility with occurrence in preverbal position. Second, if SpecTP is parametrically an A'-position, wh-extraction across a DP in SpecTP is then blocked by familiar minimality considerations. Wh-extraction across a DP in Specv*P is permissible, though. Third, a DP in postverbal position will always be assigned nominative case under AGREE with T. By contrast, a DP in preverbal position will be assigned default nominative case, unless a lexical or structural Case-assigner is available in the structure, e.g., an overt C or an ECM verb, as schematically shown in (34) below: (34) a. [CP ;inna [TP #DP# Tφ/CLASS/EPP [v*P pro v* [VP … ]]]] Case b. [VP VECM [CP #DP# [TP Tφ/CLASS/EPP [v*P pro v* [VP … ]]]]] Case
5.3 Deriving FCA: The option of AGREE prior to late adjunction Consider now FCA in the VS order. Here our target structure is “Read Mary and John the book,” with the verb showing feminine gender agreement with the first conjunct Mary. If we followed the same assumptions as in the derivation of structures with preverbal conjoined subjects in the previous section, we should predict full, not first conjunct, agreement to obtain between T and postverbal #DP# subject, as shown in (35) below: (35) [TP TDEFAULT/CLASS [v*P [#DP# DP1 [ConjP Conj DP2]] v* [VP V…]]] AGREE AGREE X
What (35) shows is that at the point when T probes, it is the conjoined #DP# that is available as a GOAL for feature valuation. The first conjunct DP1 is now “’buried” within a substructure whose internal elements are, by assumption, inaccessible for further syntactic operations. While the derivation in (35) is still needed since occurrence of full agreement in such contexts is attested in natural languages, as
ON AGREE AND POSTCYCLIC MERGE IN SYTACTIC DERIVATIONS
209
Aoun et al. argue is the case in Lebanese and Moroccan Arabic (data will be provided shortly), still a problem arises with regard to languages such as SA, where FCA is the only option in such contexts. I would like to argue here that it is in languages like SA that the option of postcyclic Merge of adjuncts is available for adjunct ConjPs. Specifically, FCA may now be understood as the result of allowing AGREE to take place with the VP-internal subject prior to the late adjunction of ConjP to that subject. More concretely, in the derivation of the sentence “Read Mary and John the book,” there is a point at which we construct the following v*P: (36) [v*P Mary v* [VP V …]]]
Suppose, we then Merge T, thereby inducing a subsequent AGREE relationship between T and the DP Mary in the v*P-internal subject position: (37) [TP T [v*P Mary v* [VP V …]]] AGREE
Postcyclically, we can then late-Merge the adjunct ConjP “and John” to the DP Mary, at which point FRRs apply to compute the φ and CLASS features of the conjoined DP, thereby licensing elements denoting semantic plurality (e.g., plural reflexives, reciprocals, “both,” “each,” etc.): (38) [TP T [v*P [#DP# Mary [ConjP and John]] v* [VP V …]]]
FCA is thus the result of agreement taking place prior to the introduction of the adjunct ConjP by late-Merge. While the above analysis can account for FCA in SA and similar languages, questions arise as to how to make sure that using the option of postcyclic Merge will not lead to overgeneration of ungrammatical structures in natural languages. In this respect, I discuss three such cases of potential overgeneration below, arguing that they are either ruled out by independently needed principles of the grammar, or are actually attested in natural languages, hence providing further support to the analysis presented here.
210
USAMA SOLTAN
First, consider the case where we Merge the first conjunct in Specv*P, allow T to AGREE with it, then late-Merge ConjP, and then move the whole conjoined subject #DP# to SpecTP to license EPP, thereby deriving the bad sentence in (39) where FCA obtains in an SV structure, claimed to be unattested in human languages (Corbett 2000): (39) *John and I loves each other.
Notice, however, that this derivation is ruled out, by a basic assumption of AGREE-based syntax: “Move is dependent on AGREE” (Chomsky 2001a, 2001b). Since T never Agrees with #DP#, movement of that #DP# is not permitted. A second case of potential overgeneration may occur if we Merge the first conjunct in Specv*P, allow T to AGREE with it, then lateMerge ConjP, and then move the AGREED-with first conjunct to SpecTP to license EPP, thereby deriving the ill-formed structure in (40) below in a language like English: (40) *John has [t and I] met each other.
But this derivation is obviously ruled out by the COORDINATE STRUCTURE CONSTRAINT (CSC). Notice, however, that the analysis presented here makes an interesting prediction in cases such as (40): Suppose that the EPP feature on T in this case can be satisfied in some other way than moving the AGREED-with DP, say by an expletive in existential constructions, then we should predict that FCA becomes possible, since no potential violation of the CSC arises in this case, a prediction that is borne out by the grammaticality of FCA structures such as (41) below from English: (41) There is a man and two women in the room.
Finally, notice that if late-Merge of adjuncts is an option, we should be able to “early-Merge” ConjP as well, thereby predicting full agreement rather than FCA to obtain in some languages. As noted earlier, this is true in some of today’s dialects of Arabic, as reported by Aoun et al. (1994) for LA and MA:
ON AGREE AND POSTCYCLIC MERGE IN SYTACTIC DERIVATIONS
(42) a.
b.
211
raa˙o Kariim w Marwan sawa LA left.pl Kareem and Marwan together “Kariim and Marwan left together.” bi˙ibbo Kariim w Marwan ˙aalun/ba'dun love.pl Kareem and Marwan themselves/each other “Kariim and Marwan love themselves/each other.”
Complementizer agreement in Dutch and German also shows a similar range of possibilities: FCA only in Tegelen Dutch (43); full agreement only in Lapscheure Dutch (44); both options in Bavarian German (45) (data from von Koppen 2005): (43)
de-s doow en ich ôs treff-e that-2sgyou and I each other meet pl “… that you and I could meet.” (44) Kpeinzen da-n [Valère en Pol] morgen goa-n I.think that-3pl Valère and Pol tomorrow go-pl “I think that Valère and Pol will go tomorrow.” (45) a. daβ-sd du und d’Maria an Hauptpreis gwunnahab-ds that-2sg yousg and the-Maria the first-prize won have-2pl b. daβ-ds du und d’Maria an Hauptpreis gwunnahab-ds that-2pl yousg and the-Maria the first-prize won have-2pl “… that Maria and you have won the first prize.”
To summarize the analysis presented here, FCA arises from the interaction between two independently needed mechanisms of the grammar: AGREE and late-Merge of adjuncts. Since AGREE, by definition, is a “downward” operation, it follows that FCA can only obtain with arguments in postverbal position, a robust fact across human languages (cf. Corbet 2000). A Spec-head approach to agreement, however, cannot provide an analysis for these “downward” and “postverbal” properties of FCA without extra stipulations. 6. Conclusions The goal of this paper has been to revisit the classical phenomenon of FCA in SA from a minimalist perspective. I have argued that full agreement with preverbal conjoined subjects is in fact the result of T AGREEING with a null subject pro in the VP-internal subject position, necessarily required to be full by the interface condition on pro identification. By contrast, FCA is argued to be the result of AGREE
212
USAMA SOLTAN
between T and the first conjunct in the thematic domain prior to the application of postcyclic Merge which adjoins ConjP to that first conjunct to form a conjoined subject. If correct, the analysis presented in this paper lends further support to a theory of agreement in terms of an AGREE relation rather than in a Spec-head configuration. In addition, it also provides evidence that late-Merge of adjuncts not only has consequences at the LF interface, but at the PF interface as well.
REFERENCES Aoun, Joseph, Elabbas Benmamoun & Dominique Sportiche. 1994. “Agreement, Word Order, and Conjunction in Some varieties of Arabic.” Linguistic Inquiry 25.195-220. Benmamoun, Elabbas. 2000 The Feature Structure of Functional Categories: A comparative study of Arabic dialects. Oxford: Oxford University Press. Basilico, David. 1998. “Object Position and Predication Forms.” Natural Language and Linguistic Theory 16:3.541-595. Chomsky, Noam. 1993. “A Minimalist Program for Linguistic Theory.” The View from Building 20: Essays in honor of Sylvain Bromberger ed. by Kenneth Hale & Samuel Jay Keyser, 1-52. Cambridge, Mass.: MIT Press. _____. 1995. The Minimalist Program. Cambridge, Mass.: MIT Press. _____. 2000. “Minimalist Inquiries: The framework.” Step by Step: Essays on Minimalist Syntax in Honor of Howard Lasnik ed. by Roger Martin, David Micheals & Juan Uriagereka, 89-156. Cambridge, Mass., MIT Press. _____. 2001a. “Derivation by Phase.” Ken Hale: A life in language ed. by Michael Kenstowicz, 1-52. Cambridge, Mass.: MIT Press. _____. 2001b. “Beyond Explanatory Adequacy.” MIT Occasional Papers in Linguistics 20, 1-28. MIT. Corbett, Greville G. 1983. “Resolution Rules: Agreement in person, number and gender.” Order, Concord and Constituency ed. by Gerald Gazdar and Geoffrey K. Pullum, 175-214.. Dordrecht: Foris. _____. 2000. Number. Cambridge: Cambridge University Press. Demirdache, Hamida. To appear. “Nominative Subjects in Arabic.” International Journal of Basque Linguistics and Philosohpy. Bilano/Donostia-San Sebastian. Fassi Fehri, Abdelkader. 1993. Issues in the Structure of Arabic Clauses and Words. Kluwer, Dordrecht. Fox, Danny & Jon Nissenbaum. 1999. “Extraposition and Scope: A case for overt OR.” Proceedings of the 18th West Coast Conference on Formal Linguistics ed. by Sonya Bird, Andrew Carnie, Jason D. Haugen & Peter Norquest, 132144. Harbert, Wayne & Maher Bahloul. 2002. “Postverbal Subjects in Arabic and the Theory of Agreement.” Themes in Arabic and Hebrew Syntax ed. by Jamal Ouhalla & Ur Shlonsky, 45-70. Dordrecht: Kluwer Academic Publishers.
ON AGREE AND POSTCYCLIC MERGE IN SYTACTIC DERIVATIONS
213
Hornstein, Norbert. 2005. “What Do Labels Do? Some thoughts on the endocentric roots of recursion and movement.” Ms., University of Maryland, College Park. Johannessen, Janne Bondi. 1996. “Partial Agreement and Coordination.” Linguistic Inquiry 27, 661-676. Kayne, Richard. 1994. The Antisymmetry of Syntax. Cambridge, MA: MIT Press. Kuroda, Yuki. 1972. “The Categorical and Thetic Judgment: Evidence from Japanese syntax.” Foundations of Language 9.153-85. Lebeaux, David. 1988. Language Acquisition and the Form of the Grammar. Ph.D. dissertation, University of Massachusetts, Amherst. McCloskey, James. 1986. “Inflection and Conjunction in Modern Irish.” Natural Language and Linguistic Theory 4.245-281. Mohammad, Mohammad. 1990. ‘The Problem of Subject-Verb Agreement in Arabic: Towards a solution.” Perspectives in Arabic Linguistics I ed. by Mushira Eid, 95-125. Amsterdam: John Benjamins. _____. 2000. Word Order, Agreement and Pronominalization in Standard and Palestinian Arabic. Amsterdam: John Benjamins. Munn, Alan. 1992. “A Null Operator Analysis of ATB Gaps.” The Linguistic Review 9.1-26. _____. 1993. Topics in the Syntax and Semantics of Coordinate Structures. Ph.D. dissertation, University of Maryland, College Park. _____. 1999. “First Conjunct Agreement: Against a clausal analysis.” Linguistic Inquiry 30.643-686. Ouhalla, Jamal. 2005. “Agreement Features, Agreement and Antiagreement.” Natural Language and Linguistic Theory 23.655-686. Raposo, Eduardo & Juan Uriagereka. 1995. “Two Types of Small Clauses: Toward a syntax of theme/rheme relations.” Syntax and Semantics 28: Small Clauses ed. by A. Cardinalleti & M. T. Guasti, 179-206. New York: Academic Press. Rizzi, Luigi. 1982. Issues in Italian Syntax. Dordrecht: Foris. Soltan, Usama. 2006. In press. “Standard Arabic Agreement Asymmetry Revisited in an Agree-based Minimalist Syntax.” Agreement Systems ed. by Cedric Boeckx. Amsterdam: John Benjamins. _____. 2007. “On Formal Feature Licensing in Minimalism: Aspects of Standard Arabic morphosyntax”. Ph.D. dissertation, University of Maryland, College Park. Uriagereka, Juan. 2002. “Pure Adjuncts.” Ms., University of Maryland, College Park. von Koppen, Marjo. 2005. One Probe—Two Goals: Aspects of agreement in Dutch dialects. Ph.D. dissertation. Leiden University, LOT-Publications Number 105.
Section III
Sociolinguistics and Second Language Acquisition
NULL SUBJECTS USE BY ENGLISH AND SPANISH LEARNERS OFARABIC AS AN L21
Mohammad T. Alhawary The University of Oklahoma
1. Introduction Within Principles and Parameters framework, the Pro-drop or Null Subject Parameter (NSP) has received some of the most extensive investigations in terms of parameter setting in both first (L1) and second (L2) language acquisition and as one of the cluster properties comprising the NSP. Most accounts have been based on the contingent relationship held between rich (overt) verbal inflections and the null subject phenomenon (i.e., in null subject languages, such as Spanish versus English). The phenomenon is exemplified in (1)-(3) below where (1), (2) and 3(a) are grammatical but 3(b) is not. (1) ;akal-naa ate-1P “We ate.” (2) Com-ímos ate-1P “We ate.” (3) a. We ate b.*Ate 1
(Arabic)
(Spanish)
(English)
I would like to thank all the students who participated in the study and Ignacio Gutiérrez de Terán and Waleed Saleh Alkhalifa for their help in recruiting the Spanish participants. The study was supported by funding from the College of Arts and Sciences and the School of International Studies, University of Oklahoma.
218
MOHAMMAD T. ALHAWARY
Various analyses posit that the null subject position (in finite clause) is occupied by a non-phonetically realized pronominal, small pro. The standard pre-minimalist analysis hinges on two basic assumptions, following Rizzi (1986): licensing and identifications of pro as illustrated in (4) below. Licensing of pro refers to the grammatical property that allows null subjects to occur in null subject languages as opposed to non-null subject languages. Presence of rich agreement morphology is generally but loosely assumed to be the licensing condition. Thus, in sentence (2) Spanish INFL is assumed to be morphologically rich enough (vs. English) to formally license pro (by assigning case to it) within a Spec-Head relation. Identification refers to recoverability of the content of pro usually from the inflectional features attached to the verb. It is formally identified by being coindexed with its head (INFL) and consequently sharing the AGR feature values of INFL. (4)
IP ru Spec I' proi ru [1PL] t y Ii VP [1PL] ru Spec V' V com-ímos
G
To account for the distribution of null subjects in languages (such as Chinese and Japanese) that exhibit no morphological agreement and yet allow null subjects, pro is assumed to be identified (pragmatically) in the discourse within a topic chain, via availability of the closest antecedent in the discourse.2 2
Cf. Jaeggli & Hyams (1988) and Jaeggli & Safir (1989) who propose the Morphological Uniformity Principle: null subjects are permissible only in languages that exhibit a morphologically uniform inflectional Paradigm P, where P in a Language L is morphologically uniform iff P has either only underived inflectional forms or only derived inflectional forms (1989:29-30). Accordingly, languages that exhibit an impoverished or mixed paradigm do not permit null subjects. It has been suggested that some languages (Persian and Wichita) have an impoverished agreement paradigm but allow null subjects (O’Grady 1997). There are also languages that have rich agreement (German and =Icelandic) but do not allow null subjects.
NULL SUBJECTS USE IN ARABIC L2 ACQUISITION
219
In what follows, I summarize the main cross-linguistics evidence and proposals with respect to NSP in L1 and L2 acquisition and report on a study conducted on adult English and Spanish speakers learning Arabic as an L2 and their interlanguage (IL) use of null subjects. 2. Null Subjects in L1 Acquisition The single most important aspect that received attention in the literature is probably the notion of the setting of the NSP allowing null subjects. Chomsky’s (1981) earlier (Government and Binding) assumption is that the parameter is initially unmarked and that the child will eventually set the parameter [+drop]/[+null] or [–drop]/[-null] depending on the grammar of L1. Additionally, the parameter is assumed to be associated with a cluster of properties (including: null subjects, subjectverb inversion and that-trace effect); the setting of any of these is assumed to trigger the automatic setting of the rest of the structures. The initial parameter setting and the clustering issue generated much debate in both L1 and L2 acquisition literature as we shall see below. Rather than being unmarked à la Chomsky (1981), on Rizzi’s (1986) account, an L1 is posited to be initially set at a [+drop] value and the child would have to reset the value of the parameter according to that of adult L1 grammar (see Hyams 1986, 1987). L1 acquisition data from both null subject and non-null subject languages showing prevalent production of null subject contexts would seem to be accounted for accordingly. Thus, in the L1 grammar of children acquiring null subject languages, pro would be initially licensed through INFL and identified through rich inflectional features. On the other hand, in the L1 grammar of children acquiring non-null subject languages, such as English, pro would be initially licensed through INFL and identified through discourse topic chain until the child figures out the impoverished inflectional paradigm and pro would then be blocked (e.g., Hyams 1994, Jaeggli & Hyams 1988). However, such a proposal encounters two problems: 1) the findings that null subjects are more prevalent in Italian (a null subject language) L1 acquisition than English L1 acquisition and 2) the findings that more null subjects occur in child English with non-finite verbs than finite verbs. Wexler (1994) proposes that instances of null subjects in non-null subject languages may be instances of big PRO rather than small pro.
220
MOHAMMAD T. ALHAWARY
The occurrence of PRO, according to Wexler, is due to the feature TENSE being underspecified in child’s grammar. Wexler suggests that since, on one hand, adult English grammar allows PRO in infinitival embedded clauses, illustrated in (5) below, this may incidentally further explain away the asymmetrical frequency of null subjects with nonfinite verbs (being higher) than that of null subjects with finite verbs. (5) Sally wanted [PRO to go home]
On the other hand, in null subject languages such as Italian, pro is licensed by rich verbal inflection from early on, hence the relatively higher frequency of null subjects in Italian child acquisition than that of English child acquisition (see also Bromberg & Wexler 1995). Similarly, early presence of null subjects in child speech is claimed to be related to syntactic development in L1 of what became known as the Truncation Hypothesis where children are initially assumed to have access to the lower (than TP) part of the tree only, triggering the production of root infinitives (Radford 1988, 1990; Rizzi 1993/1994). According to this proposal, finite inflection is claimed to correlate with development of CP projection (and disappearance of null subjects in a non-null subject language such as English) and non-finite inflection with IP projection. Other proposals claimed that the null subject phenomenon is not related to the NSP but is rather related to a performance or processing deficit in child speech. For example, Bloom (1990) found a correlation between null subjects and VP length and that the longer the VP the more likely the subject to be dropped. Bloom (1990) also found that sentences produced by children with subjects were significantly shorter than sentences without subjects. Aronoff (2003) found a correlation between null subjects and null verbs both in children and adults, suggesting that null subject dropping in child language is due to processing limitations (see also Valian 1991). The proposal that has received perhaps the most attention is related to the general trend in the field of assuming a contingent relationship between development of inflectional morphology and syntactic development. Notwithstanding the various proposals and the mixed evidence, there is a considerable body of evidence that suggests the presence of a correlation between development of inflectional paradigm
NULL SUBJECTS USE IN ARABIC L2 ACQUISITION
221
and null subject phenomenon. Many studies reported findings that children set the NSP at the time they acquire agreement inflection features (e.g., Deprez & Pierece 1993, Lebeaux 1988, Rizzi 1998). Similarly, it has also been reported that English-speaking children produce null subjects in absence of agreement features (Roeper & Rohrbacher 2000). Based on these observations, Speas (1994/2006) proposes one of the most widely cited analyses of null subjects for L1 within a Minimalist framework, following Chomsky (1993). The proposal is also based on the Economy of Projection Principle that Speas introduced to motivate her notion that functional projection can be generated only when needed. Speas proposes that languages that exhibit agreement features no matter how “residual” or impoverished (such as English) would project an AGRP projection for agreement features to be checked while in languages that lack agreement features altogether (such as Chinese and Japanese) there is no need for an AGRP within which the features can be checked. The latter have functional heads, such as TENSE and ASPECT but no AGR head. Thus, a projection must have a content (before Spellout) and an AGRP projection is needed for even a language with impoverished agreement so that AGR features must be checked in Spec-Head relation at LF (1994/2006:11). Accordingly, structures 6(a)-(c) are proposed (Speas 1994/2006:12). As illustrated, an additional introduced parameter is that in strong (or rich) AGR languages (such as Italian type languages), the agreement affix is stored in the lexical representation and can be separated from the verb to be inserted in the head position of AGRP and in weak (or impoverished) agreement languages (such as English type languages), the morphemes do not have independent lexical entries and are base-generated on the verbal stem (i.e., part of the inflectional paradigm). Accordingly, a language permits null subjects if the agreement affix is base generated under AGR, but it does not if agreement affix is base generated on the verb. Additionally, in a nonnull subject language (such as an English type language), Spec-AGRP must be filled (by a non-null content) prior to Spellout. (We shall return to this analysis in section 4.2.)
222
MOHAMMAD T. ALHAWARY
(6) a. Italian
b. English
c. Japanese
AGRP AGRP TP ru ru ru DP AGR' DP AGR' DP T' ru it ru ru AGR VP AGR VP T VP -af ru ru ru pro V' *pro V' pro V' ru ru fh V DP V +af DP DP V
3. Null Subjects in L2 Acquisition Three main questions have preoccupied second language researchers with respect to the null subject phenomenon: setting the NSP, its cluster properties, and access to UG.3 White (1985, 1986) investigated adult French [-drop], Spanish [+drop] and Italian [+drop] speakers learning English [-drop] as an L2 cross-sectionally. On grammaticality judgment and production tasks, a marked difference in performance was found between the Spanish and Italian speakers on one hand and the French speakers, on the other. The former showed a tendency to accept (ungrammatical) null subject contexts more often than the latter did, suggesting a transfer effect from L1. White claimed that the value [-null] is the unmarked setting and would require negative evidence to detect, hence the Spanish participants did not acquire the form as easily as their French counterparts did. The results of the study also showed a proficiency effect, suggesting that the Spanish learners could reset the parameter of (their English) L2. However, the data produced mixed evidence with respect to parameter clustering effect. Both groups (the Italian/Spanish and the French groups) rejected sentences with subject-verb inversion errors but accepted sentences with that-trace sequences errors. Hilles (1986) examined longitudinal data of a twelve-year-old 3
There is a disagreement among researchers with respect to investigating parameter (re-)setting in L2 acquisition if the phenomenon in question turns out not to apply altogether in L1 acquisition as claimed by Beck (1998). Thus, it is possible to do so for some researchers (e.g., Beck 1998:29) but “implausible” for others (e.g., Sprouse 1998:62).
NULL SUBJECTS USE IN ARABIC L2 ACQUISITION
223
speaker of (Colombian) Spanish learning English in a naturalistic setting. Hilles found the subject’s L2 grammar already set at a [+null] value (with a prevalence of null subject contexts), transferring the value from L1. The subject later reset the parameter to [-null] value. A correlation was found between the gradual adjustment to overt subject contexts with the production of (non-referential) expletive subjects and increase in production of modals, hence providing evidence for a clustering effect. Phinney (1987) investigated use of subject pronouns and subject verb agreement. Phinney relied on production data of free composition of (beginning and intermediate) adult Spanish learners of English and adult English learners of Spanish as an L2. Subject-verb agreement production was found accurate in both groups although most instances involved 1st person singular. The findings showed evidence of L1 transfer in both groups and that it was easier for the English learners to drop subject pronouns than for their Spanish counterparts to insert pronouns. Accordingly, the data were interpreted to support the claim that the [+drop] value is the unmarked setting. Liceras (1989) investigated the NSP and NSP clustering effect by examining the responses of adult English and French speakers learning Spanish as an L2 on grammaticality judgment tests. Liceras reported that both groups (including beginner level participants) accepted null subject tokens in Spanish as grammatical but had more difficulty with subject-verb inversion and that-trace. No L1 transfer effect was claimed. However, contrary to standard assumptions, Liceras claimed that the results are not against a clustering effect. Rather, the structures investigated are not as equally simple as null subjects; the latter seems to be a pre-requisite to the other structures in the cluster or that alternatively structures within the NSP cluster require individual triggers (see also Liceras 1988). Tsimpli & Roussou (1991) investigated the NSP and NSP clustering effects in 13 adult Greek (+drop) learners of English who were at the intermediate (proficiency) level. The data, which comprised the participants’ responses on grammaticality judgment tasks, revealed that the participants had far more problems with that-trace and subject verb inversion than they did with null subjects (suggesting no clustering effect). Tsimpli & Roussou claimed that their data suggest that L2ers transfer the value of their L1 to L2 but cannot reset the par-
224
MOHAMMAD T. ALHAWARY
ameter, because UG access in L2 is permanently impaired. L2ers are claimed rather to have access to non-parameterized UG principles. Hilles (1991) examined naturalistic acquisition data of six Colombian Spanish speakers learning English as an L2. The subjects were two children, two adolescents, and two adults. The study examined the prediction made by the Morphological Uniformity Principle (see footnote 1 above). The study produced mixed evidence: out of the six subjects the data of only three (the two children and one of the two adolescents) showed a significant correlation between emergence of overt pronominal subjects and development of verbal inflection. Notwithstanding the nature of this mixed evidence (of a clustering effect), Hilles claimed that L2ers of the study had access to UG not through L1 but through a default [+null] value (set in L2) as that of a uniformly uninflected language (e.g., Chinese and Japanese). Lakshmanan (1991) examined longitudinal data of three children learning English as an L2 (French, Colombian Spanish [+null], and Japanese [+null]). Lakshmanan, however, found no clustering effect and no correlation with respect to development of (English nonuniform/impoverished) verbal inflection and overt subjects use and concluded that the data are not in support of the Morphological Uniformity Hypothesis or direct UG access in child L2 acquisition. The Colombian child exhibited inconsistent use of verbal inflection but omitted subjects (including thematic referential subjects/pronouns). The French child exhibited inconsistent use of verbal inflection but omitted mainly expletive (non-referential) subjects. The Japanese child exhibited low verbal inflection use and produced no null subjects. Davies (1996) also attempted to test the predictions made by the Morphological Uniformity Hypothesis on 48 adult speakers of null subject languages (Chinese, Japanese, Korean, Italian and Spanish) learning English as an L2 who were at three proficiency levels. The study produced mixed evidence. A number of the participants exhibited knowledge that English is a morphologically non-uniform language, yet they accepted (as grammatical) subjectless English sentences. Vainikka & Young-Scholten (1994, 1996) examined cross-sectional data from 11 adult Turkish [+null] and six Korean [+null] learners of German [-null] as an L2. The findings revealed a correlation between the development of verbal inflection and overt subjects. Three patterns corresponding to three stages were identified: L2ers who supplied
NULL SUBJECTS USE IN ARABIC L2 ACQUISITION
225
pronominal subjects and verbal inflection at 50% (stage-1), L2ers who supplied pronominal subjects at 70% with inconsistent use of verbal inflection (stage-2), and L2ers who supplied subjects at more than 80% (stage-3) and supplied verbal agreement inflection at comparable high rates. Accordingly, Vainikka & Young-Scholten claimed no transfer of the L1 value [+null] takes place in L2. Rather, L2ers set their [-null] value at stage three when they have acquired verbal inflection. Clahsen & Hong (1995) attempted to challenge Vainikka & YoungScholten’s (1994) findings by analyzing grammaticality reaction time responses of 33 adult Korean learners of German as an L2. At least, two general patterns emerged in the study: a group of 18 participants acquired only one of the two forms, while another 15 acquired both forms or none. Accordingly, the developmental paths of null subjects and subject-verb agreement were interpreted not to be related, suggesting that subject-verb agreement is not part of the cluster properties of NSP. Al-Kasey & Pérez-Leroux (1998) examined whether L2ers can reset the NSP in L2 and whether there is a clustering effect of null (non-referential) expletives and null thematic (referential) subjects (or optional subject pronouns). They analyzed data generated via comprehension and production (written) tasks from 88 English learners of Spanish as an L2 at different proficiency levels. The main findings revealed a proficiency effect for the acquisition of both types of subjects, as the number of both types of null subjects increased with proficiency at about the same pace, suggesting an acquisition correlation, hence also a clustering effect. Accordingly, Al-Kasey & Pérez-Leroux claimed that their findings are in support of a UG access in L2 and that setting of parameters in L2 is also possible. Liceras & Díaz (1998) examined production data of 18 adult English, French, English/French bilingual, Danish [-null], Swedish [-null] and Japanese speakers learning Spanish as an L2. The participants belonged to two cross-sectional levels: beginning and advanced. The data revealed ample production of null subject contexts and that the Japanese advanced participants did not exhibit substantial increase in null subject production as the other advanced participants did. Liceras et al. claimed that their data show that advanced subjects whose L1 incorporates abstract inflectional features tended to treat their production of null subjects in Spanish structurally while those whose
226
MOHAMMAD T. ALHAWARY
L1 does not do so tended to treat them pragmatically. Liceras & Díaz concluded that their study provides some evidence of L1 transfer. Liceras et al. (1999) examined spontaneous production data of 18 adult Chinese, Korean, Japanese, English, French and German learners of Spanish as an L2 (at the “intermediate advanced” level). The findings revealed that all participants produced null subjects in matrix, embedded and conjoined clauses, with the Japanese participants producing slightly fewer subjectless clauses. Liceras et al. claim that the different L1 speakers resort to different UG-related, nonparametrized identification procedures through L1, hence allowing for L1 transfer. Null subjects are assumed to be licensed in Spec-VP by default in absence of abstract features determining the value of the null subjects. Thus, following Tsimpli & Roussou (1991), Liceras et al. claimed that re-setting the parameter is not possible in L2, but that L2ers can still have access to UG through L1. To sum up the SLA research conducted with respect to the null subject phenomenon, the following general observations can be noted. There does not seem to be a general agreement as to which value ([+null] versus [-null]) represents the unmarked setting. Tsimpli & Roussou (1991) argue that associating markedness with null subjects is “ad-hoc”. Studies produced mixed evidence with respect to clustering effects. This leads to at least three conclusions. First, evidence against clustering effects suggests that no resetting of the NSP takes place, hence no access to UG in L2. Second, the notion of clustering is applicable, but some structures within the cluster may be prerequisite to other structures or that structures within the clusters have individual triggers. On this account, no resetting of the NSP takes place in L2. Third, evidence for clustering effects suggests that resetting of the NSP is possible in L2, hence access to UG in L2 is also possible. There is no agreement as to whether L2ers can reset the parameter in L2. Studies produced mixed evidence with respect to the association between development of verbal inflection and null subjects. Studies used different assessment measures, many of which relied on grammaticality judgment tasks. This methodology has been criticized for its many limitations (e.g., Ellis 1990, 1991; Lantolf 1990; Goss et al. 1994). In addition, different types of data are relied upon
NULL SUBJECTS USE IN ARABIC L2 ACQUISITION
227
(including longitudinal, cross-sectional, naturalistic and child L2 and adult L2) that may not allow for a straightforward comparison. Studies generally show evidence that adult speakers of null subject and non-null subject languages learning a null subject language produce subjectless clauses from early on. Conversely, speakers of null subject and non-null subject languages learning a non-null subject language do supply overt subjects from early on. Studies generally seem to provide evidence in support of L1 transfer. This comes from at least three main directions. First, speakers of null subject languages learning null subject languages produce null subject clauses and seem to adjust to the L2 system from early on. Second, speakers of null subject languages learning non-null subject languages (e.g., English and German), produce subjectless clauses even though they seem to adjust to the system of L2 from early on. Third, speakers of non-null subject languages learning non-null subject languages, noticeably reject more subjectless clauses than speakers of null subject languages learning non-null subject languages. Researchers who deny L1 transfer and relegate L2ers’ early adjustment to other factors (such as the notion of non-parametrized UG related procedures) do not, in fact, provide empirical or clear evidence in ruling out L1 transfer (see also Sauter 2002). Despite the fact that many studies have been conducted to investigate the null subject phenomenon in L2, using participants of many different L1s (including both null subject and non-null subject languages), an additional surprising observation is that the research mainly focused on three languages as L2s: English [-null], German [-null] and Spanish [+ null]. The study reported on below contributes to the debate by investigating a fourth language (Arabic) as an L2. In addition, the study relies on production data rather than grammaticality judgment tasks in assessing the participants’ use of null subjects. The study focuses on three main issues: 1) use and distribution of null subjects in adult English and Spanish learners of Arabic as an L2, 2) L1 transfer and 3) the relationship between development of verbal inflection and null subjects. Investigating clustering effects is beyond the scope of this study.
228
MOHAMMAD T. ALHAWARY
4. Methods 4.1 Participants Fifty-four Arabic L2ers, belonging to two different native language backgrounds, Spanish and American English, were invited to participate in the study in their home institutions. The participants were grouped according to their placement by their home institutions and according to length of exposure to Arabic as part of their academic programs. Table 1 summarizes the details of the participants. Length of Exposure
Credit Hours Enrolled in
English L1 Group1 (n=9) Year 1 6 Group2 (n=9) Year 2 5 Group3 (n=9) Year 3 4 Spanish L1 Group1 (n=9) Year 1 6 Group2 (n=9) Year 2 6 Group3 (n=9) Year 3 6 M/F= Total Males/Total Females Table 1. Participants
M/F
Ages Range
Ages Means
4/5 5/4 6/3
18-21 20-29 22-34
19.22 22.22 29.11
3/6 2/7 3/6
19-23 19-26 20-33
20.22 21.55 25.33
The participants selected had little or no exposure to Arabic prior to joining their academic institutions (in Spain and the U.S.) and are not heritage speakers who would speak Arabic occasionally or often at home. In particular, first-year students of both language groups had zero exposure and had made no trips to Arabic-speaking countries. A few participants from both language groups at other levels had traveled to Arabic-speaking countries for brief visits and did not stay for a significant period of time. The participants of both L1 language groups received formal instruction in Arabic with focus on all grammatical forms from early on, although they used different textbooks. The L1 English group used Abboud et al. (1983, 1997) and the L1 Spanish group mainly used, though not exclusively, Alkhalifa (1999, 2002). In addition, six (educated) native speakers of Arabic were invited to participate as a control group. The native speakers were from different Arabic-speaking countries (including Egypt, Jordan, Palestine, Syria,
NULL SUBJECTS USE IN ARABIC L2 ACQUISITION
229
and Tunisia) with an age range of 25-37 and means of 32. They were all graduate students pursuing different programs at a U.S. university, including bio-chemistry, computer science, education, geology, industrial engineering and mathematics. 4.2 Target forms The focus of the present study with respect to the NSP is on the canonical feature of the parameter, empty/null subjects. The investigation is restricted to analyzing use and acquisition of null subjects in matrix sentences/clauses. In addition, to make the comparison across all three proficiency levels possible (and the forms accessible to all participants, especially those in the beginner groups), the investigation is restricted to 3rd person singular masculine and feminine contexts. Sentences (7)-(9) below are examples of null subject contexts in the target (Arabic) L2 language, null subject and non-null subject contexts in the L1 backgrounds (Spanish and English) of the participants. (7) a. ;akal-at ate-3SF “She ate.” b. hiya ;akal-at she ate-3SF “She ate.” c. ;akala ate.3SM “He ate.” d. huwa ;akala he ate.3SM “He ate.” (8) a. Com-ió ate-3S “S/he ate.” b. Ella com-ió she ate-3S “She ate.” c. Él com-ió he ate-3S “He ate.”
(Arabic)
(Spanish)
230
MOHAMMAD T. ALHAWARY
(9) a. *Ate b. He ate. c. She ate.
(English)
The examples above illustrate that while Arabic and Spanish are null subject languages, English is a non-null subject language. In this paper, I follow the standard Minimalist assumption of attributing parametric variation to strength of functional features. On this account, due to their rich verbal agreement features, both Arabic and Spanish are analyzed with the functional feature strength set to [+strong] while the functional feature strength in English is set to [-strong]. Accordingly, the typological constellation of the target and source languages of the participants reveals the pairings in (10)-(11): (10) English participants who are speakers of a [-null] and [-strong] L1, learning a [+null] and [+strong ] L2 (11) Spanish participants who are speakers of a [+null] and [+strong] L1, learning a [+null] and [+ strong] L2.
For an analysis, I adopt here, rather loosely, Speas’ (1994/2006) formulation of null subject and non-null subject clause structure illustrated in 6 (section 2) above, with Arabic and Spanish belonging to the Italian-type languages allowing null subjects. Although Chomsky (1995) abandoned Pollock’s (1989) Split-INFL hypothesis in favor of a single TP projection, it may be more plausible to maintain an independent AGRP projection as in Speas (1994/2006). This seems to have support on independent grounds from L1 acquisition data where in German L1 (e.g., Poeppel & Wexler 1993, Clahsen et al. 1994, Meisel 1994) and French L1 (e.g., Pierce 1992, Déprez & Pierce 1993) children acquire the Tense properties (finite and nonfinite) before age 2 whereas they acquire subject verb agreement between the ages of 2 and 3, indicating that these children initially project a “more basic” TP phrase before they do the “more complex” AGRP projection (Griffin 2003:20).4 4
Griffin (2003) claims further that adopting Speas’ analysis additionally allows simplification in the theory and provides independent support for Hornstein’s =(1999) claim that PRO subjects and the theory of control may be eliminated and subsumed under trace theory.
NULL SUBJECTS USE IN ARABIC L2 ACQUISITION
231
However, I depart from Speas’ distinction of the two types of verbal affixes: in null subject languages verbal affixes are base generated in the head position of AGRP (as each agreement morpheme have its own independent lexical entry) and in non-null subject languages verbal affixes do not have independent lexical entries and are base generated on the verbal stem (i.e., part of the inflectional paradigm). Verbal affixes in Arabic are not quite separable from the stem due to its circumfixal nature. Therefore, I assume here that the verbal affixes are base generated on the verb stem (see also Benmamoun 2000, Shlonsky 1997; cf. Fassi Fehri 1993) but that the verb and pro raise within an AGRP projection for feature checking. I assume, however, following Speas, that in English (as a non-null subject language) Spec-AGRP must be filled (by a non-null content) prior to Spellout. I leave aside the notion of the exact licensing condition for null subjects (responsible for the distinction between null subject and non-null subject languages) as no syntactic analysis to date seems to reconcile the counter examples (see footnote 2) unless we propose that such a condition may lie outside the realm of syntax (see Cole 2000 for such a proposal). 4.3. Research questions The present study attempts to address the following questions. Do Arabic L2ers, who are speakers of null subject and non-null subject languages (such as Spanish and English, respectively), manage to re/set the value of NSP in their L2 correctly and at what stage? Do such L2ers exhibit L1 transfer in their IL systems? Is there any relationship between the development of null subjects parameter and development of verbal agreement inflection? 4.4 Data collection and coding Data collection aimed at eliciting semi-spontaneous production data of the target forms from the L1 English, L1 Spanish and the control participants. Elicitation took place in one-on-one interview sessions, one interview per participant (30-45 minutes). Elicitation consisted of four narrative tasks (see below). The data were transcribed and coded. Certain items were not coded. These included hesitations and selfcorrections except the last attempt. In addition, a small number of null verb contexts were produced by both L1 English and L1 Spanish
232
MOHAMMAD T. ALHAWARY
participants where a pronominal subject was produced (without a verb) followed by a long pause or a verbal noun. These tokens were too few and were not included in the analysis. In coding subject-verb agreement tokens, agreement was determined by considering the verbal form and whether it is inflected properly, not by identifying first the subject then the verb it agrees with. This is significant, since the verb may agree with a discourse referent subject and the L2er participants may be mindlessly producing the wrong subject, especially when the subjects used are the pronouns hiya “she” and huwa “he” which are close in their pronunciation (see also Poeppel & Wexler 1993, Prévost & White 2000; Cf. Meisel 1991). 4.4.1 Narrative tasks Four narrative tasks were used for the purpose of elicitation: two in the past tense and two in the present tense divided equally between a female and male character. The two narratives in the past tense were each on a female and a male character. The participants were requested to narrate the planned vacation activities (on a calendar) carried out by each character during their vacation (which each took the previous month for a period of 10 days) day by day. As a distracter, the participants were asked to figure out and to comment on whether or not the male and female characters made a compatible couple based on what they did in their vacation. The two narratives in the present tense were also each about a female and male character. The participants were requested to describe the daily routines of each of the characters at different times of the day. The participants were asked to figure out where the character was from based on his/her routine activities—as a distracter. The two sets of narratives (the past vs. the present) were not presented sequentially. Rather, the past set was presented towards the middle of the interview and the present towards the end, with tasks of other (unrelated) structures used in the beginning of the first and second half of the interview, serving as distracters. The purpose of each of the four tasks was to observe how the participants described the first event as well the subsequent events in each narrative. In Arabic and Spanish, one would expect normally only the first event to be described with a verb with an optional overt pronominal or lexical subject (depending on the participants’ assumption of shared knowledge with the interviewer about the
NULL SUBJECTS USE IN ARABIC L2 ACQUISITION
233
character in question), whereas all subsequent events need not be described with an overtly expressed subject. This is not what one would expect in English where both the first verb as well as all subsequent verbs must occur with an overt subject. Thus, in English a past narrative would proceed as follows: “On Thursday, the man/he traveled to Alaska; on Friday he drank coffee; on Thursday he went to the Zoo,” and so on. 4.5 Results 4.5.1 Null subjects production Given the description of the four elicitation tasks above, one would expect each of the six native (control) speakers to produce contexts with overt subjects between 0-4 tokens, with one token for each task. The control participants produced tokens within the predicted range. All six native Arab speakers produced 11 contexts with overt subjects (4 pronominal and 7 lexical) and all except one occurred as a description of the first event of a narrative. Apart from one token, all contexts occurred with the overt subjects in a preverbal position. Only one control participant produced zero overt subject contexts. In addition, the control group occasionally produced (obligatory) overt pronominal subjects in lower or embedded clause with verbs such as yabduu “it seems”, ka;anna “looks like”, yumkin “it is possible”, and the complementizer ;anna as illustrated in (12). (12) ya-bduu ;anna-hu šariba qahwa 3SM-seem that-he drank.3SM coffee “It seems that he drank coffee.”
The control participants produced a total of 24 such tokens in the middle of the narratives. In comparison, the L1 English and L1 Spanish participants produced mainly null subject and overt subject contexts in matrix clauses. They also produced null subjects and overt lexical and pronominal subjects in conjoined contexts. Tokens of both contexts were collapsed together here. All overt subject contexts were produced with the subjects in a preverbal position. The vast majority of overt subjects produced were pronominal subjects. Of all non-native participants, only two Spanish participants (in Group 3) produced a
234
MOHAMMAD T. ALHAWARY
total of four contexts (with the complementizer li;anna) of overt pronominal subjects in embedded clause. All four tokens were correctly used with the obligatorily expressed pronominal subject on a par with those produced by the control group. The data reveal that the L1 English participants generally produced noticeably more subjectless clauses than did their L1 Spanish counterparts. In particular, the L1 Spanish beginning participants (Group 1) produced a total of 55 null subject tokens, whereas their English counterparts (Group 1) produced as many as 280 tokens—a little over 5 times the number of their Spanish counterparts. Table 2 lists the distribution of null subjects use in the participants’ IL systems. Null/All Subjects 264/275
% 96
Lexical/Pronominal Subjects 7/4
Controls (n=6) Arabic L2 (English L1) Group 1 (n=9) 280/314 89 9/25 Group 2 (n=9) 353/388 91 11/24 Group 3 (n=9) 303/385 79 18/64 Arabic L2 (Spanish L1) Group 1 (n=9) 55/200 28 18/127 Group 2 (n=9) 156/256 61 10/90 Group 3 (n=9) 232/371 63 9/130 Table 2. Distribution of null subjects in the participants’ IL systems
The distribution of the data presented in Table 2 can be illustrated more visually in Figure 1 via Boxplot (below). Thus, in addition to producing fewer null subjects than their L1 English counterparts, the L1 Spanish participants seem to exhibit a greater degree of variability in their production of null subjects as evident in the spread-out clustering of the medial (2nd and 3rd) quartiles. By contrast, the null subjects produced by the L1 English participants exhibit much less variability as evident in the clustering of their medial quartiles within the 80% range. The performance of the L1 English participants seems to resemble closely that of the control group whose medial quartiles cluster a little more consistently and higher within the 90 % range.
NULL SUBJECTS USE IN ARABIC L2 ACQUISITION
235
Figure 1.
One-way and two-way MANOVA tests revealed a significant effect for L1 backgrounds.5 In particular, ANOVA follow-up tests to the MANOVA revealed a significant difference between the control group and L1 English participants, on one hand, and the L1 Spanish Participants on the other (F(2,53) = 15.309, p<.001) and no significant difference between the control group and the L1 English participants. A marginal significant effect for interaction between L1 and proficiency was also found (F(2,53) = 2.674, p = .078). In other words, the L1 English participants dropped significantly more subjects than their Spanish counterparts did. Additionally, although the L1 Spanish participants seemed to start dropping subjects conservatively at the early stage of acquisition, in contrast to their English counterparts, they dropped more subjects as they progressed in 5
Conducted with verbal agreement (see below) scores as another independent variable.
236
MOHAMMAD T. ALHAWARY
their Arabic L2 proficiency. 4.5.2. Null subjects and verbal agreement When we examine development of verbal inflection with respect to production of null subjects, the data become more revealing of the participants’ use of null subjects. Table 3 displays the distribution of correct rule application of verbal agreement in null and overt subject contexts. Correct Agreement (Null subjects) 264/264
Ratios % 100
Correct Agreement (Overt Subjects) 11/11
Controls (n=6) Arabic L2 (English L1) Group 1 (n=9) 246/280 88 31/34 Group 2 (n=9) 283/353 80 24/35 Group 3 (n=9) 276/303 91 73/82 Arabic L2 (Spanish L1) Group 1 (n=9) 23/55 43 102/145 Group 2 (n=9) 111/156 71 68/100 Group 3 (n=9) 160/232 69 107/139 Table 3. Distribution of verbal agreement in the participants’ IL systems
Ratios % 100
91 68 89
70 67 77
The distribution of the data presented in Table 3 can be illustrated more visually with respect to correct rule application of verbal agreement as in Figure 2. The L1 English groups overall produced far more subjectless sentences than they did sentences with overt (lexical and pronominal) with high ratios of correct subject-verb agreement (88%, 80% and 91%). The L1 Spanish groups exhibit a contrasting pattern. The beginning group (Group 1) produced far fewer sentences with null subjects than they did sentences with overt subjects and exhibited a low ratio of subject verb agreement in null subject contexts (43%). Unlike Group 1, the intermediate group (Group 2) produced about three times the number of null subject contexts (156) with a noticeably higher correct verbal agreement ratio (71%). The advanced group (Group 3), produced far more tokens of both null subjects and overt subjects than Groups 1-2 did (232 and 139, respectively) and
NULL SUBJECTS USE IN ARABIC L2 ACQUISITION
237
exhibited a higher correct subject verb agreement ratio than Group 1 did (69% and 77%, respectively).
Figure 2.
As indicated above, one-way and two-way MANOVA tests revealed a significant effect for L1 backgrounds. In addition, ANOVA follow-up tests to the MANOVA revealed a significant difference between the control group and the L1 English participants on one hand and the L1 Spanish Participants on other with respect to correct verbal agreement in null subject contexts (F(2,53) = 10.524, p<.001). A near significant effect for interaction between L1 and proficiency with respect to correct verbal agreement in null subject contexts was also found (F(2,53) = 3.212, p = .048). No significant effect was found for verbal agreement in overt subject contexts. To summarize the findings of the present study, the data reveal a close correlation between the production of subjectless clauses and subject verb agreement. Whereas the L1 English groups produced a
238
MOHAMMAD T. ALHAWARY
good number of subjectless sentences (280, 353 and 303) and from the beginning stage of L2 acquisition exhibited (through Group 1) a high rate of correct verbal agreement production (at 88%), the L1 Spanish groups produced a far fewer number of sentences with null subjects (55, 156 and 232) with much lower ratios of correct subject agreement, especially by Group 1 (at 43%). However, the L1 Spanish groups (2-3) exhibit an increase in both correct verbal agreement ratios and null subject production. 5. Discussion and Conclusion To answer the research questions, perhaps the clearest evidence offered in the present study is that pertaining to the existence of a contingent relationship between the development null subjects and development of verbal agreement morphology. On one hand, we find that the L1 English participants produced a good number of null subjects from early on and with generally high correct subject-verb agreement ratios. The L1 Spanish participants, on the other hand, produced far fewer null subjects than their English counterparts, but with low ratios of correct subject verb agreement. This is most evident in the case of Group 1. By contrast, we find that the participants in higher proficiency levels (Groups 2-3) exhibit a noticeable increase in null subjects production but only as their accuracy of subject-verb agreement production increases at the same time. Thus, it may be safe to conclude, given the data of the present study, that there is a correlation between the development of null subjects and verbal agreement morphology in Arabic. As for the question whether or not the participants of the study can re/set the parameter of their Arabic L2, the findings show that both L1 background participants seem to have managed to set the parameter value to [+null] and seem to have managed to adjust to the grammar of null subjects in Arabic. Surprisingly, the L1 English participants, who are speakers of a non-null subject language, seem to have more readily and more easily adjusted to the L2 system than their L1 Spanish counterparts, who are speakers of a null subject language. The third question to do with the notion of L1 transfer should be addressed with caution here in part due to the fact that the data of the present study are cross-sectional and not longitudinal in nature. On one hand, no strong case of L1 transfer can be made with respect to the L1
NULL SUBJECTS USE IN ARABIC L2 ACQUISITION
239
English participants even though they did produce quite a number of clauses with subjects, since their production is characterized by prevalent use of null subject contexts from early on. The nature of their L1 [-null] is quite unlike the nature of their L2 production characterized as [+null]. Similarly no strong case for L1 transfer can be made for the L1 Spanish participants. The L1 Spanish participants produced subjectless sentences conservatively. Notwithstanding the conclusions reached thus far, the question remains as to why the L1 Spanish participants took longer than their L1 English counterparts to set or adjust the parameter of the L2 value at [+null] even though their L1 value is [+null] and even though the value of the NSP of the L1 English participants is [-null]. Answering this intriguing question can perhaps at the same time address the question as to why the L1 Spanish participants exhibited L1 transfer rather conservatively. From the data, it is evident that the L1 English participants were able from early on to figure out from input that Arabic allows null subjects. Indeed, null subject use is prevalent in the input (Abboud et al. 1983, 1997). The participants also seem to have established an association between verbal agreement and null subjects as evident in the correlation between their high scores on both null subjects and subject-verb agreement from an early stage of their Arabic L2 acquisition. Accordingly, all they have to do is pay attention to verbal inflection and simply drop the subject. Two additional factors may make this task easy for them. First, Arabic exhibits a functional projection similar to their own, albeit verb raising in Arabic occurs overtly rather than covertly (as in English) at LF. Second, subjects occur in a perceptually salient position within word order (see also Slobin 1973, Corder 1978). A similar observation related to the notion of salience and word order is reported in Alhawary (2005) where L1 English speakers learning Arabic as an L2 achieved a near perfect mastery of Arabic noun-adjective word order from the earliest stage of acquisition even though the order in their L1 is, in fact, the opposite: adjective-noun. As for the L1 Spanish participants, we can at least speculate on two possibilities to help explain the conservative production of null subjects, particularly those at the beginner level (Group 1). One possibility is that the L1 Spanish participants may have used
240
MOHAMMAD T. ALHAWARY
(pronominal) subjects as a processing strategy to gain time to retrieve the verb form with the proper agreement inflections (see Liceras et al. 1997 for a similar explanation), hence their conservative dropping of subjects. However, a more plausible scenario is that it may be the case that the L1 Spanish participants figured out the null subject feature of Arabic and preferred not to drop the subjects, since they did not master Arabic verbal agreement and therefore wanted to ensure the recoverability of the content of subjects by simply not dropping them. This explanation remains sketchy, since null subjects investigated in this study include only third person singular masculine and feminine, both of which are not marked differently in Spanish. Future research should include other forms in the verbal agreement paradigm. Future research should also examine use of overt pronominal subjects in embedded clause. Additionally, since a period of three years of formal instruction, such as that of Group 3, is hardly sufficient to attain a nearnative status, future research should include participants at higher advanced levels to examine the near-native status of NSP in such L2ers (see Sorace 2003). To conclude, the findings of the present study reveal that the production of null subjects already emerged in the L1 English participants’ IL systems from the early beginning stage of acquisition. They seem to have already adjusted their IL systems to the grammar of L2 with null subjects and seem to have little difficulty with the structure even though their L1 does not allow null subjects. What is additionally noticeable is that the high ratios of null subjects production correlates with their high accuracy rates of subject-verb agreement. By contrast, surprisingly, null subjects did not emerge in all of the L1 Spanish participants’ IL systems in the beginning level and produced a low number of subjectless sentences even though their L1 allows null subjects much like Arabic. One would expect that they would produce a higher number of clauses with null subjects, allowing for the role of L1 transfer in L2 acquisition, given both L1 and L2 exhibit the same phenomenon. However, as they progress in their Arabic L2 proficiency, they start (in the intermediate and advanced levels) to produce far more instances of subjectless sentences and only as their accuracy rates of subject verb agreement in null subject contexts increases. Accordingly, the data of both L1 background groups seem to indicate a close correlation between development of null subjects and development of
NULL SUBJECTS USE IN ARABIC L2 ACQUISITION
241
verbal agreement inflection. Both seem to be aware that when dropping null subjects they must ensure the recoverability of the content of the subjects; otherwise, they opt not to drop the subjects. This finding of a contingent relationship between the development of null subject and verbal inflection is similar to the observation reported in L1 (e.g., Deprez & Pierece 1993, Lebeaux 1988, Rizzi 1998, Roeper & Rohrbacher 2000) and L2 acquisition where the NSP can be reset (e.g., Vainikka & Young-Scholten 1994, 1996; Al-Kasey & Pérez-Leroux 1997).
REFERENCES Abboud, Peter, Zaki N. Abdel-Malek, Najm A. Bezirgan, Wallace M. Erwin, Mounah A. Khouri, Ernest N. McCarus, Raji M. Rammuny & George N. Saad. 1983. Elementary Modern Standard Arabic vol. 1. Cambridge: Cambridge University Press. Abboud, Peter, Aman Attieh, Ernest N. McCarus & Raji M. Rammuny. 1997. Intermediate Modern Standard Arabic. Ann Arbor, MI: Center for Middle Eastern & North African Studies, University of Michigan. Alhawary, Mohammad T. 2005. “L2 Acquisition of Arabic Morphosyntactic Features: Temporary or permanent impairment”. Perspectives on Arabic Linguistics 17-18 ed. by Mohammad T. Alhawary & Elabbas Benmamoun, 273-312. Amsterdam & Philadelphia: John Benjamins. Al-Kasey, Tamara & Ana Teresa Pérez-Leroux. 1998. “Second Language Acquisition of Spanish Null Subjects”. The Generative Study of Second Language Acquisition ed. by Suzanne Flynn, Gita Martohardjono & Wayne O’Neil, 161-185. Mahwah, NJ: Erlbaum. Alkhalifa, Waleed Saleh. 2002. Curso Práctico de Lengua Árabe II. Madrid: Editorial Ibersaf. ____. 1999. Curso Práctico de Lengua Árabe I. Madrid: Dar Alwah. Aronoff, Justin M. 2003. “Null Subjects in Child Language: Evidence for a performance account”. Proceedings of the 22nd West Coast Conference on Formal Linguistics ed. by Gina Garding & Mimu Tsujimura, 43-55. Somerville, MA: Cascadilla Press. Beck, Maria-Luise. 1998. Morphology and Its Interfaces in Second Language Knowledge. Amsterdam & Philadelphia: John Benjamins. Benmamoun, Elabbas. 2000. The Feature Structure of Functional Categories. New York & Oxford: Oxford University Press. Bloom, Paul. 1990. “Subjectless Sentences in Child Language”. Linguistic Inquiry 21.491-504. Bromberg, Hilary Sara & Kenneth Wexler. 1995. “Null Subjects in WhQuestions”. Papers on Language Processing and Acquisition ed. by Carson T. Shütze, Jennifer B. Ganger & Kevin Broihier. MIT Working Papers in Linguistics 26.221-247.
242
MOHAMMAD T. ALHAWARY
Burmeister, Hartmut & Patricia L. Rounds. 1990. Variability in Second Language Acquisition: Proceedings of the Tenth Meeting of the Second Language Research Forum. Eugene: Department of Linguistics, University of Oregon. Chomsky, Noam. 1995. “Categories and Transformations”. The Minimalist Program, 219-394. Cambridge, MA: MIT Press. ____. 1993. “A Minimalist Program for Linguistic Theory”. The View from Building 20 ed. by Ken Hale & Samuel J. Keyser, 1-52. Cambridge, MA: MIT Press. ____. 1981. Lectures on Government and Binding. Dordrecht: Foris. Clahsen, Harald & Upyong Hong. 1995. “Agreement and Null Subjects in German L2 Development: New evidence from reaction-time experiments”. Second Language Research 11.5 7-87. Clahsen, Harald, Martina Penke & Teresa Parodi 1994. “Functional Categories in Early Child Grammar”. Language Acquisition 3.395-429. Cole, Melvyn. 2000. The Syntax, Morphology, and Semantic of Null Subjects. Ph.D. dissertation, University of Manchester. Corder, S. Pit. 1978. “Simple Codes and the Source of the Second Language Learner’s Initial Heuristic Hypothesis”. Studies in Second Language Acquisition 1:1.1-10. Davies, William D. 1996. “Morphological Uniformity and the Null Subject Parameter in Adult SLA”. Studies in Second Language Acquisition 18.475493. Deprez Viviane & Amy Pierece. 1993. “Negation and Functional Projections in Early Grammar”. Linguistic Inquiry 24.25-67. Ellis, Rod. 1991. “Grammaticality Judgments and Second Language Acquisition”. Studies in Second Language Acquisition 13.161-186. ____. 1990. “Grammaticality and Judgments and Learner Variability”. Burmeister & Rounds 1990. 25-60. Eubank, Lynn. 1991. Point–Counterpoint: Universal grammar in the second language. Amsterdam & Philadelphia: John Benjamins. Fassi Fehri, Abdelkader. 1993. Issues in the Structure of Arabic Clauses and Words. Dordrecht: Kluwer. Goss, Nancy, Zhang Ying-Hua & James Lantolf. 1994. “Two Heads Better than One: Assessing mental activities in L2 grammatical judgments”. Research Methodology in Second Language Acquisition ed. by Elaine E. Tarone, Susan M. Gass & Andrew D. Cohen, 263-286. Mahwah: Erlbaum. Griffin, William Earl. 2003. “The Split-INFL Hypothesis and AgrsP in Universal Grammar”. The Role of Agreement in Natural Language/ Proceedings of the Fifth Annual Texas Linguistics Society ed. by William Earl Griffin, 1324. Texas Linguistics Forum 53 [http:uts.cc.utexas.edu~tls/2001tls/2001 proceeds.html] Hilles, Sharon. 1991. “Access to Universal Grammar in Second Language Acquisition”. Eubank 1991. 305-338. ____. 1986. “Interlanguage and the Pro-drop Parameter”. Second Language Research 2.33-57. Hoekstra, Teun & Bonnie D. Schwartz. 1994. Language Acquisition Studies in Generative Grammar: Papers in Honor of Kenneth Wexler from the 1991 GLOW Workshops. Amsterdam & Philadelphia: John Benjamins. Hornstein, Norbert. 1999. “Movement and Control”. Linguistic Inquiry 30.69-96. Hyams, Nina. 1994. “VP Null Arguments and COMP Projections”. Hoekstra &
NULL SUBJECTS USE IN ARABIC L2 ACQUISITION
243
Schwartz 1994. 21-55. ____. 1987. “The Theory of Parameters and Syntactic Development”. Roeper & Williams. 1-22. _____. 1986. Language Acquisition and the Theory of Parameters. Dordrecht: Reidel. Jaeggli, Osvaldo & Kenneth Safir. 1989. “The Null Subject Parameter and Parametric Theory”. The Null Subject Parameter ed. by Osvaldo Jaeggli & Nina Hyams, 1-44. Dordrecht: Kluwer. Jaeggli, Osvaldo & Nina Hyams. 1988. “Morphological Uniformity and the Setting of the Null Subject Parameter”. Northeastern Linguistic Society 18.238-252. Lakshmanan, Usha. 1991. “Morphological Uniformity and Null Subjects in Child Second Language Acquisition”. Eubank 1991. 389-410. Lantolf, J. 1990. “Resetting the Null Subject Parameter in Second Language Acquisition”. Burmeister & Rounds 1990. 429-452. Lebeaux, David. 1988. Language Acquisition and the Form of the Grammar. Ph.D. dissertation, University of Massachusetts, Amherst. Liceras, Juana M. 1989. “On Some Properties of the ‘Pro-drop’ Parameter: Looking for missing subjects in non-native Spanish”. Linguistic Perspectives on Second Language Acquisition ed. by Susan M. Gass & Jacquelyn Schachter, 109-133. Cambridge: Cambridge University Press. ____. 1988. “Syntax and Stylistics: More on the pro-drop parameter”. Learnability and Second Languages ed. by James Pankhurst, Michael Sharwood Smith & Paul Van Buren, 71-93. Dordrecht: Foris. ____, Denyse Maxwell, Biana Laguardia, Zara Fernández & Raquel Fernández. 1997. “A Longitudinal Study of Spanish Non-native Grammars: Beyond parameters”. Contemporary Perspectives on the Acquisition of Spanish, Vol. 1: Developing grammars ed. by Ana Teresa Pérez-Leroux & William R. Glass, 99-132. Somerville: Cascadilla Press. ____, Lourdes Díaz & Denyse Maxwell. 1999. “Null Subjects in Non-native Grammars: The Spanish L2 of Chinese, English, French, German, Japanese and Korean speakers ”. The Development of Second Language Grammars ed. by Elaine C. Klein & Gita Martohardjono, 109-146. Amsterdam & Philadelphia: John Benjamins. ____& Lourdes Díaz. 1998. “On the Nature of the Relationship between Morphology and Syntax: Inflectional typology, f-features and null/overt pronouns in Spanish interlanguage”. Maria-Luise Beck 1998. 307-338. Meisel, Jurgen M. 1994. “Getting FAT: Finiteness, agreement and tense in early grammars”. Bilingual First Language Acquisition: French and German grammatical development ed. by Jurgen M. Meisel, 89-129. Amsterdam & Philadelphia: John Benjamins. ____. 1991. “Principles of Universal Grammar and Strategies of Language Use: On some similarities and differences between first and second language acquisition”. Eubank 1991. 231-271. O’Grady, William. 1997. Syntactic Development. Chicago, IL: University of Chicago Press. Pierce, Amy E.1992. Language Acquisition and Syntactic Theory: A comparative analysis of French and English child grammars. Dordrecht: Kluwer. Phinney, Marianne. 1987. “The Pro-drop Parameter in Second Language Acquisition”. Roeper & Wlliams 1987. 221-238. Poeppel, David & Kenneth Wexler. 1993. “The Full Competence Hypothesis of
244
MOHAMMAD T. ALHAWARY
Clause Structure in Early German”. Language 69.1-33. Pollock, Jean-Yves. 1989. “Verb Movement, Universal Grammar, and the Structure of IP”. Linguistic Inquiry 20.365-424. Prévost, Philippe & Lydia White. 2000. “Missing Surface Inflection or Impairment in Second Language Acquisition? Evidence from tense and agreement”. Second Language Research 16:2.103-133. Radford, Andrew. 1990. Syntactic Theory and the Acquisition of English Syntax. Oxford: Blackwell. ____. 1988. “Small Children’s Small Clauses”. Transactions of the Philological Society 86.1-43. Rizzi, Luigi. 1998. “Remarks on Early Null Subjects”. Proceedings of the 22nd Annual Boston University Conference on Language Development ed. by Annabel Greenhill, Mary Hughes, Heather Littlefield & Hugh Walsh, 1438. Somerville, MA: Cascadilla Press. ____. 1993/1994. “Some Notes on Linguistic Theory and Language Development: The case of Root Infinitives”. Language Acquisition 3:4.371-393. ____. 1986. “Null Objects in Italian and the Theory of Pro”. Linguistic Inquiry 17.501-557. Roeper, Thomas & Bernard Rohrbacher. 2000. “Null Subjects in Early Child Language and the Economy of Projection”. The Acquisition of Scrambling and Cliticization ed. by Susan M. Powers & Camelia Haaman, 345-396. Dordrecht: Kluwer. ____ & Edwin Williams. 1987. Parameter Setting. Dordrecht: Reidel. Sauter, Kim. 2002. Transfer and Access to Universal Grammar in Adult Second Language Acquisition. Ph.D. dissertation, University of Groningen. Shlonsky, Ur. 1997. Clause Structure and Word Order in Hebrew and Arabic. New York & Oxford: Oxford University Press. Slobin, Dan I. 1973. “Cognitive Prerequisites for the Development of Grammar”. Studies of Child Language Development ed. by Charles A. Ferguson & Dan I. Slobin, 175-208. New York: Holt. Sorace, Antonella. 2003. “Near-Nativeness”. The Handbook of Second Language Acquisition ed. by Catherine J. Doughty & Michael H. Long, 130-151. Oxford: Blackwell. Speas, Margaret. 1994/2006. “Economy, Agreement, and the Representation of Null Arguments”. Ms. University of Massachusetts, Amherst. [Available at: http://www.umass.edu/linguist/people/faculty/speas/prodrop.pdf] Agreement and Argument Structure ed. by Peter Ackema. To appear. Sprouse, Rex A. 1998. “Some Notes on the Relationship Between Inflectional Morphology and Parameter Setting in First and Second Language Acquisition”. Morphology and Its Interfaces in Second Language Knowledge ed. by Maria-Luise Beck, 41-67. Amsterdam & Philadelphia: John Benjamins. Tsimpli, Ianthi-Maria & Anna Roussou. 1991. “Parameter-resetting in L2?”. UCL Working Papers in Linguistics 3.149-170. Vainikka, Anne, & Martha Young-Scholten. 1996. “Gradual Development of L2 Phrase Structure”. Second Language Research 12.7-39. ____. 1994. “Direct Access to X’-theory: Evidence from Korean and Turkish Adults Learning German”. Hoekstra & Schwartz 1994. 265-316. Wexler, Kenneth. 1994. “Optional Infinitives, Head Movement and the Economy of Derivations”. Verb Movement ed. by David Lightfoot & Norbert
NULL SUBJECTS USE IN ARABIC L2 ACQUISITION
245
Hornstein, 305-350. Cambridge, MA: Cambridge University Press. White, Lydia. 1986. “Implications of Parametric Variation for Adult Second Language Acquisition: An investigation of the ‘pro-drop’ parameter”. Experimental Approaches to Second Language Acquisition ed. by Vivian Cook, 55-72. Oxford: Pergamon. ____. 1985. “The Pro-drop Parameter in Adult Second Language Acquisition”. Language Learning 35.47-62.
LINGUISTIC DIVERSITY THE QAAF ACROSS ARABIC DIALECTS1
Maher Bahloul American University of Sharjah
1. Introduction Previous analyses have examined the dialectal variation of the Arab world within a particular regional dialect or a few closely related or distinct dialects (i.e., Holes 1990 on Gulf Arabic, Holes 2001, 2005, 2006 on Bahraini Arabic; Abu-haidar 2006 on Baghdad Arabic; Rice & Sa’id 2005 on Eastern Arabic, Al-Mutlabī 1978 on Tamīmī Arabic2; Darwish 2005 on Syrian Arabic; Heath 2002 on Moroccan Arabic; Qafisheh 1996 on Gulf Arabic; Talmoudi 1981 on Tunisian Arabic; Marçais 1977 on Maghrebi Arabic; Ayoub 1968 on Egyptian and Iraqi Arabic; Fleisch 1974 on Lebanese Arabic; Naïm 2006 on Beirut Arabic; Abdel-Jawad, 1981, 1986 on Jordanian Arabic; Ingham 1995 on Najdi Arabic; Haeri 1996 on Cairene Arabic; Brustad 2000 on Moroccan, Egyptian, Syrian, and Kuwaiti Arabic, Hamid (1984) on Sudanese Arabic, Murtaađ (1981) on Algerian Arabic, among many others.3) But rare are the studies that examine dialectal variation across all or the majority of Arabic dialects.4 The current study 1
I wish to thank all those who participated in the survey and the colleagues who gave constructive feedback, especially Clive Holes at “The Third International Conference on Middle Eastern & North African popular Culture” held at the American University of Sharjah, January 2004, and others at the Nineteenth Arabic Linguistic Symposium, held at the University of Illinois, Urbana-Champaign in April 2005. Special thanks and gratitude to the colleagues who accepted to testify and record their testimonies, Drs. Ahmad Al-Issa, American University of Sharjah; Kamal Abdel-Malek, American University of Sharjah; Monther Younes, Cornell University; Fatima Sadiqi, University of Fez; and Salem Ghazali, Institut Supérieur des Langues de Tunis. 2 Arabic variety spoken in the eastern areas of Saudi Arabia, especially within the Najdi region. 3 For brief overviews of late nineteenth- and early twentieth-century works on Arabic dialects, see Ingham (1982:4-5) and Kāmil (1968). 4 Kaye & Rosenhouse (1997) examine differences and similarities between more than thirty Arabic dialects including what they consider ‘peripheral’ dialects such as Uzbeki,
248
MAHER BAHLOUL
contributes to filling the gap and examines the dialectal variation in conjunction with the geographic structure of eighteen Arabic dialects from Morocco to Yemen on a West-East axis.5 On the basis of a survey and current Arabic dialectal literature, the paper investigates the geographical distribution of the reflexes of the voiceless uvular phoneme [q] qaaf variants within and across dialects attempting to trace its geo-dialectal map. The results of the survey show the phoneme to be a salient isogloss with a large number of reflexes counting those of core and peripheral dialects. Variants of core dialects suggest a dissection of the Arabic-speaking communities into five major regions cutting across boundaries of current national entities. Each region is investigated and similarities and differences between all five regions are also highlighted. 1.1 Overview of the problem Classical Arabic, Quranic Arabic, and Modern Standard Arabic share a number of linguistic features, amongst which figures the voiceless uvular stop qaaf as part of the language alphabet and phonology in written and oral media.6 In other words, pre-Islamic, post-Islamic and current writings exhibit the use of the letter qaaf. On the other hand, a number of Arabic dialects preserved the use of the phoneme qaaf in their day-to-day speech, others make use of a number of substitutes,7 such as the glottal stop [;] and the voiced velar stop [g]. These substitutes are observed within a particular dialect and across dialects. The examples in (1) and (2) illustrate such variation. (1) a. b. c. d.
qаal ()ﻗﺍﻞ gεεl ()ﭬﺍﻞ ;εεl ()ﺁﻞ kεεl ()ﮐﺍﻞ
‘said’ ‘said’ ‘said’ ‘said’
(2) a. b. c. d.
qаlb ()ﻗﻠﺐ gаlb ()ﭬﻠﺐ ;аlb ()ﺃﻠﺐ kаlb ()ﮐﻠﺐ
‘heart’ ‘heart’ ‘heart’ ‘heart’
Maltese, Juba, Afghan, and the extinct Andalusian, among others. Despite its brevity, their treatment of the qaaf is quite informative. In addition, their findings of the major variants of the above sound is confirmed by this study. 5 Arabic dialects of Mauritania, Western Sahara, Somalia, Djibouti, and Comoros are not included in this study for lack of informants. 6 Modern Standard Arabic is primarily a written medium. However, educated Arab speakers make use of its oral version in a number of official contexts such as educational, political, media, journalistic, and religious settings. 7 It is worth noting here that the terms ‘substitute, variant, and reflex’ are used in this study interchangeably with no theoretical implications. The literature includes different meta-terms depending on one’s framework. Thus Trubezkoy (1969) uses the term ‘merger’ for the variants while Kaye (1997) considers them ‘reflexes’ of the Old Arabic qaaf.
LINGUISTIC DIVERSITY: QAAF ACROSS ARABIC DIALECTS
e. f.
čεεl ()ﺗﺸﺍﻞ ġаal ()ﻏﺍﻞ
‘said’ ‘said’
e. f.
čаlb ()ﺗﺸﻠﺐ ġаlb ()ﻏﻠﺐ
249
‘heart’ ‘heart’
The examples in (1) and (2) show the verb qaal (‘ )ﻗﺍﻞsay’ and the noun qalb (‘ )ﻗﻠﺐheart’ pronounced in six different ways. The main difference involves the uvular stop qaaf pronounced as a uvular stop [q] in (1a) and (2a), a voiced velar stop [g] in (1b) and (2b), a voiceless glottal stop [;] in (1c) and (2c), a voiceless velar stop [k] in (1d) and (2d), a voiceless alveopalatal affricate [č] in (1e) and (2e), and a voiced uvular fricative [ġ] in (1f) and (2f). Arabic dialects appear to differ according to the one or more variants utilized in day-to-day speech. In Qatari Arabic, for example, the voiced velar stop variant [g] is used as in (1b) and (2b) above and in a consistent manner with words exhibiting the phoneme qaaf in Standard Arabic. As such, it replaces the qaaf in most if not all lexical items independently of its phonological environment (see Abdel-Jawad 1981: 172). The examples in (3a)-(3e) illustrate such pervasive use. (3) a. b. c. d. e. f.
giddaam graab rgaag ;agrab rifiag ;adagg
‘front’ (Mustafawi 2005) ‘near’ ‘transparent/delicate’ ‘closer’ ‘a friend’ ‘thinner/smaller’
Although the items in (3a-f) cannot be a representative sample of the widespread of the voiced velar stop [g] in Qatari Arabic, they nevertheless illustrate such unconditioned widespread usage. (3a) and (3b), for instance, show the use of the [g] in word initial position; (3c) and (3d) illustrate its use in medial position; and (3e) and (3f) clearly show the use of [g] in word final position8. The words in (3) and those in (1) and (2) all appear with the uvular qaaf in Standard Arabic; it is therefore safe to conclude that Qatari Arabic simply replaced the uvular [q] with the velar [g]. Cairene Arabic, on the other hand, replaces the qaaf with the voiceless glottal stop [;], the hamza. This is illustrated by the set of examples in (4a-f): (4) a. b. d. 8
;alb ‘heart’ ba;ara ‘cow’ ;aal ‘said’
(Kaye & Rosenhouse 2005:269) c. wa;t ‘time’ e. ;amar ‘moon’ f. ;ahwa
‘coffee’
See Mustafawi (2005) for a discussion on an optional affrication process whereby the [g] changes to [dž].
250
MAHER BAHLOUL
While a number of dialects exhibit one variant of the qaaf such as the [g] in Qatari Arabic and the [;] in Cairene Arabic above, others show two or more variants. Bahraini Arabic, for example, shows the use of the voiced and voiceless velar stops [g] and [k], respectively. This variation is illustrated in the following examples: (5) a. b. d. f.
gaber gabil gatal giddaam
a’. b’. d’. f’.
kabr kabil katal kiddaam
‘tomb’ (Holes 2001) ‘before’ c. gatt c’. katt ‘lucerne grass’ ‘kill’ e. gidar e’. kadar ‘be able’ ‘in front (of)’
The examples in (5) clearly show two commonly used variants of the qaaf in Bahraini Arabic, namely the voiced velar stop [g] as shown in (5a-f), and the voiceless velar stop [k] as shown in (5a’-f’). The example in (6) from a glossary entry further illustrates this dichotomy. (6) Q-W-L gāl, and kāl (B villages9) (u)/vt vn gōl, kōl (B villages) / 1a say, tell. šgit liha ; What did you say to her? ana kāyil lēhum ams…I told them yesterday…
(Holes 2001:440)
The example of the lexemes gāl and kāl in (6) above appears quite similar to such English examples as ‘schedule’ pronounced with an initial cluster /sk/ and ‘schedule’ pronounced with the voiceless palatal fricative /š/; both have the same denotation but differ in pronunciation according to the geographical location. While the latter is commonly used in England, the former is used in the United States. Thus, similar to the sk/š variation, gāl and kāl cater to two distinct communities. However, unlike the English language case, the Bahraini variants reflect differences in ethnic origin and religious backgrounds (Holes 2001:XXXIX-XL). 1.2 Corpus and Methodology The data in this study were collected mainly through the elicitation technique. A questionnaire was distributed to more than two hundred native 9
In order to account for this variation in the Bahraini dialect ‘glossary’, Holes classifies them on the basis of the speech communities referring to them alphabetically (i.e., A community, and B community, referring to Arab settlers originally from Saudi Arabia, and Baħārna, the original inhabitants of Bahrain respectively). For more details see the Glossary Guidance Notes, xlix; for phonological and morphological differences involving A and B speech communities, see Holes (2005:XXVIII-LIII).
LINGUISTIC DIVERSITY: QAAF ACROSS ARABIC DIALECTS
251
Arabic speakers of eighteen Arabic dialects, some through e-mail and some through personal contact, and so was the collection of the responses. However, only 120 questionnaires were collected, eleven of which were discarded due to either incompleteness or lack of understanding.10 Thus, 109 questionnaires were retained which I followed up with individual phone conversations, e-mail correspondence, and/or face-to-face interviews whenever clarification and/or further information was deemed necessary. The majority of respondents are undergraduate and graduate college students enrolled at the American University of Sharjah (AUS) in the United Arab Emirates and the University of Sfax in Tunisia, and college teachers at AUS and in the United States, all native speakers of each one of the eighteen Arabic dialects. Students’ ages varied between 19 and 35 (mean 27) while the teachers were in their early forties and late fifties (mean 50). Genderwise, the majority of teachers were male (23/25), while students were mixed with a slightly higher female ratio (54% vs. 46%). As for the time line of data collection, while most questionnaires were collected during the months of November 2004, December 2004, and January 2005, the final compilation and interviewing processes lasted until April of 2005, a total of five months. During this assortment period, a minimum of five thoroughly filled questionnaires per Arabic dialect, from different locales, and representing different linguistic communities, were checked. In addition to the survey, I video-recorded five interviews with five college professors, each a native speaker of a different Arabic dialect: Jordanian, Egyptian, Palestinian, Moroccan, and Tunisian.11 The questionnaire and the interviews sought to identify the four major aspects of the research project: (i) the variant or different variants of the qaaf in each particular dialect, (ii) the geographical distribution of each variant if more than one, (iii) possible correlation between each variant and socio-economic attributes, and (iv) attitudes for, neutral, or against the use of one or more variants.12
10
A few respondents did not limit their answers to their respective dialects; they included different Arab countries, which invalidated their responses. Others had little knowledge of the linguistic situation in their respective countries for the limited time they had resided in them; they could not therefore answer a number of specific questions. 11 For lack of space, we are unable to include the transcripts of the interviews; they may, however, appear in a different publication. 12 For space limitations, the current paper reflects on the first two aspects.
252
MAHER BAHLOUL
In addition to the survey and the interviews, the current literature has been instrumental in confirming our findings and filling the various information gaps which our informants could not successfully provide.13 1.3 The phonetics of the qaaf and its variants Classified within “the difficult group…which takes some time to master” (Holes 1994:8), the phoneme /q/ is a voiceless sound produced “from further back in the mouth – from the uvula, to be exact” (Holes 1994:10). As such it is a voiceless uvular stop.14 For reasons including “dialect mixing and processes of koineization” (Behnstedt 2006:596) among several others, the uvular stop has undergone a number of changes within and across various Arabic speech communities. These changes involve a number of phonological processes such as raising, lowering, and affrication. In Bahrain, for example, Holes (2006) mentions four different reflexes of the qaaf: the voiced velar stop [g], the voiceless velar stop [k], the voiced uvular fricative [ġ], and the voiced affricate palatal [ž]. The same four variants are observed in Iraqi Arabic in addition to the voiceless uvular fricative [x] (Abu-haidar, 2006). According to our survey, Palestinian Arabic exhibits five variants of the uvular qaaf: [q], [g], [k], the glottal stop [;], and the voiceless palatal affricate [č]. The following table summarizes the eight attested variants of the uvular qaaf. Apart from the voiceless uvular fricative [x], the remaining seven variants have been identified by our informants. 2. Variants of the qaaf in Northwest and Northeast Africa According to our survey, while all three northwest African countries15— Morocco, Algeria and Tunisia—exhibit the use of more than one variant of 13
The age group of our informants had limitations and so had their socioeconomic status. We could not, for example, cover the speech of older generations, different social classes, and/or various ethnic groups. Such information was detailed in Holes (2001, 2005, 2006) for Bahraini Arabic, Heath (2002) for Moroccan Arabic, Kaye & Rosenhouse (1996) for a number of other Arabic dialects, among others. 14 Al-Mutlabī (1978:103) mentions a voiced and a voiceless version of the uvular qaaf suggesting that the former one be the older form. Similarly, Ingham (1982:XX-XXI) describes the qaaf of the North Arabian dialects as voiced or voiceless. However, Suleiman (1992:130) describes the uvular stop as voiced mažhūr, and so does Ingham (1995:15) for the qaaf of Najdi Arabic. Heath (2002:141) observes that the voicing of the uvular stop depends on context. The current literature lacks consensus and the status of voicing might therefore necessitate further investigation. 15 The geographical distribution of the qaaf variants suggests a few adjustments to the common West/East division of Arabic speaking countries/communities. According to our survey results, the geo-dialectal map of the qaaf suggests the treatment of Morocco, Algeria, and Tunisia as one block, Libya, Sudan, and Upper Egypt as a second block. For convenience, we shall refer to the former as Northwest Africa and the latter Northeast Africa, excluding therefore such Northeast African countries as Somalia, Eritrea,
LINGUISTIC DIVERSITY: QAAF ACROSS ARABIC DIALECTS
253
the qaaf, namely the uvular [q], and the voiced velar [g], the northeast Arabic communities including those in Libya, Sudan, and Upper Egypt constitute a harmonized block for they exhibit the use of a single variant of the uvular /q/, that is the voiced velar [g], hence the West/East division. Our northeast and northwest African informants were unanimous on the use of the two variants [q] and [g] in Morocco, Algeria, and Tunisia and on the sole
[q]
[g]
[;]
/qaaf/ ()ﻗﺎﻑ Voiceless Uvular Stop [k] [č] [ž]
voiceless uvular stop ()ﻕ
voiced velar stop ()ﭬ
voiceless glottal stop ()ﺃ
voiceless velar stop ()ﮐ
voiceless palatal affricate ()ﺗﺶ
voiced palatal affricate ()ﺩﺝ
[ġ]
[x]
voiced uvular fricative ()ﻍ
voiceless uvular fricative ()ﺥ
Table 1. Variants of the Uvular qaaf
use of the voiced velar variant [g] in Libya, Sudan, and Upper Egypt, respectively. In Morocco, the situation of the qaaf and its variants appears to be quite complex.16 While the two major variants, that is the voiceless uvular stop [q] and the voiced velar stop [g] appear to be the two major reflexes of the qaaf within the majority of Moroccan Arabic speech communities, the glottal stop variant, the hamza [;], and the voiceless velar stop variant [k] are quite frequently used by a number of speech communities of Moroccan Arabic. Our respondents are undivided on the use of all three variants, that is [q], [g], and [;] in Morocco with a relatively mixed distribution within major Moroccan urban centers. Thus while the uvular variant [q] continues to mark the speech of major cities’ residents such as Tatouan, Tangier, Kenitra, Sale, Ethiopia, and Djibouti. Kaye (1997:265) observes a number of vocalic changes along the line of this division. See also Ingham (1982:1) for a creative phono-morphologically based classification of Northeast Arabian dialects. In addition and from a historical perspective, according to Clark (1982:377), the northern parts of northwest African countries, namely Morocco, Algeria, and Tunisia, were known as the Jaziret el-Maghrib or “Island of the West” (Clark 1982:377). 16 In order to identify the variants of the uvular qaaf, our questionnaire makes use of the verb qaal “say” as an illustrative sample. While the majority of respondents found it helpful, our Moroccan informants sought further clarification for the widespread use of the voiced velar sound [g] with this particular lexical item across various regional Moroccan dialects. In other words, while an urban dialect and a rural dialect would both use the verb ‘qaal’ with the voiced velar stop [g], they would greatly differ with respect to most other lexical items having the uvular qaaf. Such reaction prompted specific clarification to our Moroccan respondents and interviewees. Heath (2002:144) confirms such widespread use of the velar [g] with this lexical item stating that “Both –qul and – gul are common in the koiné, but my impression is that –gul is gradually spreading.”
254
MAHER BAHLOUL
the capital city Rabat, Casablanca, Fez and Taza, the voiced velar stop [g] is observed in the southern, central and eastern parts of the country. The southern city Agadir, the central city Marrakech, and the northeastern city Oujda, for instance, are quite often cited as mostly [g]-dialects. A third reflex of the qaaf is the glottal stop [;]. While the majority of respondents acknowledge its presence, they appear to differ as to its geographic distribution. Some maintain that it is limited to the Fes and Taza Arabic speech communities, while others observe a much wider distribution, especially within such large cities as Rabat and Casablanca in addition to Tetouan and Tangier. This is probably due to the fact that such variation is found within certain communities not confined to a particular location. Heath (2002), for example, reports that in the cities of Rabat, Meknes, and Fes all Jewish dialects “regularly pronounce Classical Arabic qaaf as glottal stop [;]” (2002:22). However, all respondents agree that the city of Fes is representative of the [;]-dialect. Another variant of the qaaf in Moroccan Arabic is the voiceless velar stop [k]. A respondent from Casablanca confirms its use within some elderly and some Jewish communities. Other respondents are simply not aware of such a variant. Heath (2002:142) notes the use of this variant within Jewish dialects, but cautions that “closer phonetic studies are needed to verify that *q and *k have actually merged in these dialects….” In Algeria, three variants of the qaaf, namely the uvular [q], the voiced velar [g], and the voiceless glottal stop [;] are observed. The results of our survey show a clear distinction between the use of the uvular stop variant in such major Algerian cities as Constantine in the northeast, the capital Algiers in the north central, and Oran in the northwest, while the voiced velar stop [g] is omnipresent within Arabic speech communities residing in the central, southern, eastern and western parts of the country. According to our Algerian respondents, central cities such as Ghardaïa, El-Golea; eastern cities such as Touggourt, El-Oued, Tebessa; western cities such as Tindouf, Beni-Abbas, Bechar and southern cities such as Tamanrasset all make use of the voiced velar stop variant [g]. The glottal stop variant was mentioned by one informant who appears to have lived in the city of Tlemcen where it is observed. The presence of such variant is confirmed by a number of studies. Marçais (1977:11), for example, observes that in Algeria, “the glottal stop is used in Tlemcen and amongst the Jewish-Algier communities.” Likewise, Murtādh (1981) notes that residents of the north western city Tlemcen make use of the glottal stop variant “similar to the Fassi and the Egyptians.” (1981:12)
LINGUISTIC DIVERSITY: QAAF ACROSS ARABIC DIALECTS
255
In addition to the three variants, [q], [g], and [;], current literature mentions a much more marked variant, the voiceless velar stop [k]. Marçais (1977:11) observes that in areas such as Djidjelli and other Jewish communities, the variant [k] is common. Murtādh (1981) also notes the use of the voiceless velar [k] “in some speakers from Msirda and the very far North West Coast of Algeria.” (1981:12). Similarly, Kaye & Rosenhouse (1996) note the use of the variant [k] within the Algerian Jewish communities. In sum, while the variants [q] and [g] appear to be quite widespread, the other reflexes [k] and [;] are observed in Algeria, but seem to be limited to particular communities and locations respectively. In Tunisia, the voiceless uvular stop variant stretches from the coastal central city of Sfax all the way to the north central city of Bizerte covering such other major coastal cities as Mahdia, Monastir, Sousse, Hammamet, Nabeul, and the capital Tunis. The voiced velar variant [g], on the other hand, is observed elsewhere in the southern, central, and northwestern areas of the country. Southern and central towns and cities such as Remada, Tataouine, Medenine, Kebili, Tozeur, Gabes, and Gafsa, Sidi Bou Zid, Kasserine, Kairouan respectively use the variant [g]. Similarly, northwestern cities such as Dougga, El-Kef, Siliana, Jendouba, and Beja make use of the same variant [g] according to our respondents.17 None of our respondents mention other variants, such as [k] or [;], as is the case in Morocco and Algeria, nor did we come across any literature confirming the existence of such variants.18 In the northeast African regions, mainly Libya, Sudan and Upper Egypt, Arabic-speaking communities observe much less variation of the uvular stop qaaf. In fact, the responses of our informants were quite straightforward in highlighting the sole use of the voiced velar stop [g] in all parts of the each region. The Arabic Libyan informants all selected the [g] as the qaaf variant, selected “all parts of the country” for its geographic distribution, and cited such cities and towns as Mizda, Zawiya, Zuwara, Mizrata, Harat-Zuwaya, the capital city Tripoli, and the northeastern city Bengazi where the variant voiced velar stop [g] is attested. Sudanese Arabic respondents reacted in a much similar way. They unanimously selected the velar stop [g] as the lone 17
An exception to this coastal divide is the city of Kairouan located about 100 miles away from the East coast. According to Dr. Ghazali, one of our interviewees, the use of the uvula variant [q] is limited to the original inhabitants of the city who constitute a minority given the late expansion with migrants from other communities whose dialect exhibit the [g] variant. 18 Skik’s (2000) investigation confirms our findings. In addition, it identifies three small areas where the two variants coexist and provides a historical explanation for such contexts.
256
MAHER BAHLOUL
variant in Sudan, and noted its wide geographical distribution all over regions where Sudanese Arabic is the medium of communication. Central cities and towns such as Halfa, Atbarah, Kassala, Oumdurman, Al-Ubayyid, Al-Fashir, and the capital city Khartoun were highlighted by most informants. When asked about other possible variants, especially the voiced uvular fricative [ġ], my Sudanese colleague insisted that it may occur in some words with some speakers, but it is not a variant adopted by Sudanese Arabic speakers. Interestingly, while asserting that the uvular qaaf is fronted in Sudanese Arabic to a “dorso-velar position” resulting in a sound similar to the English [g], Kaye (1976:7) questions Amery’s sources for considering the [ġ] a variant of the qaaf: “I do not know the source for Amery’s information concerning his claim that g is not only pronounced as the English hard g in ‘go’, but it is also pronounced as /ġ/, particularly in the provinces north of Khartoom and on the Blue Nile.” Kaye adds: “This may have been true then for certain lexemes or for certain idiolects, yet I never heard this myself and none of the later writers mention it.” In a much earlier work, Trimingham (1946:3) asserts that “In Sudan Arabic this sound (qaaf) is pronounced like the English hard g in ‘go’”. Finally, in his descriptive analysis of Sudanese Arabic, Hamid states that the uvular stop /q/ “has completely disappeared from SCA and is replaced by either /g/ or /k/” 19 (1984:16). Thus, current research on Sudanese Arabic appears to support to a large extent the results of our survey for it highlights the primacy of the voiced velar stop variant as the major reflex of the uvular qaaf. As for the situation in Upper Egypt, it does not appear to differ from any of the first two, Libya and Sudan. Inhabitants of cities, towns, and oases such as Minya, Assyut, Sohag, Luxor, Aswan, Qara, Siwa, Paris, Kharga, Dakhla, Buhariya, and Farafra all make use of the voiced velar stop [g]. Our Egyptian informants were unified in this respect. “The voiced velar stop [g] is a hallmark of the Saiid (Upper Egypt)” remarks one of our interviewees, Dr. Abdel-Malik. Gadalla states that “in Upper Egypt and some rural areas in Lower Egypt, the voiced velar stop /g/ is used in the place of /q/.” (2004:23). The omnipresence of the voiced velar stop [g] as a major reflex of the Standard Arabic qaaf in Upper Egypt is therefore indisputable. 19
The use of ‘or’ is quite misleading for it suggests an even distribution. In fact Hamid’s observation does not seem to be supported by his own data, as summed up at the end of his thesis, which includes 49 words containing the Standard Arabic qaaf. In their corresponding Sudanese Arabic, 43 words appear with the velar /g/, four words retained the uvular /q/, and only two words appear with the voiceless velar /k/, namely ‘katil > qatil’ (p.235) and ‘wakt >waqt’ (p. 239). This low frequency of the voiceless velar stop [k] suggests it be highly marked and should not have been considered an alternant of its voiced counterpart [g].
LINGUISTIC DIVERSITY: QAAF ACROSS ARABIC DIALECTS
257
In sum, while the two major variants of the qaaf, namely the uvular [q] and the voiced velar [g] divide urban from non-urban Arabic-speaking communities in Morocco, Algeria, and Tunisia,20 the lack of such variation in Libya, Sudan, and Upper Egypt gives prominence to the velar stop [g] which appears to be a speech feature of all Arabic-speaking communities residing in cities, towns, and/or villages. The variants [k] and [;], while unattested in Tunisia, Libya, Sudan, and Upper Egypt appear within certain communities in Algeria and Morocco. As such they are quite marked, which explains their absence in the responses of a number of our respondents. 3. Variants of the qaaf in the Eastern Mediterranean Region (EMR) According to the results of our survey, Arabic-speaking communities residing along the east coast of the Mediterranean Sea from Lower Egypt to Aleppo in northern Syria passing by Palestine, western Jordan, and Lebanon all share common linguistic features relative to the use of the qaaf and its variants. Thus, major urban centers such as Cairo, Jerusalem, Amman, and Damascus exhibit the use of the voiceless glottal stop variant [;] of the qaaf. While all questionnaire respondents from Egypt, Palestine, Jordan, Lebanon, and Syria unanimously selected the glottal stop variant as the most common, a number of other reflexes of the qaaf were also mentioned. At the forefront of such reflexes is the voiced velar stop [g], followed by the voiced uvular stop [q], the voiceless velar stop [k], and the voiceless palatal affricate [č] along a descending frequency scale. In Lower Egypt, for example, while the speech communities living in cities and towns stretching from Beni Suef and Fayoum in the south to Alexandria, Rosetta, Damietta and Port Said in the north passing by Helwan, Giza, the capital city Cairo, Tanta, and Ismailia make use of the glottal stop variant [;], the speech of a few communities exhibit the use of the voiced velar [g] and the voiceless uvular stop [q]. While only two out of the eight Egyptian respondents confirmed the use of the velar stop in some rural communities during the interviews,21,22 none has
20
In support of this salient isogloss, Heikki (2006) provides a historical argument stating that “in the eleventh century the originally Najdī tribes of Banu Sulaym and Banu Hilāl and the southern Arabian tribe of the Maaqil moved westward and occupied the North African plains and steppes.” He adds: “At present, Sulaymī Bedouin dialects are spoken in Libya, southern Tunisia, and northeastern Algeria; eastern Hilālī dialects in central Tunisia and eastern Algeria; central Hilālī in central and southern Algeria; northern Hilālī in the northern part of central Algeria; and Maaqilī dialects in northwestern Algeria and Morocco” (2006:609). 21 Unlike the other six respondents, both informants appear to have traveled to different places within both Lower and Upper Egypt; however, they could not identify a particular place.
258
MAHER BAHLOUL
mentioned the uvular stop variant. This latter is found in Behnstedt (2006: 588), who observes that the uvular stop /q/ is used in two locations in the Delta area, B. Migizil (Manzala) and Baltim, two small coastal communities residing between Damietta and Alexandria. The situation of the qaaf variants within the Palestinian communities residing in the Palestinian, Israeli, Jordanian, and Lebanese territories is by far the richest and the most complex. On the one hand, the number of variants exceeds any other place within the boundaries of the eighteen countries under investigation; on the other hand, all Palestinian respondents have similar reactions to the status of each variant within the Palestinian communities. On the basis of our survey and interviews, five variants are attested: the voiceless glottal stop [;], the voiced velar stop [g], the voiceless uvular stop [q], the voiceless velar stop [k], and the voiceless palatal affricate [č]. While [;] and [g] are observed within cities in the West Bank, such as Nablus, Ramallah, Jerusalem, Hebron and the Gaza Strip, such as Gaza city, Dayru l-Balaħ and Khan Yunis, the remaining variants [k], [q], and [č] are found amongst rural Palestinian communities. In the village of Barta’a, located 25 kilometers south of the West Bank town of Jenin, the variant [k] is commonly used. According to one of our informants, the variant [č] in commonly used in some villages south of the West Bank, by the Beersheba area inside Israel. To all our informants, however, the geodemographic distribution of the different variants is quite straightforward. The glottal stop is found within urban centers, the voiced velar stop is found within Palestinian Bedouins residing in urban centers or elsewhere, the voiceless velar stop, the uvular stop, and to a lesser degree the palatal affricate are found within the rural (Fallāhīn) communities. The situation is similar in northwestern Jordan where the majority of the population is of Palestinian descent, especially in the major urban centers such as the capital city Amman, Zarqa, and Irbid.23 Elsewhere in Jordan, that is the northeastern regions (i.e., As-Safawi, Ar-Ramayshid), the central towns (i.e., Karak, Tefila) and the southern areas (i.e., Maan, Aqaba), Jordanian Arabic exhibits the voiced velar stop [g]. It is a feature of “Jordanians of Jordanian origin,” as one of our informants asserts. In Lebanon, apart from the areas where the Palestinian communities had settled, the glottal stop variant of the qaaf remains the most common one in 22
In his book Comparative Morphology of Standard and Egyptian Arabic Gadalla (2004:2-3) confirms such use: “…In Upper Egypt and some rural areas in Lower Egypt, the voiced velar /g/ is used in the place of /q/.” 23 Abdel-jawad (1981:72) identifies three linguistic groups in Amman whom he refers to as (i) the /;/ dialect, (ii) the /k/ dialect, and (iii) the /g/ dialect corresponding to the urban, rural, and bedouin/semi-bedouin communities, respectively.
LINGUISTIC DIVERSITY: QAAF ACROSS ARABIC DIALECTS
259
the urban centers such as Tyre and Sayda in the south, the capital city Beirut in the center, and Tripoli in the north. Other variants such as the voiced velar stop [g] and the voiceless uvular stop [q] are noted by most of our Lebanese informants in the country’s rural areas. While the velar stop marks the speech of Lebanese rural communities, the [q] variant is mainly noted as a feature of the Lebanese Druze speakers, another religious group living side by side with Sunnis and Shiites, and residing mainly in mountainous districts.24 In neighboring Syria, the situation of the qaaf variants is quite similar. Thus, according to our informants, the major urban centers such as the capital Damascus in the south, the cities of Homs and Hama in the center, and Aleppo in the north have all adopted the glottal stop variant [;].25 The voiced velar stop variant [g] is observed within the speech of rural communities, while the uvular stop variant [q] appears within the Druze and the Alawite communities residing in the southwest (i.e., Druze Mountains by the Suwaida area) and the northeast coast (i.e., Tartus, Beniyas, Latakia), respectively. In sum, the urban centers along the East Mediterranean region stretching from the city of Beni Suef in Lower Egypt to the city of Aleppo in northwestern Syria all share the linguistic glottal stop variant [;] as a major isogloss which identifies urban inhabitants. Bedouin communities residing in the towns and villages make use of the voiced velar stop variant [g] while rural and a number of ethnic communities make use of the voiceless velar variant [k], the palatal [č], and the uvular stop [q], respectively.26 4. Variants of the qaaf in the Northern and Central Arabian Peninsula According to our data, Arabic speech communities stretching from eastern Jordan and eastern Syria to northwestern Yemen,27 passing by those 24
Naim (2006:276) makes a similar observation: “Druze speech is characterized by a relative conservation of /q/ which alternates with /;/ in ordinary vocabulary.” 25 Darwish (2005:10) supports this conclusion by stating that “the letter qaaf is replaced in most (Syrian) cities by the glottal stop except for the pronunciation of the work ‘Koran kariim’.” 26 Major cities such as Gaza in Palestine and Latakia in Syria appear to have drifted somehow since the former adopts the voiced velar variant and the latter the voiceless uvular variant while all other major cities in the whole region have adopted the glottal stop variant. 27 Unlike the rest of Yemeni speech communities, residents of the Northwest use the same [g] variant as those in central and northern Arabia. In his editorial note of Prochazka’s (1988) book Saudi Arabian Dialects, Ingham comments on the resemblance between the Saudi and northern Yemeni dialects by quoting Rabin who suggests that “there was a continuous chain of dialects from south to north without any clear dividing line between
260
MAHER BAHLOUL
residing in Iraq, Kuwait, Saudi Arabia, Bahrain, Qatar, and the United Arab Emirates, share in common the use of the voiced velar stop [g], a reflex of the Arabic qaaf. 28 As such, the voiced velar stop does not only cover a large geographical space within the Arabian peninsula and beyond, for it extends to Mesopotamia and the Levant and cuts across the national boundaries of a number of states within the region, it also invariably extends to speech communities residing in urban and non-urban centers. The capital cities Baghdad, Kuwait, Manama, Doha, Riyadh, and Sanaa of Iraq, Kuwait, Bahrain, Qatar, Saudi Arabia, and Yemen, respectively, all exhibit the use of the voiced velar stop [g]. In addition, towns and villages in eastern Jordan, eastern Syria and all aforementioned regions equally exhibit the use of the voiced velar. In southeastern and northeastern Jordan, for example, inhabitants of towns such as Al-Mudawwara and Ar-Ramayshid along the Saudi Arabian and Iraqi borders respectively use the voiced velar stop [g] as some of our respondents stated. Similarly, in eastern Syria, Arabic speech communities such as those residing in Abu Kamala and Al-Hol by the Iraqi borders make use of the voiced velar stop variant. In Iraq and Saudi Arabia, the overwhelming majority of Arabic-speaking communities residing in northern, eastern, western, and southern towns and villages in addition to those residing in cities make similar use of the voiced velar stop. The situation is quite the same in Kuwait, Qatar, the United Arab Emirates, northwestern Yemen, and to a certain degree Bahrain. Our 57 questionnaire respondents coming from the above nine countries were quite undivided as to the unmarked status and the widespread use of the voiced velar variant in those regions.29 However, a number of other reflexes used along the side of the voiced velar stop [g] were highlighted. In Iraq, for example, all six informants mentioned the use of the voiceless uvular stop [q] in the
Yemen and Hijaz.” This observation supports the inclusion of this part of Yemen within the boundaries of northern and central Arabia. 28 Holes examines Gulf Arabic and includes southern Iraq, Kuwait, Bahrain, Qatar, Saudi Arabia, and the United Arab Emirates. He similarly excludes Oman, but makes no comments about Yemen. The ethnic and topographic homogeneity of the region in addition to some dialectal leveling lead Holes to describe a “mixed, uncodified Gulf-wide Koine: a kind of linguistic common denominator” (1990:XII). 29 A number of comments from previous research appears to echo our conclusion. Ingham (1995:14), for example, observes that Najdi Arabic differs from Classical Arabic in its sound inventory. Amongst the new Najdi Arabic sounds is the voiced velar /g/, a result of fronting of the uvular /q/. Similarly, Qafisheh (1999:ix) observes that the uvular qaaf “occurs only in few words and classicisms of Educated Gulf Arabs.” Wagoner, Satterthwait, & Rice’s (1977) book on Saudi Spoken Arabic did not include the uvular [q] even as part of the key to pronunciation section; instead they included the variant [g].
LINGUISTIC DIVERSITY: QAAF ACROSS ARABIC DIALECTS
261
northwest region, especially in the city of Mosul.30 A similar use of the uvular was mentioned by some Emirate respondents in the northeast cities (i.e., Dibba Al Fujairah, Dibaa Al Hisn) and villages (i.e., Sharm, Dhadna, Al Bedia) within the Fujairah area, an emirate bordering Oman.31 In addition to the uvular variant, the voiceless velar stop [k] was mentioned by two out of six Bahraini respondents with a note restricting its occurrence to certain remote areas and with rather older generations.32 5. Variants of the qaaf in the Southern Arabian Peninsula Along the coast of the Arabian Sea, which marks the southern borders of the Arabian peninsula, lie the last two countries under investigation, Yemen and Oman; the former rests on the western bed and the latter on the eastern bed. According to our data, all major cities in both Yemen and Oman along the Arabian Sea ( Aden and Mukalla in Yemen and Musqat in Oman) exhibit the same linguistic variant, i.e., the voiceless uvular stop [q] as a reflex of the qaaf. As for the voiced velar variant [g], previously shown to distinguish the speech communities in central and northern Arabia, it appears in rather rural speech communities in Oman, especially those residing in the western part of the country by the Saudi Arabian borders, but not exclusively given the current mixture of both communities in the capital city Musqat as most Omani informants indicated, and in the northwest of Yemen where the capital city Sana’a is located.33 Thus the two reflexes of the qaaf, the uvular [q] and the velar [g], are both present in the speech of Arabic communities in 30
Al-Bakry (1972:30) observes that “in Egypt and the Levant the letter qaaf is pronounced a glottal stop (;) (i.e., qaal is pronounced ;aal), while in Hijaaz, Yemen, and Kuwait, and even in some areas in Iraq (except Mosul), they change it to a (g); thus, they would say ‘gaal’ instead of ‘qaal’.” (I underline; my translation). In addition, Mansour (2006:232) observes that the uvular variant is also used within the Baghdadi Jewish communities. Another variant of the qaaf, the voiced pharyngeal fricative, is mentioned in Abu-Haider (2006:272), but as a feature of the speech of elderly people of Bedouin origin in Baghdad and central Iraq. 31 Interestingly, according to Wikipedia’s website, “Fujairah is the only Emirate of the U.A.E that is almost totally mountainous.” Its geography and mainly its vulnerability to isolation might thus hint at its linguistic uniqueness. On the other hand, one might argue that the Emirate of Fujairah, being on the same coast as Musqat, the capital of Oman, exhibits similar linguistic variation in the same way that coastal north African cities, which despite national boundaries, show similar linguistic patterns. 32 In fact, Holes (2001, 2005, 2006) shows that the [k] variant is a salient feature of the Baħārna speech communities in Bahrain; however, his data is mainly collected from illiterate elderly people which seems to support our respondents’ remarks. 33 The bidialectal situation in Yemen may easily be seen in Watson (1993) where she describes San’ani Arabic whereby all occurrences of the qaaf appear with the voiced velar reflex [g] and in Feghali (1990) where he describes Adeni Arabic whereby all occurrences of the qaaf appear with the uvular reflex [q].
262
MAHER BAHLOUL
southern Arabia unlike its central and northern regions where the uvular variant [q] is no longer attested as a basic phoneme of the respective Arabic dialects.34 6. Conclusion Thus far, the analysis of the geographical distribution of the qaaf across the eighteen different political entities reveals a good deal of dialectal diversity which in turn suggests a dissection of the entire area into five different regions, namely (i) the northeast African, (ii) the northwest African, (iii) the East Mediterranean, (vi) the northern and central Arabian, (v) and the southern Arabian region. Having examined the dialectal dynamics of each region, the extent to which nearby and distant regions share dialectal features was elucidated. We can see, for example the similarities between the northeastern African region and the southern Arabian region, for the current survival of the uvular stop [q] in both regions, not as a feature of a peripheral dialect, but rather as a feature of a core dialect. Another similarity between the northwest African region and the northern and central Arabian region is quite revealing. The two regions share the use of the voiced velar reflex [g] of the qaaf. The eastern Mediterranean region, however, stands alone in its use of the voiceless glottal stop variant. We may therefore conclude that the variants [q], [g], and [;] are representatives of three major dialects which appear to cut across the boundaries of so many different current countries. As for the voiceless velar [k], it is best treated as a feature of peripheral dialects belonging to certain Arabic speaking communities, and so is the voiceless affricate [č]. The following table summarizes the distribution of the major qaaf reflexes in the eighteen Arabic countries. As the table shows, the voiced velar stop variant [g] is omnipresent; it is followed by the voiceless uvular stop [q] which appears in twelve of the eighteen dialects. The third variant is the glottal stop; it appears in seven of the eighteen Arabic dialects.
34
Some of our Kuwaiti informants mentioned during the interviews that they would use the uvular [q] in two cases: (i) words borrowed from Standard Arabic (e.g., qanat, ‘channel’), (ii) some words that start with the voiced pharyngeal fricative [γ] in Standard Arabic are replaced with [q] (e.g., γabi < qabi ‘stupid’). (See also Holes [1990:262-263] for a similar comment.)
LINGUISTIC DIVERSITY: QAAF ACROSS ARABIC DIALECTS
263
[g] [q] [;] Morocco + + + Algeria + + + Tunisia + + Libya + Egypt + + + Sudan + Palestine + + + Lebanon + + + Jordan + + + Syria + + + Iraq + + Kuwait + Saudi Arabia + Bahrain + Qatar + UAE + + Oman + + Yemen + + Table 2. Distribution of Major Reflexes of the qaaf across Arabic Dialects
REFERENCES Abdel-jawad, Hassan. 1981. Lexical and Phonological Variation in Spoken Arabic of Amman. Ph.D. dissertation, University of Pennsylvania. _____. 1986. “The Emergence of an Urban Dialect in the Jordanian Urban Centers”. International Journal of the Sociology of Language 61:5.53-63. Abu-haidar, Farida. 2006. “Baghdad Arabic”. Encyclopedia of Arabic Language and Linguistics 1.222-231. Leiden: E. J. Brill. _____. 2006. “Bedouinization”. Encyclopedia of Arabic Language and Linguistics 1.269274. Leiden: E. J. Brill. Al-Bakry, Hazem. 1972. Fii Al-alfaaDi Al-'aammiyyati al-muuTaliyyati wa muqaaranatuhaa ma'a al-alfaaDi al-'aammiyyati fii al-;aqaaliimi al-'arabiyyati. Baghdad: MaTba'atu As'ad. Al-Mutlabii, Ghalib. 1978. lahžat Tamiim wa ;aθaruhaa fii al-'arabiyya al-muwaħħada. Baghdad: Manšuuraat Wizaarat aθθaqaafa wa-l-funuun. Ayoub, Abdurrahman. 1968. Al'arabiyya wa lahažaatuhaa. Cairo: MaTaaba' sižil al'arab. Behnstedt, Peter. 2006. “Dialect Geography”. Encyclopedia of Arabic Language and Linguistics 1.583-593. Leiden: E. J. Brill.
264
MAHER BAHLOUL
Brustad, Kristen E. 2000. The Syntax of Spoken Arabic: A comparative study of Moroccan, Egyptian, Syrian, and Kuwaiti dialects. Washington, D.C.: Georgetown University Press. Clark, Desmond. 1982. The Cambridge History of Africa. New York: Cambridge University Press. Darwish, Ahmad. 2005. Al-Alfaađ Al'aamiyya As-Suuriyya: Diraasa wa Mu'žam wa žuðuur. Damascus: MaTaaba' Alif Baa;- Al;adiib. Gadalla, Hassan. 2004. Comparative Morphology of Standard and Egyptian Arabic. Munich: Lincom Europa. Feghali, Habaka. 1990. Arabic Adeni Reader. Wheaton, Madison: Dunwoody Press. Fleisch, Henri. 1974. Etudes d’Arabe Dialectal. Beirut: Dar el-Mashreq. Haeri, Niloofar. 1997. The Sociolinguistic Market of Cairo: Gender, class, and education. London: Kegan Paul International. Hamid, Abdel Halim. 1984. A Descriptive Analysis of Sudanese Colloquial Arabic. Ph.D. dissertation, University of Illinois, Urbana-Champaign. Heath, Jeffrey. 2002. Jewish and Muslim Dialects of Moroccan Arabic. London, New York: Routledge Curzon. Heikki, Palva. 2006. “Dialects: Classification”. Encyclopedia of Arabic Language and Linguistics 1.604-613. Leiden: E. J. Brill. Holes, Clive. 1990. Gulf Arabic. London and New York: Routledge. _____. 1994. Modern Arabic: Structures, functions, and varieties. London, New York: Longman. _____. 2001. Dialect, Culture, and Society in Eastern Arabia 1. Leiden: E. J. Brill. _____. 2005. Dialect, Culture, and Society in Eastern Arabia 2. Leiden: E. J. Brill. _____. 2006. “Baħraini Arabic”. Encyclopedia of Arabic Language and Linguistics 1.241-255. Leiden, Boston: E. J. Brill. Ingham, Bruce. 1982. North East Arabian Dialects. London and Boston: Kegan Paul International. _____. 1995. Najdi Arabic: Central Arabian 1. Amsterdam/Philadelphia: John Benjamins. Kāmil, Murād. 1968. Al-lahažāt al-'arabiyya al-ħadīθa fii al-yeman. Al-maTba'a alfanniyya al-ħadiiθa. Kaye, Alan. 1976. Chadian and Sudanese Arabic in the Light of Comparative Arabic Dialectology. The Hague, Paris: Mouton. Kaye, Alan & Judith Rosenhouse. 1996. “Arabic Dialects and Maltese.” The Semitic Languages ed. by Robert Hetzron, 263-311. London: Routledge. Mansour, Jacob. 2006. “Baghdad Arabic Jewish”. Encyclopedia of Arabic Language and Linguistics 1.231-241. Leiden, Boston: E. J. Brill. Marçais, Ph. 1977. Esquisse Grammaticale de l’Arabe Maghrébin. Paris: Librairie d’Amerique et d’Orient. Mustafawi, Eiman. 2005. “Affrication in Qatari Arabic.” Paper presented at the 19th Annual Arabic Linguistic Symposium, University of Illinois, Urbana-Champaign, April 1-3. Murtadh, Abdelmalik. 1981. Al'aammiyya Al-žazaa;iriyya wa Silatu-haa bi-l-fuSħaa. Algiers: Aš-šarika Al-waTaniyya li-nnašr wattawzii'. Naïm, Samia. 2006. “Beirut Arabic”. Encyclopedia of Arabic Language and Linguistics 1.274-286. Leiden: E. J. Brill. Qafisheh, Hamdi. 1996. A Glossary of Gulf Arabic. Beirut : Librairie du Liban. Prochazka, Theodore. 1988. Saudi Arabian Dialects. London & New York: Kegan Paul International. Rosenhouse, Judith. 2006. “Bedouin Arabic”. Encyclopedia of Arabic Language and Linguistics 1.259-269. Leiden: E. J. Brill. Rice, Frank & Majed Sa’id. 2005. Eastern Dialects with MP3 Files: An introduction to Palestinian Arabic. Washington D.C.: Georgetown University Press.
LINGUISTIC DIVERSITY: QAAF ACROSS ARABIC DIALECTS
265
Skik, Hichem. 2000. “La prononciation de qâf arabe en Tunisie”. In Proceedings of the Third International Conference of AIDA ed. by Manwel Mifsud, 131-136. Malta: University of Malta Press. Talmoudi, Fathi. 1981. Texts in the Arabic Dialect of Sūsa (Tunisia): Transcription, translation, notes and glossary. Göteborg: Acta Universitatis Gothoburgensis. Trimingham, Spencer. 1946. Sudan Colloquial Arabic. London: Oxford University Press. Trubetzkoy, Nikolai. 1969. Principles of Phonology. Berkeley: University of California Press. [1939.] Wagoner, Merrill, Arnold Satterthwait, & Frank Rice. 1977. Spoken Arabic (Saudi). Ithaca, New York: Spoken Languages Services. Watson, Janet. 1993. A Syntax of San'ānī Arabic. Wiesbaden: Harrassowitz verlag. World Wide Websites: http://en.wikipedia.org/wiki/http://lexicorient.com/e.o/atlas/index .htm
ARABIC SOCIOLINGUISTICS AND CULTURAL DIVERSITY IN MOROCCO∗
Moha Ennaji Rutgers University
1. Introduction Morocco is characterized by multilingualism in the sense that many languages and varieties are used in different domains, viz., Classical Arabic, Standard Arabic, Moroccan Arabic, Berber, French, Spanish, and, recently, English. The multilingual dimension of Morocco has a direct impact on Arabic sociolinguistics which is characterized by many paradoxes and contrasts. The language policy adopted is partial Arabization and Arabic-French bilingualism in education. Arabization as a policy has, however, its own limitations because French still dominates as a vector in socio-economic development. In the view of decisionmakers, Standard Arabic alone cannot seriously challenge French in domains like education, administration, media, business, science and technology, for it is not yet entirely modernized and standardized (Wagner 1993:22). Today, given that Arabization is almost complete in primary and secondary public schools, all scientific subjects are taught in Arabic in these levels. However, Arabization is still an issue because Arabic does not open wide horizons, as there exist few job openings for Arabized
*
I would like to thank an anonymous reviewer, Linda Stump Rashidi, Fatima Sadiqi, and the audience at the ALS 19 for their valuable comments and remarks. Also, many thanks to Elabbas Benmamoun for his comments, generous help and support during and after the conference.
268
MOHA ENNAJI
graduates, compared to their French-educated peers (Ennaji 2005, ch. 9). On another level, tension exists not only between French-Western and Arab-Islamic values and beliefs, but also within the Moroccan context between Berber and Arabic languages and cultures. In fact, Islam imposed the unity of religion and language, a concept based on the principle of the unity between the sacred text and Classical Arabic (see Khatibi 1983). To alleviate this tension, Berber has recently been officially recognized as part and parcel of the national cultural identity with the creation of the Royal Institute of Berber Culture. As a result, the authorities have decided to introduce Berber into schools and to increase the number of hours allotted to it on the radio programs. Berber has also been introduced on television, particularly for news broadcasting. Thus, Morocco today experiences two sorts of revival: the revival of the Arabic language and Arab-Islamic culture and the attempts to promote mother tongues, especially Berber. 2. Diglossia, Triglossia or Quadriglossia? Of the important features of multiligualism in Morocco, it is worth mentioning the phenomenon of diglossia. This notion was first discussed by Marçais (1930-1931) and then by Ferguson (1959). It specifies briefly that in the Arab world there are two varieties of Arabic, a high variety (Classical Arabic) and a low one (Colloquial Arabic). Other researchers claim that today there are at least three varieties of Arabic (triglossia), Classical and Standard Arabic, which are high and intermediate respectively, and colloquial Arabic (the low variety). Being the language of Islam, Classical Arabic (CA) is the high variety; the Qur’an was revealed in Classical Arabic, which enjoys a great literary and religious tradition. Classical Arabic is a written language that is learned at school. Standard Arabic is the middle variety, which is codified and standardized; it is used in education, media, and administration. The main distinction between Classical and Standard Arabic comes from the fact that Standard Arabic is more flexible in its phonology, morphology, and syntax; for instance, it lacks the case marking affixes (e.g., CA duruusun (lessons) SA duruus (lessons); unlike Classical Arabic, Standard Arabic exhibits a new alternative word order (Subject Verb Object) in
ARABIC SOCIOLINGUISTICS AND DIVERSITY IN MOROCCO 269 269
addition to the Verb Subject Object word order. Standard Arabic has also borrowed a host of words and phrases from French (e.g., French colonel SA al- kolonel; French surréalisme surjalija). For more such examples, see Ennaji (2005; 1988). Standard Arabic also vehicles modern mass culture, as it is usually considered the outcome of the Arabization process, which resulted in consolidating the place of Standard Arabic in sectors like education, administration, and media (Grandguillaume 1983). The ‘low’ status of Moroccan Arabic can be ascribed to the fact that it is neither codified nor standardized; however, it is the variety spoken by the vast majority of the population. Berbers generally speak it as their second language. It is viewed by the masses and the elite alike as a corrupt form of Classical/Standard Arabic. Linguistically, it is characterized by vowel drop and the overuse of the schwa (e.g,. Standard Arabic waqafa (stop) wq∂f in Moroccan Arabic; Standard Arabic haarib (has escaped) harb or har∂b); lexically, Moroccan Arabic differs from Standard Arabic (e.g., Standard Arabic nisaa; (women) 'jalat, sariqa (theft) XeTfa. Moroccan Arabic has borrowed immensely from French (e.g., French pompe bumba, French cuisine kuzina), as well as from Berber (e.g., the following Berber loans are used in Moroccan Arabic: tamara (hardship), tanZZart (carpentry)). Moroccan Arabic can be divided into urban and rural varieties. In the north, we have the shamali (northern) dialect, which is spoken in the areas of Tangiers, Chefchaoun, Tetouan and Larache. In central Morocco, there is the Fassi variety spoken in the areas of Fès and Sefrou. There is also the Moroccan dialect of Rabat and Casablanca. In the south, we have the Marrakeshi and Agadiri dialect that is much influenced by Tashelhit Berber; it is spoken in Marrakech, Essaouira and Agadir. In the Sahara, there is the dialect of Hassaniya. Apart from a few lexical and phonological idiosyncrasies, these regional dialects are mutually intelligible to most Moroccans. Ferguson’s (1959) classification of Arabic varieties into high and low does not actually correspond to the linguistic situation in Morocco and the Maghreb at large, for we have three Arabic varieties which are in a triglossic relation: Classical Arabic, Standard Arabic, and Moroccan Arabic. Classical Arabic is used in the mosque, in the Ministries of Justice and of Islamic Affairs, in official speeches, in classical poetry and literature. Instead of Classical Arabic, as Ferguson
270
MOHA ENNAJI
claims, it is what is called Standard Arabic that is employed in writing personal letters, in political or scientific discourse, and in the media and administration. Moroccan Arabic is used in informal settings, at home, in the street, with friends, etc. Thus, three distinct varieties co-exist so that we have today triglossia, as mentioned in Ennaji (1991, 2001): Classical Arabic
Standard Arabic
Moroccan Arabic Figure l. Triglossia in Morocco
Following Ennaji & Sadiqi (1994:86) and Ennaji (2001), one may argue for the existence of ‘quadriglossia’ in Morocco and the Arab world, in the sense that, in addition to the three varieties above, a fourth variety, Educated Spoken Arabic (or Modern Moroccan Arabic), is used in the everyday colloquial style of learned people. It may be used as a lingua franca by Arabic speakers from different Arab countries or to address foreign speakers of Arabic. Educated Spoken Arabic is an elevated form of colloquial Arabic that is much influenced by the vocabulary and expressions of Standard Arabic. Here are a few examples of Educated Spoken Arabic: (1) a. stamarrat d-dirasa ila TTamina. Educated Spoken Arabic b. bqina kanqraw ↔tta 1 tm↔nya. Moroccan Arabic continued the-study till eight “School went on until eight.” (2) a. 'T)i-ni t-t↔qriir lli rs↔l-ti 1 mudir. Educated Spoken Arabic b. Ζib li rrapport lli Siftti 1 ddirektur. Moroccan Arabic give-me the-report that sent-you to director “Give me the report that you sent to the director.”
However, Educated Spoken Arabic is, like Moroccan Arabic, essentially spoken and not used in writing; Educated Spoken Arabic is generally used on radio and television debates and interviews (see Ennaji 1995).
ARABIC SOCIOLINGUISTICS AND DIVERSITY IN MOROCCO 271 271
Classical
Standard
Educated Spoken
Moroccan Figure 2. Quadriglossia in Morocco
Like Moroccan Arabic, Educated Spoken Arabic is neither codified nor standardized; in addition, it is not widely used by the Moroccan speech community. This fourth variety, which is used by educated people in their everyday speech, is not yet fully developed and widespread. It is a ‘polished’ form of Moroccan Arabic, whose lexicon is affected by that of Standard Arabic. Youssi (1983) refers to it as “Modern Moroccan Arabic”. Educated Spoken Arabic is usually heard on radio and television, and in academic circles. At times, lectures, talks, plays, and discussions are given in this variety. Thus, Educated Spoken Arabic adds a fourth dimension to yield a form of ‘quadriglossia’; that is, four varieties of Arabic are actually in use, with each variety having a set of functions and situations which it fulfills. However, given the high illiteracy rate (48% according to the official statistics of 2002), it can be stated that Educated Spoken Arabic is not widespread, as it is reserved to learned people. This form of quadriglossia constitutes a continuum where the four varieties of Arabic are in complementary distribution, with each serving specific domains of use and social functions. Furthermore, the long use of Modern Moroccan Arabic among the educated elite in the Moroccan linguistic scene results, as Ferguson argued, from the “communicative tensions” between two varieties, namely Moroccan Arabic and Standard Arabic, that are trying to serve the same functions. As to Classical and Standard Arabic, their heavy use in political and media discourses and the overlap of their domains of use allocates to them points in the continuum. Apart from Classical and Moroccan Arabic, which are both at the two extremes of the continuum, one
272
MOHA ENNAJI
might see the middle varieties (Standard and Modern Moroccan Arabic) more in terms of a continuum than two differently dichotomous varieties in these functional domains. 3. Standard Arabic-French Bilingualism Standard Arabic-French bilingualism is intimately linked to education; the two languages are not the mother tongues of Moroccans, as they are learned only at school. A limited number of Moroccans are actually Standard Arabic-French bilinguals (25%), essentially those who hold a high school certificate or a higher degree, and who speak both languages fluently (see Santucci 1986, Poindexter 1991, and Elbiad 1991). However, the degree of competence in both languages depends on the level of education of each bilingual; in general, the higher the level of schooling, the more proficient the bilingual. In Moroccan public schools, the number of hours allocated to Standard Arabic at the primary and secondary levels are by far greater than those allocated to French. This means that, globally, one third of the weekly classes are devoted to French. By the end of high school, students are expected to achieve a good mastery of both Standard Arabic and French. But in reality, many teachers, professionals, and decision-makers complain about the low standards of French in public schools. This low level is often ascribed to the impact of Arabization, which began in the early 1960s, and which emphasized Standard Arabic over French. Arabization is one among many reasons why many people prefer to send their children to private schools, where French instruction occupies an important place in the curriculum, beginning in kindergarten. In the following section, I look at the various forms, functions and domains of the use of French. I will focus on the opposition between Arabic-educated and French-educated segments of the population. 4. Language Tension The Arabic varieties mentioned above which are in a diglossic or a quadriglossic relation are in a conflict situation. In fact, one of the consequences of this relation is the difficulty in Arabic language teaching and learning; many students and intellectuals suffer from some kind of linguistic insecurity, for they have to make sure that they use the appropriate Arabic variety in the right context. When they are in a
ARABIC SOCIOLINGUISTICS AND DIVERSITY IN MOROCCO 273 273
formal setting, they use Standard Arabic, in a semi-formal setting they use Moroccan Modern Arabic, and in an informal situation they use Moroccan Arabic. For writing purposes, Standard Arabic is used, and for prayers, the use of Classical Arabic is compulsory. At times, there is interference from the colloquial Moroccan variety in formal contexts and in written texts. The difficulty increases when speakers want to move from the colloquial variety to the written standard variety. As a result, many educated Moroccan people resort to some sort of codeswitching, where Moroccan Arabic and Standard Arabic are mixed. At times, speakers code-switch between Arabic and French for lack of the exact idioms (see Ennaji 2005, ch. 8; Sadiqi 2003, ch. 5). On another level, there is a strong tension between Classical Arabic and French, for they are used interchangeably to fulfill grossly similar functions. However, while French prevails as a language of science, Classical Arabic is regarded, by officials and non-officials alike, as a language of religion and ancient literature. Arabization in Morocco is still a controversial issue as there are disagreements between all groups of protagonists. The traditionalists advocate full Arabization in education and administration and a return to traditional lifestyle and Arab-Muslim roots. As for the modernists (Françisants), they reject systematic Arabization and favor the consolidation of French to help modernize the educational system. The nationalists broadly agree with the modernists because they believe that Arabic-French bilingualism can contribute to the modernization of the country and prepare it for a better future (cf. Ennaji 2002a). All in all, Arabization is intimately linked to nationalism, cultural identity and politics, and its advocates ignore the attitudes of the modernists or of the Berber activists who disfavor systematic Arabization. Since its implementation in the 1960s, Arabization has rekindled the Berbers’ pride in cultural identity, and subsequently led to their endeavor to secure national status for the Berber language (see Faik 1999). Along these lines, and as a result of Berber civil society’s struggle for the promotion of Berber, Berber language has been introduced in primary schools since September 2004.
274
MOHA ENNAJI
5. Language Attitudes The following discussion is based on the sociolinguistic surveys of language attitudes I carried out and published in Ennaji (2002a, 2005). The attitudes were elicited through interviews and questionnaires submitted to 124 participants (students, teachers, professionals, etc). The findings show that 73% of respondents think that Standard Arabic is a school language, against 2% for Moroccan Arabic and 51% for French. Only 24% of students consider that they have a good or a very good mastery of spoken Standard Arabic, and 27% of them think they can write it well. Less than half of the students and about half of the teachers think they have average proficiency in Standard Arabic. For most Moroccans, Moroccan Arabic is a corrupt form of Arabic which is associated with everyday life and which is useless in formal settings or domains. As a result, Moroccan Arabic is excluded from schools and benefits from the support of no institution. Even literacy is acquired by learning to read and write in Standard, not Moroccan, Arabic. Students’ attitudes toward Moroccan Arabic are quite significant in that they reflect the general attitude held by officials and ordinary people alike. For instance, 87% of students disagree with the idea that Classical Arabic should be replaced by Moroccan Arabic. This attitude is due to the prestigious status of Classical Arabic, the language of the Qur’an, and to the fact that Moroccan Arabic is associated with illiteracy. According to the findings in Ennaji (1997), the introduction of Berber in schools is favored by almost all Berberophones and by 72% of Arabophones. However, while 75% of Berberophones consider their language to be rich and beautiful, only a small number of non-native Berbers share this idea. On the other hand, many respondents state that Berber can be taught at all levels of education (see Ennaji 1997). Arabophones view Berber as a ‘dialect’ which is not worth introducing in schools because it is neither standardized nor codified and is neither a language of wider communication, nor a language with a rich written literature. As to attitudes to French, they are for the most part favorable because French represents a ‘window on the world’, as it is associated with science and technology. However, Arabic-educated Moroccans favor Standard or Classical Arabic because it is the vehicle of a great Arabic
ARABIC SOCIOLINGUISTICS AND DIVERSITY IN MOROCCO 275 275
literary tradition and is a symbol of Moroccan cultural identity. French for them symbolizes Western culture, which conflicts with Arab-Muslim traditions, and could be a source of alienation and dependence. 6. Conclusion Since independence, the debate has centered on two major oppositions: the first opposition is between Arabophones and Francophones. The former think that Standard Arabic represents Moroccan identity and cultural authenticity, while French is the language of the colonizer and the expression of Western culture. For the Francophones, Standard Arabic is inadequate for modern needs, whereas French is more adequate in science and technology. This opposition reflects a controversy that has often characterized two different visions of the world (Arab-Muslim and Western). The second opposition is between Arabophones and Berberophones. The Berberist discourse treats Standard Arabic as an archaic language which does not satisfy the exigencies of modernity, and as a reminiscence of the Arab conquest. For Berberophones, Berber and Moroccan Arabic are the mother tongues which express authenticity and people’s daily life, while French is associated with modernity and forward thinking. On another level, the Arabization policy has in a way failed to attain its objectives because it has not been well planned and has not taken into consideration the multilingual and multicultural context of Morocco and the necessity to modernize its system of education. A positive approach to Arabization would mean the use, alongside Standard Arabic, of foreign languages such as French, Spanish, and English, as well as the promotion of mother tongues, viz. Moroccan Arabic and Berber.
276
MOHA ENNAJI
REFERENCES Aljabri, M.A. 1995. The Question of Identity (in Arabic). Beirut: Publications of the Center for Arab Unity Studies. El Biad. 1991. “The Role of Some Population Sectors in the Progress of Arabization, in Morocco”. International Journal of the Sociology of Language 87.27-44. Ennaji, M. 2005. Multilingualism, Cultural Identity and Education in Morocco. New York: Springer. _____. 2002a. “Language Contact, Arabization Policy and Education in Morocco”. Language Contact and Language Conflict in Arabic ed. by Aleya Rouchdy, 323. London: Routlege/Curzon. _____. 2001. “De la Diglossie à la Quadriglossie”. Languages and Linguistics 8:.964. _____, ed. 1997. ”Berber Sociolinguistics”. International Journal of the Sociology of Language 123. _____, ed. 1995. “Sociolinguistics in Morocco”. International Journal of the Sociology of Language 112. _____. 1991. “Aspects of Multilingualism in the Maghreb”. Sociolinguistics of the Maghreb. Special Issue. International Journal of the Sociology of Language 87.7-25. _____. 1988. “Language Planning in Morocco and Changes in Arabic”. International Journal of the Sociology of Language 74.9-39. Ennaji, M. & Sadiqi, F. 1994. Applications of Modern Linguistics. Casablanca: Afrique-Orient. Faik, S. 1999. “The Status of Berber: A permanent challenge to language policy in Morocco”. In Language and Society in the Middle East and North Africa ed. by Yasir Suleiman, 137-153. London: Curzon. Ferguson, C. 1972[1959]. “Diglossia”. In Language and Social Context ed. by Pier Paolo Giglioli (1972), 232-251. Harmondsworth: Penguin. Ghallab, A. 1999. The Challenges of Francophony (in Arabic). Nationalist Texts Series. Casablabca: Afrique Orient Grandguillaume, G. 1983. Arabisation et Politique Linguistique au Maghreb. Paris: Maisonneuve et Larose. Khatibi, A. 1983. Maghreb Pluriel. Paris: Denoël. Marçais, W. 1930-1931. “La Diglossie: Un pélérinage aux sources”. Bulletin de la Société Linguistique de Paris 76.1: 61-98. Poindexter, M. 1991. “Subscription Television in the Third World: The Moroccan experience”. Journal of Communication 41.3:26-39. Sadiqi, F. 2003. Women, Gender and Language in Morocco. Leiden: Brill. Santucci, J. C. 1986. “Le Français au Maghreb : Situation Générale et Perspectives d’Avenir”. In Nouveaux Enjeux Culturels au Maghreb ed. by J. Henry, 137-157. Paris : CNRS. Wagner, D. A. 1993. Literacy, Culture, and Development: Becoming literate in Morocco. New York: Cambridge University Press. Youssi, A. 1995. “The Moroccan Triglossia: Facts and implications.” International Journal of the Sociology of Language 11.29-43. ____. 1983. “La triglossie dans la typologie linguistique”. La Linguistique 19.2:7183.
THE GENDERED USE OF ARABIC AND OTHER LANGUAGES IN MOROCCO*
Fatima Sadiqi Sidi Mohamed Ben Abdellah University
1. Introduction This paper considers language and gender in present-day Morocco with a focus on the gendered use of Standard Arabic. It highlights the fact that women in this country are ethnically, socio-economically, and educationally differentiated, and that this differentiation is reflected in their everyday language use. The prevalent Western view of Moroccan and Arab-Muslim women in general misses such distinctions, resulting in disparities between women (Sadiqi 2003). The argument made in this paper is that Moroccan women use the rich linguistic resources that are available to them either to perpetuate or to subvert the conventional gender roles assigned to them within Moroccan culture. Illiterate women use Moroccan Arabic and/or Berber for their personal and social expressions, and educated women use Standard Arabic and French in addition to one or both mother tongues. Moroccan women may also “switch” from one language to another. Regardless of their socio-economic status and educational level, Moroccan women are never linguistically passive; they assert themselves and negotiate power in a linguistically complex environment. This paper focuses on the agency of Moroccan women as demonstrated through language. Moroccan women’s linguistic agency is part and parcel of their struggle for self-assertion. The nature of women’s linguistic agency *
I acknowledge with gratitude the help of Elabbas Benmamoun who invited me to present this paper at the 2005 ALS Conference.
278
FATIMA SADIQI
depends upon their socio-economic status and educational level. Women’s linguistic agency consists of strategies of communication that women use to maximize their chances in achieving gains in real life contexts. This paper is divided into four sections. The first section is a general presentation of the linguistic situation in postcolonial Morocco. The second section deals with the ways in which languages in Morocco interact with gender in everyday contexts. The third section focuses on monolingual (illiterate) women’s linguistic strategies of communication, and the fourth section concerns the communicative modes of multilingual (literate) women. 2. The Linguistic Situation in Morocco Language has always been affected by history and culture. It is a strong component of one’s historical and cultural identity, as well as the deepest layers of the personality (Lacan 1966, Foucault 1980). A particularly important phase in Morocco’s modern history is the era of French colonization (1912-1956). Language played a role in this era, as well as in postcolonial Morocco. The powerful, written languages were utilized by both the colonized and the rulers of the newly-independent Morocco. The former did so to maintain control over the colonized and the latter to strengthen state-building. As for the oral and less prestigious languages, they are, up to the present time, still largely confined to the private sphere of the home and intimate settings. The French colonizers justified their occupancy of Morocco by referring to it as a “civilizing mission” (mission civilisatrice). Consequently, French was introduced to Morocco as a “civilized” and “superior” language. It was used in most spheres of political power such as the government, the administration, and education. This calculated strategy was part of the broader ideology of colonialism whereby the military supremacy of the French and their self-adopted role as enlighteners and decision-makers led to a construction and reading of the Moroccans as the inferior and backward other. This reading also established Western style modernity as the sole remedy for Morocco’s backwardness. In the name of civilizing the Moroccans while respecting the indigenous culture, the French elevated their own language and marginalized the Moroccan languages (and the men and women who used them).
THE GENDERED USE OF ARABIC IN MOROCCO
279
The colonizers were aware of the deep cultural differences among the Moroccans and tried to exploit them. For example, in supporting the 1933 Dahir Berbère (Berber Decree), according to which Berbers did not need to abide by the Islamic Law and could instead use their local tribal laws, as well as creating the famous “Collège berbère” (Berber High School) in the town of Azrou and numerous Franco-Berber schools in the Atlas and the southern plains of Morocco, the French colonizers adopted the “divide to rule” strategy. The “divide to rule” strategy was also used in education. According to a “brochure” edited in 1994 by Patrick Cavaglieri, a teacher in the Casablanca Lycée Lyautey, when the French colonizers first entered Morocco in 1912, the latter, like most Muslim countries, had a network of primary and secondary traditional education: 150,000 Moroccan pupils were in msids (Qur’anic schools) and 2,500 in medersas (advanced Qur’anic schools). Most of these pupils were male. Msids and medersas were better organized in cities than in rural areas. At the age of 12 or 13, the most brilliant pupils had access to yet more advanced learning in mosques or zaouias, where they mastered the fundamental principles of Arabic grammar and Islamic law as students of the prestigious university of Al-Qarwiyyin. In their conception of education in Morocco, the French wanted to form an intellectual elite to negotiate and collaborate with; hence they created what is referred to as “Ecoles de fils de notables” (schools of the sons of noblemen), where education was delivered to boys in French. It was only at the end of World War II that Arabic was used in these schools. These schools accommodated 1,468 pupils in 1913, 21,400 at the eve of World War II, and 314,800 by 1955. The successful pupils then continued their education in the second cycle of Muslim lycées which offered the “Baccalauréat marocain.” The number of Moroccan pupils in these lycées were 608 in 1938 and 6,712 in 1955. As for French lycées, they were created in 1944 and started by admitting only European pupils. Gradually, small numbers of Moroccan students were accepted in these lycées, and in 1951, the Moroccan pupils constituted 12% of all students. Urban schools were created with lesser means for children of middle classes, such as the Franco-Muslim rural schools, which offered professional formation. However, such schools received only a limited number of students:
280
FATIMA SADIQI
1,300 in 1938 and 7,500 in 1955. Other schools, such as the Universal Israeli Alliance, were also created. In parallel to these schools, the traditional Moroccan system of education was supported by the emergence of private Muslim schools, mostly conceived as a symbol of the nationalist movement. All in all, during the French Protectorate, both the indigenous Moroccans and the French colonizers focused on the education of boys, the former with the aim of creating strong male nationalists and the latter with the aim of creating adequate male interlocutors. After Morocco’s independence in 1956, the newly autonomous citizens sought to construct their own authenticity in the face of a disillusioning modernity that excluded them as real agents. The postcolonial era has witnessed the dilemma caused by this deep disillusionment. In this overall transition, women benefited far less than men. In fact, although women participated in the struggle for independence and although both boys and girls have had access to education in urban areas since right after independence, women did not accede to political power and their rate of illiteracy continued to be much higher than that of men. Women realized bitterly that the struggle for independence promoted the culture of the elite, strengthened Arabic as the language of the nation and Islam as the state religion, relegating women and the mother tongues they spoke to the private sphere. The postcolonial linguistic situation in Morocco is thus complex. It has resulted in establishing multilingualism as an important component of Moroccan culture (Sadiqi 2003). Multilingualism interacts significantly with ethnicity, gender, class and educational opportunities. Four major languages started to be used by men and women in Morocco: Standard Arabic, French, Berber, and Moroccan Arabic. Whereas the first two languages have written forms, are taught at school, and are perceived as “literate,” the latter two do not have written forms, are not taught at school, and are perceived as “oral” and “illiterate.” In spite of genuine efforts to teach Berber, the language is still largely excluded from school and from print. Being largely illiterate, Moroccan women, especially in Berberophone areas like the Souss, the Atlas and the Rif, are more closely associated with the oral languages, especially Berber. The geographical position of Morocco at the crossroads of Africa and Europe, its deep historical roots in Africa, its proximity to the
THE GENDERED USE OF ARABIC IN MOROCCO
281
Western world (only 14 kilometers separate it from Spain), as well as its Mediterranean heritage, are factors that explain Morocco’s multilingualism and multiculturalism. The acquisition of literate languages (Standard Arabic and French) can be achieved only through schooling, and as job opportunities are possible only with degrees and mastery of literate languages, multilingualism is perceived in the Moroccan culture as a positive social-promoter–it increases the individual’s potential for communication and opens up horizons so far as jobs and social ascension are concerned. Multilingualism as a cultural component of Moroccan culture interacts significantly with other strong components of the culture (Sadiqi 2003). For example, the state religion, Islam, is closely related to Classical Arabic, the official language of Morocco, but not to Berber. Both Islam and Standard Arabic have been established as sacred in written history. These facts are translated politically in the 1962 Constitution: Article 1 of the Constitution stipulates: “Morocco is an Arab and a Muslim country,” Article 2 that “Islam is the official religion of the State,” and Article 3 that “The Arabic language is the official and national language of the State.” History, Islam and Standard Arabic have been gradually constructed by the media and other forms of public power as typical male domains in Moroccan culture (Sadiqi 2003). Being a literate type of high knowledge, the written history of Morocco has been largely recorded by men; the voices of women in this history are still barely perceptible. For example, the roles given to men in this history are often glorified and women are presented as supporters of men rather than independent actors in history-making. The history written by men is different from the one written by women as shown in recent social history accounts of the struggle of independence, such as the book The Year of the Elephant by Leila Abouzeid (Sadiqi 2003). As a result, the images of women in Moroccan history have, up to recent times, been exclusively presented from a male perspective. It is only after independence that women started to have access to education and to assert themselves in the public spaces. Seemingly, although Moroccan women are overwhelmingly Muslim, they do not relate to religion in the same way as men. The fact that they have for centuries been excluded from the public sphere in
282
FATIMA SADIQI
which mosques are located has distanced women from publicly practicing religion. As written history and Islam are closely related to Standard Arabic, a non-mother tongue that is learnt at school, history, Islam and Standard Arabic were withheld from women, the majority of whom are still illiterate up to the present time. The fact that multilingualism is power-laden means that it has significant social meanings and implications for gender dynamics in everyday interactions. The language and gender interaction in Morocco is also closely related to the social status of women, namely their geographical origin (urban vs. rural), their class (rich vs. poor), their age, and their level of education. For example, urban middle- and upper-class women have more access to education than rural (usually poor) women (Sadiqi 2003). The illiteracy rate among these women is 60% (48% in urban areas and 95.5 % in rural areas). Beginning in the mid-1980s, growing demands for human rights crystallized into demands for women’s rights and cultural (language) rights. Feminist projects in Morocco have been initiated and led by both women and men. These feminist projects may be broadly categorized as being liberal or Islamist. Each of these two trends may be further subdivided. Both trends defend Islam and denounce patriarchal customary practices. Further, modernity is claimed by both liberals and Islamists. Islamists adopt modernity to gain credibility inside and outside their country, and secularists adopt Islam for the same reason and to obtain an aura of authenticity. Both groups utilize European languages, especially French and English. Specific customs at home and tastes in clothing, furniture, and cuisine also represent both trends. Islamists oppose sexual independence and freedom but they adopt other modern views. Even the veil has undergone changes in meanings from a political symbol to a style of dressing. Ideals are appropriated and constantly negotiated according to specific aims. The media and unemployment make women in both trends polyvocal, multilingual, and complex, a fact that calls for constant historicization of the debates around women. Within Moroccan feminist projects and as a reaction to marginalization and exclusion, women began to rewrite their history. Most of these writings express women’s bitterness in the postcolonial era (Abou-Zeid 1983). Women felt betrayed as they had not benefited
THE GENDERED USE OF ARABIC IN MOROCCO
283
from the struggle for independence despite their participation in it (Abou-Zeid 1983). This literature emerged from the peripheral, feminist, essentially oral-oriented testimonies that challenged the hegemonic narratives of linear historicism that has, up to now, legitimized the national elites and perpetuated gender and class discriminations. Women’s studies and gender studies programs have been established in Moroccan universities. (The first women’s studies program was established in 1999 at the University of Rabat and the first gender studies program was launched in 2001 at the University of Fes.) Efforts have been made to recover women’s voices of the past, to create more awareness of women’s present needs, and to prepare new sources on women in North Africa, including oral and written texts by women. The study of gender and language is therefore relevant to women’s goals in the postcolonial period. In this as in other searches for women’s voices, one should bear in mind the complex factors set into motion by colonialism, modernity, the search for authenticity and the relationship between the East and the West. 3. The Interaction of Language and Gender in Morocco In principle, Moroccans may use one, two, and sometimes three or more languages in their everyday life. Gender influences language use: women often do not have the same choices as men. According to a study by the author (Sadiqi 2003), Berber is not only used more by women than men but it has always been associated with women. Berber may thus be termed a typically female language, while Standard Arabic is a typically male language, and Moroccan Arabic is both a female and male language although it is used more by men in rural areas. French is a typically urban language and is used more by women than men. This linguistic repartition has its roots in the social functions of the four languages in Morocco and the way these functions interact with gender perception and gender negotiation in everyday life. 3.1 Standard Arabic and gender Standard Arabic has a special social function in Moroccan society and culture. It has always been a language of power, as well as a high and status-marked language of important social, religious, legal, and political rituals (Kaplan 1938). Dominant groups in a society achieve
284
FATIMA SADIQI
power mainly through control of high languages. As Mary Kaplan rightly puts it, “refusal of access to public language is one of the major forms of the oppression of women within a social class as well as in trans-class situations” (Kaplan 1938: 6). The strongest cultural aspect of Standard Arabic is the fact that it is perceived as the voice of Islam and the symbol of a glorious past. Since the arrival of Islam in Morocco around A.D. 700, Standard Arabic has remained the language of Arab identity, Arab literature and poetry, as well as religious scholarship and practice. Just after independence, Morocco joined the Arab League in which Standard Arabic is the lingua franca. The gender aspect of Standard Arabic resides in the fact that being the medium of the public expression of religion and politics, it is more accessible to and significant for men, as they are more closely defined in connection with public spaces such as the mosque, the government, etc. whereas women are considered to inhabit or rightfully occupy the private sphere—the home. Moroccan men have always identified with the public domain, which in turn has always defined the concept of maleness in Morocco (Rashidi 2000). As a result, although Moroccan women strongly feel that they belong to the official religion of the country, they do not really participate in public religious practices. This is reflected in the fact that their linguistic space in Standard Arabic (through which religion is expressed) is rather limited. For example, women in Morocco, and in the Arab-Muslim countries in general, do not publicly announce prayers, pray aloud, or pronounce religious formulae that accompany important religious rites. This explains the non-use of words like imama (female leader of prayers), fqiha (female religious consultant), muftiya (female religious legislator), musaliya (female leader of prayers), muqri’ah (female reader of the Qur’an), and mujewwida (female reciter of the Qur’an). While men attend the mosque and participate in the daily ritual of public prayers, women generally pray at home and seek religious baraka (blessing) in the holy sanctuaries of deceased religious saints. Being illiterate, the majority of Moroccan women are excluded from the spheres of public power as Standard Arabic is accessible only through schooling. Even when women are proficient in Standard
THE GENDERED USE OF ARABIC IN MOROCCO
285
Arabic, they tend to use it less frequently than men because of men’s more positive attitude toward women’s proficiency in French (see section below on French). As a consequence of Moroccan women’s exclusion from the domains where Standard Arabic is publicly used, a general tendency to disqualify women as competent public speakers in the Moroccan society has developed. This state of affairs created an apparent paradox in Moroccan society: women are perceived by society at large as conservative in the sense that they preserve oral culture by speaking Berber and transmitting cultural values and non-conservative because they do not use the conservative means of public linguistic expression: Standard Arabic. The paradox makes sense politically in that it highlights the political status of oral and written mediums of language. It is true that both Standard Arabic and Berber are socially defined as conservative, but they are so in very different ways: whereas Berber is perceived as conservative because it expresses traditional oral literature and folklore, Standard Arabic is perceived as conservative because it perpetuates traditional written literature, history, and poetry in addition to the fact that it is the language of the Qur’an, the holy book of all Muslims. Consequently, women relate more closely to Berber and less to Standard Arabic. However, from the early 1980s on, educated women of the feminist movement (academics and politicians) have started to use Standard Arabic in the media. For example, writer Leila Abouzeid first wrote Year of the Elephant and later novels such as The Last Chapter in Arabic. Year of the Elephant was the first novel by a Moroccan woman to be translated from Arabic to English. The novel received critical acclaim in the West and eventually gained praise in Morocco. In addition to using Standard Arabic in writing, educated women of the feminist movement also started to use this language on TV from the 1990s onward. This particular use of Standard Arabic is a reaction against the Islamists’ attempts to discredit these women by accusing them of blindly following the West. This strategic use has the double effect of both creating a “rapprochement” between them and the growing Arabic-using Islamist feminists and creating space for them to engage in ijtihad “Qur’an interpretation” and, thus, resist exclusion from the powerful religious public domain. This use of Standard Arabic
286
FATIMA SADIQI
by women is also a means of self-empowerment in the public space where Arabic has prestige. 3.2 Berber and gender Berber is the oldest language in Morocco and North Africa (Ennaji 1997). Although this language has never been associated with a divine written text, it has survived for over 5,000 years (Sadiqi 1997). There are three major dialects of Berber in Morocco: Tashelhit (used in the south of Morocco), Tamazight (used in central Morocco), and Tarifit (used in the north of the country). Many factors have contributed to the maintenance of Berber in Morocco: the mother tongue status of the language, female illiteracy, male migration from rural to urban areas or European countries, and French. Being a native language, Berber possesses the historicity, dynamism and vitality of mother tongues. Berber has primarily been maintained in rural and semi-urban areas, and in urban areas, it is still used primarily in homes and intimate gatherings. In the latter context, it is generally perceived as a token of solidarity. Berber is also the language of communication between the (male) migrants to the cities or Europe and their families left behind. Paradoxically, the presence of French in Morocco helped to maintain Berber. Through the dissemination of education in the French language, language itself has gradually become less associated with its religious base in the minds of Moroccans, a fact that tacitly legitimized the use of Berber in everyday life and improved attitudes toward it. The factors that have ensured the maintenance of Berber are linked to women in a significant way: women are the ones who have perpetuated the language as a mother tongue and they are the ones who gave it its deep and emotional tie to the self. Women are also the ones who have suffered more from illiteracy and stayed home to take care of the children when the men migrate. The factors enhancing Berber associate the language with the private sphere and the language of ancestors, and explain the relatively inferior social status that Berber possesses in comparison with the other Moroccan languages. To the extent that Berber is the language of cultural identity, home, the family, village affiliation, intimacy, traditions, orality, and nostalgia to a remote past, it perpetuates attributes that are considered female in Moroccan culture. The absence of Berber from the powerful key
THE GENDERED USE OF ARABIC IN MOROCCO
287
institutional areas reinforces these attributes. Indeed, the fate of Berber has always paralleled the fate of women in Morocco. For example, the recent demands for more official recognition of Berber have been accompanied by demands for more women’s civil rights. Demands for human rights go hand in hand with demands for cultural rights. 3.3 Moroccan Arabic and gender Moroccan Arabic is the lingua franca in Morocco. The need for a lingua franca is motivated by the presence of three major Berber dialects and many sub-dialects. The speakers of the three Berber dialects have often recourses to Moroccan Arabic when communicating among themselves. Although Moroccan Arabic is used by both women and men, preference for this language sometimes varies on the basis of gender. For example, in rural areas, Berber women use Moroccan Arabic less than Berber men because they are more confined to their homes. However, outside the home, Moroccan Arabic is used by Berbers and Arabs of both sexes except in remote areas where Berber is used by both sexes. In urban centers, educated women shift from Berber to Moroccan Arabic and from the latter to French more than men (Sadiqi 2003). A reason may be that women aspire more to social prestige as they need it more than men given the heavy patriarchy to which both sexes are subject. As Moroccan Arabic is not restricted to strictly private contexts, it is less of a “female” language than Berber. However, in lacking a written form, Moroccan Arabic is generally perceived in society as a debased form of Standard Arabic. 3.4 French and gender French is an urban superordinate second language that is closely linked to education. Over the years, it has become very useful in the private sector. French is also necessary for obtaining employment and is thus positively perceived as a symbol of modernity, enlightenment, and openness to the Western world. The general attitude toward French is positive. Like Moroccan Arabic, French is used by both men and women, but it interacts significantly with gender: whereas men use French in the higher administrative and military positions, thus exploiting the emasculating aspect that usually accompanies colonial
288
FATIMA SADIQI
languages, women benefit from the social prestige aspect of this language. They derive social power from being considered civilized and modern. Even in conservative families, a woman speaking French to her children is perceived positively. Moroccan women use the French language both in everyday life and for more formal settings, i.e. creative writing, journalism and university study. Since the end of France’s influence in the country in 1956, French has continued to be used by authors, journalists, professors and academics both in and outside of the country, as is the case with Tahar Benjelloune, Fatema Mernissi, Driss Chraibi, and others. Women tend to display proficiency in French more than proficiency in Standard Arabic. This behavior may be linked to the fact that men are generally more favorable to women’s proficiency in French than to their proficiency in Standard Arabic. The reason for this is that French is less related to cultural identity than Standard Arabic, and, thus, less threatening to the male status quo. Men are more favorable to women speaking French than they are to women behaving in a French (Westernized) way because women’s use of French is a guarantee that they will speak it (and teach it) to their children. Behaving in a French way is generally perceived as stripping women of their authenticity as members of their own community. It is also regarded as a sign of “too much emancipation” that clashes with Moroccan cultural values. This makes sense in Moroccan patriarchal and sexist culture. Women are aware of this and use French to gain, use, and maintain social power. Overall, French is more of a female language than Moroccan Arabic. When compared to Standard Arabic, French displays a different aspect: both languages have social power, but each power carries a specific symbolic meaning in the Moroccan context: French is crucial in Moroccan postcolonial administration and politics, and Standard Arabic is a symbol of a glorious past and cultural identity. The two symbolic powers serve men more than women; men appropriate the symbolic powers of French and Standard Arabic (they hold the highest positions in politics, administration and business) and women are more associated with the modern (but alien) aspect of the two languages. Their use of French is socially perceived positively only in relation to fostering good citizens.
THE GENDERED USE OF ARABIC IN MOROCCO
289
The above overview reveals that languages interact with gender in Morocco. Women are closer to Berber and Moroccan Arabic than men because Moroccan society clings to its indigenous traditions but assigns the responsibility to guard those traditions to women. On the other hand, the majority of women are distanced from written languages because of the high illiteracy rates alluded to above. Of the two written languages, educated women are closer to French. 4. Moroccan Women’s Strategies of Communication Moroccan women’s communicative strategies are primarily dictated by their geographical origin and level of education. Being predominantly illiterate, rural women use oral literature to empower themselves, and being educated, urban women use their language skills (code-switching) for the same purpose. Women’s communicative strategies are highly structured; they show that Moroccan women assert themselves in a rigidly patriarchal society although they are not generally associated with the country’s more powerful languages 4.1 Illiterate women’s strategies of communication Illiterate women in Morocco use oral literature genres to mark their presence in the community and to sometimes subvert the roles that patriarchy assigns them. The most important oral genres used by women are gossip, folktales, folk songs, and halqa “marketplace oratory”. Gossip is an important female genre in Moroccan culture. The term “gossip” is used in the literature as a cultural trivialization of an authentic female means of expression and mode of speech (Jones 1980). Gossip is often celebrated as a typically female verbal culture that has played a significant universal role, historically and in the present (Jones 1980). It is a means of negotiating reputations and redefining values. Gossip is of two main types: negative (or malicious) and positive (or complimentary) (Besnier 1990). At the level of discourse, gossip is characterized by first-person narratives of past experiences, as well as of personal accounts and memories. The discourse of gossip is also characterized by a mixture of truth, lies, and legend.
290
FATIMA SADIQI
A characteristic of Moroccan women’s gossip is that it depends greatly on the complicity of the participants in gatherings and flourishes in private settings like public baths, tea visits, the hairdresser’s shop, and family celebrations. Gossip also relies on an exclusive audience and the absence of the person(s) who is/are the subject of gossip. Gossip is also characterized by emotional involvement; it “publicizes” private matters and problematizes the dichotomy of public/private. Although gossip is an oral folkloric event that is practiced and appreciated by both women and men, society does not regard men’s gossip as negative. Moroccan female gossip is determined by both setting and content. So far as setting is concerned, gossip takes place in small groups of two or more; it is structured in both conversational turn-taking and monologues which occur in various oral genres: narratives, jokes, proverbs, etc. It takes place in all-women groups. As far as content is concerned, gossip topics turn around social themes, mainly divorce, marriage, magic, spirits, etc. The topic and length of gossip depend on the immediate interests of particular women. Moroccan women perceive the activity of gossiping as an opportunity to renegotiate the values and relations of dominance in their immediate environment. For example, upper and middle class urban women, the majority of which live in nuclear families, reconstruct the traditional mother-in-law/daughter-in-law power tension through a power relationship between them and their maids. Maids are described as subordinate, debased, threatening the family cohesion, etc. This type of gossip may be regarded as an attempt to negate the centrality of maids to the maintenance of social and family order. On the other hand, maids often construct their female employers as ugly, old, bitter, and snobbish (for examples, see Sadiqi 2003). In addition to gossip, folktales are another type of female oral genre in Morocco. Storytelling is a typical female occupation, especially in rural areas. As with gossip, storytelling takes place in private and rather intimate settings and contrasts sharply with male urban storytelling which usually takes place in public marketplaces such as Jamaa lefna in Marrakech. Folktales are usually told by older women and are characterized by narrative discourse whereby gender and class are often
THE GENDERED USE OF ARABIC IN MOROCCO
291
constructed. The languages of female Moroccan folktales are Berber and Moroccan Arabic. Moroccan women perceive storytelling as a highly worthwhile enterprise. They take the activity of telling stories very seriously; they dramatize events and overemphasize actions in order to make their stories sound important. A way in which women highlight the significance of a tale is by generously giving information about themselves. When telling stories, Moroccan women involve themselves by attributing vision to their opinions and presenting themselves as anticipators of events and actions, without, however, overtly committing themselves. They also make use of moral judgments and critical evaluation, especially of other women. Storytelling is a strong means of maintaining and perpetuating power inside the family, especially in larger rural households. Grandmothers reinforce their status in the household by establishing strong links with their (usually young) audiences through unfinished stories and suspense. This is understandable in settings where older women feel that younger daughters-in-law are gaining power through having children. Through storytelling to these children, women seek to recuperate the children and make themselves indispensable at home. On a more abstract level, Moroccan women manage to empower themselves by expressing women’s intelligence and victory over men in stories. In this way, storytelling is a reaction to marginalization. Older women telling long tales are far from being simple-minded entertainers; they are perceived in the family as almost mystical female figures. They exhibit powerful thinking, memory, and skillful use of psychological knowledge of human nature. They also make the possibility of transforming the world easier to grasp. These attributes are very much associated with the image of the grandmother in Moroccan culture. Another female oral genre in Morocco is folk songs. Moroccan folk songs are sung in Berber or Moroccan Arabic. These songs sharply contrast with “high” songs that are sung in Standard Arabic. Folk songs are usually delivered by illiterate people. Women folk singers have always played an important part in oral literature and culture in Morocco. Female folk singers are usually referred to as shikhats (feminine of shiwukhs). However, whereas the term shiwukhs is neutral, the term shikhats is pejorative and is often used as a synonym
292
FATIMA SADIQI
for “prostitutes.” This appellation greatly marginalizes and damages the reputation of these folk singers. Shikhats are professional women groups of singers of all ages that appeared in Morocco in the 1950s. These women are poor and their singing is perceived as a reaction to marginalization by family and society. Female folk singers sing in allfemale or mixed-sex groups. The themes that are treated in their lyrics vary from love to rejection of colonialism, support of political authorities, etc. Another, more sophisticated type of women’s songs is lmalhun. There are three types of lmalhun that are sung by women: la’rubiyat, salamat (love letter exchanging), and tadukan (lullabies). Female lmalhun songs are different from men’s corresponding songs: first, these songs are performed as part of play or action (e.g. putting babies to sleep by rocking their cradle). These songs focus more closely on the action being performed than on the lyrics of the song. This is a female means of transmitting secret messages to an addressee. Secondly, female songs are usually brief in comparison to men’s. Short songs presuppose more effort in condensing meanings, intelligence, and skill in transmitting messages. Thirdly, women’s songs are usually anonymous, whereas men tend to sign their songs. Women’s preference to remain unknown is concordant with the indirectness and the subtlety of their songs, as well as with the “unauthorized” aspect of Moroccan oral literature in general. In sum, Moroccan female popular songs constitute a marginalized female oral genre. As in storytelling, women often involve sections that would empower them and their art. For example, many female songs make fun of men, especially those sung in marriage ceremonies and in all-women gatherings. Finally, the typically Moroccan oral genre of halqa “public oratory” has started to be appropriated by women. The setting of halqa or public oratory is usually the public marketplace. The discourse of halqa is hybrid: it is both religious and obscene. This discourse is also characterized by curses, oaths, monologues, blessings, and usually aims at involving the audience by making it participate in the halqa rituals. Another characteristic of the halqa discourse is that it is loaded with misogynistic ideology: women are usually portrayed as social agitators and promoters of social chaos. This discourse is also characterized by the use of taboo words and expressions that are legitimized by reference to religious sanctioning expressions such as la
THE GENDERED USE OF ARABIC IN MOROCCO
293
haya’a fi din “there is no shame in religion”. The women who speak in halqas are poor, illiterate and old. These women address an audience of men and engage in the same misogynistic discourse as men. Although this practice is not feminist, the very presence of women orators in Moroccan marketplaces certainly is. Deborah Kapchan writes, “This [halqa’s] feminized discourse, although full of patriarchal traces, nonetheless spins out from itself aetiolating its own boundaries, feeding on its own excess and metamorphosing into other forms” (Kapchan 1996: 165). Marketplace female orators may sometimes include genuine poets. Mririda is one such poet; she is Berber and became very famous after her death; her poems were gathered, translated into French, and published in a book (Euloge 1959). However, female public orators, like female singers, are perceived as debased and low; they are doubly marginalized—first as women and again as lower class. At the end of this section, it is worth pointing out that although illiterate Moroccan women have been associated with oral skills and oral literature, the attitude to their modes of expression is not perceived positively on the social level, although the artistic value of this style may be appreciated by society at large. The reason for this resides in the deeply ingrained stereotype that women’s language does not have public authority. Moroccan women’s skills and oral literature may be considered as sub-cultural varieties which characterize specific, rural, all-women peer groups. These skills and oral genres constitute a linguistic reaction to social marginalization. Moroccan women’s communicative styles are a reaction to exclusion from powerful means of expression; they are also a reaction to a male-dominated culture. It is only by taking into account the heavily patriarchal environment in which Moroccan illiterate women live that one may appreciate the extent of their agency and the extreme resourcefulness of their creativity. Moroccan female skills and oral genres prove that Moroccan women are far from being inarticulate or passive consumers of daily knowledge. The female genres in Morocco serve as strategies of resistance to prejudice and linguistic restriction; they are a means by which rural and illiterate women differentiate themselves from men and from other (urban, literate) women.
294
FATIMA SADIQI
4.2 Literate women’s strategies of communication Moroccan educated women use Berber and/or Moroccan Arabic, Standard Arabic, and French. Some may even use English and/or Spanish. Their strategies of communication are different from the ones used by illiterate women. The most important such strategy is codeswitching. Code-switching is defined in sociolinguistics as the use of more than one language simultaneously in conversation. Codeswitching is a characterizing feature of multilingual settings like Morocco. Linguists have underlined that code-switching is a linguistically self-sufficient style of speech and that code-switchers master the languages they mix and are perfectly competent in them (Gumperz 1982). In implying choice on the part of the code-switcher, codeswitching is a linguistically healthy practice. It is a rule-governed phenomenon where the grammar of the mother tongue prevails in the structure of sentences and is completed by the lexicon and some minor functional words from the second language. Code-switching presupposes bi- or multi-lingualism and, thus, indicates positive social attributes in Moroccan society. It also indicates composite identities that are aware of the social value of each of the languages used. Codeswitching presupposes competence not only in two linguistic codes, but also in appropriately manipulating the two codes in real life contexts. As four major languages are used in Morocco, code-switching often takes place between a more and a less powerful language and is bound to be sensitive to gender. A prevalent type of code-switching involves Moroccan Arabic and French. Code-switching involving Moroccan Arabic and Berber is often present in the speech of Berber bilinguals, but this type of code-switching is not gender-sensitive since it involves both women and men. Switching between Berber and French is rather rare, although switching from Rifian Berber to and from Spanish occurs in the north of Morocco. Code-switching between Moroccan Arabic and French is by far the most widespread and the most revealing. It is common only in urban areas and involves educated bilingual women. Studies have shown that this type of code-switching is more prevalent in the speech of women than in that of men (Nortier 1989, Lahlou 1991). Frequently, women insert whole sentences in French into their Moroccan Arabic or Berber conversations.
THE GENDERED USE OF ARABIC IN MOROCCO
295
In urban settings, code-switching is a female type of communicative style. This skill is encouraged since childhood, as little girls are more strongly encouraged to use French in their Arabic than are little boys. This is more so the case in the upper and middle class families who are very much in favor of modernity and openness to Western values. This practice is continued into adolescence when female teenagers include French more frequently than male teenagers. This code-switching is often perceived by young females as a means of group solidarity and a means of showing difference from boys. The use of French in childhood and adolescence is naturally carried into adult life. In fact, Moroccan adult women use code-switching as a means of controlling conversation and keeping the floor for the necessary time without being interrupted. The use of code-switching by women in mixed groups is a means of self-empowerment. Many males are put off by this way of communication and prefer to step back or remain silent. Given the overall sociolinguistic status of Moroccan Arabic and French, Moroccan women use code-switching in order to score personal gains in everyday conversations. They are aware that French is prestigious in the Moroccan society, and as they are not easily given the opportunity to use French at the higher levels of decision-making, they overuse it in conversation. Through code-switching, women easily succeed in getting and maintaining attention. In general, when borrowing words from French, a man will mold the loans in the general morphosyntactic structure of Moroccan Arabic, whereas a woman will pronounce the loans as they are pronounced in French. Thus, whereas a woman would say le frigidaire “fridge” a man would pronounce the word as lfrijidir where the sound l is prefixed to the word to make it sound more like Moroccan Arabic. Further, whereas a woman would pronounce the ‘r’ sound in the French way (a uvular trill), a man would readily use the Arabic rolled ‘r’. The following are more such examples: (1) French words Garage France Journal Train
Female version Garage La France Le journal Le train
Male version lgaraj fransa jjernan tran
296 Veste Jardin Vitesse
FATIMA SADIQI
La veste Le jardin La vitesse
lfista jjerda lfitas
Women tend to prefix the French words with the French article (le or la) whereas men would readily use the Moroccan Arabic l which originates from the Standard Arabic definite article al but which is not used as an article but as part of words in Moroccan Arabic. These phonological adaptations are more evident in words that have been borrowed relatively recently from French into Moroccan Arabic. Earlier borrowed words that have become part of Moroccan Arabic are pronounced in the same way by women and men. Examples of the latter are shanty for sentier “small road”, ttomobil for automobile “car”, and lkartab for cartable “school bag”. In general, women differ from men in Morocco so far as the morphological molding of borrowed words is concerned: women use less of it than men. Thus, whereas a Moroccan man would easily say rkebt f tran lyum “I have taken the train today”, a woman would use the French counterpart as it is used in French and say rkebt f le train lyum. The following sentences are produced by women in the city of Fez (Ouali 2000). The underlined strings are in French: (2) a. mshat pour retirer son passeport, u matji htta lRedda. “She went to withdraw her passport and will not be back until tomorrow.” b. qul lha ce n’est pas la peine de crier, ila mabRash lweld iqra ma’endha matdir. “I told her there was no need to scream, if her son would not study; there is nothing she can do.”
In the above examples, whole sentences in French are inserted in the Moroccan Arabic ones. These sentences are spoken in their French version. Moroccan women’s use of code-switching may also be considered as a way of stripping everyday Moroccan language from the religious aura that surrounds Standard Arabic and automatically excludes women. Finally, code-switching means identity-switching. It is a way
THE GENDERED USE OF ARABIC IN MOROCCO
297
of demarcating oneself as different not only in relation to men but also in relation to other (rural and often illiterate) women. 5. Conclusion This paper has dealt with some aspects of language and gender in Morocco. More specifically, it has described settings where women’s linguistic agency is most perceived. Moroccan women use specific linguistic strategies to assert themselves according to the choices they have and the situations they find themselves in. These strategies depend greatly on whether these women are literate or illiterate. Women’s choice and use of language helps them negotiate power. Up to now, linguistic issues have been largely subordinated to broad historical and cultural discussions. It is high time the language and gender relationship in Morocco was given serious attention as a promising field of research.
REFERENCES Abou-Zeid, Leila. 1983. The Year of the Elephant. (In Arabic.) Trans. Barbara Parmenter. Austin, Texas: University of Texas at Austin. Abu-Risha, Zulikha. 1996. The Absent Language. Amman: Center for Studies on Women, 1996. Ait Sabbah, Fatna. 1986. La Femme dans l’Inconscient Musulman. Paris: Albin Michel. Badran, Margot, Fatima Sadiqi & Linda Rashidi. 2002. Language and Gender in the Arab World”. In Languages and Linguistics: International Journal of Linguistics 9. Berger, Anne-Emmanuelle, ed. 2002. Algeria in Others’ Languages. New York: Cornell University Press. Besnier, John. 1990. “Conflict Management, Gossip and Affective Meanings on Nukulaelae”. Disentangling: Conflict discourse in pacific societies ed. by Karen Watson-Gegeo & Geoffrey White. Stanford, CA: Stanford University Press. Bourdieu, Pierre. 1966. “The Sentiment of Honor in Kabyle Society,” in Honor and Shame: The values of Mediterranean societies ed. by John Peristiany. London: Weidenfeld & Nicolson. Dalmiya, Vrinda & Linda Alcoff. 1993. “Are Old Wives’ Tales Justified?” in Feminist Epistemologies ed. by Elizabeth Potter & Linda Alcoff. New York: Routledge. Eickelman, Dale. 1976. Moroccan Islam: Tradition and Society in Pilgrimage Center. Austin: University of Texas Press.
298
FATIMA SADIQI
El-Khayat, Ghita. 1987. Le Monde Arabe au Féminin. Casablanca: Eddif. El-Saadawi, Nawal. 1997. The Hidden Face of Eve: Women in the Arab world. London: Zed Press. _____. 2001. “Women and Development in North Africa”. Paper presented at the Second Mediterranean Meeting, Florence, Italy. Ennaji, Moha, ed. 1997. International Journal of the Sociology of Language 112, New York: Mouton de Gruyter. Euloge, René. 1959. Les Chants de la Tassaout. Casablanca: Maroc Editions. Fishman, Joshua. 1991. Handbook of Language and Ethnic Identity. London: Oxford University Press. Foucault, Michel. 1980. Power Knowledge. New York: Pantheon. Geertz, Clifford. 1968. Islam Observed: Religious development in Morocco and Indonesia. Chicago: Chicago University Press. Gellner, Ernest. 1981. Muslim Society. Cambridge: Cambridge University Press. Gluckman, Max. 1963. “Gossip and Scandal.” Current Anthropology 4. Gumperz, John. 1982. Discourse Strategies. Cambridge: Cambridge University Press. Haviland, John. 1977. Gossip and Knowledge in Zinacantan. Chicago: Chicago University Press. Jones, Deborah. 1980. “Gossip: Notes on women’s oral culture”. The Voices and Words of Women and Men ed. by Chris Kramaraeed. Oxford: Pergamon Press. Kapchan, Deborah. 1996. Gender on the Market. Philadelphia: University of Pennsylvania Press. Kaplan, Mary. 1938. The Jewish Feminist Movement in Germany: The campaigns of the Jüdischer Frauenbund. Westport, CT: Greenwood Press. Khatibi, Abdelkbir. 1983. Maghreb Pluriel. Paris: Denoel. Labov, William. “The Intersection of Sex and Social Class in the Course of Linguistic Change,” Language Variation and Change 2 (1991):205-251. Lacan, Jean. 1966. Ecrits. Paris: Editions du Seuil. Lahlou, Moncef. 1991. A Morpho-Syntactic Study of Code-Switching between Moroccan Arabic and French. Ph.D. dissertation, University of Texas, Austin. Laroui, Abdellah. 1977. Les Origines Culturelles du Nationalisme Marocain. Paris: F. Maspero. Nortier, Jacomine. 1989. Dutch and Moroccan Arabic in Contact: Code-Switching among the Moroccans in the Netherlands. Unpublished Ph.D. dissertation, University of Amsterdam. _____. 1995. “Code-switching in Moroccan Arabic/Dutch vs. Moroccan Arabic/French Language Contact.” International Journal of the Sociology of Language 112. Ouali, Soumia. 2000. “A Study of Code-Switching in the Language of Females at the University”, BA Research Monograph. Rashidi, Linda. 2000. “The Interface of Language and Gender in Morocco.” Feminist Movements: Origins and orientations ed. by Fatima Sadiqi. Fez: Dhar El Mehraz. Sadiqi, Fatima. 1995. “The Language of Women in the City of Fez”. International Journal of the Sociology of Language112.63-80. _____. 1997. Grammaire du Berbère. Paris: L’Harmattan. _____. 1997. “The Image of Moroccan Women in Public Spheres.” The Idea of the University ed. by Tayeb Belghazi. Rabat: Publications of the Faculty of Letters.
THE GENDERED USE OF ARABIC IN MOROCCO
299
_____. 1998. “A Feminist View of the Medina of Fez.” The British Moroccan Comparative Studies Newsletter no. 2. _____. 2000, ed. Feminist Movements: Origins and orientations. Fez: Dhar El Mehraz. _____. 2003. Women, Gender and Language in Morocco. Leiden & Boston: Brill. Spacks, Patricia. 1985. Gossip. New York: Alfred Knopf. Walters, Keith. 1999. “‘Opening the Door of Paradise a Cubit’: Educated Tunisian women, embodied linguistic practice, and theories of language and gender.” Reinventing Identities: The gendered self in discourse ed. by Mary Bucholtz et al. New York: Oxford University Press.
INDEX OF SUBJECTS
A affrication, 151-153, 155-173, 250, 253 AGR Criterion, 30-32 Alignment-Based Learning, 68 analyses of variance, 129 Ɂanna, 39, 45-50, 55-57 annexation, improper, 85-90, 92, 93 ANOVAs, 129-130 aphasia, 64 Arabic Algerian, 97, 104, 107-108, 125, 157, 248, 255, 256 Baghdadi, 247, 265-266 Bahraini, 152, 247, 251, 253 Beiruti, 247, 266 Cairene, 247, 250, 251 Classical, 164, 248, 255, 262, 268270, 274, 276-277, 283 Colloquial Arabic, 266-267, 269 Educated Spoken, 271-272 Egyptian 97, 101, 104, 107-108, 112, 122, 180, 247, 252, 258-259, 266 Gulf, 175, 247, 261, 266-267 Iraqi, 97, 115, 247, 253 Jordanian, 97, 100, 104, 108-109, 247, 252, 259, 265 Kuwaiti, 152, 248 Lebanese, 97, 247 Maghrebi, 247
Modern Standard, 5, 43, 48-49, 61, 62-63, 77, 95, 97, 99-100, 103, 112, 125, 242, 248 Moroccan, 34, 97-99, 101-102, 104-105, 108, 116, 119-120, 167, 176-182, 184-185, 187, 192, 197, 199, 243, 245-247, 252-255, 266, 268-274, 276-302 Najdi, 247, 253, 262, 266 North, 151 Qatari Arabic, 152-154, 162, 164166, 168-170, 172-176, 250-251, 266 Quranic, 248 Standard, 63, 100, 122, 125, 139, 178, 187, 190, 250, 257-258, 263, 268-274, 276-277, 279, 282-284, 286-288, 290-291, 294, 297, 299300 Sudanese, 248, 257, 266 Syrian, 97, 100, 104-105, 107, 112, 247, 260, 266 Tamīmī, 247 Tunisian, 97, 101, 104, 107-108, 110-111, 117, 120-121, 123, 139, 247, 252 Western, 104 Arabization, 268-269, 273-274, 277278 aspect, 63, 178, 180-181, 184, 199 imperfective, 182, 184, 186, 194 perfective, 179, 182, 184, 186, 194
302
INDEX OF SUBJECTS
B
E
Bantu, 166 Berber, 268, 270, 274, 276-279, 281283, 286-291, 294, 296-298 bilingualism, 268, 273, 274 binding, 22, 23, 26, 27, 34 Buckwalter Arabic Morphological Analyzer, 17, 65, 70, 75, 80, 94
empty/null subjects, 230 English, 3-4, 9, 14, 16, 19-20, 23-24, 28, 30, 37, 44-45, 47, 51-52, 54, 58-59, 62-63, 66, 70, 91, 94-95, 106-111, 119-122, 166-167, 175, 181, 198, 217-226, 228-241, 245, 251, 257, 268, 277, 285, 288, 297 error analysis, 130, 137 étymon, 14 exceptional case-marking, 28, 177, 185-187, Extended Projection Principle, 26
C case markers, 22 case theory, 22-23, 25-29, 191, 195, 198 causative stems, 182-183 Chinese, 20, 34, 218, 221, 224-226, 245 clitics, 157, 180 pronominal, 25 subject, 26, 198 code-switching, 274, 292, 297-298, 300 colligation, 39, 40, 44, 47, 56, 62 collocation, 40-41, 44-45, 49, 50, 52, 54, 56, 59, 62 complementizer, 28-29, 39-40, 45-46, 54-57, 59, 62, 196-197, 199, 234235 complex tense constructions, 185, 188, 190, 195 complex tense sentences, 177 concatenative morphology, 64, 66-68, 70, 73-75 copula, 181, 185, 186-188, 190-191 D declinations, 119 deletion of high vowels, 103 diglossia, 269 discrimination errors, 107 doublets, 153, 167-168, 171, 173 doubly weak roots, 64 Dutch, 20, 302
F faithfulness, 154-156, 159, 162, 166167, 173 feminist projects, 284-285 folk songs, 292, 294 folktales, 292-294 French, 20, 52-53, 55, 61, 66, 98, 103-104, 107-108, 110-111, 120, 222, 224, 226, 231, 245, 268-270, 273-274, 276-277, 279-283, 285291, 296-302 Functional Arabic Morphology, 79, 94-95 Functional Generative Description theory, 82 G gender, 22, 31, 64, 157, 162, 164, 279-280, 282, 284-286, 288- 291, 293, 297, 300, 302 German, 219, 225, 226, 228, 231, 243, 245, 246 glide deletion, 131 gossip, 292, 293 Government and Binding, 19, 23, 34, 219, 243
303
INDEX OF SUBJECTS
H ḥaal, 44, 59 head movement, 23- 25 Hebrew, 65, 75, 94, 125, 135, 157, 167, 246 hypocoristic, 64 I illiteracy, 272, 276, 282, 284, 288289, 291 intonation patterns, 111, 117-118 isochrony, 99 Italian, 63, 220, 222-223, 225, 231, 245 J Japanese, 20, 23, 34, 109, 119, 122, 166, 218, 221-222, 224-226, 245 K Korean, 225-226, 245-246 Kullback-Leibler Divergence, 69 L L1 acquisition, 219-220, 222, 231 L1 transfer effect, 224 Levenshtein distance, 8, 10, 11, 12 lexical processing, 123, 124, 125, 134, 138, 139 lexicon, 4, 15-16, 20-21, 23, 64- 66, 131, 176, 272, 297 licensing, 23, 26, 218, 232 M Manhattan distances, 9 Markedness constraint, 155 marker, future tense, 188 see also case Middle Ages, 38, 54 Minimum Description Length, 66, 69, 75
Morphological Uniformity Hypothesis, 225 morphology, 5, 14-15, 21, 63, 65, 6768, 70, 74, 80- 82, 89, 123-124, 138, 139, 179, 218, 221, 239, 269 multilingualism, 268, 282-284 N narrative tasks, 233 non-null subject language, 220, 222, 228, 231, 232, 240 null subject languages, 217-220, 225, 227-228, 231-232 Null Subject Parameter, 217, 219227, 230, 232, 240-244 O onomatopoetic, 3 Optimal Paradigms, 162-163, 166, 176 Optimality Theory, 153, 173-174, 176 P PADT, 77, 79-80, 83-85, 91, 93 PAPPI, 19-27, 29, 31-33 paradigm uniformity, 153, 162 paradigmatic effect, 159 parameter settings, 20, 23 parser, 19-22, 24, 27, 29, 33, 65, 70 passive stem, 183 pharyngealization, 8 phonaesthemes, 3, 4 phonemes, 8-12, 68, 126-127, 132133, 136 phonetic comparison, 8 pitch contours, 113 patterns, 112, 115, 117-118 broken plurals, 153, 157, 159-161, 170, 173, 176 Prague Arabic Dependency Treebank, 77, 78, 93, 94
304
INDEX OF SUBJECTS
priming condition, 130 Principles and Parameters, 19, 34, 217 pro-drop, 23, 217, 244-246 Q qaaf, 248, 250-263, 265 qad, 38, 40, 42-62 quadriglossia, 271-272 R reduplication, 64, 74 relative entropy, 69 root and pattern, 63, 138 morphemes, 64 sound, 64, 132-133 root, continued weak, 132-133 rhythm mora-timed, 99, 109, 119 stress-timed, 99, 106-109, 119 syllable-timed, 99, 107, 120 S Saussure, 3, 4, 18 semantic distance, 8, 13 prosody, 41-42, 62 relatedness, 6, 127 similarity, 4-6, 14 vector, 7 Semitic, 21, 63, 67, 70, 74-76, 157, 266 Spanish, 166, 217- 219, 222-226, 228-229, 230-245, 268, 277, 297 storytelling, 293-295 systematic parameterization, 20 systematicity, 14-16
T Tense, 17-18, 63, 175-181, 184-185, 187, 199-200, 217, 222, 231, 243, 271 Theta theory, 27, 29 treebank, 79, 85, 88, 91-93 triglossia, 269-270 triliteral, 63, 67-68, 72, 74, 126 Truncation Hypothesis, 220 Turkish, 20, 21, 34, 225, 246 U Uzbek, 70 V vocalic intervals, 104-105 X X-bar syntax, 23-24
CURRENT ISSUES IN LINGUISTIC THEORY
E. F. K. Koerner, Editor
Zentrum für Allgemeine Sprachwissenschaft, Typologie und Universalienforschung, Berlin [email protected] Current Issues in Linguistic Theory (CILT) is a theory-oriented series which welcomes contributions from scholars who have significant proposals to make towards the advancement of our understanding of language, its structure, functioning and development. CILT has been established in order to provide a forum for the presentation and discussion of linguistic opinions of scholars who do not necessarily accept the prevailing mode of thought in linguistic science. It offers an outlet for meaningful contributions to the current linguistic debate, and furnishes the diversity of opinion which a healthy discipline must have. A complete list of titles in this series can be found on the publishers’ website, www.benjamins.com 293 Detges, Ulrich and Richard Waltereit (eds.): The Paradox of Grammatical Change. Perspectives from Romance. v, 254 pp. Expected February 2008 292 Nicolov, Nicolas, Kalina Bontcheva, Galia Angelova and Ruslan Mitkov (eds.): Recent Advances in Natural Language Processing IV. Selected papers from RANLP 2005. 2007. xii, 307 pp. 291 Baauw, Sergio, Frank Drijkoningen and Manuela Pinto (eds.): Romance Languages and Linguistic Theory 2005. Selected papers from ‘Going Romance’, Utrecht, 8–10 December 2005. 2007. viii, 338 pp. 290 Mughazy, Mustafa A. (ed.): Perspectives on Arabic Linguistics XX. Papers from the twentieth annual symposium on Arabic linguistics, Kalamazoo, Michigan, March 2006. xii, 247 pp. Expected December 2007 289 Benmamoun, Elabbas (ed.): Perspectives on Arabic Linguistics XIX. Papers from the nineteenth annual symposium on Arabic Linguistics, Urbana, Illinois, April 2005. xiv, 274 pp. + index. Expected December 2007 288 Toivonen, Ida and Diane Nelson (eds.): Saami Linguistics. 2007. viii, 321 pp. 287 Camacho, José, Nydia Flores-Ferrán, Liliana Sánchez, Viviane Déprez and María José Cabrera (eds.): Romance Linguistics 2006. Selected papers from the 36th Linguistic Symposium on Romance Languages (LSRL), New Brunswick, March-April 2006. 2007. viii, 340 pp. 286 Weijer, Jeroen van de and Erik Jan van der Torre (eds.): Voicing in Dutch. (De)voicing – phonology, phonetics, and psycholinguistics. 2007. x, 186 pp. 285 Sackmann, Robin (ed.): Explorations in Integrational Linguistics. Four essays on German, French, and Guaraní. ix, 217 pp. Expected January 2008 284 Salmons, Joseph C. and Shannon Dubenion-Smith (eds.): Historical Linguistics 2005. Selected papers from the 17th International Conference on Historical Linguistics, Madison, Wisconsin, 31 July - 5 August 2005. 2007. viii, 413 pp. 283 Lenker, Ursula and Anneli Meurman-Solin (eds.): Connectives in the History of English. 2007. viii, 318 pp. 282 Prieto, Pilar, Joan Mascaró and Maria-Josep Solé (eds.): Segmental and prosodic issues in Romance phonology. 2007. xvi, 262 pp. 281 Vermeerbergen, Myriam, Lorraine Leeson and Onno Crasborn (eds.): Simultaneity in Signed Languages. Form and function. 2007. viii, 360 pp. (incl. CD-Rom). 280 Hewson, John and Vit Bubenik: From Case to Adposition. The development of configurational syntax in Indo-European languages. 2006. xxx, 420 pp. 279 Nedergaard Thomsen, Ole (ed.): Competing Models of Linguistic Change. Evolution and beyond. 2006. vi, 344 pp. 278 Doetjes, Jenny and Paz González (eds.): Romance Languages and Linguistic Theory 2004. Selected papers from ‘Going Romance’, Leiden, 9–11 December 2004. 2006. viii, 320 pp. 277 Helasvuo, Marja-Liisa and Lyle Campbell (eds.): Grammar from the Human Perspective. Case, space and person in Finnish. 2006. x, 280 pp. 276 Montreuil, Jean-Pierre Y. (ed.): New Perspectives on Romance Linguistics. Vol. II: Phonetics, Phonology and Dialectology. Selected papers from the 35th Linguistic Symposium on Romance Languages (LSRL), Austin, Texas, February 2005. 2006. x, 213 pp. 275 Nishida, Chiyo and Jean-Pierre Y. Montreuil (eds.): New Perspectives on Romance Linguistics. Vol. I: Morphology, Syntax, Semantics, and Pragmatics. Selected papers from the 35th Linguistic Symposium on Romance Languages (LSRL), Austin, Texas, February 2005. 2006. xiv, 288 pp. 274 Gess, Randall S. and Deborah Arteaga (eds.): Historical Romance Linguistics. Retrospective and perspectives. 2006. viii, 393 pp.
273 Filppula, Markku, Juhani Klemola, Marjatta Palander and Esa Penttilä (eds.): Dialects Across Borders. Selected papers from the 11th International Conference on Methods in Dialectology (Methods XI), Joensuu, August 2002. 2005. xii, 291 pp. 272 Gess, Randall S. and Edward J. Rubin (eds.): Theoretical and Experimental Approaches to Romance Linguistics. Selected papers from the 34th Linguistic Symposium on Romance Languages (LSRL), Salt Lake City, March 2004. 2005. viii, 367 pp. 271 Branner, David Prager (ed.): The Chinese Rime Tables. Linguistic philosophy and historicalcomparative phonology. 2006. viii, 358 pp. 270 Geerts, Twan, Ivo van Ginneken and Haike Jacobs (eds.): Romance Languages and Linguistic Theory 2003. Selected papers from ‘Going Romance’ 2003, Nijmegen, 20–22 November. 2005. viii, 369 pp. 269 Hargus, Sharon and Keren Rice (eds.): Athabaskan Prosody. 2005. xii, 432 pp. 268 Cravens, Thomas D. (ed.): Variation and Reconstruction. 2006. viii, 223 pp. 267 Alhawary, Mohammad T. and Elabbas Benmamoun (eds.): Perspectives on Arabic Linguistics XVII–XVIII. Papers from the seventeenth and eighteenth annual symposia on Arabic linguistics. Volume XVII–XVIII: Alexandria, 2003 and Norman, Oklahoma 2004. 2005. xvi, 315 pp. 266 Boudelaa, Sami (ed.): Perspectives on Arabic Linguistics XVI. Papers from the sixteenth annual symposium on Arabic linguistics, Cambridge, March 2002. 2006. xii, 181 pp. 265 Cornips, Leonie and Karen P. Corrigan (eds.): Syntax and Variation. Reconciling the Biological and the Social. 2005. vi, 312 pp. 264 Dressler, Wolfgang U., Dieter Kastovsky, Oskar E. Pfeiffer and Franz Rainer (eds.): Morphology and its demarcations. Selected papers from the 11th Morphology meeting, Vienna, February 2004. With the assistance of Francesco Gardani and Markus A. Pöchtrager. 2005. xiv, 320 pp. 263 Branco, António, Tony McEnery and Ruslan Mitkov (eds.): Anaphora Processing. Linguistic, cognitive and computational modelling. 2005. x, 449 pp. 262 Vajda, Edward J. (ed.): Languages and Prehistory of Central Siberia. 2004. x, 275 pp. 261 Kay, Christian J. and Jeremy J. Smith (eds.): Categorization in the History of English. 2004. viii, 268 pp. 260 Nicolov, Nicolas, Kalina Bontcheva, Galia Angelova and Ruslan Mitkov (eds.): Recent Advances in Natural Language Processing III. Selected papers from RANLP 2003. 2004. xii, 402 pp. 259 Carr, Philip, Jacques Durand and Colin J. Ewen (eds.): Headhood, Elements, Specification and Contrastivity. Phonological papers in honour of John Anderson. 2005. xxviii, 405 pp. 258 Auger, Julie, J. Clancy Clements and Barbara Vance (eds.): Contemporary Approaches to Romance Linguistics. Selected Papers from the 33rd Linguistic Symposium on Romance Languages (LSRL), Bloomington, Indiana, April 2003. With the assistance of Rachel T. Anderson. 2004. viii, 404 pp. 257 Fortescue, Michael, Eva Skafte Jensen, Jens Erik Mogensen and Lene Schøsler (eds.): Historical Linguistics 2003. Selected papers from the 16th International Conference on Historical Linguistics, Copenhagen, 11–15 August 2003. 2005. x, 312 pp. 256 Bok-Bennema, Reineke, Bart Hollebrandse, Brigitte Kampers-Manhe and Petra Sleeman (eds.): Romance Languages and Linguistic Theory 2002. Selected papers from ‘Going Romance’, Groningen, 28–30 November 2002. 2004. viii, 273 pp. 255 Meulen, Alice ter and Werner Abraham (eds.): The Composition of Meaning. From lexeme to discourse. 2004. vi, 232 pp. 254 Baldi, Philip and Pietro U. Dini (eds.): Studies in Baltic and Indo-European Linguistics. In honor of William R. Schmalstieg. 2004. xlvi, 302 pp. 253 Caffarel, Alice, J.R. Martin and Christian M.I.M. Matthiessen (eds.): Language Typology. A functional perspective. 2004. xiv, 702 pp. 252 Kay, Christian J., Carole Hough and Irené Wotherspoon (eds.): New Perspectives on English Historical Linguistics. Selected papers from 12 ICEHL, Glasgow, 21–26 August 2002. Volume II: Lexis and Transmission. 2004. xii, 273 pp. 251 Kay, Christian J., Simon Horobin and Jeremy J. Smith (eds.): New Perspectives on English Historical Linguistics. Selected papers from 12 ICEHL, Glasgow, 21–26 August 2002. Volume I: Syntax and Morphology. 2004. x, 264 pp. 250 Jensen, John T.: Principles of Generative Phonology. An introduction. 2004. xii, 324 pp. 249 Bowern, Claire and Harold Koch (eds.): Australian Languages. Classification and the comparative method. 2004. xii, 377 pp. (incl. CD-Rom). 248 Weigand, Edda (ed.): Emotion in Dialogic Interaction. Advances in the complex. 2004. xii, 284 pp. 247 Parkinson, Dilworth B. and Samira Farwaneh (eds.): Perspectives on Arabic Linguistics XV. Papers from the Fifteenth Annual Symposium on Arabic Linguistics, Salt Lake City 2001. 2003. x, 214 pp. 246 Holisky, Dee Ann and Kevin Tuite (eds.): Current Trends in Caucasian, East European and Inner Asian Linguistics. Papers in honor of Howard I. Aronson. 2003. xxviii, 426 pp.