Page i
ASSESSMENT OF LANGUAGE DISORDERS IN CHILDREN
Page ii
Page iii ASSESSMENT OF LANGUAGE DISORDERS IN CHILDREN ...
39 downloads
876 Views
4MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
Page i
ASSESSMENT OF LANGUAGE DISORDERS IN CHILDREN
Page ii
Page iii ASSESSMENT OF LANGUAGE DISORDERS IN CHILDREN Rebecca J. McCauley University of Vermont
Page iv Copyright © 2001 by Lawrence Erlbaum Associates, Inc. All rights reserved. No part of this book may be reproduced in any form, by photostat, microform, retrieval system, or any other means, without the prior written permission of the publisher. Lawrence Erlbaum Associates, Inc., Publishers 10 Industrial Avenue Mahwah, New Jersey 07430 Cover design by Kathryn Houghtaling Lacey Library of Congress CataloginginPublication Data McCauley, Rebecca Joan, 1952– Assessment of language disorders in children / Rebecca J. McCauley. p. cm. ISBN: 0805825614 (cloth : alk. paper)/ 0805825622 (pbk. : alk. paper) 1. Language disorders in children—Diagnosis. 2. Communicative disorders in children—Diagnosis. 3. Learning disabled children— Language—Evaluation. I. Title. RJ496.L35 M375 2001 00050403 618.92’855075—dc21 CIP Printed in the United States of America 10 9 8 7 6 5 4 3 2 1
Page v To my parents, Fred and Priscilla McCauley
Page vi
Page vii
Contents Preface Why I Wrote This Book How This Book Is Organized Acknowledgments 1 Introduction Purposes of This Text 1 Why Do We Make Measurements in the Assessment and Management of Childhood Language Disorders? 2 What Problems Accompany Measurement? 4 A Model of Clinical Decision Making 7 Summary 11 Key Concepts and Terms 11 Study Questions and Questions to Expand Your Thinking 12 Recommended Readings 12 References 12 PART I: BASIC CONCEPTS IN ASSESSMENT 2 Measurement of Children’s Communication and Related Skills Theoretical Building Blocks of Measurement 17 Basic Statistical Concepts 24 Characterizing the Performance of Individuals 30 Case Example 38 Summary 43 Key Concepts and Terms 44 Study Questions and Questions to Expand Your Thinking 46
xi
1
17
Page viii Recommended Readings 47 References 47 3 Validity and Reliability Historical Background 49 Validity 51 Reliability 66 Summary 72 Key Concepts and Terms 73 Study Questions and Questions to Expand Your Thinking 75 Recommended Readings 76 References 76 4 Evaluating Measures of Children’s Communication and Related Skills Contextual Considerations in Assessment: The Bigger Picture in Which Assessments Take Place 79 Evaluating Individual Measures 88 Summary 105 Key Concepts and Terms 106 Study Questions and Questions to Expand Your Thinking 106 Recommended Readings 107 References 107 PART II: AN OVERVIEW OF CHILDHOOD LANGUAGE DISORDERS 5 Children with Specific Language Impairment Defining the Problem 113 Suspected Causes 116 Special Challenges in Assessment 127 Expected Patterns of Language Performance 130 Related Problems 132 Summary 137 Key Concepts and Terms 138 Study Questions and Questions to Expand Your Thinking 139 Recommended Readings 140 References 140 6 Children with Mental Retardation Defining the Problem 147 Suspected Causes 149 Special Challenges in Assessment 156 Expected Pattern of Strengths and Weaknesses 158 Related Problems 161 Summary 161 Key Concepts and Terms 162 Study Questions and Questions to Expand Your Thinking 163
49
78
113
146
Page ix Recommended Readings 164 References 164 7 Children with Autistic Spectrum Disorder Defining the Problem 169 Suspected Causes 173 Special Challenges in Assessment 174 Expected Patterns of Language Performance 176 Related Problems 178 Summary 181 Key Concepts and Terms 182 Study Questions and Questions to Expand Your Thinking 183 Recommended Readings 184 References 184 8 Children with Hearing Impairment Defining the Problem 188 Suspected Causes 196 Special Challenges in Assessment 198 Expected Patterns of Oral Language Performance 203 Related Problems 204 Summary 205 Key Concepts and Terms 205 Study Questions and Questions to Expand Your Thinking 206 Recommended Readings 207 References 207 PART III: CLINICAL QUESTIONS DRIVING ASSESSMENT 9 Screening and Identification: Does This Child Have a Language Impairment? The Nature of Screening and Identification 214 Special Considerations When Asking This Clinical Question 216 Available Tools 236 Practical Considerations 240 Summary 242 Key Concepts and Terms 243 Study Questions and Questions to Expand Your Thinking 244 Recommended Readings 244 References 244 10 Description: What Is the Nature of This Child’s Language? The Nature of Description 251 Special Considerations for Asking This Clinical Question 252 Available Tools Practical Considerations 280 Summary 283 Key Concepts and Terms 284 Study Questions and Questions to Expand Your Thinking 286
168
187
213
250
255
Page x 11
Recommended Readings 286 References 287 Examining Change: Is This Child’s Language Changing? The Nature of Examining Change 294 Special Considerations for Asking This Clinical Question 296 Available Tools 311 Practical Considerations 317 Summary 321 Key Concepts and Terms 322 Study Questions and Questions to Expand Your Thinking 323 Recommended Readings 324 References 324 Appendix A Appendix B Author Index Subject Index
293
328 334 339 353
Page xi
Preface Why I Wrote This Book How This Book Is Organized Acknowledgement Why I Wrote This Book ‘‘You can’t kill anyone with speechlanguage pathology.” I came to speechlanguage pathology by what was then an unconventional route—a Ph.D. in a nonclinical speciality within behavioral sciences, followed by postdoctoral study, clinical practicum, and a clinical fellowship year. Thus, I was unschooled in the humorous wisdom that is passed along with more standard fare to speechlanguage pathology doctoral students through the years. I was able to glean only one or two such aphorisms from my contacts with a more conventionally trained and clinically savvy colleague. “You can’t kill anyone with speechlanguage pathology,” she said. A balm to the anxieties of a beginning clinician who knows that there is so much she does not know. A bit of humor to help you while you learn. However, the more clients I worked with, the more I was haunted by this aphorism. Certainly, killing was exceedingly rare to nonexistent, but looming large were the specters of unfulfilled hopes and wasted time. The possibility for improving children’s lives became ever clearer, but so did the possibility of less desirable outcomes.
Page xii Initially my clients were preschoolers whose parents were baffled by their children’s failure to express themselves clearly, or they were schoolaged children who were diagnosed with both languagelearning disabilities and serious emotional problems. More recently, my clients have included unintelligible children whose problems were largely limited to their phonology as well as children whose problems encompassed not only that one aspect of language, but almost all other areas one might examine. All of these clients—like those with whom you currently work or will soon work—present us with puzzles to be solved and responsibilities to be met if we are to help them. The puzzle presented by children with language disorders is the array of abilities and difficulties that they bring to language learning and use. I use the word “puzzle” because, like puzzles, their problems at first suggest many alternative modes of solution—some better, some worse, and some probably of no value at all. Thus, “responsibilities” follow from our professional obligation to help children maximize their skills and minimize their problems in the process of deciphering the particular pattern of intricacies they present. In short, the reason I wrote this book was to help identify better ways of dealing with the puzzles and responsibilities that are so frustratingly linked in our interactions with our clients. By finding the best ways of dealing with these puzzles and responsibilities we can avoid the harm implied by the aphorism quoted earlier and can instead enrich their lives by helping them improve their communication with others. How This Book Is Organized Overall Organization of the Book
This book is divided into three major sections. In Part I, concepts in measurement are explained as they apply to children’s communication. Although some of these concepts are quantitative in nature, others relate to the social context in which measurements are made and used. Special emphasis is placed on the concepts of validity and reliability because all other measurement characteristics are ultimately of interest by virtue of their effects on reliability and, more importantly, on validity. This part of the book concludes with a chapter providing direct advice regarding the examination of materials associated with measurement tools for purposes of determining their usefulness for a particular child or group of children. In Part II, four major categories of childhood language disorders are discussed: specific language impairment (chap. 5), language problems associated with mental retardation (chap. 6), autism spectrum disorders (chap. 7), and language problems associated with hearing impairment (chap. 8). These four categories were selected because they are the most frequently occurring childhood language disorders. Although children across these disorder categories share many problems, each group also presents unique challenges to assessment and management. Some of these challenges relate to the heterogeneity of language and other abilities shown by children in the category, the relative amount of information available due to the rarity of the problem, and the often diverse theoretical orientations of researchers. Each of these chapters provides a barebones introduction to the disorder category: its suspected causes,
Page xiii special challenges to language assessment, expected patterns of language performance, and accompanying problems that are unrelated to language. A full description of any one of these disorders would require several books as long as this one. Consequently, readers are directed to more comprehensive sources for further learning but are given sufficient information to anticipate how language assessment will need to be focused in order to begin to respond to the special needs of each group of children. In Part III, three major types of questions that serve as the starting points for assessment are introduced and then pursued in detail—from theoretical underpinnings to currently available measures. The major questions correspond to steps in the clinical interaction. First, the clinician must determine whether a language problem exists; second, he or she must determine the nature of the problem—both in terms of specific patterns of impairment across language domains and modalities and in terms of specific problem areas within each domain and modality. Finally, he or she must track change, determining how the client’s behaviors are changing and whether treatment seems to be the cause of identified improvements. In the course of addressing each of these questions, the reader is taken through the steps required to move from the question to the tools available to answer it for any given client. Organization within Chapters
Each chapter contains several features designed to assist readers in mastering new content and in searching the text for specific information. Chapter outlines and enumerated summaries of major points aid readers interested in obtaining an overview of chapter content. To help readers with new or unfamiliar vocabulary, key terms are highlighted in the text, defined when of particular importance, and listed at the end of each chapter. Finally, a list of study questions and recommended readings is designed to allow readers to pursue topics further. Acknowledgments Whereas the flaws of this book are certainly of my own doing, its virtues owe much to the help I have received from colleagues and friends. Numerous colleagues in Vermont and elsewhere read sections of the book and contributed greatly to my understanding of the diverse group of children described in it and deserve my considerable thanks. Among them are Melissa Bruce, Kristeen Elaison, Laura Engelhart, Julie Hanson, and Julie Roberts. In addition, I owe special appreciation to Barry Guitar, whose experience with his own books helped him provide the most meaningful encouragement and advice on all aspects of the project. I am particularly grateful for his ability to temper constructive criticism with egoboosting praise. My longtime colleague and friend Martha Demetras took on a heroic and most helpful reading of a near final form of the book. She along with Frances Billeaud, Bernard Grela and Elena Plante read some of the most challenging sections and tried to help keep me on track. At Lawrence Erlbaum Associates, Susan Milmoe, Kate Graetzer, Jenny Wiseman, and Eileen Engel have helped me countless times through their expertise and patience. Irene Farrar took my graphics and made them both clearer and more inter
Page xiv esting and Kathryn Houghtaling made the cover all I could have hoped for. She did this with help of the photographer Holly Favro and her most graceful niece Sara Faust. Although not involved with this project directly, there are several mentors who have shaped my interest in the topics discussed here and contributed substantially to my ability to tackle those topics as well as I have. They have my respect and gratitude always: Ralph Shelton, Linda Swisher, Betty Stark, Dick Curlee, and Dale Terbeek. Finally, I owe great thanks to my parents, who each read and commented on some portion of the book and who provided encouragement along the way, not to mention the foundation that led me to want to pursue this project.
Page 1 CHAPTER
1 Introduction Purposes of This Text Why Do We Make Measurements in the Assessment and Management of Childhood Language Disorders? What Problems Accompany Measurement? A Model of Clinical Decision Making Purposes of This Text The distraught parents of a 3yearold with delayed communication arrive at the office of a speechlanguage pathologist, youngster in tow and anxiety emanating almost palpably with every word: ‘‘Does our child have a serious problem?” “What can be done to correct it?” “How effective will treatment be?” Although the children and the specific questions change, the scene remains the same: A child’s parents or teacher turn to a speechlanguage clinician for help that will include answers to specific questions about whether a language problem exists, its nature, and how to intervene to minimize or remove its effects. This book focuses on basic elements of measurement of childhood language disorders as the means of providing valid clinical answers to these questions because only with valid clinical answers can effective clinical action be taken. Specifically, this book is designed to prepare readers to select, create, and use behavioral measures as they assess, manage, and evaluate treatment efficacy for chil
Page 2 dren with language disorders. Although it is designed to provide guidance for those working with children with any language disorder, the greatest attention is paid to specific language impairment, autism, and language disorders related to mental retardation and hearing impairment. This book is intended primarily for graduate and undergraduate students who expect to enter the field of communication disorders. It may also serve as a refresher for professionals, such as practicing speechlanguage pathologists or teachers, who have never been formally introduced to some of the basic concepts behind the wide range of measures used in the assessment of childhood language disorders or who would like an introduction to the latest developments in this area. Unfortunately, the topic of measurement in childhood language disorders has the reputation of threatening complexity. Indeed, measurement of language, or communication more generally, is complex both because of the wealth of abilities and behaviors underlying language use and because of the variety of measurement orientations on which speechlanguage pathology and audiology draw. Although direct roots in educational and psychological testing traditions are particularly robust, there are also connections to measurement traditions in linguistics, personnel management, medicine, public health, and even acoustics. The approach taken here attempts to blend the best of these traditions and alert readers to the elements they share. For all readers, the text is intended to achieve three goals. First, readers will learn to recognize the bond that ties the quality of clinical actions to the quality of measurement used in the process of clinical decision making for children with suspected language disorders. Second, they will learn how to frame clinical questions in measurement terms by considering the information needed and the specific methods available to answer them. Third, they will learn to recognize that all measurement opportunities present alternatives—at times alternatives of comparable merit, but more often alternatives that vary in their ability to answer the clinical question at hand. This last goal will enable readers to act as critical consumers and discriminating developers of clinical tools for language measurement. Case examples are used frequently in the text to help readers apply new concepts and methods to specific problems like those they currently face or will soon encounter. Why Do We Make Measurements in the Assessment and Management of Childhood Language Disorders? The following three cases illustrate a variety of occasions in which measurement serves as the basis for clinical actions involving children with various language difficulties. Twoyearold Cameron has been scheduled for a communication evaluation because of parental concerns that he uses only two words and does not appear to understand as well as his older sister did at a much younger age. Additionally, he generally avoids eye contact, which his parents find particularly alarming because of recent exposure to a television show on autism. Thus, they have specific questions about whether their child has autism and what they can do to improve his ability to communicate with other members of the family.
Page 3 Alejandro, a diminutive 9yearold who hardly seems imposing enough for such a distinguished name, moved from Mexico to the United States a year ago, has just moved into a new school district. Although he has been diagnosed with a language disorder, no information concerning the relation of that language disorder to his bilingualism has accompanied him to his new school. Decisions regarding his school placement and access to special services will hinge on that information. Fouryearold Mary Beth has been referred by her pediatrician to your private practice for a complete evaluation of her communication skills. Although she has been receiving speechlanguage treatment since she was 2 years of age because of Down syndrome, Mary Beth has not made progress at the rate expected by her regular speechlanguage pathologist or desired by her parents. In fact, she appears to have made almost no progress in the past year and may be losing skills in some areas. These three cases illustrate the varied problems facing children and families who turn to speechlanguage pathologists for solutions. They also illustrate the speech language pathologist’s role as part of a larger team of professionals. First, Cameron’s parents are faced with a child who appears quite delayed in his expressive and receptive language and who may also evidence difficulties in the nonverbal underpinnings of communication. Addressing their chief concern will require an interdisciplinary effort involving several professionals (including possibly a psychologist, a neurologist, a developmental pediatrician, and a social worker) designed to yield a differential diagnosis. If autism is diagnosed, the need for interdisciplinary efforts will continue because of the array of problems often associated with autism—ranging from mental retardation to sleep disorders. The family’s needs, as well as the child’s, may be intense, with the result that the speech pathologist’s focus on the child’s communication may broaden to encompass the family communication context as well as the coordination of efforts aimed at the child’s overall needs. Alejandro presents the speechlanguage pathologist with the difficult task of determining to what extent his language difficulties are differences not unlike those facing anyone with undeveloped skills in a new language versus to what extent they reflect an underlying disorder in language learning affecting both his native and second languages. In addition to decisions regarding the nature of direct therapy that he should receive (including whether it should be conducted in Spanish or English), critical decisions regarding his classroom placements are pressing. Not only will the speechlanguage pathologist need to work closely with his family and teachers to reach these decisions, he or she may also need to work with a translator or cultural informant to arrive at the best decisions for Alejandro’s academic and social future. Finally, Mary Beth’s parents and pediatrician are interested in receiving information that will shed some light on her lack of progress in speechlanguage treatment. Such information could help guide her subsequent treatment by providing her parents, pediatrician, and regular speechlanguage pathologist with a better understanding of her current strengths and weaknesses and, consequently, a better understanding of reasonable next steps. It should be noted, however, that Mary Beth’s parents might also use this information as they consider suing the speechlanguage pathologist responsible for her care. Although this prospect is remote, it is nonetheless an increasing possibility (Rowland, 1988). These three cases reveal that speechlanguage pathologists are asked to obtain and use information to help children from a variety of cultural backgrounds and a range
Page 4 of communication problems. Although they obtain much of that information directly, they must often work with families and other professionals to stand a chance of getting the “facts.” Speechlanguage pathologists use some of this information themselves, such as when they identify and describe a language disorder or plan their role in treatment. They also share information with others, including doctors, teachers, and other individuals who work with persons experiencing a communication disorder. In brief, then, speechlanguage pathologists generate, use, and share information having potentially vital medical, educational, social, and even legal significance. So how does measurement enter into the strategies used to address children’s needs? Put simply and in terms specific to its use in communication disorders, measurement can be seen as the methods used to describe and understand characteristics of persons and their communication as part of clinical decision making, the process by which the clinician devises a plan for clinical action. Thus, it is the connection between clinical decisions and clinical action that makes measurement matter (Messick, 1989). Clinicians make numerous, almost countless decisions about a child in the course of a clinical relationship—from determining that a communication disorder exists, to selecting a general course of treatment, to examining the efficacy of a very specific treatment task. Because the clinician bases her actions at least in part on measurement data obtained from the client, the quality of the action will be closely related to the quality of the data used to plan it. The section that follows considers several decision points that offer opportunities for successes—or failures—in clinical decision making. What Problems Accompany Measurement? Table 1.1 lists five different kinds of decisions occurring in the course of a clinical relationship as well as some of the measures that might be used to provide input to each decision. This listing is intended to illustrate the variety of decisions to be made rather than to list them exhaustively. As illustrated in the table, decision making begins even prior to the initiation of an ongoing clinical relationship, as the speechlanguage pathologist screens communication skills to determine whether additional attention is warranted. Subsequently, the clinician will require more information to understand the nature of the problem presented and to arrive at decisions about how best to manage it. Once a program of management is in place, ongoing measurement is required to respond to the client’s changing needs and accomplishments. Even the end of the clinical relationship is based on the clinician’s use of measurement with dismissal from treatment usually occurring when communication skills are normalized, maximum gains have been effected, or treatment has been found to be unsuccessful. At each of the points of decision making, the potential for harm enters hand in hand with the potential for benefit. A brief reconsideration of the case of Mary Beth can be used to illustrate the potential for clinical harm as well as to introduce a method for evaluating the effects of different kinds of errors in decision making. Recall that Mary Beth has received speechlanguage treatment for 2 years because of an early diagnosis of Down syndrome. Her lack of any progress in speech and language over the past year, or worse yet, her loss of
Page 5 Table 1.1 Clinical Decisions in SpeechLanguage Pathology Clinical Decision
Related Clinical Actions
l
Screening for a language disorder
l l l l
Diagnosis of a language disorder
l l
Refer for complete evaluation Counsel client and family Inform and confer with relevant professionals Refer for related evaluations Recommend treatment, monitoring, or no treatment Counsel client and family Inform and confer with relevant professionals
Recommend type and frequency of treatment l Identify strengths and weaknesses in communicative functioning l Consult with professionals serving client needs (e.g., educators, psychologists, physicians) l
Planning for management of language disorder
l
Assessment of change in communication over time
l l l
Identification of need for additional information in a related area
Infer developmental trends Modify treatment plan Document treatment efficacy Dismiss from treatment
Refer to a related professional for additional information
Types of Measures Used
l l l
l l l
Client and family interview Standardized screening measure Informal cliniciandesigned measure Client and family interview Standardized normreferenced tests Parent report instruments
Standardized normreferenced or criterion referenced tests l Informal measures related to specific treatment goals or used to describe domains for which measures are unavailable or that require a realistic setting (e.g., functional performance in the classroom) l
Standardized normreferenced or criterion referenced tests l Informal measures related to specific treatment goals l Single subject experimental designs l
l
l
l l
Client and family interview Standardized normreferenced tests Informal cliniciandevised measure
skills, may represent a poor fit between the assessment tools used to measure progress and the areas in which Mary Beth has in fact advanced, or it may represent some unsatisfactory clinical practice of her regular speechlanguage pathologist. On the other hand, this lack of progress may reflect a change in Mary Beth’s neurological status that requires medical attention. Therefore, one of the most immediate decisions to be made from a speechlanguage perspective is whether to refer Mary Beth to a neurologist. Figure 1.1, a decision matrix, illustrates a method for thinking about the possible outcomes associated with this particular decision. This type of decision matrix has
Page 6
Fig. 1.1. A decision matrix for the decision of whether to refer Mary Beth for neurologic evaluation. been used to assess the implications of alternative choices in a variety of fields (Berk, 1984; Thorner & Remein, 1962; Turner & Nielsen, 1984). To construct such a matrix as a means of considering repercussions for a single case, one pretends that one has access to the ultimate “truth” about what is best for Mary Beth. From that perspective, a referral either should or should not be made—no doubts. With such perfect knowledge, therefore, suppose that a referral should be made. In that case, the clinician will have made a correct judgment if he or she has referred and an incorrect one if he or she has not. If the clinician errs by not referring, Mary Beth may become involved in the expense and frustration of continuing speech language treatment that is doomed to failure. Further she may be delayed in or prevented from receiving attention for an incipient neurologic condition, which, in turn, could have serious, even lifethreatening consequences. Although this error might be corrected over time, its effects are likely to be relatively long lasting and potentially costly in terms of time and money. On the other hand, suppose that the “truth’’ is that a referral is not needed and therefore should not be made. In that case the clinician will have made a correct judgment if she has not referred and an error if she has. Plausibly, this type of error may result in a needless expenditure of time and money and in undue concern on the part of Mary Beth’s family. A bit more positively, however, the effects of this error would probably be relatively shortlived: Once the neurologic evaluation took place, the concern would probably end. A decision matrix makes it clear that different errors in clinical decision making are associated with different effects. Errors vary in terms of the likelihood that they
Page 7 will be detected, the time course for that detection, and the nature of costs they will exact from the client and clinician. The decision matrix, therefore, is a particularly powerful tool because it allows one to examine both the frequency and type of errors made. I return to this type of matrix frequently because of its helpfulness in thinking about tools used to reach clinical decisions. In the next section of this chapter, I introduce methods used to understand (and therefore potentially to improve) clinical decision making. Their description is followed by the introduction of a model that is intended to serve as a framework in which to think about the steps involved in formulating and answering clinical questions. A Model of Clinical Decision Making The processes by which individuals make decisions about complex problems—such as those involved in a variety of clinical settings—have been the focus of several lines of research (Shanteau & Stewart, 1992; Tracey & Rounds, 1999). Each differs from the others somewhat in intent, but all have something to offer anyone interested in clinical decision making. First, decision making has been of interest to psychologists who want to understand how complicated problems are solved and to what extent those who are acknowledged “expert” problem solvers in a given area (e.g., chess, medicine, accounting) differ from naive problem solvers (Barsalou, 1992). Second. skilled decision making has been studied by researchers from a variety of disciplines who wish to develop computer programs called expert systems, which seek to mimic expert performance (Shanteau & Stewart, 1992). Such researchers have focused on the creation of computer programs yielding optimal clinical judgments. Because they focus on successful decision making, these researchers have been uninterested in understanding expert errors in decisionmaking. Finally, there has been a much smaller group of researchers who study the nature and process of decision making in specific fields for the benefit of the field itself. In speechlanguage pathology and audiology, such research has increased dramatically over the last decade (e.g., McCauley & Baker, 1994; Records & Tomblin, 1994; Records & Weiss, 1991). Researchers in this third category tend to be interested in both errors and successful performance, often as a means of improving professional training. You may be asking, “How does research on decisionmaking relate to measurement in speechlanguage pathology?” and more specifically, “How can it help me be a better professional?’’ To begin with, a detailed understanding of expert clinical decision making may help beginning clinicians reach the ranks of “expert” more quickly. For example, such an understanding may identify which sources of information and which methods experts use—as well as which ones they avoid. Another potential benefit of research in clinical decision making is that it may identify problems that beset even experienced clinicians, thereby helping decision makers at all levels be vigilant in avoiding them (e.g., Faust, 1986; Tracey & Rounds, 1999). A relatively brief description of two such problems may help illustrate the potential value of this type of research. In a review of research on human judgment in clinical psychology and related fields, Faust (1986) described clinicians’ overreliance on confirmatory strategies. Essentially, the use of a confirmatory strategy means that after forming a hypothe
Page 8 sis early in the course of decision making (e.g., regarding a diagnosis, etiology, or some other clinical question), the clinician proceeds to search out and emphasize information tending to confirm the hypothesis. At the same time, she or he may fail to search out discrepant evidence. The tendency for very able clinicians to adopt such a strategy has been demonstrated repeatedly in studies in which clinicians are asked to make decisions on hypothetical clinical data (Chapman & Chapman, 1967, 1969; Dawes, Faust, & Meehl, 1993). For an example of how a confirmatory strategy might operate in a case of decision making in speechlanguage pathology, I return to the case of Alejandro. Suppose that Alejandro’s clinician initially develops the hypothesis that Alejandro responds most consistently when communicating in English. The clinician would be using a confirmatory strategy if she or he failed to evaluate Alejandro’s performance for Spanish and informally sought teachers’ impressions of how well Alejandro was responding to the Englishonly approach she had recommended, but did so in such a way as to invite only positive reactions. A second example of a problem in clinical decision making has been described as the failure to “realize the extent to which sampling error increases as sample size decreases.” (Faust, 1986, p. 421). Tversky and Kahneman (1993) described this practice as evidence of “the belief in the law of small numbers,” by which they mean the tendency to assume that even a very small sample is likely to be representative of the larger population from which it is drawn. Returning to one of the hypothetical cases presented earlier, imagine this sort of problem as menacing the clinician who is to evaluate Mary Beth, the youngster with Down syndrome. Suppose that that clinician were to have seen only two or three children with Down syndrome during her clinical career—each of whom had made exceptionally poor progress. The danger would be that the clinician would consider those few children she had seen as representative of all children with that diagnosis, thereby causing her to downplay the stated concerns about Mary Beth’s lack of progress. Neither of these problems in clinical decision making has been seen as evidence of gross incompetence. Although poor clinicians may succumb more frequently to these practices, the practices themselves should be of considerable concern to scientifically oriented clinicians precisely because they seem to be related to tendencies in human problem solving, and they must actively be worked against for the good of clients and of the profession. Once aware that bad habits such as those described above may creep into clinical decision making, the wary clinician can seek remedies. Among the remedies recommended for the tendency to use a confirmatory strategy is the adoption of a disconfirmatory strategy, in which evidence both for and against one’s pet hypothesis is sought after and valued. Similarly, a belief in the law of small numbers can be undermined by reminders that when one has only limited experience with individuals with a particular type of communication disorder, the characteristics of people from that sample are quite likely to be unrepresentative of that population as a whole. Although the process by which speechlanguage pathologists and audiologists reach clinical decisions is far from well understood at this point (Kamhi, 1994; Yoder & Kent, 1988), the model shown in Fig. 1.2 is intended to serve as a working model that can be
Page 9
Fig. 1.2. A model illustrating the ways in which measurements are used to reach clinical decisions leading to the initiation or modification of clinical actions. elaborated on as understanding increases. Such a graphic model can help emphasize the varied nature of the processes involved in reaching complex clinical decisions, including both those that are very deliberate and readily available for inspection as well as those that are almost automatic and less available for observation. The process of clinical decision making is initiated as the speechlanguage pathologist formulates one or more clinical questions. Although such questions may often coincide with those actually expressed by the client, they may not always do so. Thus for example, the parents of 3yearold Mary Beth may not have expressed interest in having her hearing status evaluated. On the other hand, her speechlanguage pathologist would see that as a critically important question, given both the susceptibility to middle ear infection with associated hearing loss among children with Down syn
Page 10 drome and the pivotal role of hearing in speechlanguage acquisition. This example points out that clinical questions arise both from clients’ expressions of need and from the expert knowledge possessed by the clinician. The formulation of clinical questions is of central importance to the quality of clinical decision making because it drives all that follows. First, the clinical question determines what range of information should be sought. Second, it guides the clinician in the selection or creation of appropriate measurement tools. In fact, it is widely held that any measurement tool can only be evaluated in relation to its adequacy in addressing a specific clinical question (American Educational Research Association [AERA], American Psychological Association [APA], & National Council on Measurement in Education [NCME], 1985; Messick, 1988). No measure is intrinsically “good” or “valid.” Rather, the quality of a measure varies depending on the specific question it is used to address. Thus, for example, a given language test may be an excellent tool for answering a question about the adequacy of 4yearold Mary Beth’s expressive language skills, yet it may be a perfectly awful tool if used to examine such skills for 9yearold bilingual Alejandro. Optimally, specific measurement tools will be selected so as to address the full scope of each clinical question being posed using the best measures available (Vetter, 1988b). For some questions, however, the wealth of commercially available standardized tests and published procedures will fail to yield any acceptable measure, or even any measure at all. At such times, clinicians may decide to develop an informal measure of their own (Vetter, 1988a), or they may simply have to admit that not all clinical questions for all clients are answerable (Pedhazur & Schmelkin, 1991). The administration or collection of selected clinical measures is certainly the most obvious portion of the clinical decisionmaking process. Its importance can be emphasized by reference to the dataprocessing adage “garbage in, garbage out.’’ Put more decorously, the act of skillful administration is crucial to the quality of information obtained. Haphazard compliance with standard administration guidelines may render the information obtained spurious and misleading, thereby undermining all later efforts of the clinician to use it to arrive at a reasonable clinical decision. Following data collection, the clinician examines information obtained across a variety of sources and integrates that information to address specific clinical questions. For example, in order to comment on the reasonableness of progress made by Mary Beth during the past 2 years, her speechlanguage pathologist will need to perform a Herculean task—integrating across time and content area measures related to speech, language, hearing, and nonverbal cognition. Components of the clinical decisionmaking process outlined in Fig. 1.1 have received differing amounts of attention from speechlanguage pathology and audiology professionals. Thus, for example, considerable attention has been paid to the formulation of relevant clinical questions for specific categories of communication disorders (e.g., Creaghead, Newman & Secord, 1989; Guitar, 1998; Lahey, 1988). On the other hand, little has been written about how clinicians can use such information to arrive at effective clinical decisions (Records & Tomblin, 1994; Turner & Nielsen, 1984). Therefore, in the remainder of this text, both venerable concepts and emerging hypotheses will be shared to help readers improve the quality of their clinical decision
Page 11 making and, consequently, of their clinical actions toward children with developmental language disorders. Summary 1. Measurement of developmental language disorders draws on methods used in a wide variety of disciplines. 2. The purposes of this text are to help readers learn to frame effective clinical questions that will guide the decisionmaking process, to recognize that all measurement opportunities present alternatives, and to recognize the connection between the quality of clinical actions and the quality of measurement used in the clinical decision making process. 3. Speechlanguage pathologists obtain and use information obtained through measurement to arrive at diagnoses that affect medical, educational, social, and even legal outcomes. They derive this information cooperatively with others (e.g., families and other professionals) and share it with others as a means of achieving the child’s greatest good. 4. Measurement is important because it helps drive clinical decision making, which in turn affects clinical actions. 5. Measurement is used to address clinical questions related to screening, diagnosis, planning for treatment, determining severity, evaluating treatment efficacy, and evaluating change in communication over time. 6. The cognitive processes involved in clinical decision making are not well understood but have begun to be studied in research addressing complex problem solving, computer expert systems, and specific issues within a variety of fields (e.g., medicine, special education). 7. Examples of problematic tendencies that have been identified as possible barriers to effective clinical decision making include the use of confirmatory strategies and the belief in the law of small numbers. Key Concepts and Terms belief in the law of small numbers: the tendency to overvalue information obtained from a relatively small sample of individuals, for example, those few individuals with an uncommon disorder with whom one has had direct contact. clinical decision making: the processes by which clinicians pose and answer clinical questions as a basis for clinical actions such as diagnosing a communication disorder, developing a treatment plan, or referring a client for medical evaluation. confirmatory strategy: the tendency to seek and pay special attention to information that is consistent with a clinical hypothesis while failing to seek, or undervaluing, information that is not consistent with the hypothesis. decision matrix: a method used to consider the outcomes associated with correct and incorrect decisions.
Page 12 differential diagnosis: the identification of a specific disorder when several diagnoses are possible because of shared symptoms (selfreported problems) and signs (observed problems). measurement: methods used to describe and understand characteristics of a person. Study Questions and Questions to Expand Your Thinking 1. Taking each of the three cases described earlier in the chapter, use Table 1.1 to determine what types of clinical decisions and related clinical actions are likely to be required for each. 2. For each of those cases used in Question 1, identify a binary clinical decision and consider the implications of the two kinds of errors that can result. 3. On the basis of your current knowledge of children with language disorders, develop a hierarchy of outcomes that might result from clinical errors in the following cases: l l l l
screening of hearing in a 4monthold infant; collection of treatment data in English for a child whose first language is Vietnamese; collection of trial treatment data for purposes of selecting treatment goals for a child exhibiting significant semantic delays; evaluation of a language skills in a child who exhibits severe delays in speech development.
4. Think about decisions—big and small—that you may have made during the last week. Try to remember the process by which you reached your decision. Did any of your decision making involve the use of a confirmatory strategy? Describe the specific example and how your thinking might have differed if you had avoided such a strategy. Recommended Readings Barsalou, L. W. (1992). Cognitive psychology: An overview for cognitive scientists. Hillsdale, NJ: Lawrence Erlbaum Associates. McCauley, R. J. (1988). Measurement as a dangerous activity. Hearsay: Journal of the Ohio Speech and Hearing Association, Spring 1988, 6–9. Tracey, T. J., & Rounds, J. (1999). Inference and attribution errors in test interpretation. In J. W. Lichtenberg & R. K. Goodyear (Eds.), Scientistpractitioner perspectives on test interpretation (pp. 113–131). Boston: Allyn & Bacon. References American Educational Research Association (AERA), American Psychological Association (APA), & National Council on Measurement in Education (NCME) (1985). Standards for educational and psychological testing. Washington, DC: APA. Barsalou, L. W. (1992). Thinking. Cognitive psychology: An overview for cognitive scientists. Hillsdale, NJ: Lawrence Erlbaum Associates.
Page 13 Berk, R. A. (1984). Screening and diagnosis of children with learning disabilities. Springfield, IL: C. C. Thomas. Chapman, L. J., & Chapman, J. P. (1967). Genesis of popular but erroneous psychodiagnostic observations. Journal of Abnormal Psychology, 72, 193–204. Chapman, L. J., & Chapman, J. P. (1969). Illusory correlation as an obstacle to the use of valid psychodiagnostic signs. Journal of Abnormal Psychology, 74, 271– 280. Creaghead, N. A., Newman, P. W., & Secord, W. A. (1989). Assessment and remediation of articulatory and phonological disorders. Columbus: Merrill. Dawes, R. M., Faust, D., & Meehl, P. E. (1993). Statistical prediction versus clinical prediction: Improving what works. In G. Keren & C. Lewis (Eds.), A handbook for data analysis in the behavioral sciences: Methodological issues (pp. 351–367). Hillsdale, NJ: Lawrence Erlbaum Associates. Faust, D. (1986). Research on human judgment and its application to clinical practice. Professional Psychology, 17, 420–430. Guitar, B. (1998). Stuttering: An integrated approach to the nature and treatment (3rd ed.). Baltimore, MD: Williams & Wilkins. Kamhi, A. G. (1994). Toward a theory of clinical expertise in speechlanguage pathology. Language, Speech, Hearing Services in Schools, 25, 115–118. Lahey, M. (1988). Language disorders and language development. New York: Macmillan. McCauley, R. J. (1988, Spring). Measurement as a dangerous activity. Hearsay: Journal of the Ohio Speech and Hearing Association, 6–9. McCauley, R. J. & Baker, N. E. (1994). Clinical decisionmaking in specific language impairment: Actual cases. Journal of the National Student SpeechLanguage Hearing Association, 21, 50–58. Messick, S. (1989). Validity. In R. L. Linn (Ed.), Educational measurement (pp. 13–104). New York: American Council on Education and Macmillan Publishing. Pedhazur, R. J., & Schmelkin, L. P. (1991). Measurement, design, and analysis: An integrated approach. Hillsdale, NJ: Lawrence Erlbaum Associates. Records, N. L., & Weiss, A. (1990). Clinical judgment: An overview. Journal of Childhood Communication Disorders, 13, 153–165. Records, N. L., & Tomblin, J. B. (1994). Clinical decision making: Describing the decisionrules of practicing speechlanguage pathologists, Journal of Speech and Hearing Research, 37, 144–156. Rowland, R. C. (1988). Malpractice in audiology and speechlanguage pathology. Asha, 45–48. Shanteau, J., & Stewart, T. R. (1992). Why study expert decision making? Some historical perspectives and comments. Organizational Behavior and Human Decision Processes, 53, 95–106. Thorner, R. M. & Remein, Q. R. (1962). Principles and procedures in the evaluation of screening for disease. Public Health Service Monograph No. 67, 408–421. Tracey, T. J., & Rounds, J. (1999). Inference and attribution errors in test interpretation. In J. W. Lichtenberg & R. K. Goodyear (Eds.), Scientistpractitioner perspectives on test interpretation (pp. 113–131). Boston: Allyn & Bacon. Turner, R. G. & Nielsen, D. W. (1984). Application of clinical decision analysis to audiological tests. Ear and Hearing, 5, 125–133. Tversky, A. & Kahneman, D. (1993). Belief in the law of small numbers. In G. Keren & C. Lewis (Eds.), A handbook for data analysis in the behavioral sciences: Methodological issues (pp. 341–350). Hillsdale, NJ: Lawrence Erlbaum Associates. Vetter, D. K. (1988a). Designing informal assessment procedures. In D. E. Yoder & R. D. Kent (Eds.), Decision making in speechlanguage pathology (pp. 192– 193). Toronto: BC Decker Inc. Vetter, D. K. (1988b). Evaluation of tests and assessment procedures. In D. E. Yoder & R. D. Kent (Eds.), Decision making in speechlanguage pathology (pp. 190–191). Toronto: BC Decker Inc. Yoder, D. E., & Kent, R. D. (Eds.) (1988). Decision making in speechlanguage pathology. Toronto: BC Decker Inc.
Page 14
Page 15
PART
I BASIC CONCEPTS IN ASSESSMENT
Page 16
Page 17 CHAPTER
2 Measurement of Children’s Communication and Related Skills Theoretical Building Blocks of Measurement Basic Statistical Concepts Characterizing the Performance of Individuals Case Example Theoretical Building Blocks of Measurement What Is Measured by Measurements?
Measurements are usually indirect, that is, they involve the description of a characteristic taken to be closely related to but different from the characteristic of interest. As an illustration of this notion, Pedhazur and Schmelkin (1991) considered temperature. Conceptually, temperature is most closely related to the rate of molecular movement within a material, yet it is almost always measured using a column of mercury. In this way, the measurement is made indirectly using the height of the column of mercury as the indicator, or indirect focus of measurement. Although it would be possible to determine the rate of molecular movement more directly, this is not done because of the considerable expense and effort involved. Similarly, measurements of behavior or other characteristics of people are almost always indirect. Consider, for example, a characteristic that might be of interest to a
Page 18 speechlanguage pathologist, such as a child’s ability to understand language. Clearly, as in the case of temperature, one cannot easily measure this characteristic in a direct fashion. In fact, the ability to understand language cannot ever be directly measured but instead must be inferred from a variety of indicators. This is because that ability is a theoretical construct,1 a concept used in a specific way within a particular system of related concepts, or theory. Thus, the theoretical construct referred to here as “the ability to understand language” represents shorthand for the carefully weighed observations one has made about people as they respond to the vocalizations of others as well as for the information one has read or been told about this construct by others. Figure 2.1 attempts to capture the complex relationship between what one wants to measure, the theoretical construct, and the indicators used to measure it. Looking at Fig. 2.1, you can see that there are many possible indicators for a single construct. This premise is important to clinicians and researchers who need to recognize that any test or measure they use represents a choice from the set of all possible indicators. As will become clearer in later sections of this book, the wealth of indicators available for a construct presents flexibility for those interested in measuring the construct, but it also presents potential problems. For example, a diverse range of indicators for a single construct (e.g., intelligence) can lead to confusion when clinicians or researchers use different indicators in relation to the same construct and reach different conclusions about both the construct and how the characteristic being studied functions in the world. As an example, if one were to use an “intelligence” test that heavily emphasizes knowledge of a particular culture, then use of that measure with children who come from a different culture would lead to very different conclusions regarding how intelligent the children are. Alternatively, focusing on a single indicator and ignoring the broader range of possible indicators for a given construct can lead to its impoverishment. This type of problem has recently received attention in the literature on learning disabilities, where it has been asserted that intelligence is synonymous with performance on one particular test—the Wechsler Intelligence Scale for ChildrenRevised (Wechsler, 1974). Critics complain that the use of this single measure means that the knowledge gained by such research may be far more limited in its appropriate application than has been appreciated. In summary, the choice of which indicator and how many indicators are used in order to gain information about a particular construct—be it intelligence, receptive language, or narrative production—have important implications for the quality of information to be gained. Pedhazur and Schmelkin (1991) described two kinds of indicators: reflective and formative indicators. Reflective indicators represent effects of the construct, and formative indicators represent causes of it. An example of a reflective indicator of one’s ability to understand a language would be the proportion of a set of simple commands in that language that one can correctly follow. An example of a formative indicator of one’s ability to understand a language would be the number of years one has 1 Within the literature on psychological testing, there is a tendency to refer to such constructs as latent variables.
Page 19
Fig. 2.1. The relationship between a theoretical construct—single word comprehension— and several indicators that could be used to measure it. been exposed to it. Almost all indicators are reflective; however, formative indicators are sometimes used. By this point, you may be scratching your head, wondering whether the term indicator is synonymous with the somewhat more familiar term variable. In fact, those terms are quite closely related and, at times, may be used synonymously. I introduced the term indicator first because variable is so closely associated with research that its application to clinical measures might have seemed confusing. Consequently, I believe that an initial discussion of indicators may help readers see how similar clinical and research measures are to one another while averting the confusion. For the purposes of this book, indicator and variable will be used almost interchangeably to refer to a measurable characteristic associated with a theoretical construct. However, variable is frequently used in a more restricted way than indicator, to refer to a property that takes on specific values (Kerlinger, 1973). One more term that commonly functions as a building block for measurement in descriptions of human behavior and abilities is the operational definition. This term was originally introduced in physics by Bridgman (1927) to suggest that in a given application (e.g., a specific research design or a particular clinical measure) a construct can be considered identical to the procedures used to measure it. Operational definitions have been influential in communication disorders because they have given rise to the clinical use of behavioral objectives, specific statements defining desired outcomes of treatment for clients in terms that explain exactly how one will know whether the desired outcome has been achieved. Operational definitions are probably most useful as a means of encouraging us to think carefully about the specific indicators we use to gain information about a given theoretical construct.
Page 20 TESTING AND MEASUREMENT CLOSEUP: ALFRED BINET AND THE POTENTIAL EVILS OF REIFICATION In his 1981 book The Mismeasure of Man, Stephen Jay Gould, a noted biologist and popularizer of science, described the work of Alfred Binet, the Frenchman who developed one of the first wellknown intelligence tests. Gould noted that Binet began to develop the test in 1904 when he was commissioned by the minister of education to devise a practical technique for ‘‘identifying those children whose lack of success in normal classrooms suggested the need for some form of special education” (p. 149). Almost as soon as the test came into use, Binet expressed hopes that its results not be taken as ironclad predictions of what a child could achieve, but that they be used as a basis for providing help rather than as a justification for limiting opportunities. Gould went on to describe the regrettable dismantling of Binet’s fond hope. Gould’s book describes the process of the reification of intelligence, a process in which an abstract, complex theoretical construct (such as “intelligence”) comes to have a life of its own, to be seen as real rather than the abstract approximations that its originators may have had in mind. To illustrate this process, Gould described events in the United States that occurred within a mere 20 years of Binet’s initial test development. Intelligence had been reified to the point that it was used—or rather misused—as a basis for decisions having major effects on military service, emigration policies, penal systems, and the treatment of individuals suspected of “mental defectiveness: ” Levels of Measurement
There are numerous ways to categorize measurements, but the notion of levels, or scales, of measurement introduced by S. S. Stevens (1951) is one of the most influential and continues to inspire both defenders and attackers. Stevens’s levels describe the mathematical properties of different kinds of indicators, or variables. The concept of levels is usually defined operationally, with each level of measurement described in terms of the methods used to assign values to variables—for example, whether the values are assigned using categories (normal vs. disordered) versus numbers (percentage correct). Typically, a hierarchical system of four ordered levels is discussed, in which the higher levels preserve greater amounts of information about the characteristic being measured. Table 2.1 summarizes the defining properties of each level of measurement and lists examples of each that relate to the assessment of childhood language disorders. These levels not only have implications for our interpretation of specific measures, but also what statistics will be appropriate for their further investigation. The nominal level of measurement refers to measures in which mutually exclusive categories are used. Diagnostic labels and category systems for describing errors are frequently used examples of nominal measures. Although numerals may some
Page 21 Table 2.1 Three Levels of Measurement, Their Defining Characteristics and Examples From Developmental Language Disorders Level of Measurement Characteristics
Examples
l
Nominal
l
Mutually exclusive categories
l l
Describing a child as having wordfinding difficulties Labeling a child’s problem as specific language impairment Describing a child’s use and nonuse for each of 14 grammatical morphemes
Describing the severity of a child’s expressive language difficulties as severe Characterizing a child’s intelligibility along a rating scale, such as “intelligible with careful listening,” where no effort has been made to assure that the scale has equal intervals l Describing a child’s language in a conversational sample as productive at a particular phase (Lahey, 1988) l
Ordinal
l Mutually exclusive categories l Categories reflect a rank ordering of the characteristic being measured
Mutually exclusive categories Categories reflect a rank ordering of the characteristic being measured l Units of equal size are used making the comparison of differences in numbers of units meaningful l l
Interval
l
Summarizing a child’s standardized test performance using a raw or standard score l Describing a child’s spontaneous use of personal pronouns using the number of correct responses l Rating intelligibility using an equalinterval scale l
times be used as labels for nominal categories (e.g., serial numbers or numbers on baseball jerseys), nominal measurements are not quantitative and simply involve the assignment of an individual or behavior to a particular category. Measurement at this level is quite crude in that all people or behaviors assigned to a specific category are treated as if they are identical. Ideally, categories used in nominal level measures are mutually exclusive: Each person or characteristic to be measured can be assigned to only one category. Diagnostic labels used in childhood language disorders can ideally be thought of as nominal; however, they are not always mutually exclusive. For example, a child may have language problems associated with both mental retardation and hearing impairment. Similarly, a child with mental retardation may show a pattern of greater difficulties with linguistic than nonlinguistic cognitive functions, leading one to want to entertain a designation of the child as both language impaired and mentally retarded (Francis, Fletcher, Shaywitz, Shaywitz, & Rourke, 1996). The ordinal level of measurement refers to measures using mutually exclusive categories in which the categories reflect an underlying ranking of the characteristic
Page 22 to be measured. Put differently, at this level, categories bear an ordered relationship to one another so that objects or persons placed in one category have less or more of the characteristic being measured than those assigned to another category. Despite the greater information provided at this level of measurement compared with the nominal level, it lacks the assumption that categories differ from one another by equal amounts. Severity ratings are probably the most commonly used ordinal measures in childhood language disorders. Although ordinal measures reflect relative amounts of a characteristic, they are still not quantitative in the sense of reflecting precise numerical relationships between categories. For example, although a profound expressive language impairment may be regarded as representing “more” of an impairment than a severe expressive language impairment, it is not clear how much more of the impairment is present. One result of the absence of equal distances between categories (also called equal intervals) in an ordinal measure is that when rankings are based on an individual judgment, they are likely to be quite inconsistent across individuals. Imagine the case of a clinician who only serves children with devastatingly severe language impairments. When that clinician uses the label mild to describe a child’s problems, it may mean something very different from the level of impairment meant to be conveyed by the same label when it is used by clinicians serving a less involved population. Because of this, it has been recommended that ordinal measures be used when the ratings made by a single individual will be compared with one another, but not when ratings of several people will be compared (Allen & Yen, 1979; Pedhazur & Schmelkin, 1991). The interval level of measurement refers to measures using mutually exclusive categories, ordered rankings of categories, and units of equal size. It is the highest level of measurement usually encountered in measurements of human abilities and behavior. Unlike measurements at the first two levels, measurements at this level can be considered quantitative because numerical differences between scores are meaningful, as was not the case for numerals used at the nominal or ordinal levels. Test scores are usually identified as the most frequent examples of this level of measurement in childhood language disorders. The use of equalsize units in intervallevel measurements allows more precise comparisons of measured characteristics to take place. For example, someone who receives a score of 100 on a vocabulary test can be said to have received 10 more points than someone who received a score of 90, and the same can be said for the person who scored 40 points when compared with someone who scored 30 on the same test. What cannot be said, however, is that someone who received a score of 80 knew twice as much as someone who received a score of 40—that comparison entails a ratio (80:40), and the ability to describe ratios precisely is not reached until the final level of measurement. However, for most measurement purposes, the interval level of measurement allows sufficient precision. The ratio level of measurement refers to measures using mutually exclusive categories, ordered rankings of categories, equalsize units, and a real zero. Achievement of this level of measurement is considered rare in the behavioral sciences, but occurs when a measure demonstrates all of the traits associated with interval measures along
Page 23 with a sensitivity to the absence of the characteristic being measured—the “real zero” mentioned above. The term ratio is used to describe such measures because ratio comparisons of two different measurements along this scale hold true regardless of the unit of measurement that is used. It should also be noted that when ratios are formed from other measures, they achieve this level of measurement. For example, the ratio of a person’s height to weight falls at the ratio level of measurement. Measures involving time (such as age or duration) are probably the most common of the relatively few measures in childhood language disorders that reach the ratio level. At this point, readers may wonder why score data are not described as falling at the ratio level of measurement given that a score of 0 on a test or other scored clinical measures is an unpleasant but real possibility. For score data, however, the zero point is considered an arbitrary zero rather than a real zero because a score of 0 does not reflect a real absence of the characteristic being studied (Pedhazur & Schmelkin, 1991). Thus, for example, a score of zero on a 15item task concerning phonological awareness is not considered indicative of a complete absence of phonological awareness on the part of the person taking the test. In order to demonstrate that a person has no phonological awareness, the test would need to include items addressing all possible demonstrations of phonological awareness and would therefore be too long to administer (or devise, for that matter). Information concerning levels of measurement may be a review to many readers who remember it from past statistics or research methods courses. Levels of measurement are introduced in those contexts because each level is associated with specific mathematical transformations that can be applied to measurements at that level without changing the relationship between the characteristic to be measured and the value or category assigned to it. Those mathematical properties, in turn, determine the types of statistics considered appropriate to the measure. In general, the lower the level of measurement, the less information contained in the measure and the less flexibility one will have in its statistical treatment. Recall that a given construct may be associated with indicators at various levels of measurement. Consequently, the level of measurement of an indicator may be one consideration when choosing a particular measure. Thus, for example, imagine that you are interested in characterizing a child’s skill at structuring an oral narrative. At the crudest level, one might choose to label a child’s performance in the production of such a narrative as impaired or not impaired—measuring it at a nominal level. For greater precision, however, a spontaneous narrative produced by the child might be rated using a 5point scale, with 1 indicating a very poorly organized narrative and 5 a narrative with adultlike structure. Yet probably the most satisfactory type of measure for describing this child’s difficulties is one at the interval level of measurement. An example of such a measure for narrative production is one devised by Culatta, Page, and Ellis (1983), in which the child receives a score for the number of propositions correctly recalled in a storyretelling task. With such a measure (as opposed to measures at the nominal or ordinal levels), you can obtain greater insight into the nature of the difficulties facing the child and can more readily make comparisons to the severity of other children with problems in narrative production.
Page 24 Basic Statistical Concepts As a branch of applied mathematics, the field of statistics has two general uses: describing groups of measurements made to gain information about one or more variables and testing hypotheses about the relationships of variables to one another. For many students in an elementary statistics class, each of these uses represents a vast, aweinspiring, and sometimes fearprovoking landscape. In this section of the chapter, only the highest peaks and lowest valleys of these landscapes will be surveyed. Specifically, selected statistical concepts are introduced in terms of their meaning and the practical uses to which they are applied by those of us interested in measuring children’s behaviors and abilities. Although statistical calculations are described, only rarely are specific formulas given so that the connection between meaning and application can remain particularly close. More elaborate and mathematically specific discussions can be found in sources such as Pedhazur and Schmelkin (1991). Statistical Concepts Used to Describe Groups of Measurements
One of the most common uses of statistics is to summarize groups of measurements, typically referred to as distributions. Distributions can consist of a set of measurements based on actual observations (often called a sample) or a set of values hypothesized for a set of possible observations (often called a population). An example of a distribution based on a sample would be all of the test scores obtained by children in a single preschool class on a screening test of language. In contrast, an example of a distribution based on a population would be all of the scores on that same test obtained by any child who has ever taken it. Except when population distributions are discussed from a purely mathematical point of view, they are almost always inferred from a specific sample distribution because of the impracticality or even impossibility of measuring the population. Two types of statistics used to summarize distributions of measurements are measures of central tendency and variability. Measures of central tendency are designed to convey a typical or representative value, whereas measures of variability are used to convey the degree of variation from the central tendency. Measures of central tendency have been described as indicating “how scores tend to cluster in a particular distribution” (Williams, 1979, p. 30). The three most common measures of central tendency are (in order of decreasing use) the mean, median, and mode. The mean is the most common measure of central tendency. It is used to refer to the value in a distribution that is the arithmetic average, that is, the result when the sum of all scores in a distribution is divided by the number of scores in the distribution. Unlike the two other measures of central tendency, the mean is appropriate only for measurements that fall at interval or ratio levels. Although it is considered the richest measure of central tendency, the mean has the negative feature of being particularly sensitive to outliers—extreme scores that differ greatly from most scores in the distribution. Because of this, the mean will sometimes not be used even if the level of measurement allows it; instead, the median, which is the next most sensitive measure of central tendency will be used.
Page 25 The median is the score or category that lies at the midpoint of a distribution. It is the middle score in the case of ungrouped distributions of interval or ratio data and the middle category in the case of ordinal data. The median is considered an appropriate measure of central tendency for either ordinal or interval measures and is even superior to the mean in terms of its relative stability in the face of outliers. On the other hand, it is considered inappropriate for nominal measures because the categories used at that level of measurement cannot, by definition, be ordered logically. Because of this lack of ‘‘order” in nominal data, finding a middle score or category is nonsensical. The third and final measure of central tendency, the mode, has relatively few uses. The mode simply refers to the most frequently occurring score (for interval or ratio data) or category (for nominal data). Because of the way the mode is defined, it is possible for there to be more than one mode in a given distribution, in which case the distribution from which it comes can be referred to as bimodal, trimodal, and so forth. For nominal level data, the mode is the only suitable measure of central tendency. Because measurements within a distribution vary, a measure of variability is also required to characterize it effectively. Three measures of variability, two of which are very closely related, are most frequently used in descriptions of children’s abilities and behaviors. As was done in the description of measures of central tendency, these measures will be described in order of decreasing use. Although considered somewhat daunting by beginning statistics students because of its relatively involved calculations, the most frequently used measure of variability is the standard deviation. The standard deviation was developed for interval and ratio measures as an improvement on the seemingly good idea of describing the average (or mean) difference (or deviation) from the mean. The problem with an average deviation was that because of the way the mean is defined, all of the deviations above the mean are positive in sign and would therefore balance all of the negative deviations falling below the mean, leading to an average deviation of zero for all distributions—regardless of obvious differences in variability from one distribution to another. In order to avoid this problem, the standard deviation is calculated in a manner that makes all deviations positive. Nonetheless, the intent behind the standard deviation is to convey the size of the typical difference from the mean score. As I expand on in an upcoming section of this chapter, the standard deviation has special significance because of its relationship to the normal curve. Specifically, standard deviation units become critical to comparisons of one person’s score against a distribution of scores, such as occurs when test norms are used. The concept of variance is closely related to the standard deviation. In fact, the standard deviation of a distribution is the square root of its variance. Despite this very close relationship to standard deviation, variance is less frequently used because, unlike the standard deviation, it cannot be expressed in the same units as the measure it is being used to characterize. For example, you can describe the age of a group of children in months by saying that the mean age for the group is 36 months, and the standard deviation is 3 months. This results in a much clearer description than saying that the mean age for the group is 36 months, and the variance is 9. No, not 9 months—simply 9. Because of this “unitlessness,” variance is rarely used when the
Page 26 intent is simply to describe the characteristics of a group. It does play a role in some statistical operations, however, and so is an important statistic to be aware of. The least complicated measure of variability, the range, is also the least frequently used of the three measures. It represents the difference between the highest and lowest scores in a distribution. The utility of the range lies in its ease of calculation and its applicability to distributions at any level of measurement other than the nominal level. For interval or ratio data, it is calculated by subtracting the lowest from the highest score and adding 1. Thus for example, if the highest and lowest scores in a distribution of test scores were 85 and 25, respectively, the range would be 61. At the ordinal level, the range is usually reported by indicating the lowest to highest value used. For example, one might report that listener ratings of a child’s intelligibility in conversation ranged from usually unintelligible to intelligible with careful listening, or from 2 to 4 if a 5point numeric scale were used. Because the range is based on only two numbers (or two levels in the case of an ordinal measure), its weakness is the lack of sensitivity and susceptibility to the effects of outliers. In summary, measures of central tendency and variability are useful for describing groups of measurements related to a single variable and are selected on the basis of the variable’s level of measurement. Statistical Concepts Used to Describe Relationships between Variables
A number of statistical concepts are available to describe relationships between and among two or more groups of measurements and to test hypotheses about the nature of those relationships. Because the intent here is to focus only on those concepts most basic to understanding measurement applications in developmental language disorders, only one of those concepts will be discussed in some detail—the correlation. The correlation between two variables describes the degree of relationship existing between them as well as information about the direction of that relationship and its strength. Correlation coefficients typically range in degree from 0 (indicating no relationship) to positive or negative 1 (indicating a perfect relationship in which knowing one measure for an individual would allow you to predict that person’s performance on the second measure with perfect accuracy). The sign of the correlation refers to its direction: A positive correlation indicates that as one measure increases, the second measure increases as well. Relationships associated with a positive correlation are said to be direct. A vivid example of a direct relationship would be the relationship some see between money and happiness. In contrast, a negative correlation indicates that as one measure increases, the second measure decreases. Relationships associated with a negative correlation are said to be inverse. A vivid example of an inverse relationship would be the relationship between unpaid bills and peace of mind. Figure 2.2 contains examples of graphic representations of correlations that differ in magnitude and direction. Notice that two of the correlations are described as being associated with a correlation coefficient of 0. The second of those demonstrates a curvilinear relationship, which cannot be captured by the simple methods described here.
Page 27
Fig. 2.2. Illustrations showing the variety of relationships that can exist between variables and can potentially be described using correlation coefficients. These include no relationship (i.e., the value of one variable is independent of the value of the other), a curvilinear relationship (i.e., in which the nature of the relationship between variables changes in a curvilinear fashion depending on the value of one of the variables), and linear relationships of lower and greater magnitudes. As a more detailed (and relevant) example involving correlation, let’s consider two hypothetical sets of test scores obtained for a class of third graders—one on reading comprehension and the other on phonological awareness (explicit knowledge of the sound structure of words). If this group of children were like many others, then one would expect their performances on these two measures to be positively correlated (e.g., Badian, 1993; Bradley & Bryant, 1983)—that is, one would expect that children who receive higher scores on the reading comprehension test would receive higher scores on the phonological awareness test. However, because many factors affect each of the abilities targeted by the measures, it would be unlikely that the magnitude of the correlation, which reflects the strength of the association, would be very large. In
Page 28 fact, a low correlation might be expected in this context. Table 2.2 contains labels that are frequently used to describe correlations of various magnitudes (Williams, 1979). The correlation coefficient most frequently used in describing human behavior is the Pearson product–moment correlation coefficient (r), the specific type of correlation that would have been appropriate for the example given above. Unfortunately, that correlation coefficient is only considered appropriate for measurements at the interval or ratio level of measurement. For measurements at the ordinal level, Spearman’s rankorder correlation coefficient (ρ) can be calculated. At the nominal level, the contingency coefficient (C) is used to describe the relationship between the frequencies of pairs of nominal categories. In addition to these correlation coefficients, however, there are several other correlation coefficients (e.g., phi, point biserial, biserial, terachoic) that are used during the development of standardized tests. The choice of these less familiar correlation coefficients is dictated by the characteristics of the measurements to be correlated, such as whether either or both of the measurements are dichotomous (e.g., yes–no, correct–incorrect), multivalued (e.g., number correct), or continuous (e.g., response times). It is easy to be intimidated by an unfamiliar correlation coefficient. However, this danger can be countered with the knowledge that the concept of correlation remains the same, regardless of how exotic the name of the specific coefficient. Thus, whether one is using phi or Pearson’s product–moment correlation, a correlation coefficient always is intended to describe the extent to which two measures tend to vary with one another. In fact, even when one examines the relationships between the distributions of more than two variables using multiple correlations, the interpretation of correlations remains essentially unchanged. Correlation coefficients are usually reported along with a statement of statistical significance, which describes the extent to which the correlation coefficient is likely to differ from zero by chance, given the size of the sample on which it is based. In general, statements of statistical significance always carry the implication that although a particular sample of behavior was observed, it is being used to draw conclusions for the larger population. Statements of statistical significance are used to test hypotheses—conjectural statements about a relation between two or more variables (Pedhazur & Schmelkin, 1991). In this case, the hypothesis is that the obtained correlation coefficient differs from zero. Statistical significance indicates that the obtained value was unlikely to have occurred by chance. Table 2.2 Descriptive Labels Applied to Correlations of Varying Magnitudes Correlation
Label
Degree of Relationship
<.20
Slight correlation
Almost negligible relationship
.20–.40
Low correlation
Definite, but small relationship
0.40–.70
Moderate correlation
Substantial relationship
0.70–.90
High correlation
Marked relationship
> .90
Very high correlation
Very dependable relationship
Page 29 Unfortunately, a correlation’s statistical significance is sometimes mistakenly taken as the most important indication of its importance. However, a very low correlation coefficient is unlikely to be important even if it is statistically significant because it does not explain much of the variability of the correlated measures. In addition, the larger the sample size, the easier it is for a correlation coefficient to attain statistical significance. Therefore, although a statistically significant correlation coefficient is always desirable, the magnitude as well as the significance of the correlation must be considered. An additional concern surrounding the interpretation of correlation coefficients, such as the Pearson product–moment correlation coefficient, is that its magnitude does not itself reflect the extent to which two variables explain one another. Instead, that information is provided by a closely related statistic, the coefficient of determination, which can also be referred to as “variance accounted for,” or r2, for the Pearson product–moment correlation. It is calculated by squaring the correlation coefficient and multiplying it times 100. As an example, assume that the correlation between two sets of test scores was .60 (a moderate correlation according to Table 2.2). The corresponding coefficient of determination would be 36%, meaning that 36% of the variation observed in the two sets of test scores was accounted for by their relationship—leaving a substantial 64% unexplained. Awareness of this concept becomes important in evaluating correlational evidence provided by test developers to support the quality of their test. In their book on phonologic disorders, Bernthal and Bankson (1998) made a general point concerning the limitations of statistical significance as an indication of the importance a research finding. Although they were not talking specifically about correlational data, they warned clinicians against the assumption that any statistically significant finding reported in the research literature was worthy of impact on clinical practice. They use the term clinical significance to suggest that only relatively large effects (i.e., those that would be associated with a relatively large proportion of variance accounted for) would likely be of importance in the clinical environment. They encouraged readers to look for evidence of the size of relationships in the form of “variation accounted for,” which is reported as omegasquared for many analyses (Young, 1993). For the purposes of this book, Bernthal and Bankson’s caution should be considered as it applies to both correlation coefficients and any statistical finding that might be used in discussions of children’s language abilities. A final cautionary statement concerning the interpretation of correlations is the fundamental idea that the existence of a correlation between two measures does not constitute evidence of a causal relationship between them. Thus, returning to the example initially used to introduce the concept of correlation, remember that children’s scores on two tests, one of reading comprehension and one of phonological awareness, were found to be correlated. Although it would be very tempting to conclude that children’s phonological awareness “caused’’ their comprehension performance, that would be an incomplete, even incorrect interpretation of the correlation. Theoretically such an interpretation would be quite inviting because it would be easy to imagine that a greater familiarity with the sound structure of a written language would make its processing easier, thus resulting in improved comprehension. In fact, however, it is equally plausible that children’s comprehension caused their performance on the phonological tasks. That is, their level of comprehension may have allowed them to process the
Page 30 sound information of the language to a greater degree because they were not as overwhelmed with the other memory and processing demands associated with understanding text. Thus, they would perform better on the phonological awareness test because of their comprehension skills. Finally, it would also be plausible to imagine that children’s performances on both tasks were in fact caused by some third variable or by multiple variables. The oftrepeated warning not to confuse correlation with causation is probably one of the most important lessons in this or any book because of its impact on critical thinking in nonscientific as well as scientific realms. In addition to simple correlations, a wide range of other statistics are available for examining hypotheses about the relationship between variables. Frequently, hypotheses relate to the relationship of one or more classification variables (e.g., age and gender) to an outcome or response variable (e.g., performance on a particular test). Alternatively, statistics are used to determine whether one or more variables have a causal effect on a response variable. When that is the case, variables hypothesized to be causes are termed independent variables and those hypothesized to be effects are termed dependent variables. Selection of specific statistical techniques for testing a hypothesis depends quite heavily on the level of measurement of the outcome or dependent variable. Variables measured at the interval or ratio level of measurement are generally studied using parametric statistics (e.g., t tests, analyses of variance, or ANOVAs); whereas variables measured at nominal or ordinal levels are examined using nonparametric statistics (e.g., chisquare analyses and Cochran’s Q). Nonparametric statistics are also used when the dependent variable seems to be distributed in a manner that either departs significantly from a normal distribution or seems likely to violate assumptions underlying the use of normal distributions. A concise introduction to the decision making behind the selection of an appropriate statistical technique can be found in Chial (1988). Longer discussions can be found in Freedman, Pisani & Purves (1998) or McClave (1995) for parametric statistics, and Conover (1998) or Gibbons (1993) for nonparametric statistics. Statistical techniques for testing hypotheses are not explored further here because of their relatively limited use in assessing children’s language disorders. They primarily come into play in the documentation provided by test developers to support the value of standardized measures, and they will be discussed further in that context in the next chapter. Characterizing the Performance of Individuals Methods for summarizing an individual’s performance vary depending on the nature of the measurement being made. Numerous schemes for categorizing measurements of human behavior have been proposed. These categorizations often assume that the measurements of interest are formal tests because tests are the most studied form of measurement related to human abilities and behaviors. One frequently discussed categorization separates achievement testing from ability testing; the former seeks to measure actual learning, and the latter seeks to measure learning potential. Within achievement testing, distinctions are made between placement testing, which takes place prior to instruction; formative and diagnostic testing, which take place during
Page 31 instruction; and summative testing, which takes place at the end of instruction (Gronlund, 1982). Formative testing is designed to measure the learner’s progress as learning is underway, whereas diagnostic testing identifies the source of difficulties impeding the learner’s progress. Summative testing is designed to evaluate learning progress at some ending point, for example, at the end of a school term. Other categories applied to tests have included paperandpencil tests, the most studied medium for test execution; performance tests, which typically involve the test taker’s manipulation of objects or performance of some activity that usually does not involve the use of paper and pencil; and computerized tests, which involve the use of computer displays or both computer display and keyboarded responses. Although performance tests predominate as a method of testing in developmental language disorders, paperandpencil tests are typically used in cases when written language skills are assessed. Computerized testing is a growing topic of interest (e.g., Wiig, Jones, & Wiig, 1996) because of the possibilities it presents for providing more interesting, even animated stimuli and for greater tailoring of test items to a client’s needs by choosing later items based on earlier performance (Bunderson, Inouye, & Olsen, 1989). Each of these types of tests alters aspects of the test administration and scoring process and thus indirectly affects the interpretation of individual scores. Although tests and other measures can be categorized along many different dimensions, the categorization of measures as normreferenced versus criterionreferenced has the greatest impact on how individual performances are interpreted. In fact, at times, these two categories are referred to as modes of score interpretation rather than types of tests (e.g., APA, AERA, & NCME, 1985). NormReferenced versus CriterionReferenced Measures
Overall, normreferenced measures are those for which an individual’s performance is interpreted in relation to the performance of others, and criterion referenced measures are those for which an individual’s performance is interpreted in relation to an established behavioral criterion. Table 2.3 lists some norm referenced and criterionreferenced measures with which readers may have had personal experience as well as some that are commonly used in developmental language disorders. Although not every author would agree that some of the more informal of these measures should be categorized as normor criterionreferenced, each of the measures fits within the definitions appearing at the beginning of this paragraph. The dependence of this categorization on the method used to interpret an individual’s score can be illustrated using the brief example in Table 2.4, which I call the Amazing University of Vermont Test. Imagine first that this wouldbe test is to be given to determine which incoming students to the University will receive a scholarship being granted by the University’s Alumni Association. If that were the test’s purpose, appropriate score interpretation would involve comparing all of the incoming first year students to see which ones had the most knowledge and thus would receive the scholarship. That method of score interpretation, therefore, would depend not only on knowledge of a single test taker’s score, but also on knowledge of the performance of the entire group against which the individual’s performance was to be compared.
Page 32 Table 2.3 Examples of Criterion and NormReferenced Measures Associated With Readers’ Personal Experiences and Clinical Practice in Developmental Language Disorders Normreferenced Personal experience IQ tests GREs l SATs l Classroom tests (with grading on the curve)
Criterionreferenced
Developmental language disorders
l
Personal experience
Driver’s test Eye examination l Classroom examination (without grading on the curve) l
l
l l
IQ tests Most language tests
l
Developmental language disorders
Most articulation or phonology tests Treatment probes in which a set criterion (e.g., 80%) is used l l
Note. GRE = Graduate Record Examination; SAT = Scholastic Aptitude Test. Table 2.4 The Amazing University of Vermont Test
1. The University of Vermont is located in
(a) Burlington, Vermont (b) Montpelier, Vermont (c) Manchester, New Hampshire (d) St. Albans, Vermont (e) Enosburg Falls, Vermont
2. The official acronym for the University is
(a) U of V (b) VU (c) UVM (d) MUV (e) none of the above
3. The number of students attending the University is
(a) 500–1500 (b) 1500–3000 (c) 3000–4500 (d) 4500–6000 (e) > 10,000
4. The school colors are
(a) grey and white (b) green and white (c) grey and green (d) green and gold (e) grey and gold
5. The mascot of the University is
(a) snowy owl (b) raccoon (c) barn owl (d) catamount (e) Jersey cow
6. The most popular spectator sport at the University is
(a) cow tipping (b) ice hockey (c) football (d) downhill skiing (e) snowboarding
7. The most famous philosopher graduating from UVM was
(a) Ethan Allen (b) Ira Allen (c) Woody Allen (d) Woody Jackson (e) John Dewey
8. Translated from the Latin, the school motto means
(a) Scholarship and hard work (b) Stay warm (c) Live free and stay out of New Hampshire (d) Suspect flatlanders (e) Independence and dignity
Such a comparison group is called a normative group, hence, the designation normreferenced to refer to the method of score interpretation and sometimes to refer to the specific type of measure being used. Norms, then, refer to the specific information about the distribution of scores associated with the normative group. Two types of norms merit special attention: national norms and local norms. National norms are data concerning a group that has been recruited so as to be representative of a national cross section of individuals who might be tested. Norms for tests involving children are typically organized so that information
Page 33 based on subgroups of children are reported by age (usually in 2–6 month intervals), by grade, or both. It is often recommended that when norms are collected, the normative groups be matched against national data (usually census data) for socioeconomic status, race, ethnicity, education, and geographic region (Salvia & Ysseldyke, 1995). National norms are collected almost solely for standardized measures that will be used with very large numbers of individuals each year. For example, intelligence tests, educational tests, and many language tests typically provide national norms. Local norms are prepared when national norms for a measure are unavailable or inappropriate to a group of test takers. They represent normative, data collected on a group of test takers like those on whom the measure will be used. Local norms are especially useful when national norms are likely to be inappropriate for a group of test takers whose language is unlike that in which the test is written. Most frequently, this would involve individuals who speak one of many regional or social dialects that are significantly different from the idealized “standard” American English dialect, for example, speakers of Black American English or Spanishinfluenced English. Alternatively, a clinician may want to collect local norms for specific client populations for whom normative data are lacking (e.g., individuals with hearing impairment, mental retardation, or cerebral palsy). Rather than using the Amazing University of Vermont Test to compare performances of a number of test takers, you might use the Amazing University of Vermont Test to determine whether a group of incoming students has adequately learned the information included in their orientation materials. In that case, the outcome of the test could lead to a student’s becoming exempt from an additional orientation session or being required to complete it. For that testing purpose, scores would be interpreted in relation to a behavioral criterion, for example, 6 of 10 correct. When interpreted in that way, the test could be described as a criterionreferenced measure. The level of performance would then be considered a cutoff, or, less frequently, a cutting score. Often the term master is used to refer to a test taker whose score exceeds the cutoff score, and nonmaster is used to refer to a test taker whose score falls below the cutoff. Briefly then, in contrast to a normreferenced interpretation, score interpretation for a criterionreferenced measure hinges on knowledge of the person’s raw score and the cutoff score. Information about a reference or normative group is not necessary. It is often useful, however, for developers of criterionreferenced measures to study group performances as a means of determining a reasonable cutoff score—one that is empirically derived rather than based on an arbitrary cutoff, for example at 80% correct. In addition to differences in the mechanics of score interpretation, normreferenced versus criterionreferenced measures tend to differ in the scope of knowledge being assessed and the specific method used to choose items. Specifically, normreferenced measures tend to address a large content area which is sampled broadly; whereas criterionreferenced measures tend to address a quite narrowly defined concept that is sampled in as exhaustive a manner as possible. For normreferenced measures, items are selected so that the greatest amount of variability in test scores is achieved among test takers; whereas for criterionreferenced measures, items are selected primarily because of how well they address the targeted construct. Figure 2.3 shows the steps involved in the development of standardized normreferenced and criterionreferenced instruments.
Page 34 At the beginning of this section, only a single measure, the Amazing University of Vermont Test, was used to introduce the concepts of criterion and normreferencing. This was done in order to emphasize that method of interpretation is the most crucial feature distinguishing norm from criterionreferenced measures. Practically, however, because of differences in how items are selected for each type of measure, it is very difficult to develop a single measure that can equally support these two different approaches to score interpretation. Types of Scores NormReferenced Measures
For normreferenced measures, a variety of test scores is useful. Because of the centrality of the comparison between the test taker’s and the normative group’s performances, however, the raw score is of little value
Fig. 2.3. Steps involved in the development of normreferenced and criterionreferenced standardized measures.
Page 35 except as the starting point for other scores. These other scores are termed derived scores because of their dependent relationship to the raw score. Three types of derived scores deserve attention: developmental scores, percentile ranks, and standard scores. These are listed in increasing order of both their value as a means of representing a test taker’s performance and their complexity of calculation. Developmental scores are the least valuable derived scores but are still ubiquitous in clinical and research contexts—a paradox that I will address shortly. The two most commonly used developmental scores are ageequivalent scores and gradeequivalent scores. A test taker’s ageequivalent score is derived by identifying the age group that has a mean score closest to the score received by the individual test taker. For example if a test taker’s raw score of 85 corresponding the mean raw score of a group of 3yearolds, the ageequivalent score assigned to the test taker would be 3 years. If there is no age group that exactly matches the score of a test taker, then an estimation is made of how many months should be added to the age group whose mean falls just below that of the test taker, resulting in gradeequivalent scores, such as 2 years, 6 months or 5 years, 11 months. Typically, test users do not have to examine the group data directly, but are given tables listing raw scores and the agescores to which they correspond. Gradeequivalent scores are similar in many respects to ageequivalent scores but are, as one would guess from their name, derived from data concerning the mean performance of groups of test takers in different grades. When estimation is required, gradeequivalent scores are reported in tenths of a grade. Thus, for a 12yearold who achieves a score just slightly above that of a group of 4th graders, a gradeequivalent of 4.1 or 4.2 might be assigned. In psychometric circles, almost never is a kind word spoken about scores of this type. Long, derogatory lists of the problems with developmental scores abound (e.g., McCauley & Swisher, 1984; Salvia & Ysseldyke, 1995), but the lists invariably center around concerns that such scores are easily misunderstood and likely to be unreliable. Table 2.5 provides an elaborate version of these lists as well as a pointed commentary on developmental scores. The appeal of developmental scores is twofold. First, the apparent uniformity of meaning of such scores across different tests makes it seem that they allow for a comparison of skills in different areas and permit a sensitive quantification of degree of impairment. Thus, when a 9yearold child is said to have skills falling at the 7 year level in math and the 8year level in receptive language, it can be misinterpreted as indicating significant problems in both areas, with a more severe impairment in mathematics. Although many individuals are quite aware of the low esteem in which developmental scores are held, they nonetheless fall into misinterpretations like this. Given that ageequivalent scores only crudely compare two scores as their means of normreferencing, neither individual developmental scores nor comparisons between them necessarily convey degrees of impairment. Depending on the tests used, for example, it may be that a great many very normally developing children would exhibit the same “impaired” scores. The second appeal of developmental scores lies outside the interests of individual test users. Numerous state and insurance regulations demand that developmental scores be used to describe test performances, presumably on the basis of the misconceptions cited earlier that meaningful comparisons between skill areas can be based
Page 36 Table 2.5 Five Drawbacks to Developmental Scores, Such as AgeEquivalent and GradeEquivalent Scores (Anastasi, 1982; Salvia & Ysseldyke, 1995) Developmental scores lead to frequent misunderstandings concerning the meaning of scores falling below a child’s age or grade. For example, a parent may interpret an age equivalent of 5 years, 10 months as evidence of a delay in a 6yearold. In fact, by definition, half of those children in a given age group (or grade 1. level) would receive ageequivalent scores below the child’s age. This problem arises because developmental scores contain no information about normal group variability. There is a tendency to interpret developmental scores as indicating that performance was similar to that of an individual of corresponding age—for example, that a 2. score of 3 years, 6 months would be associated with performance that was qualitatively like that of a 3½yearold. In fact, however, it is unlikely that the nature and consistency of errors would be similar for two individuals with similar developmental scores but differing ages or grade levels. 3. Developmental scores promote comparisons of children with other children of different ages or grades rather than with their sameage peers. Developmental scores tend to be ordinal in their level of measurement. Therefore, they lack flexibility in how they may be treated mathematically and are prone to 4. being misunderstood. For example, a “delay” of 1 year in a fifth grader who receives a grade equivalent score of 4 is not necessarily comparable to a “delay” of 1 year in a ninth grader who receives a gradeequivalent score of 8. 5. Developmental scores are less reliable than other types of scores.
on them. As I discuss in the next section of this chapter, such regulation of test users provides a vivid example of the numerous cases in which assessment must respond to a variety of forces outside of the direct clinical interaction between clinician and client. Typically, test users faced with the dilemma of having to report developmental scores are advised by psychometricians to report them along with more useful derived scores in a manner that minimizes the likelihood of misunderstanding. Percentile ranks are actually one variety of a class of derived scores that includes quartiles and deciles. Percentile ranks represent the percentage of people receiving scores at or below a given raw score. Thus, a percentile rank of 98, or 98th percentile, indicates that a test taker received a score better or equal to those of 98% of persons taking the test (usually the normative sample). This type of score has the distinct advantage of being readily understood by a wide range of persons, including parents and some older children. Percentile ranks have two disadvantages. The first is that they are sometimes misunderstood as meaning percentage of correct responses on the test. Readers can avoid this false step if they remember that on a very difficult test, one could perform better than almost anyone (and therefore have a high percentile rank), but in fact have obtained a low percentage correct. The second disadvantage of percentile ranks is that, like developmental scores, they represent an ordinal measure and thus cannot be combined or averaged. Standard scores represent the pinnacle of scoring approaches used in normreferenced testing. They preserve information about the comparison between an individual and appropriate age group and information about the variability of the normative group.
Page 37 In addition, they are at the interval level of measurement and thus can be combined and averaged in ways not possible with the other types of scores discussed earlier. Standard scores are ‘‘standard” because the original distribution of raw scores on which they are based has been transformed to produce a standard distribution having a specific mean and standard deviation. Because standard scores are normally distributed, they can be interpreted in terms of known properties of the normal distribution, especially expectations concerning how expected or unexpected a particular score is. This makes standard scores a favored method of communicating test results among professionals. Figure 2.4 illustrates the relationship between the normal curve and several of the most frequently used scores: the z score, deviation IQ score, and T scores. The most basic standard score is the z score, which has a mean of 0 and a standard deviation of 1. It is calculated by taking the difference of a particular raw score from the mean for the distribution and dividing the result by the standard deviation of the distribution. Each score is represented by the number of standard deviations it falls from the mean, with positive values representing scores that were above the mean and negative values, representing those below the mean. Because of the relationship between this type of score and the normal curve, it is possible to know that a z score
Fig. 2.4. The relationship between the normal curve and several of the most frequently used standard scores, including the zscore, deviation IQ score, and T scores. From Assessment of children (p. 17), by J. M. Sattler, 1988, San Diego, CA: Author. Copyright 1988 by J. M. Sattler. Reprinted with permission.
Page 38 of –2 falls 2 standard deviations below the mean and that fewer than 3% of the normative population had a score that low or lower. Other widely used standard scores in developmental language disorders are the deviation IQ and the T score. These scores share the virtue of z scores in their known relationships to the normal curve: The deviation IQ has a mean of 100 and a standard deviation of 15. As an additional benefit, such scores are somewhat less open to the confusion associated with negative numbers used in z scores. However, their interpretation remains quite challenging for people who are unfamiliar with the use of the normal curve in score interpretation. Still, because of their strengths, standard scores such as these are frequently used among professionals, with percentiles favored for use with other audiences. CriterionReferenced Measures
For criterionreferenced measures, raw scores are the major type of score because by definition such measures involve the comparison of a raw score against a given criterion or cutting score. As mentioned previously, it is possible for the cutoff score to be based on empirical study or for it to be arbitrarily established on the basis of hypotheses about the level of performance, or performance standard, required for satisfactory advancement to later levels of skill acquisition (McCauley, 1996). Case Example Case 2.1 illustrates most of the concepts discussed in this chapter as they relate to Austin, a 5yearold boy with specific language impairment. This hypothetical report is annotated to highlight instances where a measurement has been made by the clinician. Specifically, both formal and informal measures are bolded in this case. Case 2.1 SpeechLanguageHearing Center 353 Luse Street Burlington, VT 054050010 Client’s name: Austin G. Address: (child’s home with mother and stepfather) 284 Willow Creek Road Burlington, VT 05401
Date of Birth: 1/8/92 Education Status: Kindergarten School: Woodward Elementary School 2 Station Street Burlington, VT 05401 Date of report: 2/14/97
Date of Evaluation: 2/12/97 Parents’ names: Leslie G. (mother) Warren G. (stepfather) George C. (father) 33 Elm Street Savannah, GA 31411 h: (912) 9999393 Referral Source: Dr. A. B. Park Student clinician: E. Miller, B.A. Supervisor: R. J. Turner, M.S., CCCSLP
Page 39 BACKGROUND INFORMATION Austin, a 5year, 1monthold boy, was seen today for a speech and language evaluation following referral by his primary care physician. Dr. A. B. Park. Background information was obtained using a case history form, an indepth parent interview conducted with Mr. and Mrs. G., who accompanied Austin today, and a phone conversation with Mr. C., Austin’s biological father. The reasons given by Mr. and Mrs. G for today’s evaluation were growing concerns regarding Austin’s articulation, overall intelligibility, and expressive language skills. Mr. and Mrs. G report that strangers and even other children in Austin’s class find him difficult to understand and frequently ask him to repeat what he has said. He is also becoming increasingly frustrated with family members when they fail to understand him, resulting in increasingly frequent and escalating arguments with his older sister. Elizabeth (age 10). In contrast, they report that he understands everything that is said to him and is recognized as a very bright child even by adults who fail to understand him. Austin and his sister Elizabeth live with Mr. and Mrs. G and see their biological father, Mr. C, only at holidays and for 6 weeks in the summer. The parents divorced when Austin was 1 year old, and he calls his stepfather as well as his biological father “Daddy.” Austin currently attends a kindergarten class in the Woodward Elementary School—Burlington , where he has three or four especially close friends. According to his teacher Mrs. Smith’s reports to his parents, Austin is a happy child Who is popular at least in part because of his enthusiastic manner and skill at playground athletics. Because he is small for his age (in the 5th percentile for height and weight) and because of his immaturesounding speech, he is sometimes teased by children from older classes about being a “baby,” but is readily defended by his classmates and appears unaffected by such taunts, according to Mrs. Smith. She referred Austin for a speechlanguage evaluation by the school speechlanguage pathologist in January because of concerns about his language production and articulation, but otherwise she states that he is performing well in the kindergarten classroom. Because circumstances prevented that evaluation from taking place, Mr. and Mrs. G had decided to seek an evaluation at the Luse Center. Austin’s birth and early health and developmental history are unremarkable except for delays in the onset of speech, with only about 10 words by age 2 and no word combinations until age 3. Although he had shown a dramatic increase in the length of his utterances over the past 2 years, his parents reported that he still speaks in incomplete. sentences and produces many words incorrectly. Both biological parents reported a significant history of family members with speech and language problems, including Mr. C., who received speech therapy until 5th grade for what appeared to have been languagerelated concerns, two of Austin’s paternal uncles, one maternal aunt in the preceding generation, and two maternal cousins.
Page 40 LIST OF ASSESSMENT TOOLS The assessment procedures that were conducted during this evaluation are listed and reported in the paragraphs that follow. Hearing screening test Test of Language Development—2 Primary (Newcomer & Hammill, 1991). Expressive OneWord Picture Vocabulary Test (Gardner, 1990) Peabody Picture Vocabulary Test–3 (Dunn & Dunn, 1997) Bankson and Bankson Test of Phonology (Bankson & Bernthal, 1990) Oral Structural Mechanism Examination—Revised (St. Louis & Ruscello, 1987). In addition, informal procedures were used to screen pragmatics, voice, and fluency. Overall results of these tests and procedures are described in the following sections, with more detailed information about subtest performance and specific errors available on summary test forms (see file). Hearing Austin’s hearing was screened using pure tones that were presented under headphones at 20 dB bilaterally at 500, 100, 2000, and 4000 Hz. He passed the screening in both ears. Receptive Language Austin’s ability to understand what is said to him was assessed using receptive portion of the Test of Language Development—2 Primary (TOLDP:2) and the Peabody Picture Vocabulary Test—3 (PPVT2). On the receptive language subtests of the TOLDP:2, Austin received a listening quotient of 96, which approximates a percentile rank of 50. On the PPVT2, his performance was even better. The raw score he obtained was 78, which corresponds to a percentile rank of 75 and a standard score of 110. Expressive Language Austin’s ability to express himself was assessed using the TOLDP:2 expressive portions and the Expressive OneWord Picture Vocabulary Test—Revised, as well as informal measures obtained from a transcription of a conversational sample taken as Austin played with his mother. Austin’s formal test scores were considerably lower on these measures, in part because of the difficulties associated with his speech intelligibility. On EOWPVTR, Austin received a raw score
Page 41 of 20, which corresponds to the 5th percentile and a standard score of 76. Of his 10 errors on that test, approximately 4 were unambiguous with respect to the possible impact of his speech production difficulties; for example, they involved the use of a more general or associated word than the target, or they consisted of instances when Austin said that he did not know the name. On the TOLDP:2 expressive subtests, Austin received an overall speaking quotient of 61, which falls below the first percentile. An examination of his utterances during a conversation with his mother revealed frequent omission of grammatical morphemes, an absence of complex sentences, and a tendency to overuse the word ‘‘thingy” to refer to numerous elements of a Lego construction that they built cooperatively. Phonology and Oral–Motor Performance The Oral Speech Mechanism Examination—Revised (OSMER) was used to examine the adequacy of Austin’s oral structures for speech production. His performance on that measure was well within the normal range, with no signs of incoordination or weakness and no observable abnormalities of the structures used in speech. Errors noted in the production of repeated syllables mirrored those in his conversational speech. On the Bankson–Bernthal Test of Phonology, Austin received a word inventory score, which reflects the number of words produced correctly, of 39, which corresponds to a Standard Score of 71 and a percentile rank of 3. Errors occurred primarily on medial or final consonants. Patterns of errors that occurred most frequently were final consonant deletion (omission of the final consonant in the word; e.g., “bat” becomes “ba”), cluster simplification (replacement or less of one or more elements of a consonant cluster; e.g., “clown” becomes “clo”), and fronting (replacement of a velar consonant by a more forward consonant; e.g., “gun” becomes “dun”). Efforts to elicit correct production of two consonants that had not been produced correctly up to that point (viz., k, g) were undertaken using a phonetic placement instructions and touchcues resulted in velar fricative approximations. Other sounds consistently in error were [s, z, r] and [l]. When the language sample discussed in the previous section was examined with regard to speech errors and intelligibility, very ,similar error patterns were observed and the percentage of understandable words out of all words spoken was determined to be 70%. Screening for Other Language and Speech Problems The conversational sample between Austin and his mother was also examined to screen for problems in pragmatics, voice, and fluency. Austin’s use of language and his ability to describe the plot of a movie he had recently seen with
Page 42 out his mother appeared appropriate for his age. His voice quality and pitch were normal. Fluency also appeared normal, although frequent repetitions and rewordings of sentences occurred in response to his mother’s verbal and nonverbl indications of having difficulty in understanding some of his utterances. Although Austin’s awareness of his communication difficulties is quite sophisticated in a child of his age, his facial expression and movements at times suggested significant frustration. Summary Austin appears to be a bright and sensitive 5yearold with no significant medical history, but a family history of communication difficulties. Today’s evaluation reveals normal hearing and language comprehension, as well as good conversational skills and normal voice and fluency. Austin’s difficulties in being understood are moderate to severe at this time and appear to reflect his difficulties in using sounds as expected for his age and in selecting and combining words to create grammatically acceptable sentences. His strong skills in other areas, support by family and school personnel, and clear motivation to improve his communication efforts suggest a very positive prognosis for change. Recommendations Austin is likely to benefit from speechlanguage intervention conducted in individual and group setting at his school, including inclass work conducted by his teacher in consultation with the school speechlanguage pathologist. Areas to be targeted include phonology, expressive vocabulary, and syntax. Specific goals should address (a) the phonological processes of final consonant deletion and fronting, (b) expressive vocabulary related to school activities, (c) the use of grammatical morphemes that are not currently used but should be pronounceable given his current phonological system, and (d) the development of strategies for dealing in a more relaxed way with listeners’ difficulties in under standing Austin’s speech. It was a pleasure to meet Austin and his family today and to have talked previously to others involved in his education and upbringing. We urge you to call with any questions you might have about this report or Austin’s ongoing development. Sincerely, E. Miller, B.A. R. J. Turner, M.S., CCCSLP Student clinician Supervisor
Page 43 Summary 1. Measurement is usually indirect, meaning that it involves the measurement of characteristics, sometimes called indicators, that are closely related to but different from the characteristic being described by the process of measurement. 2. The use of theoretical constructs, which are examined using various indicators, underlies clinical as well as research measurement. 3. Four levels of measurement, first proposed by S. S. Stevens (1951), are nominal, ordinal, interval, and ratio. These levels correspond to different methods of assigning measurements to characteristics, which have implications for the measurement’s appropriate interpretation and statistical study. 4. Measures at the nominal level, such as diagnostic labels or labels of error type, involve the assignment of measured individual performances or behaviors to mutually exclusive categories. Measurements at the ordinal levels, such as severity labels, also use mutually exclusive categories, but ones that can be ordered as demonstrating more or less of the measured characteristic. 5. Measures at the interval level, such as test scores reported in raw or standard scores, involve the assignment of numbered values to characteristics. This is the highest level of measurement usually attained in the behavioral sciences. 6. Often a theoretical construct can be measured using indicators falling at various levels of measurement. 7. Statistics are useful for gaining and summarizing information about groups of measurements, called distributions, as well as for testing hypotheses about the relationships between distributions or between an individual score and a distribution. 8. Two types of statistics used in summarizing distributions of measurements are measures of central tendency (e.g., mean, median, mode) and measures of variability (e.g., standard deviation, variance, range). Central tendency refers to the most typical values in a distribution, whereas variability refers to the tendency of values in the distribution to differ from one another. 9. In the measurement literature, correlation coefficients are the ones most used to describe the relationship between groups of measures, with the Pearson product– moment correlation coefficient achieving the greatest use. Correlation refers to the tendency for values of one distribution to be systematically related to values of another distribution. 10. Causal inferences cannot be made directly from observations of correlations: If variables A and B are related, it may because A caused B, B caused A, or that both are caused by a third variable or set of variables. 11. Normreferenced measures are interpreted through the comparison of a person’s performance to those of a relevant normative group—usually using a derived score that incorporates information relevant to the comparison. Criterionreferenced measures are interpreted through the comparison of a person’s performance to a performance standard—usually using a raw score.
Page 44 12. Derived scores consist of developmental scores (age and gradeequivalent scores), percentile ranks, and standard scores (e.g., z scores, T scores, deviation IQ scores). Percentile ranks are probably the most widely used among lay persons, whereas standard scores are preferred by most professionals. Although widely used, developmental scores are the least respected type of score among professionals because they encourage misunderstanding and are less reliable than other derived scores. Key Concepts and Terms ability testing: a systematic procedure for exploring learning potential. achievement testing: a systematic procedure for examining past learning. ageequivalent score: a derived score corresponding to the age group with the mean score that is closest to the raw score received by the individual test taker. behavioral objectives: a description of treatment goals in terms of client behaviors. clinical significance: the likely value of a particular research finding on the basis of the reliability of the finding (i.e., its statistical significance) and its magnitude. computerized tests: tests that involve the use of computer display, keyboarded responses, or both. correlation: the degree of relationship existing between two or more variables. criterionreferenced measure: a measure in which scores are interpreted in relation to a particular behavioral criterion; contrasts with normreferenced measure. developmental scores: a type of derived score in which development is taken into account, for example, ageequivalent and gradeequivalent scores. distribution: a group of scores, either theoretical or observed. formative indicators: indicators that are associated with a cause of a construct that is of interest. gradeequivalent scores: a derived score corresponding to the gradespecified group with the mean score that is closest to the raw score received by the individual test taker. indicator: an indirect object of measurement; something one measures in place of the characteristic one is really interested in because it is both related to the actual focus of interest and is more accessible to measurement. interval level of measurement: a level of measurement using mutually exclusive categories in which scores reflect a rank ordering of the characteristic being measured and the difference between adjacent scores is equal in size; for example, scores on a behavioral probe. local norms: summaries of the performance of a relevant group of individuals that are obtained, often when national norms are unavailable, for purposes of making a specific comparison between an individual test taker’s performance and those of that group.
Page 45 mean: a distribution’s arithmetic average. median: the middle score of a distribution. mode: the most frequently occurring score(s) of a distribution. national norms: summaries of the test performances of a large group of individuals against which a person’s performance can be compared; usually consisting of individuals with known demographic characteristics. nominal level of measurement: a level of measurement in which characteristics of an individual are assigned to mutually exclusive categories (e.g., boys and girls). nonparametric statistics: statistics that do not require assumptions about the nature of the underlying distribution from which observations are drawn. normal distribution: a theoretical distribution of scores or set of scores with known mathematical properties. normative group: a group whose performance is used in the comparison and interpretation of an individual’s score in normreferenced score interpretation. normreferenced measure: a measure in which scores are interpreted in relation to the performance of a normative group; contrasts with criterionreferenced. norms: data concerning the distribution of scores achieved by a normative group. operational definition: defining a variable through the operations used to measure it. ordinal level of measurement: a level of measurement using mutually exclusive categories that reflect a rank ordering of the characteristic being measured. paperandpencil tests: conventional testing in which printed test materials are completed independently by literate test takers. parametric statistics: statistics that require certain assumptions about the nature of the underlying distribution from which observations are drawn. percentile ranks (percentiles): derived scores representing the percentage of individuals performing at or below a given raw score. performance standard: a criterion against which an individual’s performance can be compared in criterionreferenced score interpretation. performance tests: tests to assess skills involving the manipulation of objects or that otherwise are difficult or impossible to assess using paper and pencil tests. range: one method of describing the variability of a distribution; the difference between the highest and lowest scores in the distribution. ratio level of measurement: a level of measurement using mutually exclusive categories (scores) in which the scores reflect a rank ordering of the characteristic being measured, the difference between adjacent scores is equal in size, and there is a real zero along the scale; for example, the time elapsing between presentation of a picture and name production. reflective indicators: indicators that are associated with the effects of a construct that is of interest.
Page 46 standard deviation: a method of describing the variability of a distribution of scores; the square root of the variance. standard scores: derived scores in which a transformation has been used to assure a predetermined mean and standard deviation, for example, a mean of 100 and standard deviation of 15. statistical significance: statistical evidence that an obtained value was unlikely to have occurred by chance. theoretical construct: a concept used in a specific way within a particular system of related concepts. theory: a system of related concepts, usually used to explain a variety of related data concerning a phenomenon of interest. variables: measurable characteristics that differ under different circumstances. variance: a method of describing the variability of a distribution; it consists of the mean of the squared distances of scores from the distribution mean Study Questions and Questions to Expand Your Thinking 1. Imagine that you are interested in measuring the ability of a child to understand the names of colors usually known by children of his or her age. Think of four different indicators for the construct of color name comprehension—two that are reflective and two that are formative. 2. Propose an indicator of spelling proficiency falling at each of the first three levels of measurement: nominal, ordinal, and interval. 3. Suppose that a measurement tool offers you two different normative groups against which to compare the performance of a child who speaks Korean as a first language and English as a second—one group consisting of children of similar ages with similar language histories to the child to be tested and one of children of similar ages with English as their only language. What would each comparison tell you about the child? 4. For each of the following measurement purposes, explain which type of score interpretation would be most suitable—norm or criterionreferencing: a. b. c. d. e.
identifying the poorest performance on a classroom test competency testing for graduation from high school testing for licensure in a profession, such as speechlanguage pathology national testing for scholastic aptitude (e.g., SAT’s or GRE’s) determining success of treatment aimed at improving a student’s correct use of selected verb forms
5. Find a newspaper article in which a behavioral measure is described. What construct appears to be measured? At what level is that measurement conducted? What measures of central tendency and variability would be appropriate for this measure?
Page 47 6. Find a newspaper article in which the relationship between two variables is described. Is a causal relationship between these variables implied? Does that interpretation seem warranted or can you imagine a different causal relationship between the variables? Describe it. 7. On the basis of your personal observation, describe two variables you believe have a positive correlation with one another, then two that have a negative correlation. 8. A 3yearold child receives a test score on a normreferenced test that falls at the 35th percentile and yields an age equivalent score of 2 years, 8 months. Explain the meaning of those scores as if you were talking to a very worried parent. 9. The parent of a highachieving 10yearold girl tells you that her daughter has been tested by a neighbor who is studying psychology and achieved a standard score of 100 on an intelligence test. She wonders if that doesn’t mean that her child’s perfect score suggests that she is a genius who should skip several grades. What would you tell her about her child’s performance? (This is tricky. Consider both the fact that you didn’t obtain this information directly as well as the meaning of standard scores.) 10. Pretend that you have devised a test to determine students’ mastery of the content covered in this chapter. How might you determine an appropriate cutting score? (No, the answer to this is not in the book up to this point. Think creatively). Recommended Readings Gould, S. J. (1981). The mismeasure of man. New York: Norton. Sattler, J. M. (1988). Assessment of children (3rd ed.). San Diego: Author. References Allen, M. J., & Yen, W. M. (1979). Introduction to measurement theory. Monterey, CA: Brooks/Cole Publishing. American Psychological Association, American Educational Research Association, & National Council on Measurement in Education. (1985). Standards for educational and psychological testing. Washington, DC: American Psychological Association. Anatasi, A. (1982). Psychological testing (5th ed.). New York: Macmillan. Badian, N. (1993). Phonemic awareness, naming and visual symbol processing and reading. Reading and Writing, 5, 87–100. Bankson, N. W., & Bernthal, J. E. (1990). Bankson–Bernthal Test of Phonology. Chicago: Riverside. Bernthal, J. E., & Bankson, N. W. (1998). Articulation and phonological disorders (4th ed.). Englewood Cliffs, NJ: PrenticeHall. Bradley, L., & Bryant, P. (1983). Categorizing sounds and learning to read: A causal connection. Nature, 301, 419–421. Bridgman, P. W. (1927). The logic of modern physics. New York: Macmillan. Bunderson, C. V., Inouye, D. K. & Olsen, J. B. (1989). The four generations of computerized educational measurement. In R. L. Linn (Ed.), Educational measurement (3rd ed., pp. 409–429). New York: National Council on Measurement in Education and American Council on Education. Chial, M. R. (1988). Utility inferential statistics. In D. Yoder & R. D. Kent (Eds.), Decision making in speechlanguage pathology (pp. 198–201). Toronto: B. C. Decker. Conover, W. M. (1998). Practical nonparametric statistics (3rd ed.). New York: Wiley.
Page 48 Culatta, B., Page., J. L., & Ellis, J. (1983). Story retelling as a communicative performance screening tool. Language, Speech, and Hearing Services in Schools, 14, 66–74. Francis, D. J., Fletcher, J. M., Shaywitz, B. A., Shaywitz, S. E., & Rourke, B. P. (1996). Defining learning and language disabilities: Conceptual and psychometric issues with the use of IQ tests. Language, Speech, and Hearing Services in Schools, 27, 132–143. Freedman, D., Pisani, R., & Purves, R. (1998). Statistics (3rd ed). New York: Norton. Gardner, M. F. (1990). Expressive OneWord Picture Vocabulary Test—Revised. Novato, CA: Academic Therapy. Gibbons, J. D. (1993). Nonparametric statistics: An introduction. Newbury Park, CA: Sage. Gould, S. J. (1981). The mismeasure of man. New York: Norton. Gronlund, N. (1982). Constructing achievement tests (3rd ed.). Englewood Cliffs, NJ: PrenticeHall. Kerlinger, F. N. (1973). Foundations of behavioral research. (2nd ed.) New York: Holt, Rinehart & Winston. Lahey, M. (1988). Language disorders and language development. New York: Macmillan. McCauley, R. J. (1996). Familiar strangers: Criterionreferenced measures in communication disorders. Language, Speech, and Hearing Services in Schools, 27, 122–131. McCauley, R. J., & Swisher, L. (1984). Use and misuse of normreferenced tests in clinical assessment: A hypothetical case. Journal of Speech and Hearing Disorders, 49, 338–348. McClave, J. T. (1995). A first course in statistics (5th ed.). Englewood Cliffs, NJ: PrenticeHall. Newcomer, P. L., & Hammill, D. D. (1991). Test of Languge Develoment—2 Primary. Austin, TX: ProEd. Pedhazur, R. J., & Schmelkin, L. P. (1991). Measurement, design, and analysis: An integrated approach. Hillsdale, NJ: Lawrence Erlbaum Associates. Salvia, J., & Ysseldyke, J. E. (1995). Assessment (6th ed.). Boston: Houghton Mifflin. Sattler, J. M. (1988). Assessment of children (3rd ed.). San Diego, CA: Author. St. Louis, K. O. & Ruscello, D. (1987). Oral Speech Mechanism Screening Examination—Revised. Baltimore: University Park Press. Stevens, S. S. (1951). Mathematics, measurement, and psychophysics. In S. S. Stevens (Ed.), Handbook of experimental psychology (pp. 1–49). New York: Wiley. Wechsler, D. (1974). Manual for the Wechsler Intelligence Scale for ChildrenRevised. San Antonio: The Psychological Corporation. Wiig, E. S., Jones, S. S., & Wiig, E. D. (1996). Computerbased assessment of word knowledge in teens with learning disabilities. Language, Speech, and Hearing Services in Schools, 27, 21–28. Williams, F. (1979). Reasoning with statistics (2nd ed.). New York: Holt, Rinehart & Winston. Young, M. A. (1993). Supplementing tests of statistical significance: Variation accounted for. Journal of Speech and Hearing Research, 36, 644–656.
Page 49 CHAPTER
3 Validity and Reliability Historical Background Validity Reliability Historical Background The historic roots of behavioral measurement can be traced to tests used in the third century B.C. by the Chinese military for the purpose of identifying officers worthy of promotion (Nitko, 1983). Despite such early beginnings, however, widespread interest in measurement for purposes such as helping children has far more recent origins, beginning at the close of the 19th century. Not surprisingly, therefore, there are many threads of thought leading to the diversity of instruments and procedures now being used to describe and make decisions about people. During the 20th century, perspectives on how to develop and use measures such as those used to help children with developmental language disorders have come from education, psychology, and—most recently—speechlanguage pathology. Over this relatively brief period of time, professional and academic organizations in these fields have taken on the responsibility of developing standards of test development and use. These efforts have primarily focused on tests, where test is defined as a behavioral measure in which a structured sample of behavior is obtained under conditions in which the tested individual is expected (or at least has been instructed) to do his or
Page 50 her best1 (APA, AERA, & NCME, 1985). Despite a focus on tests in this narrow sense, such standards have always been meant to apply to all behavioral measures—although they apply to a greater or lesser extent depending on the specific characteristics of the measure. Most notable among efforts to provide guidance to test developers and users have been those of the APA, AERA, and NCME. In 1966, after two earlier sets of testing standards (APA, 1954; National Education Association, 1955), the three organizations worked together to create a single document, Standards for Educational and Psychological Tests and Manuals, which has gone through two revisions. The most recent revision was renamed Standards for Educational and Psychological Testing (AERA, APA, & NCME, 1985). The frequent revision of these standards reflects the brisk pace of research and ongoing discussion about behavioral measurement. One particularly important transition occurring within the past two decades is reflected in the change of title from Standards for… Tests to Standards for… Testing. This change emphasizes the centrality of the test user in measurement quality. Earlier editions focused on ways in which test developers could demonstrate the quality of their instruments. Far less attention was paid to issues related to actual test administration and interpretation. In fact, whereas 75% of the 1974 version related to test standards, only 25% of it related to standards of test use. In the most recent version, there has been almost a reversal in those percentages: about 60% relates to test use versus 40% to test standards. This shift is consistent with the most influential work conducted in the last decade in which test users are asked to consider not simply the technical adequacy of methods used to derive specific test scores, but also the impact their decisions will have (Messick, 1989). Not surprisingly, the term ethics has cropped up frequently in the course of these discussions. It will surface frequently in this text as well. Beginning with this chapter, I hope that readers will adopt a perspective similar to that set by the APA, AERA, and NCME (1985). Specifically, I hope that you will consider measurement quality in developmental language disorders as a arena in which many elements come into play, but in which you are the lion tamer, the person who remains expertly in charge of a potentially dangerous situation. In this chapter and the one that follows it, I focus on how best to select appropriate measures once you have a fairly specific application in mind. Chapters in Part II focus on those specific applications commonly faced by clinicians who work with children who have developmental language disorders. Those chapters will figure prominently in helping you learn to tailor your measurements to the specific purposes you have in mind—a key lesson for those interested in providing their clients with the best possible care. The remainder of this chapter is intended to introduce you to validity and reliability, two concepts that invariably dominate discussions of measurement quality. Validity is by far the most central of the two terms. It even might be said that any discussion of measurement quality is automatically a discussion of validity. Reliability is of 1 This assumption is probably not well founded for many children with language disorders, who may be unable to understand what it means to “do one’s best” or who may be unwilling to do it. I return to this issue at numerous points throughout this book.
Page 51 lesser importance but is still vital. Its secondary place derives from its role as prerequisite for, but not sole determinant of, validity. Validity Validity can be defined as the extent to which a measure measures what it is being used to measure. So, you might ask, what’s all of the fuss about? Despite its seeming simplicity, however, the concept of validity has a number of subtle nuances that can be difficult to grasp for even the most seasoned users of behavioral measures. Several misconceptions are evident when a test user or developer says sweepingly that a given test is a valid test. First, this kind of statement about a measure suggests that it somehow possesses validity, independent of its use for a particular purpose. Second, it suggests that validity is an allornothing proposition. Both of those suggestions are untrue, however. What can safely be said about a given measure is that it seems to have a certain level of validity to answer a specific question regarding a specific individual. However, even reaching that lessthandefinitivesounding conclusion requires considerable work on the part of the clinician. To explore the general concept of validity a little more fully, consider a specific, widely used measure—the Peabody Picture Vocabulary’ Test–III (Dunn & Dunn, 1997). That measure was developed for the purpose of examining receptive vocabulary in a wide variety of individuals using a task in which a single word is spoken by the test giver and the test taker points to one picture (from a set of four) to which the word corresponds. Despite the exceptionally detailed development undergone by the PPVTIII, it is nonetheless quite easy to imagine situations in which its use could lead to highly invalid conclusions and, thus, for which its validity could be questioned. For example, using the PPVTIII to reach conclusions about a test taker’s artistic talent or about the vocabulary of someone who does not speak English represent gross examples of how misapplication undermines validity. One can also imagine—or simply observe—less obvious yet similarly problematic applications of the PPVTIII. For example, the PPVTIII might be used to draw conclusions about overall receptive language, rather than receptive vocabulary only. It might also be used to examine the receptive vocabulary skills of an individual or group lacking much previous exposure to many vocabulary items pictured in the exam. In each of these cases, the validity of the test’s use would be adversely affected, although probably not to the degree of the first, extreme examples. Thus, these latter examples illustrate the continuous nature of validity by showing that a measure can be less valid than if it were used appropriately, but more valid than if wildly misused. These last two examples are also poignant because they aren’t just hypothetical examples, but actual ones that readily occur if a clinician is careless or naive about the concept of validity. As another way of thinking about these problems in validity, consider two questions: (a) Is something other than the intended construct actually being measured by the indicator (the test)? and (b) Does the indicator reflect its target construct in such a limited way that much of the meaning of the construct is lost? Affirmative answers to either or both of those questions chip away at the value of the indicator as a means
Page 52 of measuring the intended construct and, by definition, chip away at the measure’s validity. Thus, when the PPVTIII is used as a measure of receptive language as a whole, the construct of receptive language is greatly impoverished, hence one can conclude that reduced validity is a strong risk. On the other hand, it may be used to measure vocabulary skills in individuals who have not had much exposure to the vocabulary. Then it may become a measure of exposure to the vocabulary rather learning of the vocabulary, thus reducing the measure’s validity because the test would not be measuring what it was supposed to measure. Given the continuous nature of validity and the considerable specificity with which it must be demonstrated, how does one ascertain that a measure is valid enough to warrant use for a particular purpose? In the next section I outline methods that are used by test developers and other researchers to provide support of a general nature—that is, suggesting broad parameters associated with its useful application. Methods used by test users to evaluate that support in terms of a specific application are described in the next chapter. Ways of Examining Validity
The methods used to demonstrate that a measure is likely to prove valid for a general purpose (such as identifying a problem area or monitoring learning) have grown in number and sophistication over the years. Although the methods are highly interrelated, they are nonetheless characterized as falling into three categories: construct validation, content validation, and criterionrelated validation. These three categories are ordered beginning with the most important. Construct Validation
Construct validation refers to the accumulation of evidence showing that a measure relates in predicted ways to the construct it is being used to measure—that is, to show that it is an effective indicator of that construct. A wide variety of evidence falls into this category, including evidence that is described as content or criterion related in the sections that follow. If that seems confusing to you at first, you are not alone; the theoretical centrality of construct validity has only recently been recognized. Until that time, validity was usually conveyed as composed of three parts rather than as a unity. Figure 3.1 portrays the relationship between the three types of validity evidence. It also conveys the two meanings of construct validity—(a) as a cover term for all types of validity evidence and (b) as a term used to refer to several methods of validation that are not seen as fitting under either content or criterionrelated validation techniques. The underlying similarity of methods uniquely defined as demonstrating construct validity can perhaps best be seen through a discussion of the earliest stages in measurement development. When approaching the development of a behavioral measure, the developer considers how the construct to be measured (such as receptive vocabulary) is related to other behavioral constructs and events in the world (such as age, gender, other abilities). Also considered at this stage are possible indicators (such as pointing at named pictures or acting out named actions) that might reasonably be used
Page 53
Fig. 3.1. A graphic analogy illustrating the different kinds of evidence of validity. to obtain information about the construct and thereby serve as the basis for the measure. For example, in the case of receptive vocabulary as a possible construct, the test developer begins with a scientific knowledge base that supports expectations about how receptive vocabulary is affected by phenomena such age and gender. That knowledge base also generates expectations about how the construct is related to other behavioral constructs such as expressive language development and hearing ability. From this knowledge base, the developer formulates predictions about how a valid indicator, or measure, will be affected by such phenomena and how such a valid indicator will be related to other constructs. Evidence suggesting that the measure acts as predicted supports claims of construct validity. Four specific methods of construct validation are discussed in upcoming paragraphs—developmental studies, contrasting group studies, factor analytic studies, and convergentdiscriminant validation studies. For many measures used with children, two kinds of studies are frequently used to provide evidence of construct validity—developmental studies (sometimes called age differentiation studies) and studies in which groups who are believed to differ in relation to the construct are contrasted with one another (sometimes called group differentiation studies). Table 3.1 provides an example of the description provided for each of these types of study. The specific examples used here are not considered to be the most thorough nor the most sophisticated possible examples. Instead they are meant to help you anticipate the way such studies are described in test manuals. The developmental method of construct validation is based on the general expectation that language and many related skills of interest increase with age. The
Page 54 Table 3.1 Examples of Test Manual Descriptions of Two Types of Construct Validation Studies Type of study
Description
Developmental studies
‘‘Correlational methods were used to determine if performance on the TWF [Test of Word Finding] changes with age. Using the Pearson productmoment correlation procedure, TWF accuracy scores (scale scores generated from the Rasch analyses) were correlated with the chronological age of the 1,200 normal subjects in the standardization sample…. All coefficients were statistically significant and of a sufficient magnitude to support the construct validity of the TWF as a measure of expressive language for both boys and girls and of children of different ethnic and racial background. Comparison of accuracy scores at each grade level also reflected developmental trends as the accuracy scores of the normal subjects in the standardization sample increased across grades…. These findings, which support grade differentiation by the TWF for all but one grade, are a further indication of developmental trends in test performances on the TWF.” (German, 1986, p. 5)
Contrasting group studies
“In order to test the capacity of the TELD [Test of Early Language Development] to distinguish between groups known to differ in communication ability, we administered the TELD to seventeen children who were diagnosed as ‘communication disordered’ cases. No children with apparent hearing losses were included in the group. Eighty percent of the children were white males; they ranged in age from three to six and a half. In socioeconomic status, sixtyfour percent were middle class or above. All of the children attended school in Dallas, Texas. The mean Language Quotient (LQ) derived from the TELD for this group was 76. Since the TELD is built to conform to a distribution that has a mean of 100 and a standard deviation of 15, it is apparent that the observed 76 LQ represents a considerable departure from expectancy. It is a discrepancy that approaches two standard deviations from normal. These findings were taken as evidence supporting the TELD’s construct validity.” (Hresko, Reid, & Hammill, 1981, p. 15)
hypothesis tested in this type of validation study is that performance on the measure being studied will improve with age. As you probably recall from previous course work, developmental studies of this kind can take a couple of different forms—one (called a longitudinal study) compares the performances of a single group of children across time, and a second (called a crosssectional design) compares the performances of several groups of children, each group falling at a different age. Crosssectional studies are particularly popular among test developers, undoubtedly because the data needed to test the hypothesis are the same as those needed to provide norms. A second major type of construct validation study, which can be called the contrasting groups method of construct validation, tests the hypothesis that two or more groups of children will differ significantly in their performance on the targeted measure. Again, consider receptive vocabulary as the example. Obviously developing a test of receptive vocabulary for use with children only makes sense if you believe that there are some children whose performance falls so far below that of peers as to
Page 55 have significant negative consequences. For this type of measure, one might evaluate construct validity by finding groups of children who are thought to differ in their receptive vocabulary knowledge (e.g., children with a developmental language disorder vs. children without such a disorder). In this type of study, if the measure is a valid reflection of the construct, children who have been identified as differing in relation to the construct should also differ in their performance on the measure. See Table 3.1 for an example of a validation study of this type. A third category of construct validity study is identified through the use of a specific statistical technique—factor analysis. Factor analysis is less frequently used in speechlanguage pathology than it is in some other disciplines. For example, it has been used most extensively to study intelligence tests. Besides its value as a means of studying an already developed measure, factor analysis is frequently used in early stages of test development as an aid in selecting items from a pool of possible items. The term factor analysis describes a number of techniques used to examine the interrelationships of a set of variables and to explain those interrelationships through a smaller number of factors (Allen & Yen, 1979). Factor analysis assists researchers in the very difficult process of making sense of a large number of correlations, the most basic method for describing interrelationships (as described in chap. 2). In factor analytic studies, the original set of variables to be studied typically consists of a group’s performance on the target measure as well as a set of other measures—some of which tap a similar construct as the target measure. Although the concept of the factor does not exactly relate to a specific underlying construct, all measures related to a single construct are expected to be associated with a single factor. Therefore, construct validity would be demonstrated in this type of study when the target measure shares, or “loads on,” the same factor as measures for which validity with respect to a particular construct has already been demonstrated (Pedhazur & Schmelkin, 1991). A particularly sophisticated method proposed for studying construct validity exists in principle, is applied to measures developed for a variety of behavioral constructs, but is rarely applied in speech and language measures. That is the method known as convergent and discriminant validation (Campbell & Fiske, 1959), which is associated with a type of experimental design they called a multitrait–multimethod matrix. Because of the relative rarity of this approach for measures used with children who have language disorders, I do not discuss it in detail. However, because this method is sometimes used for measures you will be interested in, it is important to know that convergent validiation refers to demonstrations that a measure correlates significantly and highly with measures aimed at the same construct, but using different methods; discriminant validation refers to demonstrations that it does not correlate significantly and highly with measures targeting different constructs (Pedhazur & Schmelkin, 1991). An example from Anastasi (1988) may help make the ideas behind convergent and discriminant validation clearer: Correlation of a quantitative reasoning test with subsequent grades in a math course would be an example of convergent validation. For the same test, discriminant validity would be evidenced by a low and insignificant correlation with scores on a reading comprehension test, since reading ability is an irrelevant variable in a test designed to measure quantitative reasoning. (p. 156)
Page 56 In short, validity is supported in this approach through evidence that the measure under study is measuring what it is supposed to measure in a manner uncontaminated by its relationship to something else that it was not supposed to measure. In the context of their discussion of convergent and discriminant validation, Pedhazur and Schmelkin (1991) discussed a pair of fallacies that threaten researchers’ understanding of the evidence they obtain using this measure, but equally apply to thoughts about test selection. Cleverly, they have been termed the “jingle and jangle fallacies.” Jingle fallacies arise when one assumes that measures with similar names must tap similar constructs; whereas jangle fallacies arise when one assumes that measures with dissimilar names must tap dissimilar constructs. Obviously, close examination of actual content can help ward off the deluding effects of such thinking. Although I only discussed four methods of construct validation, many more methods are actually used, including those that have conventionally been identified in association with content and criterionrelated validation. Methods fitting under content and criterionrelated validation techniques are discussed next. These are typically viewed as more easily understood than construct validation. Content Validation
Content validation involves the demonstration that a measure’s content is consistent with the construct or constructs it is being used to measure. As with construct validity, the developer addresses concerns about content validity from the earliest stages of the measure’s development. Such concerns necessitate the use of a plan to guide the construction of the components of the measure (test items, in the case of standardized tests). The plan ensures that the components of the measure will provide sufficient coverage of various aspects of a construct (often called content coverage) while avoiding extraneous content unrelated to the construct (thus assuring content relevance). Later, content validity is evaluated directly, usually through the use of a panel of experts who evaluate the original plan and the extent to which it was effectively executed. Table 3.2 lists the basic steps involved in the development of standardized measures. Despite underlying similarities, the specific ways in which concerns regarding content validity affect the development process differ for normreferenced and criterion referenced measures. Before attempting a comparison of these differences, recall Table 3.2 Steps Involved in the Development of a Standardized Measure (Allen & Yen, 1979; Berk, 1984) Step
Test Development Activity
1
Plan the test
2
Write possible items
3
Conduct an item tryout
4
Conduct an item analysis
5
Develop interpretive base (norms or performance standards)
6
Collect additional validity and reliability data
Page 57 these two ideas from chapter 2: (a) content tends to be broadly sampled for normreferenced measures and extensively, almost exhaustively, sampled for criterion referenced measures; and (b) a person’s performance is interpreted in relation to the performance of a normative group for normreferenced measures and to a specific performance level for criterionreferenced measures. In the following sections, I describe the effect of these differences on content validation within the context of an explanation of procedures used in the development of normreferenced as well as criterionreferenced measures. The Development of NormReferenced Measures and Content Validity. For normreferenced tests, the development of the plan involves decisions about the number and complexity of constructs to be examined as well as the numbers and kinds of items to be used. Some tests attempt to take on only one construct (e.g., the PPVTIII Dunn & Dunn, 1997), whereas others address many or complex constructs and consequently are composed of numerous subtests (e.g., the Test of Language Development Intermediate–3 [Hammill & Newcomer, 1997]), in which the complex construct of language is viewed as composed of numerous simpler constructs involving various aspects of receptive and expressive language). Next, as many as 1.5 to 3 times as many items are written as are expected to be used in the final version of the test (Allen & Yen, 1979). Items are written with the goal of sampling evenly across the range of all possible items and providing a large enough pool of items that their effectiveness can be studied in the next steps of the test’s development: item tryout and item analysis. An item tryout is conducted using a large sample of individuals chosen to be as similar as possible to those for whom the test will ultimately be used. After the test is given to the sample, the performance of each item is studied using item analysis. This analysis tends to rely most heavily on information about the item’s difficulty and discrimination but can involve a variety of techniques (including factor analysis) intended to help the test developer arrive at a subset of the most valid items by throwing out or modifying unsatisfactory items. Item difficulty (p) is the number of persons answering the item correctly divided by the number of persons who took the item. It can be used to gauge whether an item is appropriate to the range of abilities characteristic of the target population. Obviously, if a test is passed by everyone (p = 1.0) or is failed by everyone (p = .0), it will not help you rank individuals relative to one another—the goal of a normreferenced measure. In fact, it is generally held that an item has the maximum ability to discriminate among test takers when it has a p value of .50. Normreferenced test developers are often encouraged to strive for items with difficulties falling between .30 and .70 as an acceptable range around .50 (Allen & Yen, 1979; Carver, 1974). Items that fall outside of this range are discarded or rewritten (because a difficult item may only be difficult because its wording is confusing). Item discrimination can be measured in several different ways, with item discrimination indexes and item–total test score point biserial correlations as the most popular methods. Item discrimination reflects the extent to which people tend to perform similarly on the item as they do on the test as a whole (Allen & Yen, 1979). It is gen
Page 58 erally thought that better items will be those for which there is a tendency for more positive performances on the item to be associated with more positive performance on the test as whole. Again, items that fail to perform in a desirable fashion are candidates for rewriting or exclusion. Once items are rewritten and a subsequent item analysis demonstrates a satisfactory report on the final body of items, the last step of the test construction process involves the collection of initial information about the instrument’s overall validity and reliability and the preparation of documentation concerning the instrument. Content validity comes in at this point in two ways. First, by reporting on the specific methods used in the steps I described, the test author is providing a potential test user with some evidence that the initial intended content of the test has been well translated into the final measure. Second, one type of data collected during the final step of test construction consists of expert evaluation of the development process and of the final fit between intended and actual content of the test. Table 3.3 provides two examples, showing how different test manuals describe this information. The Development of CriterionReferenced Measures and Content Validity. Criterionreferenced measures are constructed using steps similar to those previously described. However, numerous differences in methods and rationales distinguish the development of such measures from the development of norm referenced measures. To begin with, the initial plan used for a criterionreferenced measure tends to be more elaborate and detailed than that used in normreferenced measure construction (Glaser, 1963; Glaser & Klaus, 1962). Also, behavioral objectives, often hierarchically arranged, may be used as part of the plan, particularly when the measure is being developed to examine progress in the acquisition of a particular body of information or a particular skill (Allen & Yen, 1979). Nitko (1983) offered a detailed accounting of the sometimes very intricate plans used for criterionreferenced measures. The Testing and Measurement CloseUp in this chapter provides a very personal example from the life of one of the authors quoted most frequently on the topic of validity, Anne Anastasi, which reminds us of the difference in normreferenced and criterionreferenced tests. Once the plan has been finalized, items are written so that they address all aspects of the intended content. Although exhaustive is too strong a word (an exhaustive test of any construct worth knowing about would undoubtedly require several lifetimes), the extensiveness of item coverage is definitely in the direction of exhaustive when compared with that of normreferenced measures. Item tryouts and analyses offer another point at which major differences separate normreferenced from criterionreferenced instruments. For normreferenced measures, items are selected for their ability to discriminate across a range of abilities; for criterionreferenced measures, however, items are selected for their ability to discriminate between performance levels. Most commonly, dichotomous performance levels are used, such that items are selected for their ability to discriminate between performance showing mastery of a particular content versus that showing nonmastery. For that purpose, an ideal item’s difficulty would approximate zero for nonmasters and 1 for masters. One method used to tentatively identify masters and nonmasters
Page 59 Table 3.3 Examples of Two Types of CriterionRelated Validity Studies Test of Phonological Awareness (TOPA): “When the TOPA (Test of Phonological Awareness) was given to a sample of 100 children at the end of kindergarten, it was found to be significantly correlated with two other, relatively different measures of phonological awareness. The TOPAKindergarten scores were correlated with scores from a measure called sound isolation (a 15 item test requiring pronunciation of the first phoneme in words) at .66 and with a segmentation task (requiring children to produce all the phonemes in a three to fivephoneme word) at .47. Both of these other measures assessed analytical phonological awareness, although they required a more explicit level of awareness than did the TOPA.” (Torgesen & Bryant, 1994, p. 24) l Preschool Language Scale3 (PLS3): “A study of the relationship between PLS3 and CELFR [Clinical Evaluation of Language FunctionRevised (Semel, Wiig, & Secord, 1987)] was conducted with 58 children. The sample consisted of 25 males and 33 females ranging in age from 5 years to 6 years, 11 months (mean = 6 years, 0 months). The two tests were administered in counterbalanced order. The betweentest interval ranged from two days to two weeks, with an average of 4.5 days. Both tests were administered by the same examiner. Reported correlations were as follows: PLS3Auditory Comprehension with CELFR Receptive Composite (r = .69); PLS3Expressive Communication with CELFR Expressive Composite (r = .75); PLS3Total Language score with CELFR total Score (r = .82).” (Zimmerman, Steiner, & Pond, 1992, p. 95) l
Concurrent validity
Test of Phonological Awareness (TOPA): “When the TOPAKindergarten was given to 90 kindergarten children sampled from two elementary schools serving primarily low socioeconomic status and racial minority children, its correlation with a measure of alphabetic reading skill (the Word Analysis subtest from the Woodcock Reading Mastery Test) at the end of first grade was .62. Thus, between 30% to 40% of the variance in wordlevel reading skills in first grade was accounted for by the TOPA administered in kindergarten.’’ (Torgesen & Bryant, 1994, p. 24) l ReceptiveExpressive Emergent Language Scale (REEL2): “In the first study investigating predictive validity, researchers at the University of Florida’s Emergent Language laboratory conducted a longitudinal study of 50 ‘normal’ infants from linguistically enriched environments. After repeated monthly testing over a 2 to 3year period, all infants were found to achieve mean average scores for Receptive Language Age (RLA) and Expressive Language Age (ELA), and Combined Language Age (CLA) at or about their chronological ages.” (Bzoch & League, 1992, p. 10) l
Predictive validity
has been to examine performances of an item tryout sample before and after instruction designed to produce mastery (Allen & Yen, 1979). In that context, better items are those in which p values show the greatest upward change. As was the case with normreferenced measures, the last step of the test construction for a criterionreferenced measure involves the collection of initial information about the instrument’s overall validity and reliability and the preparation of docu
Page 60 mentation concerning the instrument. Here, the effects on content validity are achieved using means similar to those used for normreferenced measures. In addition to providing descriptive evidence of the procedures used to develop the test’s content, test authors look to the results of expert evaluations of construction methods and final test content as a further source of content validation. TESTING AND MEASUREMENT CLOSEUP Anne Anastasi has been called one of “psychology’s leading women.” She was one of only five women (of a total of 96 psychologists) to be considered during the first eight decades of this century in a prominent series of books recording the history of psychology through autobiography (Stevens & Gardner, 1982). Although Anastasi has made contributions in a variety of areas in psychology, the reason that she is included here is because of her authorship of a classic text on psychological testing (Anastasi, 1954). That text has gone through seven editions, with the latest edition published in 1997. It has undoubtedly served as the source of more information on testing for psychologists and others than perhaps any other work, and in its latest edition, Anastasi (1997) again provided one of the clearest sources for essential information on validity and reliability. In the early 1980s, at the University of Arizona, I had the pleasure of hearing Anne Anastasi present a lecture, when she was in her 70s. Her black patent leather pocketbook was propped up in front of her on the podium as she spoke, its stiff handle almost obscuring the audience’s view of her white hair, thick hornrimmed glasses, and the bright eyes that lay behind them. I do not actually remember much about the details of her presentation, except that her speech was as clear as her writing and was presented without a single note. She was as impressive in person as she had been on the page. The following passage from her autobiography breathes life into two very different ideas from this chapter. First, it shows the possibly traumatizing effect that the process of assessment can have—even on a child whose biggest problem appears to have been her exceptional intelligence. Second it revisits the distinction between normreferenced and criterionreferenced (or as she calls it here, contentreferenced) score interpretation. “Throughout my schooling, I retained a deeprooted notion that any grade short of 100 percent was unsatisfactory. At one time I actually believed that a single error meant a failing score. I recall a spelling test in 4B, in which we wrote ten words from dictation. I was unable to hear one of the words properly, because the subway had just roared past the window (it was elevated in that area). The word was “friend,” but I heard it as “brand.” As a result, the item was marked wrong and my grade was only 90%. That evening when I told my mother about it, she consoled me and advised me to raise my hand at the time and tell the teacher, if anything like that should happen again. But she did not disabuse me of the notion that anything short of a perfect score was a failure. I eventual
Page 61 ly discovered for myself that one could pass despite a few errors; but I always felt personally uncomfortable with the idea. There seemed to be some logical fallacy in calling a performance satisfactory when it contained errors. I was apparently following a contentreferenced rather than a normreferenced approach to performance evaluation.’’ (Anastasi, 1980, p. 7–8). Face Validity. One further topic regarding content validity that demands attention is not really a matter of true validity at all, despite its being termed face validity. Face validity is the superficial appearance of validity to a casual observer. Face validity is considered a potentially dangerous notion if a test user mistakenly assumes that a cursory evaluation of a measure for its face validity constitutes sufficient evidence to warrant its adoption. Nonetheless, face validity can play a role in a test’s actual validity; for example, poor face validity may cause a test taker to discount the importance of a measure and thereby undermine its ability to function as intended. In summary, the kind of evidence provided for normreferenced versus criterionreferenced measures differs. However, content validation for both types of measures is achieved through the author’s careful planning, execution, and reporting of the measure’s development and through the positive evaluation of this process by experts in the content being addressed. CriterionRelated Validation
Criterionrelated validation refers to the accumulation of evidence that the measure being validated is related to another measure—a criterion—where the criterion is assumed to have been shown to be a valid indicator of the targeted construct. Putting this in primitive terms, criterionrelated validation involves looking to see if your “duck” acts like a “duck.” This explanation derives from famous streetwise logic in which anything that looks like a duck, walks like a duck, and quacks like a duck is determined to be a duck. Thus, as you set out to validate your measure (Duck 1), you search around for a duck (Duck 2, a.k.a. Criterion Duck) that everyone acknowledges is indeed a true duck (i.e., a valid indicator of the underlying construct). Then you put your ducks through their paces to see to what extent they act similarly. The greater their similarities, the better the evidence that they share a common “duckness.” And then, voilà: You have evidence of criterionrelated validity! In case I lost you there, the way that criterionrelated validation works for a behavioral measure is that one obtains evidence by finding a strong, usually positive correlation between the target measure and a criterion. The choice of the criterion is crucial because of the assumption that the criterion has high validity itself. It can also be problematic because for many constructs it may be difficult to find a criterion that can claim such an exalted status. Two types of criterionrelated validity studies are typically described: concurrent and predictive. Predictive validity is most relevant when the measure under study will be used to predict future performance in some area. For example, the Predictive
Page 62 Screening Test of Articulation (PSTA; Van Riper & Erickson, 1969) was intended to predict whether a child tested at the beginning of first grade would still be considered impaired in phonologic performance 2 years later. Consequently, this type of evidence was important in demonstrating that the test would measure what it was supposed to measure. In that particular case, the test developers used as the criterion measure the researcher’s judgments of normal articulation versus continued articulatory errors based on a simple phonetic inventory and on samples of spontaneous connected speech obtained 2 years after initial testing with the PSTA. A study of concurrent validity is performed when the criterion and target measures are studied simultaneously in a group of individuals like those for whom the test will generally be used. It is by far the more common type of criterionrelated validity study. See Table 3.3 for an example of this type of validity study. For both predictive and concurrent studies of criterionrelated validity, the resulting correlation coefficient is often termed a validity coefficient, or more specifically, as a predictive or concurrent validity coefficient, respectively. Interpretation of such coefficients is essentially the same as that described for correlations in chapter 2. However, one factor in the interpretation of validity coefficients that was not addressed previously concerns how high a correlation has to be for one to consider it credible support of a measure’s valid use for a particular purpose. The Standards for Educational and Psychological Testing (APA, AERA, & NCME, 1985) does not provide direct guidance on this question. However, several experts recommend that when a measure is going to be used to make decisions about an individual (rather than as a way to summarize a group’s performance), a standard of .90 should be used. As an additional proviso, the correlation coefficient should also be found to be statistically significant (Anastasi, 1988). Factors Affecting Validity
Anything that causes a measure to be sensitive to factors other than the targeted construct will diminish the measure’s validity. For example, a bathroom scale that becomes sensitive to room temperature or humidity is likely to be less valid as an indicator of how much damage one has done after a series of holiday meals. In this section of chapter, I consider factors affecting the validity of behavioral measures such as those used with children—first considering two factors over which the clinician has considerable direct control, then two factors over which the clinician’s control is far less direct. Selection of an Appropriate Measure
As mentioned at the beginning of this chapter, probably the biggest factor affecting the validity of decisions made using a particular measure is the suitability of the match between the specific testing purpose and the demonstrated qualities of the measure to be used. The majority of information described thus far relates to activities performed by the developer of a standardized measure. Still to be discussed is how test users make use of that information to do their rather large part in assuring the validity of their own test use. For the moment, it is sufficient that you be aware that your role is critical in assuring testing validity and that it begins with a thorough evaluation of information provided by the
Page 63 test developer, test reviewers, and the clinical literature in light of your client’s needs. Specific steps leading to such an evaluation are described in the next chapter. Administration of the Measure
After successful selection of a measure, the clinician plays a critical role in assuring validity through its skilled and accurate administration. Unless a measure is administered in a manner consistent with the methods used in developing the measure’s norms and testing its reliability and validity, any comparison of the resulting performance against either norms or performance standards becomes distorted, even nonsensical. Thus for example, the directions supplied, with a test may indicate that orally presented items are to be read aloud only once. In that case, the difficulty of that test will probably be lessened if the test user decides that it’s only “fair” to the child to give a second chance to hear the information included in the item. In reality, however, it is decidedly “unfair” to the child if the test is being used to provide information about how that child’s performance compares with a standard that was determined under different conditions. Skilled administration of standardized measures, however, goes well beyond the preservation of idealized conditions. It also facilitates a crucial but sometimes overlooked function of a testing situation—that is, the establishment of a trusting, potentially helpful relationship between the clinician and the child being tested. If test administration goes well, the child comes away from the experience with a sense that the test giver likes the child and is a rewarding person with whom to interact. If it does not, not only will the test data be compromised, but the child may develop expectations of the test giver that will be difficult to overcome. Indeed, some researchers (Maynard & Marlaire, 1999; Stillman, Snow, & Warren, 1999) who examine the testing process in detail note that far too little attention is paid to the collaborative nature of testing, in which the examiner is not a passive conduit of items but an integral participant in the testing outcome. Table 3.4 lists some suggestions gleaned from several years of clinical experience (my own and others’) concerning how to facilitate testing. Client Factors
Client factors are such a key feature to valid testing of children that it seems worth discussing them under a separate heading. Of particular interest are motivation and what Salvia and Ysseldyke (1995) called enabling behaviors and knowledge. Motivation affects the performance of adults and children in often dramatic ways. Although the topic of motivation has been the impetus for extensive research in several disciplines, you can quickly appreciate the devastating impact of low motivation by looking back over your own experiences and remembering an occasion when a classroom quiz or test fell at a time when you were preoccupied by other things happening in your life, or perhaps a time when you “psyched yourself out,’’ thereby seemingly necessitating the fulfillment of a prophecy of failure. For me, the experience that comes to mind is a midterm examination I took in college. I had found an unconscious but still breathing mockingbird on my way to the exam. Consequently, during the examination, I spent much more of my time wondering whether the bird would still
Page 64 Table 3.4 Testing Recommendations Things to Consider When Testing Children 1.
Remember that children rarely have much sophistication in testtaking skills. They expect your relationship with them to be based on the same rules that apply to interactions in other situations. Therefore it is your responsibility to honor their expectations and find ways to achieve your goals within that context.
2.
Children’s efforts to achieve their best for you will be built on the expectation that you and they are out to please each other in the interaction. You want to be accepted by the child as a rewarding, appreciative adult who is generally fun to be with.
3.
For older children, you need to strive for a balance in which you are in control as much as you need to be to have your questions answered and the child is in control as much as possible otherwise. For example, it is important that you maintain control over your test materials, are relatively firm when you make a request that is a necessary part of the testing process, and only offer choices where they are truly available (e.g., avoid asking questions such as the following if they are not true offers: “Do you want to look at some pictures with me now? ”).
4.
Help children cooperate by informing them about the content, order, and time frames associated with various assessment tasks. Toward this end, consider doing the following: (a) Whenever possible, allow the child to make choices in ordering activities, and (b) devise a method to let children know how much more is required of them. For older children, you can use a list where each completed item is checked off or rewarded with a sticker. For younger or less sophisticated children, you can use tokens equaling the number of activities, which are removed from sight or moved to a different location as each activity is finished.
be alive when I finished it and where I could get help for it if it were still alive, than I spent actively focused on the outcome of the examination. With predictable results. (Sadly, the bird fared no better than my exam grade.) Motivation is particularly critical for measures that are intended to elicit one’s best effort. One variety of such measures are those in which clients are assumed to be doing their best under conditions stressing accuracy, speed of execution, or both. These are sometimes called maximal performance measures. Common examples of maximal performance measures in childhood language disorders include measurement of language functions in which responses are timed as well as a variety of speech production measures, including diadochokinetic rate. In a discussion of such measures used to study speech production, Kent, Kent and Rosenbek (1987) cautioned that extreme care should be taken before concluding that a test taker is fully aware and motivated and therefore likely to produce a performance that can reasonably be compared with norms or behavioral standards. The need for caution is particularly great for younger children and for children with either Down syndrome or autism, but it should always be a concern for any child. Whereas the level of concern should be greatest for maximal performance testing, any testing of a child will be subject to reduced validity if the child is uninterested or overly anxious. Enabling behaviors and knowledge are defined by Salvia and Ysseldyke (1995) as “skills and facts that a person must rely on to demonstrate a target behavior or knowledge.” If an assumed enabling behavior is absent or diminished, performance on the
Page 65 measure may no longer be associated with the behavior under study; hence its validity is threatened dramatically. Enabling behaviors that are frequently assumed in children’s language tests include adequate vision, hearing, motor skill, and understanding of the dialect in which the test is constructed. In fact, although I discussed it earlier as a separate category, positive motivation to participate in assessment is a frequently assumed enabling behavior. Reliability
Reliability, or consistency in measurement, is invariably listed as a major factor affecting validity because it is a necessary condition for validity, meaning that a measure can only be valid if it is also reliable. Reliability does not assure validity, however. Figure 3.2 illustrates this relationship between reliability and validity using archery as an analogy. Target number 1 demonstrates the handiwork of an archer whose aim is both reliable and valid; number 2, an archer whose aim is reliable, but not valid; and number 3, an archer whose aim is neither reliable nor valid. In behavioral measurement, the use of measures with degrees of reliability and validity similar to that shown in targets 2 and 3 will have similarly negative outcomes, although unfortunately the outcomes may not be as obvious and, therefore, will be harder to detect— and, possibly, to rectify.
Fig. 3.2. A graphic analogy illustrating the relationship between reliability and validity.
Page 66 One point (no pun intended) made by Fig. 3.2 is that reliability limits how valid a measure can be; any loss of reliability represents a loss of validity. Thus, information about reliability can provide very important insight into the quality of a measure. To illustrate this problem in a more lively way, imagine the problems associated with an elastic and therefore unreliable ruler. Over repeated measurements of a single piece of wood with such a ruler, its user on each attempt might try desperately, even comically, to apply exactly the same outward pressures to the ruler—almost certainly in vain, with measurements of 5 inches one time, 6 inches the next, and so on. With such immediate feedback, the user of the measure would surely recognize the hopeless lack of validity in these measurements and would undoubtedly go looking for a better ruler. Unfortunately, when human behavior is being measured, even measures with reliability equivalent to that of an elastic ruler would not be so easily recognized. Thus, because of the importance of reliability, the next section of this chapter is devoted to a more detailed explanation of reliability—what it is and how it is studied. Reliability Reliability can be defined as the consistency of a measure across various conditions—such as conditions associated with changes in time, in the individual administering or scoring the measure, and even changes in the specific items it contains. If a measure is shown to be consistent in its results across these conditions, then its user can make inferences from performance under observed conditions to behaviors and skills shown in other, unobserved conditions. In short, acceptable reliability allows for generalization of findings obtained in the assessment situation to a broader array of reallife situations—those in which test users are really more interested. When the reliability of a measure is examined during the course of its construction, that information is frequently represented using another type of correlation coefficient called a reliability coefficient. Alternatively, more sophisticated statistical methods have been developed to examine the reliability of measures on the basis of an influential perspective called Generalizability theory (Cronbach, Gleser, Nanda, & Rajaratnam, 1972), which attempts to examine several sources of inconsistency simultaneously. These methods, however, are relatively recent and only infrequently applied in speech and language measures (Cordes, 1994). Another way for thinking about reliability is in terms of how it affects an individual score. The most popular framework guiding this perspective on reliability is sometimes described as the “classical psychometric theory’’ or the “classical truescore theory.” Although recent developments, including Generalizability theory, have eclipsed classical theory as the cutting edge of psychometrics (Fredericksen, Mislevy, & Bejar, 1993), classical theory nonetheless pervades much of the practical methods used by test developers and hence test users. Further, its continuing utility is praised even by those actively working along other lines (e.g., Mislevy, 1993). The most important assumption associated with classical truescore theory (Allen & Yen, 1979) is that an observed score (a score someone actually obtains) is the sum of the test taker’s true score plus some nonsystematic error. Thus, the true score is an
Page 67 idealization. It has alternatively been described as the score you would find if you had access to a crystal ball or as the mean score a test taker would achieve if tested infinitely. Notice that error and reliability are correlated in this perspective. Specifically they are inversely related: The larger the reliability, the smaller the error. Besides its historical value, this perspective on reliability is useful because it foreshadows our ability to apply reliability information obtained on a group to possible error in the observed score of an individual test taker, such as our client. When the reliability of a measure is expressed in relation to individual scores, that information is represented using a measure known as the standard error of measurement (SEM). Its mention here is meant to whet your appetite for further information, which is provided later under the heading Internal Consistency. Ways of Examining Reliability
Three types of reliability are of most frequent interest—test–retest reliability, internal consistency reliability, and interexaminer reliability. A fourth type of reliability, alternateforms reliability, is relatively infrequently used. The methods used to demonstrate such reliability with a particular group of test takers depend to some extent on whether it will be interpreted using a criterionreferenced or normreferenced approach. Whereas there is widespread agreement concerning the methods to be used to study the reliability of normreferenced measures, debate continues concerning the best methods to be used with criterionreferenced measures and whether methods traditionally developed for normreferenced measures can also be used with criterionreferenced measures (Gronlund, 1993; Nitko, 1983). I discuss reliability primarily from the traditional, or normreferenced, perspective, but note those points at which methods recommended for criterionreferenced measures depart from that perspective. Test–Retest Reliability
Test–retest reliability is studied in order to address concerns about a measure’s consistency over time. It is particularly important where the characteristic being measured is thought to remain relatively constant for at least shorter periods of time (such as 2 weeks to a month). Sometimes a distinction is made between examinations of reliability over periods of time under 2 months and those of reliability over longer periods of time, which is then termed stability (e.g., Watson, 1983). However, more common is a tendency for the terms test–retest reliability and stability to be used interchangeably. For normreferenced measures used with children with language impairments, test–retest reliability is typically studied by testing a group of children similar to those for whom the measure is intended on two occasions, usually no more than a month apart. A correlation coefficient, called a testretest reliability coefficient, is calculated to describe the relationship between the two sets of scores and is interpreted in a manner identical to that used for previous correlation coefficients, with increasing correlation size showing a greater degree of relatedness between the two sets of scores. For measures used with children, the test–retest interval is particularly crucial because rapid developmental changes are likely to affect whatever characteristic is being meas
Page 68 ured if the test–retest interval is too large. Thus it is imperative that test developers report the size of that interval over which test–retest reliability is calculated. Only rarely will a measure be examined for test–retest reliability over an interval longer than a month. One limitation of test–retest reliability coefficients is their susceptibility to carryover effects where the first testing affects the second. Depending on the nature of the carryover, the apparent reliability of a measure for use in a onetime testing situation (the most typical application) might be either inflated or deflated (Allen & Yen, 1979). For example, practice effects might make the test easier on the second testing, causing answers to change from the first to second testing that would result in a reliability coefficient that is smaller than it would be if carryover had not occurred. On the other hand, test takers may remember their answers from the first testing and simply repeat them on the second, resulting in a reliability coefficient that is larger than it would be if carryover had not occurred. Because of this, test developers will sometimes adopt methods other than the straightforward test–retest method, choosing to use alternateforms retesting methods to supplement or sometimes even replace test–retest data. Many measures of considerable utility to speechlanguage pathologists working with children who have language impairments are not standardized tests for which reliability data are provided. Instead, they are informal measures devised for a limited purpose. For informal measures it is more common to discuss the concept of consistency under the heading of agreement. Thus, for example, it is possible to calculate test–retest agreement for an informal measure used by a single clinician. Figure 3.3 provides an example of an informal probe measure for which an agreement measure is calculated. Although this example uses two judges, analogous methods can be used to examine consistency for a single judge over time. In this example, the importance of agreement measures in giving you a sense of the consistency of measurement is highlighted when you notice that the two judges arrived at exactly the same percentage correct calculation for the client. However, they did so while agreeing about which words were correctly produced at a percentage almost equal to that predicted if their judgements were due to chance (50%)! A particularly popular alternative to the simple procedure I described is the Kappa coefficient (Fleiss, 1981; Hsu & Hsu, 1996), which addresses this problem of chance agreement. McReynolds and Kearns (1983) are an especially helpful resource for those interested in a more thorough description of agreement measures. Yet another resource for those interested in a detailed discussion of the meaning and relative merit of such measures can be found in Cordes (1994). Internal Consistency
Internal consistency is studied in order to address concerns about a measure’s consistency of content. It is primarily of interest in cases where a test or subtest has items that are assumed to function similarly. Obtaining information about internal consistency for normreferenced measures presents few practical difficulties: The same information used to provide norms is used to study internal consistency. Thus, information about internal consistency is often provided, even if little else is. The most basic method for examining internal consistency involves the calculation of a splithalf reliability coefficient, where performances of a group of test takers like
Page 69
Fig. 3.3. An example showing how to calculate a pointtopoint measure of agreement. those for whom the measure is designed are compared for two halves of the measure. Although the measure may be split in half using a variety of strategies, most often even items are compared with odd items through the calculation of a correlation coefficient. Higher correlations are taken as evidence of internal consistency. A major problem with the splithalf method is that because you compare only onehalf of the test items with the other half, the amount of data used in the correlation coefficient is half what it should be. This has the effect of making the correlation coefficient smaller than it would otherwise be. Alternative methods have been developed to cope with this limitation. The two most important alternative measures of internal consistency encountered in tests for children are the Kuder–Richardson formula (KR20) and Coefficient alpha (α). KR20 (which, in case you’re curious about the name, was the twentieth formula used by Kuder & Richardson in a famous 1937 article—Kuder & Richardson, 1937) is used only for dichotomously scored items (e.g., those scored as 1 = right and 2 = wrong only). It cannot be used for items that are not scored dichotomously (e.g., those using a rating system from 1 to 4). This limitation led to the development of α. Coefficient alpha is a more general formula than KR20 and can handle both dichotomously and nondichotomously scored measures. KR20 and α are thought to be more sensitive than splithalf methods to homogeneity of item content, meaning the extent to which items are aimed at the same specific construct. Thus they are sometimes described as measures of test homogeneity.
Page 70 Near the beginning of this section on reliability I introduced the idea that reliability can be considered in terms of its impact on a given score using a statistic called the SEM. The SEM is discussed in greater detail at this point because it is usually based on a measure of internal consistency (possibly because of the easy availability of this type of reliability data, rather than for theoretical reasons). The formula for the SEM is relatively easy to understand and use. It is calculated by multiplying the standard deviation of the test by the square root of 1 minus the reliability coefficient. It represents the degree of error affecting an individual score. Recall that as reliability increases, the size of the SEM decreases: The more reliable a measure, the smaller the error affecting individual scores and the more precise the measurement. Thus, one can use the SEM directly as a means of determining which of two competing measures is more precise. For example, for a 4yearold child you may want to compare the SEM for two tests designed to address receptive vocabulary skills using very similar tasks. Although there are additional grounds on which you may want to compare the two tests, precision would be one important feature to consider in making a choice between them. Searching in their test manuals, you find that the SEM for the first test is 7 (for which the mean standard score is 100, SD = 15) and for the second test is 4 (for which the mean and standard score is 100, SD = 15). Thus, the second is the more precise of the two measures. (Although it is possible to make essentially the same comparison using the reliability coefficients for these measures, phrasing that comparison in terms of the SEM allows you to see much more vividly the impact on an individual score.) The SEM can also be used, along with information about the normal curve, to obtain a confidence interval around an obtained score—a concept I discuss more fully in chapter 9 as part of a larger discussion of test scores and identification decisions. Interexaminer Reliability
Interexaminer reliability is studied in order to address concerns about a measure’s consistency across examiners. Essentially, this form of reliability study addresses the question, Are different examiners likely to affect performance on the measure? Depending on the specific focus, it can be called by a variety of names: interscorer reliability, interobserver reliability, interjudge reliability, among others. The nature of the study depends on which aspects of the sequence of activities involved in administering, scoring, and interpreting the measure are expected to be most vulnerable to inconsistency. For example, if a measure involves a sophisticated perceptual judgment on the part of the examiner (such as the application of a 5 point rating scale), that aspect of the test’s use would be the primary focus of a reliability study. Alternatively, if the calculation of a measure’s total score depended on the calculation and correct recording of numerous sums, then that aspect of the test’s use would be a more important focus of study. Where possible during reliability studies, two testers are asked to perform the same function (e.g., scoring), either from tape (audio or video) or live, for a single group of test takers. Then the resulting scores are examined using a reliability coefficient. When the actual administration of items seems to provide a source of error, the same
Page 71 group of test takers may be tested by two testers. The results will be less clearcut in that case, however, because differences in the two testing times could be due to differences either in testers or in testing times (test–retest reliability). For informal measures, consistency across users of the measure is more commonly discussed in terms of agreement. For example, it is possible to calculate agreement for two examiners using a behavioral probe to examine performance within a specific treatment task. The methods are identical to those described in Fig. 3.3. AlternateForms Reliability
Alternateforms reliability is studied to address concerns about consistency across varying forms of the test. Multiple versions of a test, termed alternate or parallel forms, tend to be created when a test will probably be used on more than one occasion with an individual, thus making repeat testing subject to possible carryover effects. Alternate forms are created by selecting items for each form from a common pool of possible items. Alternateforms reliability is studied by administering one version, then another (balanced so that half of the test takers will take one version first and the other half will take the other version first), and then calculating a correlation coefficient for the resulting two sets of scores. Often the interval between testings is very short, and the correlation coefficient is thought to reflect only differences in the form used. If the interval is longer, however, the resulting correlation coefficient can be expected to reflect not only differences in content between the two forms, but also changes due to time. Therefore, information about the interval between testings should be reported as part of the test developer’s description of the study. Alternate, or parallel, forms are rarely provided for tests used with children who have developmental language problems. They are typically reserved for tests that are used with greater frequency, such as some educational and intelligence tests. Nonetheless, there are a small number of tests (e.g., the PPVTIII, Dunn & Dunn, 1997) that do provide this information, which is why it is considered here. Factors Affecting Reliability
Any factor that increases the likelihood that nonsystematic error will enter into the testing situation will, by definition, decrease a measure’s measured reliability. Consequently, any lack of similarity between testing conditions during a study of test–retest reliability or interexaminer reliability, for example, are likely to result in lower reliability coefficients. In addition, there are a couple of factors that may not be so obvious that will distort the magnitude of reliability coefficients. These are discussed further in a variety of sources, including Nitko (1983) and Gronlund (1993). First, the length of the measure used will affect the size of the reliability coefficient. In general, the longer a measure, the greater its reliability. This factor presents a significant challenge to those wishing to develop tests for test takers with shorter attention spans (e.g., children!). Second, the specific group on which reliability is studied may affect the size of the obtained reliability coefficient. One reason for this is a phenomenon known as restric
Page 72 tion of range. What that means is that when there is little variability in performance in a distribution of scores (the restricted range), the size of the correlation coefficient will be smaller than if the same pattern of variation were extended over a larger range of scores. Another reason for the possibility of specific groups affecting the size of reliability coefficients is that characteristics of one group may make it susceptible to error that does not affect a different group. Take the performance on an IQ test of one group with and one group without an identified learning disability. The ability of those two groups to perform consistently under the same conditions may not be the same, leading to differing results if reliability coefficients were to be calculated for each group. The danger would be, however, that rather than looking for evidence for each group separately, one would consider evidence about the group without an identified learning disability as sufficient for both groups. Here, as has been stressed before, the adequacy of evidence concerning reliability (and validity) needs to be considered in light of the specific circumstances (who is being measured and for what purpose) motivating the clinician’s search for an appropriate measure. In the next chapter, procedures are presented that are designed to help you learn how to evaluate individual measures within a clientoriented framework. Summary 1. Although behavioral measurement has relatively ancient roots, clinical and educational testing began only at the end of the 19th century. 2. The most influential standards developed for educational and psychological testing have been those of APA, AERA, and NCME (1985). These standards apply to all behavioral measures, but apply most strictly to standardized tests. 3. The test user is responsible for assuring that a specific measure is likely to provide the information being sought (i.e., that the measure is a valid measure for the purpose to which it will be put). 4. Because all evidence of validity depends on demonstrations that the measure captures the theoretical construct it was intended to assess, construct validity can be seen as the overarching framework of validation. As a result of historical factors, however, three types of evidence are typically discussed: construct validity, content validity, and criterionrelated validity. 5. Four specific methods of construct validation include the developmental method, the contrasting groups method, factor analytic studies, and studies of convergent and discriminant validity. 6. Content validation activities occur as part of the development process (e.g., documentation of the test plan, item analyses) and, following development, as part of validation activities. 7. Standardized measures designed for criterionreferenced interpretation and for normreferenced interpretation are developed using similar steps, but differ in the methods used to make decisions at each step.
Page 73 8. Face validity, or public relations validity as it is sometimes called, involves a measure’s appearance of validity rather than the degree of validity it will be shown to have on closer, systematic scrutiny. 9. Criterionrelated validity involves the collection of evidence suggesting that the target measure performs in a manner similar to that of an already validated criterion measure. Concurrent validity refers to criterionreferenced validation studies in which the criterion and the target measure are administered to the participant at the same point in time, whereas predictive validity refers to studies in which the target measure is obtained first and the criterion at a later time. 10. Validity is affected by appropriate measure selection, test administration conditions, reliability, and client factors such as motivation and enabling behaviors. 11. Reliability (consistency of measurement) places an upper limit on possible validity, but even perfect reliability does not ensure validity. Reliability is therefore said to be a necessary, but not sufficient condition for validity. 12. Studies of reliability usually target consistency across testing occasions (test–retest reliability), across subsets of test items (internal consistency reliability), and across testers (interexaminer reliability). For speechlanguage pathology and audiology measures, consistency across test versions (alternate form reliability) is much less frequently examined. 13. When reliability information is reported for a particular measure, reliability correlation coefficients are used most frequently. When such information is reported in relation to a specific score obtained by an individual, SEM is used. 14. When information about consistency is sought for informal measures, agreement measures are usually calculated. The most common measures of agreement are interexaminer and interexaminer agreement. 15. Classical true test score theory holds that the score actually received by an individual (the obtained score) is composed of error and the theoretical score the individual “should” receive (the true score). 16. SEM can be used to construct a confidence interval within which one can determine a high probability of finding the individual’s true score. 17. Reliability is affected by test length (with fewer items resulting in lower reliability) and by the range of abilities represented in the reliability subjects (with a smaller range of abilities resulting in lower reliability). Key Concepts and Terms construct validation: the accumulation of evidence showing that a measure relates in predicted ways to the construct it is being used to measure. content validation: the accumulation of evidence suggesting that the content included in a measure is relevant and representative of the range of behaviors fitting within the construct being measured.
Page 74 contrasting groups method of construct validation: the accumulation of evidence suggesting that groups known to differ in the extent to which the tested construct applies to them also differ in their performance on the target measure. convergent and discriminant validation: the accumulation of evidence suggesting that a measure correlates significantly and highly with measures aimed at the same construct (convergent validation) as well as evidence that the measure does not correlate significantly and highly with measures targeting different constructs (discriminant validation). criterionrelated validation: the accumulation of evidence suggesting that the measure performs in a manner similar to another measure (the criterion) that is believed to be a valid indicator of the construct under study, either where both criterion and measure are administered at one point in time (concurrent validation) or with the criterion measured at a later point in time than the target measure (predictive validation). developmental method of construct validation: the accumulation of evidence suggesting that performance on a measure changes with age (usually improves), when the measure is meant to target a construct that is thought to change with age. enabling behaviors and knowledge: behaviors not related to the construct under study that are nonetheless required for successful test performance (e.g., vision for tasks using visual stimuli, previous exposure to vocabulary being used). face validity: the appearance of validity of a measure, which is not necessarily reflective of its actual validity. factor analysis: a number of statistical procedures used during test development and construct validation to describe and confirm the relationships of a number of variables. informal measure: a measure developed for a limited measurement purpose for which a standardized measure was inappropriate or unavailable; for example, probes designed by speechlanguage pathologists to assess learning within a treatment session are usually informal measures. interexaminer agreement: the extent to which results of a measure agree when it is administered, scored, or interpreted by two or more examiners. interexaminer reliability: the consistency of a measure across two or more examiners, also termed interjudge reliability, interobserver reliability, and intertester reliability. internal consistency: the consistency of a measure across subdivisions of its content, usually measured using splithalf reliability, KR20, or coefficient α. item analysis: a variety of procedures applied to the pool of items being considered for inclusion in a measure that examine its possible contributions to the overall measure. observed score: the score actually achieved by a given test taker; usually contrasted with “true” score in classical truescore theory.
Page 75 reliability: consistency of a measure across changes in time (test–retest), in the individual administering or scoring it (interexaminer), and in the specific items it contains (internal consistency). standard error of measurement (SEM): a measure of reliability that is expressed in terms of the original units of measurement (e.g., number of items). test–retest reliability: the consistency of a measure that is administered at two points in time. test: a behavioral measure in which a structured sample of behavior is obtained under conditions in which the “tested” individual is assumed to perform at his or her best (APA, AERA, & NCME, 1985). true score: a theoretical value that hypothetically would be obtained by a test taker if the measure being used were perfectly reliable, that is, were unaffected by error validity: the extent to which a measure actually measures what it claims to measure. Study Questions and Questions to Expand Your Thinking 1. Define validity. 2. Choose a specific test dealing with child language. Compare sources of information about its content: (a) content implied by the title, (b) its apparent content on the basis of the author’s overview statements concerning the intended purpose of the test and its intended content, and (c) individual items. How might a naive test user be misled if he or she only considers the title? 3. Describe the three major kinds of validity evidence and their relationships with one another. 4. Translate the following sentence into a form that someone unfamiliar with testing would be able to understand: Reliability is a necessary but not sufficient condition for validity. 5. List the steps required in the development of a standardized measure. Compare and contrast these steps as they apply to criterion versus normreferenced measures. 6. Imagine that you’ve set up a task with 20 items that you believe may be difficult for you to rate consistently as correct or incorrect. What procedure would you use to obtain a measure of your consistency in rating these items. 7. List three factors known to affect validity. 8. List three factors know to affect reliability. 9. Explain how the amount of variability in test scores affects the magnitude of correlation coefficients. What implications does this effect have for test developers? 10. What is meant by the convergent–discriminant approach to construct validity? 11. How does reliability relate to classical true score theory?
Page 76 12. Why is internal consistency associated with three measures: splithalf reliability, KR20, and coefficient α? 13. List five enabling behaviors required for the performance of a picture vocabulary test in which the test taker is required to listen to the name of an action and pick out a picture (from a group of 4) that corresponds to the action. 14. Reflect on situations in which a teacher, coach, or parent has helped you do something that you found particularly difficult. What did they do that helped you feel motivated to try that difficult something? How might you apply the same approach to the testing of a reluctant child? Recommended Readings American Educational Research Association, American Psychological Association, and National Council on Measurement in Education. (1985). Standards for educational and psychological testing. Washington, DC: American Psychological Association. Gronlund, N. E. (1993). How to make achievement tests and assessments. (5th ed.). Boston: Allyn & Bacon. Lyman, H. B. (1963). Test scores and what they mean. Englewood Cliffs, NJ: PrenticeHall. McReynolds, L., & Kearns, K. (1983). Singlesubject experimental designs in communicative disorders. Baltimore: University Park Press. Messick, S. (1989). Validity. In R. L. Linn (Ed.), Educational measurement (3rd ed., pp. 13–103). New York: American Council on Education and Macmillan. Sattler, J. (1988). Assessment of children (3rd ed.). San Diego, CA: Author. References Allen, M. J., & Yen, W. M. (1979). Introduction to measurement theory. Monterey, CA: Brooks/Cole. American Psychological Association. (1954). Technical recommendations for psychological tests and diagnostic techniques. Washington, DC: Author. American Psychological Association, American Educational Research Association, and National Council on Measurement in Education. (1985). Standards for educational and psychological testing. Washington, DC: American Psychological Association. Anastasi, A. (1954). Psychological testing. New York: Macmillan. Anastasi, A. (1980). Anne Anastasi. In G. Lindzey (Ed.), A history of psychology in autobiography (pp. 1–37). San Francisco: W. H. Freeman and Company. Anastasi, A. (1988). Psychological testing (6th ed.). Upper Saddle River, NJ: PrenticeHall. Berk, R. A. (1984). Screening and identification of learning disabilities. Springfield, IL: CC Thomas. Bzoch, K. R., & League, R. (1991). Receptive–Expressive Emergent Language Test (REEL2). Austin, TX: ProEd. Campbell, D. P., & Fiske, D. W. (1959). Convergent and discriminant validation by the multitrait–multimethod matrix. Psychological Bulletin, 56, 81–105. Carver, R. P (1974). Two dimensions of tests: Edumetric and psychometric. American Psychologist, 29, 512–518. Cordes, A. K. (1994). The reliability of observational data: I. Theories and methods for speechlanguage pathology. Journal of Speech and Hearing Research, 37, 264–278. Cronbach, L. J., Gleser, G. D., Nanda, H., & Rajaratnam, N. (1972). The dependability of behavioral measurements: Theory of generalizability for scores and profiles. New York: Wiley. Dunn, L., & Dunn, L. (1997). Peabody Picture Vocabulary Test–III. Circle Pines, MN: American Guidance Services.
Page 77 Fleiss, J. L. (1981). Statistical methods for rates and proportions. New York: Wiley. Fredericksen, N., Mislevy, R. J., & Bejar, I. I. (Eds.) (1993). Test theory for a new generation of tests. Hillsdale, NJ: Lawrence Erlbaum Associates. German, D. J. (1986). Test of Word Finding. Allen, TX: DLM Teaching Resources. Glaser, R. (1963). Instructional technology and the measurement of learning outcomes. American Psychologist, 18, 519–521. Glaser, R., & Klaus, D. J. (1962). Proficiency measurement: Assessing human performance. In R. Gagne (Ed.), Psychological principles in systems development (pp. 419–476). New York: Holt, Rinehart & Winston. Gronlund, N. (1993). How to make achievement tests and assessments. (5th ed.). Boston: Allyn & Bacon. Hammill, D. D., & Newcomer, P. L. (1997). Test of Language Development Intermediate–3. Circle Pines, MN: American Guidance Service. Hresko, W., Reid, D., & Hammill, D. D. (1981). Test of Early Language Development. Los Angeles: Western Psychological Associates. Hsu, J. R., & Hsu, L. M. (1996). Issues in design research and evaluating data pertaining to children’s syntactic knowledge. In D. McDaniel, C. McKee, & H. S. Cairns (Eds.), Methods for assessing childrens syntax (pp. 303–341). Cambridge, MA: MIT Press. Kent, R. D., Kent, J. F, & Rosenbek, J. C. (1987). Maximum performance tests of speech production. Journal of Speech and Hearing Disorders, 52, 367–387. Kuder, G.F., & Richardson, M. W. (1937). The theory of estimation of test reliability. Psychometrika, 2, 151–160. Maynard, D. W., & Marlaire, C. L. (1999). Good reasons for bad testing performance: The interactional substrate of educational testing. In D. Kovarsky, J. Duchan, & M. Maxwell (Eds.), Constructing (in)competence (pp. 171–196). Mahwah, NJ: Lawrence Erlbaum Associates. McReynolds, L. V. & Kearns, K. P. (1983). Singlesubject experimental designs in communicative disorders. Baltimore: University Park Press. Messick, S. (1989). Validity. In R. L. Linn (Ed.), Educational measurement (3rd ed., pp. 13–103). New York: American Council on Education and Macmillan. Mislevy, R. J. (1993). Foundations of a new test theory. In N. Fredericksen, R. J. Mislevy, & I. I. Bejar (Eds.), Test theory for a new generation of tests (pp. 19– 40). Hillsdale, NJ: Lawrence Erlbaum Associates. National Education Association. (1955). Technical recommendations for achievement tests. Washington, DC: Author. Nitko, A. J. (1983). Educational tests and measurement: An introduction. New York: Harcourt Brace Jovanovich. Pedhazur, R. J., & Schmelkin, L. P. (1991). Measurement, design, and analysis: An integrated approach. Hillsdale, NJ: Lawrence Erlbaum Associates. Salvia, J., & Ysseldyke, J. E. (1995). Assessment (6th ed.). Boston: Houghton Mifflin. Semel, E., Wiig, E. H., & Secord, W. (1987). Clinical Evaluation of Language FundamentalsRevised. San Antonio: The Psychological Corporation. Stillman, R., Snow, R., & Warren, K. (1999). ‘‘I used to be good with kids.” Encounters between speechlanguage pathology students and children with Pervasive Developmental Disorders (PDD). In D. Kovarsky, J. Duchan, & M. Maxwell (Eds.), Constructing (in)competence (pp. 29–48). Mahwah, NJ: Lawrence Erlbaum Associates. Stevens, G., & Gardner, S. (1982). The women of psychology. Cambridge, MA: Schenkman. Torgesen, J. K., & Bryant, B. R. (1994). Test of Phonological Awareness (TOPA). Austin, TX: ProEd. Van Riper, C., & Erickson, R. (1969). A predictive screening test of articulation. Journal of Speech and Hearing Disorders, 34, 214–219. Zimmerman, I. L., Steiner, V. G., & Pond, R. E. (1992). Preschool Language Scale–3. San Antonio, TX: Psychological Corporation.
Page 78 CHAPTER
4 Evaluating Measures of Children’s Communication and Related Skills Contextual Considerations in Assessment: The Bigger Picture in which Assessments Take Place Evaluating Individual Measures In the last chapter, you were introduced to the most important testrelated considerations for evaluating individual measures: validity and reliability. In this chapter, you will learn about factors to consider in evaluating measures and about how to perform such an evaluation—a process that makes the most sense when the focus is shifted away from the test itself and toward the reason for its use: the child in question and the larger world in which he or she moves. Speechlanguage pathologists use measurement information to achieve goals affecting children’s health, development, family life, education, and social wellbeing. They obtain this information cooperatively (working primarily with families and other professionals) and share it with others as a means of achieving the child’s greatest good. This cooperative pursuit on behalf of the child is not simply a practical matter, although it certainly affects the logistics of measurement in very practical ways. Rather, a rich understanding of the way in which children’s interactions with the world are mediated by their family and culture is critical to framing questions that will result in valid responses to the child’s needs. Also needed, however, is an appreciation that the clinician brings his or her own history, culture, and workplace constraints to the questionasking situation—
Page 79 all of which will also bear on which questions are asked and how they are answered. In the first half of this chapter, I discuss the larger context of measurement, focusing first on factors affecting the child and then on factors, that more directly impinge on the clinician. Figure 4.1 illustrates a visual model for thinking about this larger context. Contextual Considerations in Assessment: The Bigger Picture in Which Assessments Take Place In 1974, Urie Bronfenbrenner was responsible for an evaluation of the developmental research of that era which can still chill the heart of researchers and clinicians who study children in highly structured contexts. Specifically, he described that research as “the study of the strange behavior of children in strange situations for the briefest possible period of time” (Bronfenbrenner, 1974). This quotation brings into sharp focus a deep concern that researchers were failing to capture the essential factors affecting the child by failing to study them and their most influential companions (usually parents) in the natural situations in which development occurs. Shifting species for a second, one could say that essentially Bronfenbrenner pointed out that drawing conclusions about children in real life from existing research paradigms was akin to concluding that one knew about lions in the wild by observing lions moving around the artificial rocks, caves, and ponds of their enclosure in a zoo. Anyone who has seen a wildeyed, noncompliant, and virtually nonverbal child leave a clinic room to begin a fastpaced, detailed litany of his ordeals can understand Bronfenbrenner’s point—as well as the relevance of the lion analogy. A vast research literature was spawned by Bronfenbrenner’s criticism and by the program of research he and others undertook to understand development through observations of children and their caretakers in reallife settings. The resulting literature is associated with an evolving theory of development (Bronfenbrenner, 1986; Bronfenbrenner & Morris, 1998) that can provide us with a valuable starting point for thinking about the larger context of assessment. A recent articulation of this model (Bronfenbrenner & Morris, 1998) was described by its authors as a “bioecological model” of development because it emphasizes both the child’s characteristics and the context in which development occurs as contributors to the process of development. Among the most obvious modifications represented in this version of the model are the placing of greater emphasis both on biological factors affecting the child and those around him or her and on the child’s role in affecting his or her environment as well as being affected by it. The enduring central component of the model, however, and the component that was most needed and championed in speechlanguage pathology, is its celebration of the importance of the child’s environment to developmental processes, especially the social environment (Crais, 1995; Muma, 1998). In the following pages I briefly discuss how current thoughts on the contexts of family, language, culture, and society as a whole continue to shape and reshape views of valid language evaluation and how aspects of the clinician’s context also affect the evaluation of children’s language.
Page 80
Fig. 4.1. A model of factors affecting the child and the clinician in the assessment process. From Assessing and screening preschoolers: Psychological and educational dimensions (p. 6), by VasquezNuttall, Romero, and Kalesnik, Boston: Allyn & Bacon. Copyright 1999 by Allyn & Bacon. Adapted by permission.
Page 81 Familial Contexts
Why should families be seen as the central forum for language development and, thus, language assessment? From the time the child is born, the family constitutes the most basic and enduring of contexts in which children spend their time and their energies. Further, the foundation of communication and language is established with the give and take of early feeding and proceeds onward to all attained levels of linguistic achievement. Although these truths have probably always been recognized by professionals at some level, they have tended to be overlooked in measurement practices until the diffusion of theories such as that proposed by Bronfenbrenner led to political action. Specifically, the Education of the Handicapped Act Amendments of 1986 required Individual Educational Plans (IEPs) for children ages 3 to 5 and Individualized Family Service Plans (IFSPs) for children younger than 3. Through these requirements, the law embodied the perspective that because of the intertwined and interdependent nature of child and family needs, effective evaluation and intervention for children requires inclusion of the family as collaborators—that is, as active agents in the life and affairs of the child rather than as passive recipients of professional activities. Particularly for children below the age of three, this perspective was seen as crucial, hence the requirement of the IFSP for that age group. Within the IFSP provisions, assessments include information about family strengths, needs, and variables related to program services, as well as about the child’s current level of functioning (Radziewicz, 1995). Radziewicz noted that effective family assessment is conducted in a manner that is positive for the family, respectful of the family’s values, inclusive of key family members, nonintrusive, and aimed at targeting family needs and resources (Radziewicz, 1995). New types of tools have been developed to address clinical questions concerning the nature of the family and of parent–child interactions. Radziewicz (1995) and Crais (1992, 1995) provided excellent discussions of these. In addition to serving as a focus of attention of professionals, however, parents and families have become more actively involved in a variety of “clinical” activities, including screening, providing descriptions and other data, validating evaluation findings, and even administering some tests. Although these activities are described in later chapters, they are mentioned here to help you become aware that your consideration of an instrument’s validity will often include thinking about the suitability of its use with and by parents. Not surprisingly, this need is greatest for younger children and infants and for children who are more affected by their difficulties. CulturalLinguistic Contexts for Assessment
Just as the child is embedded within his or her family, so too is the family embedded within a specific culture and linguistic context. Thus, effective interaction with families depends not simply on the clinician’s choosing to include them in the process, but also on her or his knowledge of each family’s cultural and linguistic expectations. The variety of cultural and linguistic differences affecting a clinician’s interaction with
Page 82 parents is quite awesome. Among just a few of the differences discussed in a growing literature (Damico, Smith, & Augustine, 1996; DonahueKilburg, 1992; van Kleeck, 1994) are the following: 1. differences in childrearing practices (e.g., the appropriateness of asking children to engage in question asking or to recite information already known by listeners); 2. differences in patterns of decision making within families (e.g., which figures are seen as primary decision makers); 3. differences in family choices concerning language and dialect use (e.g., whether children are expected to use the language of the home); 4. and differences in how difficulties in communication are viewed (e.g., how they are viewed as affecting the family and child). Differences such as these can affect the nature and extent of communications occurring between clinicians and parents, how they are included in their child’s care, the nature of intervention, and—most importantly for the purposes of this book—how language evaluations are planned, executed, and acted on. Also, because evaluations are prompted by heightened parental concern or can act to promote parents’ focus on their child, evaluations that successfully involve parents can also enlist parents’ continuing engagement in ways that are critical to the child’s success. Table 4.1 summarizes reported trends in the attitudes of Asian Americans, African Americans, and Hispanic Americans toward children, family, and child rearing. Of course, these trends represent prejudices, that is, prejudgments, of a type: There is simply no substitute for finding out how a specific family functions and what its attitudes are, regardless of its culture. That each child is also a maturing user of ambient language(s) and dialect(s) will affect assessment dramatically. Most obviously, clinicians are aware of this when they are asked to assess the communication skills of a child whose first language is not the same as their own, and they must decide whether and how they can be involved with the child. Clinicians are also aware of this when they serve children who differ in social or regional dialect from themselves. In both cases, the clinician must often determine whether the differences from the mainstream, dominant, or school language are due to language disorder or to difficulties specific to second language or dialect acquisition (e.g., inadequate exposure, transference effects from the first language or dialect, motivational differences between first and second language acquisition; Damico et al., 1996). Issues related to the presence of culturally and linguistically diverse clients was once seen as a matter of sporadic significance. Once it was considered more important in bigger cities with larger immigrant populations and in geographic regions with greater cultural, ethnic, and dialectal diversity. Now, however it has been estimated that one in every three Americans is African American, Hispanic, Asian American, or American Indian (American Speech–Language–Hearing Association [ASHA], 1999). Although nationally and globally, diversity in language and culture is the rule rather than the exception, that fact is not represented in the demographics of the professions of
Page 83 Table 4.1 Trends in Attitudes Toward Children, Family, and Child Rearing Asian Americans
l
Strict gender and age roles
l
Father—the family leader, head of family
l
Mother—the nurturer, caregiver
l
Older males superior to younger males
l
Females submissive to males
l
Close, extended families
l
Multigenerational families
l
Older children strictly controlled, restricted, protected
l
Physical punishment used
l
Parents actively promote learning activities at home—may not participate in school functions
l
Children are treasured
l
Infant/toddler needs met immediately or anticipated
l
Close physical contact between mother and child
l
Touch rather than vocal/verbal is primary vehicle of early mother–infant interaction
l
Harmony of society more important than individual
l
Infant seen as independent and needing to develop dependence on family and society
African Americans
l
Mothers and grandmothers may be greatest influences
l
Strong extended family ties are encouraged
l
Independence and assertiveness encouraged
l
Infants may be focus of family attention
l
Affectionate treatment of babies, but fear of “spoiling”
l
Strong belief in discipline, often physical
l
Caregiving of older toddler may be done by an older child
Hispanic Americans
l
Strong identification with extended family
l
Families tend to be patriarchal with males making most decisions
l
Infants tend to be indulged; toddlers are expected to learn acceptable behavior
l
Emphasis placed on cooperativeness and harmony in family and society
l
Independence and ability to defend self encouraged
l
Older siblings often participate in child care
Note. From FamilyCentered Early Intervention for Communication Disorders: Prevention and Treatment (p. 21), by G. DonahueKilburg, 1992, Gaithersburg, MD: Aspen. Copyright 1992 by Aspen. Reprinted with permission. speechlanguage pathology and audiology. Thus, clinicians are increasingly faced with the special challenge of enlarging their understanding of other cultures and linguistic communities and the skills required to implement that understanding in their work. The process of respecting diversity in children and in their families pervades all phases of clinical interaction. Because it is critical to valid screening, identification, description, and assessments of change, diversity arises as a continuing point of discussion throughout the remainder of this text. I highlight it here because of its particular relevance to the test review process discussed later in this chapter.
Page 84 Societal and Legal Contexts
Just as the child whose language development is in doubt exists as a member of a larger community, so too is the speechlanguage pathologist who serves the child. He or she is also a participant in the larger social contexts of a given profession and workplace within a particular time and place—a given era within a given school district or institution, state, and country. Each of these contextual factors can affect decisions about assessment. A recent discussion of the roles and responsibilities of school speechlanguage pathologists, contained within an extensive ASHA document available on their website, emphasized this fact (ASHA, 1999). Table 4.2 includes just a small number of the many factors ASHA described as affecting clinical practice with children. In this brief section, two particularly compelling sources of effects on measurement practice are addressed: national legislation and changing global perspectives on disablement. National Legislation
As mentioned briefly in terms of regulations regarding family involvement, legal influences on how children are evaluated for language problems represent some of the most powerful influences in clinical practice. In particular, federal legislation establishing the ways in which public schools address the needs of children has had profound effects on how children’s problems are screened, identified and addressed (ASHA, 1999; Demers & Fiorello, 1999). Thus, as described earlier, it was through Education of the Handicapped Act Amendments of 1986 that ideas about the need for greater attention to families became a potent factor in shaping actual practice. In this section, I point out the even broader effects that have resulted from a number of other legislative initiatives, paying particular attention to the Individuals with Disabilities Education Act (IDEA), which was passed in 1990. The IDEA built on and modified earlier legislation, including two landmark federal laws: the Education for All Handicapped Children Act of 1975 (P.L. 94142), which established many nowstandard features of educational attention to children with special needs and Education of the Handicapped Act Amendments (1986), which mandated services for those children from birth to age 21, in addition to its role in pressing for greater inclusion of families in educational evaluations. Since 1990, the IDEA has been amended (IDEA Amendments of 1997) and has had regulations developed for its implementation. Table 4.2 A Brief List of Some of the Contextual Factors Affecting SpeechLanguage Pathology Practice Among SchoolBased Clinicians (ASHA, 1999) l
Specific federal legislative actions (e.g., the Individuals with Disabilities Education Act of 1990)
l
State regulations and guidelines
l
Local policies and procedures
l
Staffing needs
l
Caseload composition and severity
l
Cutbacks in education budgets
l
Personnel shortages
l
Expanding roles
Page 85 The IDEA and the 1997 amendments to it maintained numerous elements of the earlier legislation. Among the most important of these maintained features is a mandate for nondiscriminatory assessment. In such assessments, it is required that measures be administered in the child’s native language by trained personnel following the procedures outlined in the test manual. In addition, these more recent laws dictate that validity information for a test be specific to the purpose for which the test is used. Further, this legislation requires that evaluations of children be comprehensive, multifactored, and conducted by an interdisciplinary team. Although each of these components was viewed as the best practice at the time of legislation, legislation and the potential for litigation where legislation is not followed give rise to the actual implementation of professional and academic recommendations. However, it’s important to recognize that legislation is not always in accord with best practices, as I discuss in later sections. New provisions of the IDEA, its amendments, and the more recent development of regulations implementing it include some changes in nomenclature, such as abandonment of the term handicapped for the term disabled as the designation given to children covered by the law. In addition, these legal actions have added several new separate disability categories, with autism being the most relevant to discussions of language disorders. Other new elements consist of demands for increased accountability with resulting increases in documentation requirements and insistence that children’s IEPs contain information connecting the child’s disability to its impact on the general education curriculum (ASHA, 1999; Demers & Fiorello, 1999). Because of the legislation described above, speechlanguage pathologists who work with children in schools are involved in a broader range of responsibilities and potential roles (ASHA, 1999). The children they evaluate are more diverse in age, language, and culture, and the collaborative nature of their work has increased dramatically. Also, clinicians are made more accountable for the validity of the instruments they use and the methods they follow in evaluating clients. To a great extent, the effects of national legislation are supportive of good measurement practices. At the same time, however, legislation introduces complexity for clinicians, who face increasing responsibilities, increasing demands for documentation, and the push to revise or develop strategies to deal with the specific ways in which individual states and school districts implement federal law. Some of the complications to clinical practice introduced by state Departments of Education are discussed as they relate to specific measurement questions in later chapters. World Health Organization Definitions
At an international level, changes brought about by the World Health Organization (WHO) of the United Nations have affected assessment practices (WHO, 1980). As part of its charge to develop “a global common language in the field of health,’’ WHO proposed guidelines reflecting changing views about health and departures from health that would affect a wide array of sectors, including health care, research, planning and policy formation, and education. Specifically, in 1980, WHO developed the International Classification of Impairments, Disabilities, and Handicaps (ICIDH), in which various types of outcomes associated with health conditions were considered.
Page 86 The 1980 ICIDH classification recognized four levels of effects. These levels are summarized here with examples taken from applications to language disorders. First, there is disease or disorder, the physical presence of a health condition, for which a language disorder can serve as the example. Next, there is impairment, an alteration of structure or function causing the individual with the condition to become aware of it. For children with language disorders, an example of a possible impairment would be inappropriate use of grammatical morphemes. The third level of effects is described as disability, an alteration in functional ability. For children with language disorders, the disability associated with their difficulties could be a decreased ability to communicate. The last level recognized in the ICIDH is that of handicap, which is a social outcome. Thus, negative attitudes on the part of playmates or teachers toward affected children constitutes a possible handicap associated with language disorder. Although changes in these terms and the reasons for those changes are discussed in a moment, I first discuss two important implications of this new classification system that have proven most significant. First, although there is a tendency for these four types of effects to be related to one another (e.g., for more severe disorders to be associated with greater handicaps), this is not always the case. For example, it is possible for a handicap to exist apart from the presence of a disease or disorder, as might be the case if societal prejudice against an individual occurred in the absence of actual impairment. A specific example might be if a child were to be excluded by a group of peers because of a cleft lip, an observable but functionally insignificant difference. Similarly, it is possible for a more severe impairment to be associated with only a mild disability and minimal handicap because of successful compensatory strategies on the part of the individual, effective interventions on the part of professionals, or both. Imagine a child with a moderate hearing loss acquired after initial stages of language acquisition are complete who experiences high overall intelligence, strong motivation, a supportive home environment, and effective auditory management. Such a child could be expected to experience lesser effects on communication effectiveness and on social roles than would be expected on the basis of the severity of hearing loss alone. This classification causes one to consider the role of not only the child, but also of his or her surroundings in determining the nature of negative effects experienced because of a disorder. A second major implication of the 1980 classification is that each of the four levels of effects is understood to be associated with different measurement goals for both research and clinical purposes. For example, measurement focused at the level of handicap requires information about how a child’s social and educational roles are affected by his or her condition. This contrasts with measures focused at the level of impairment, which require information about the child’s use of particular language structures. The greater attention paid to the larger ramifications of health conditions coincides with an urgent push in both clinical and educational settings for measuring and evaluating the effectiveness of interventions in terms these higher order effects. Despite the widespread influence of the 1980 classification system, dissatisfaction existed with its terminology and with the ways in which the social contributions to the effects of health conditions was handled. Among specific criticisms was that terminology was sometimes confusing and included the use of potentially offensive terms
Page 87 such as handicap (Frattali, 1998). The model underlying the classification was also criticized for failing to represent the influence of contextual factors. Because of concerns about the 1980 classification system, a draft revision was put forward in 1997 for comment and field testing, with an expected final approval date for a final version in 2000 (WHO, 1998). The proposed classification system is called the ICIDH2: International Classification of Impairments, Activities, and Participation (WHO, 1998), reflecting significant changes to the theoretical orientation from the earlier classification of “Impairments, Disabilities, and Handicaps.” The details of the final revision remain indefinite at the moment. Nonetheless, the current draft warrants discussion because of its value as an indicator of emerging trends and because it fits snugly with the view of children advanced up to this point in the chapter—that is, as deeply affecting and affected by their environment. As its most important change, the 1997 classification is designed to embrace a model in which human functioning and disablement result from an interaction of the individual’s condition and his or her social and physical environment. In this system, therefore, the following definitions are used to describe levels of functioning (or where decreased functioning is noted, disablement) in the context of a health condition: 1. “Impairment is a loss or abnormality of body structure or physiological or psychological function, e.g., loss of a limb, loss of vision” (WHO, 1998, p. 8). Notice that this level corresponds to the current ICIDH level of impairment and thus might refer to a child’s abnormal or delayed language characteristics. 2. “An Activity is the nature and extent of functioning at the level of the person. Activities may be limited in nature, duration, and quality, e.g., taking care of oneself, maintaining a job” (WHO, 1998, p. 8). Notice that this level replaces the current ICIDH level of disability and thus might refer to a child’s reduced ability to communicate. 3. “Participation is the nature and extent of a person’s involvement in life situations in relation to Impairment, Activities, Health Conditions and Contextual factors. Participation may be restricted in nature, duration and quality, e.g., participation in community activities, obtaining a driving license’’ (WHO, 1998, p. 8). This final level corresponds to the older level of handicap and thus might refer to negative social outcomes of a child’s language problems. On the basis of these new formulations, one can see continuities between the proposed and existing systems yet also notice a significant change in orientation that is both more positive in tone and more recognizing of contextual influences. In the new classification system, a person’s environmental (social and physical) and personal contexts are said to influence how disablement at each of these levels is experienced. In particular, two types of contextual factors are deemed most important: (a) environmental and physical factors (such as social attitudes, physical barriers posed by specific settings, climate, and public policy) and (b) personal factors (e.g., education, coping style, gender, age, and other health conditions; WHO, 1998, p. 8). From this overview, it is evident that the thrust of the ICIDH2 will be support for many of the principles championed by Bronfenbrenner, by recent federal legislation,
Page 88 and by advocates for an integrated view of validity in which the effects of a decision made using a measure must be considered when one evaluates a measure’s validity. Overall, a unifying principle is that decision making on behalf of children requires attention not simply to properties of the child but to the context in which the decision is being made and acted on. In the last half of this chapter, practical steps involved in the process of evaluating measures for possible use in decision making are described. Although I have rendered the larger context in which this process must take place in only the grossest detail, I hope that you can sense the sheer intricacy of the task at hand. On the one hand, confronting the very significant intellectual challenge entailed in the selection, use, and interpretation of appropriate measures makes me nearly turn tail and run. On the other hand, however, the rewards of successful clinical decision making and action would be less sweet if they were easily won. Evaluating Individual Measures Evaluating individual measures is like solving a mystery, where the mystery is how to view a measure for use with a particular client or group of clients. After a general plan is developed in the early stages of the review process, clues are collected and weighed. Most clues come from the clinician’s knowledge of individual clients and their needs and from the manual for the particular measure. Additional sources of information, such as test reviews and pertinent research articles, can also help in the process. This chapter is arranged so that, following a brief overview of two modes of reviewing, you are introduced to the test manual and then to other sources of information to help you reach a final decision—to “crack the case,” if we follow the detective analogy. Clientversus PopulationOriented Reviews of Measures
I have said that the validity of a measure depends on its ability to answer a particular clinical question for a particular child. Consequently, the appropriateness of a measure is determined within the realm of the particulars—ideally, within a firm appreciation of factors important to an individual child, such as coexisting handicapping conditions, language background, gender, and age—as one reviews the test manual and other sources of information for the measure. Such a review might be said to be a clientoriented review of the measure. Clientoriented review of measures is an ideal that is often unattainable. Given the pace of most clinical environments, clinicians are rarely able to review each potential measure thoroughly and compare it with competing measures immediately prior to each measurement they make. In fact, clinicians more commonly use what I would call a populationoriented evaluation. In a populationoriented review of a measure, the clinician reviews the measure’s documentation in reference to a particular group or groups—usually those subgroups of children they serve most frequently. For example, a speechlanguage pathologist in a rural Vermont school would pay special attention to a test’s likely value for a subgroup of chil
Page 89 dren with few significant problems in other areas of development, who come from homes in which English is the only language spoken, and socioeconomic status is middle to low. In contrast, a very different populationoriented assessment might be conducted by a speechlanguage pathologist in a Boston school district with a caseload consisting solely of children from Frenchspeaking Haitian families living in poverty. Although evaluating a measure for these two populations would involve many of the same questions, each would require different answers reflecting sensitivity to the relevant population. Populationoriented reviews are most frequently conducted when a new measure is considered for purchase, when a measure is examined at a publisher’s display at a convention, or when a speechlanguage pathologist enters a new position and inventories available measures. In contrast, clientoriented reviews of measures often arise when an uncommon clinical question emerges or when a child’s particular pattern of problems (e.g., mental retardation and a severe hearing loss) make the child’s needs in a testing situation too unlike those for which the clinician has conducted a populationoriented evaluation. How to Use Test Manuals
Regardless of the type of review you undertake, the outcome of your evaluation will never simply be a buy–don’tbuy or– use–don’tuse decision. A thorough review provides potential users with an appreciation of the measure’s limitations for answers to specific clinical questions. The test manual is the definitive source of information on a standardized measure. In fact, many of the recommendations made in the Standards for Psychological and Educational Testing (APA, AERA, & NCME, 1985) relate directly to material that should be provided in test manuals. Despite their importance, however, test manuals range widely in their sophistication and value. At their best, test manuals provide not only the basic information required to evaluate the measure’s appropriateness for given uses with specific populations, but also insightful tutorial information that can reinforce and extend one’s understanding of test construction and use. At their worst, test manuals appear to be little more than sales brochures designed to obscure a test’s weaknesses and imply that it can be used for all clients and testing purposes. Even measures that are valuable additions to a clinician’s arsenal may imply possible uses that really are not supportable. Consequently, a clinician’s detective talents are called on to ferret out the truth! The reviewing guide reproduced in Fig. 4.2 is a worksheet for evaluating behavioral measures. It is blank so that you can readily duplicate and use it. An annotated version of the guide, which appears as Fig. 4.3, summarizes the most important kinds of information—or “clues”—you will be looking for as you conduct a measure review. The annotated guide is designed to function like the ready reference cards available for many software applications. The reviewing guide and annotated guide are included to make reviewing a more efficient process, but their inclusion is not without hazards. The danger of such worksheets and summaries is that some individuals may consider them all one needs to know in order to conduct a credible review. This is a big mistake! These guides are a first step that should always be accompanied by a willingness—even eagerness—to
Page 90
(continued)
Page 91
(continued)
Page 92
Fig. 4.2. Annotated review form.
Page 93
(continued)
Page 94
(continued)
Page 95
Fig. 4.3. Review form.
Page 96 look back at trusted resources on measurement, especially the Standards for Educational and Psychological Testing (APA, AERA, & NCME, 1985). After all, even Sherlock Holmes depended on his learned friend Dr. Watson! Numerous authors writing about psychometric issues propose review procedures that are very similar to those described here (e.g., Anastasi, 1988, 1997; Hammer, 1992; Hutchinson, 1996; Salvia & Ysseldyke, 1998; Vetter, 1988b). Appendix 5 in Salvia and Ysseldyke (1998, pp. 763–766), “How to Review a Test,” is a particularly informative and amusing description of the review process. In the remainder of this section I lead you through the annotated guide, explaining why it is important to look for certain kinds of information. These sections are less sketchy versions of the brief summaries given in Fig. 4.3. Some of their content should sound quite familiar because it is based on the concepts discussed at length in chapter 3. This section ends with a review guide completed as part of a hypothetical clientoriented review (Fig. 4.4). 1. Reviewer This information will probably be unnecessary for reviewers who function alone in their test selection and evaluation. On the other hand, it can be helpful in cases where multiple test users share reviewing responsibilities, at least for preliminary, populationoriented reviews. Use of a standard guide facilitates such sharing by reducing differences between reviewers and offering later reviewers a possible starting point for clientoriented reviews. 2. Identifying Information Besides information that can help you locate or replace an instrument, this section provides preliminary clues to the scope and nature of the measure. Test names vary greatly in just how much they disclose about the nature of the test (e.g., whether it is comprehensive or aimed at only one modality or one domain of language), so they should be approached with caution. Testing time, which users may want to break down in terms of projected administration and scoring times, is of practical importance when scheduling testing. Information about basic characteristics of the measure such as whether it is standardized versus informal, criterionreferenced versus normreferenced, is used to determine the measure’s suitability for certain clinical questions and guides expectations for other sections of the review guide. Although all major sections of the Guide are relevant to all measures, the kinds and amounts of information provided vary depending on the measure’s type. Manuals for standardized, normreferenced measures probably provide the greatest amounts of information. On the other hand, more informal, criterionreferenced measures, which have often been created by an individual clinician for a specific purpose, have far less information available. (Although see Vetter, 1988a, for recommendations about the kind of information that should be kept for any procedure that might profitably be used on repeated occasions). 3. Testing Purpose Here, you summarize your knowledge of the intended client or population. Relevant information includes the client’s age, other problems (e.g., visual, motor, or cognitive impairments), and important language characteristics (e.g., bilingual home, regional or social dialect use).
Page 97
(continued)
Page 98
(continued)
Page 99
Fig. 4.4. Sample of a completed review form.
Page 100 The main clinical questions leading to the search for an appropriate measure should also recorded here: Is the measure going to be used for screening, identifying a problem or difference, treatment planning, or assessing change? Also, what language modalities and skill areas are of interest? As mentioned in chapter 1, each of these clinical questions requires different measurement solutions. Therefore, the reviewer should conduct all reviews with the assessment purpose vividly in mind. Chapters 9– 12 address in considerable detail the demands associated with different clinical questions. 4. Test Content This section returns the reviewer’s attention to the test manual. Gaining a clear understanding of a test’s content usually requires that you examine at least the early sections of the test manual and the test form itself. Homogeneous measures, in which all items are aimed at a single modality and language domain, are relatively easy to specify in terms of their content. For example, the Expressive OneWord Picture Vocabulary Test–Revised (Gardner, 1990) fits into this category; its content can be specified as expressive vocabulary or expressive semantics. Usually, however, measures address more than one content area, which are indicated through the use of subtests or subscores. For this section of the review guide, as well as for the sections that follow it, recording page numbers along with your findings is an excellent way to encourage checking against the manual during later use of the completed guide. As you record information about test content, you want to see how well the content areas covered by the measure match those of interest for your client. Even the nature of items (e.g., forced choice vs. openended responses) will be important in helping you determine whether the behaviors or abilities of interest will be the largest contributor to your client’s performance. Recall that one threat to validity introduced in chapter 3 was that of enabling behaviors, behaviors that enable a test taker to take the test validly. For example, suppose that you were interested in assessing the receptive language skills of a child with cerebral palsy who fatigued easily if asked to show or act out responses. The motoric demands of measures become enabling behaviors that will negatively influence the child’s performance even though they are independent of the targeted construct of receptive language. In addition to providing a tangible reminder not to overlook possibly problematic enabling behaviors, this section of the review form should also stimulate cluegathering around what is actually being tested (Hammer, 1992; Sabers, 1996). Recall that as the test developer moves from an ideal formulation of the measure’s underlying construct to the downanddirty task of writing sets of items, certain behaviors or skills necessarily tag along to yield a fleshedout construct that may or may not match your own (or even the author’s) intended formulations. As an example of how constructs can be modified as a test takes shape, imagine a test developer who decides to devise a measure to assess use of complex sentences using methods that place a heavy demand on working memory capabilities. For example, the test developer could provide the test taker with a set of seven words, including the word because that are to be used to create a single sentence. Although the final form in which the construct is realized may be acceptable to some test users, it may
Page 101 not be to others, depending on their understanding of the targeted construct. It is primarily through careful attention to this step in the review process that you will become aware of correspondences—or disjunctions—between the test developer’s and your view of what is being tested. Armed with this knowledge, you can make an informed decision as to whether the construct being measured is close enough to your reason for testing for you to consider using it. 5. Standardization Sample/Norms At first glance, this section may seem to be primarily of interest when you are looking for a normreferenced instrument, that is, one in which scores are interpreted primarily on the basis of how the test taker’s performance compares in a quantitative way with that of a peer group. In fact, however, the nature of the standardization sample has important implications for all measures. It can determine the extent to which summary statistics (in the case of normreferenced measures) or summary descriptions of behaviors (in the case of criterionreferenced measures) are likely to reflect characteristics of most children rather than those of a small, potentially nonrepresentative group (e.g., children of affluent, highly educated parents). Nonetheless, there are some differences in how the information provided in this section will be weighed on the basis of the nature of the instrument. When a normreferenced measure is being evaluated, you look for a clear description of the normative sample that was used: how many children were studied, whether and why any children were excluded, and how representative the sample is compared with the population your client (or subgroup of clients) fits into. Ideally, at least 50 children who are within a relatively small range in age from that of your client (usually no more than 6 months older or younger) will have been tested. Also, you want these children to be similar in race, language background, and socioeconomic status to the child or children you have in mind. When there are significant differences between the normative sample and your client(s), you need to draw on your knowledge of the appropriate research base as well as your own knowledge of cultural differences to determine to what extent the validity of this measure is likely to be undermined. If a measure’s validity is seriously undermined and alternative measures are unavailable, a variety of approaches, including dynamic assessment and the development of an informal measure, represent possible strategies (see chap. 10 for further discussion of this issue.) For a normreferenced instrument, you also want to examine the types of scores the test uses to describe the test taker’s performance. In terms of desirability, standard scores rank first, percentile scores are next, and developmental scores (such as ageequivalent or gradeequivalent scores) earn a sorry last place. In this section of the review form, you may also want to record the availability of tables that record the standard error of measurement (which will be discussed at greater length below under reliability). Recording that information here is a good idea because it indicates the amount of error associated with a test taker’s standard score. When a criterionreferenced measure is evaluated, the composition of groups used to determine cutoff scores will be the focus of your scrutiny at this point in the review form. I am not aware of recommendations concerning sample size and composition that are as specific as those given above for normreferenced measures. However, you
Page 102 want to be sure that the group for whom the cutoff scores are provided are similar to your client or clients and that the group is large enough so that the cutoff is likely to be stable (McCauley, 1996). 6. Reliability In this section, you will summarize relevant information about the test’s reliability, which is almost always contained in a separate, clearly marked section of the test manual. The operative word here is relevant. The manual may report 6, 10, even 20 studies in which the reliability of the measure was examined. Nonetheless, the relevant ones are those (a) using participants who are as similar as possible to your client(s) and (b) focusing on the type of reliability that is either most at risk because of the nature of the instrument or most important to your clinical question. Recall that chapter 3 discusses the different kinds of reliability data that are typically of interest. Once you have decided what forms of reliability are of greatest importance, how do you know whether the evidence is adequate? For normreferenced tests, the evidence will almost always take the form of reliability coefficients. Traditionally, it has been suggested that one demand correlation coefficients that are statistically significant and at least .80 in magnitude for screening purposes and at least .90 when making more important decisions about individuals (Salvia & Ysseldyke, 1998). However, a more circumspect recommendation might be that you want the best reliability available on the market. By this I mean that when the ideal of .90 is not available, and a decision must be made, you will want the best that you can find as well as multiple, independent sources of information. For criterionreferenced measures, evidence for reliability can take a great many forms—from correlation coefficients to agreement indices (Feldt & Brennan, 1989). Such evidence for criterionreferenced measures usually addresses the question of how consistently the cutoff can be used to reach a particular decision. As you would do for normreferenced measures, focus on the results of those studies that involve research questions most like your clinical question and participants most like your client(s). Information about the relationship between types of reliability and clinical questions is discussed in chapters 9 to 11. 7. Validity Although the entire review form is aimed at your cracking the case of a measure’s validity for a particular use, in this section of the review form, you will summarize the most important of the information provided by the test developer for the purpose of evaluating validity. Although most of the information of interest will probably be found in clearly labeled sections of the manual, information relevant to considerations of content and construct validity is also frequently found in sections dealing with the measure’s initial development and subsequent revisions (if any). Recall that some of the specific methods used to provide evidence of validity (e.g., developmental studies, contrasting group studies) are discussed at some length in the previous chapter. The statistical methods that are used to document validity vary from correlation coefficients to analyses of variance to factor analysis. Consequently, a discussion of what constitutes acceptable data must remain fairly general here. Overall, one looks
Page 103 to see that the measure is shown to function as it is predicted to function if valid. As with reliability evidence, the nature of the participants in the study will affect the extent to which it is relevant for your client and purposes. As you complete this section of the review form, every skeptical bone in your body should be recruited for service. Claiming validity doesn’t make a measure valid, although at times test developers seem to forget this. 8. Overall Impressions of Validity for Testing Purpose and Population At this point in the review guide, you put the clues together to sum up the case. Your study of the pros and cons should be summarized, with holes in the evidence noted and discussed in terms of their implications for interpreting results. This is where you determine whether you believe the instrument can be safely used and, if used, what cautions should be kept in mind when it is administered and interpreted. Clearly, this is the most demanding point in the review process—akin to a final exam or the concluding paragraph of a large paper. Although practice is perhaps the best way of honing the requisite analytic skills, examination of other reviews of the instrument (when they’re available) can help you make sure you have not overlooked any major clues and can also help you see how others have approached the task. Even examining reviews on other measures can prove helpful for getting a sense of how seasoned detectives sum up their cases. (e.g., See reviews in Conoley & Impara, 1995, of the Receptive–Expressive Emergent Language Test–2 [Bzoch & League, 1994], written by Bachman [1995] and Bliss [1995] and of the Test of Early Reading Ability–Deaf or Hard of Hearing [Reid, Hresko, Hammill, & Wiltshire, 1991], written by Rothlisberg [1995] and Toubanos [1995]). Because examples can prove so helpful in developing one’s understanding of a new process, I included Fig. 4.3, which illustrates how I would complete the reviewing guide for the Expressive Vocabulary Test (Williams, 1997) as I consider its validity for use with a hypothetical child, Melissa. Melissa is a 9year, 2monthold girl who has previously been receiving treatment for a specific language impairment. She is being tested as part of a periodic reevaluation, which will be used by an educational team to determine whether she will continue to receive services in her school. Melissa’s unilateral hearing loss and problems with attention will require special attention during the review of the Expressive Vocabulary Test (Williams, 1997) for possible use. How to Access Other Sources of Information
In addition to test manuals, independent test reviews are available to help in the test review process in three different forms: reviews appearing in standard reference volumes on behavioral measures, journal articles reviewing one or more tests in a particular area, and computer databases of test reviews. Standard references and journal articles that include reviews of tests used frequently in the assessment of children with developmental language disorders or that provide specific information relevant to an understanding of individual tests are listed in Table 4.3.
Page 104 Table 4.3 Books and Journal Articles Providing Information About Specific Tests Used With Children Books American SpeechLanguageHearing Association. (1995). Directory of speechlanguage pathology assessment instruments. Rockville, MD: Author. Compton, C. (1996). A guide to 100 tests for special education. Upper Saddle River, NJ: Globe Fearon Educational. Impara, J. C., & Plake, B. S. (Eds.). (1998). Thirteenth mental measurements yearbook. Lincoln, NE: Buros Institute of Mental Measurements. Keyser, D. J., & Sweetland, R. C. (1994). (Eds.). Test critiques (Vol. X). Austin, TX: ProEd. Murphy, L. L., Conoley, J. C., & Impara, J. C. (Eds.). (1994). Tests in print IV: An index to tests, test reviews, and the literature on specific tests. Lincoln, NE: Buros Institute of Mental Measurements. Journal Articles Huang, R., Hopkins, J., & Nippold, M. A. (1997). Satisfaction with standardized language testing: A survey of speechlanguage pathologists. Language, Speech, and Hearing Services in Schools, 28, 12–23. McCauley, R. J., & Swisher, L. (1984). Psychometric review of language and articulation tests for preschool children. Journal of Speech and Hearing Disorders, 49, 34–2. Merrell, A. W., & Plante, E. (1997). Normreferenced test interpretation in the diagnostic process. Language, Speech, Hearing Services in Schools, 28, 50–58. Plante, E., & Vance, R. (1994). Selection of preschool language tests: A databased approach. American Journal of SpeechLanguage Pathology, 4, 70–76. Plante, E., & Vance, R. (1995). Diagnostic accuracy of two tests of preschool language. American Journal of SpeechLanguage Pathology, 4, 70–76. Stephens, M. I., & Montgomery, A. A. (1985). A critical review of recent relevant standardized tests. Topics in Language Disorders, 5(3), 21–45. Sturner, R. A., Layton, T. L., Evans, A. W., Heller, J. H., Funk, S. G., & Machon, M. W. (1994). Preschool speech and language screening: A review of currently available tests. American Journal of SpeechLanguage Pathology, 3, 25–36.
Each new volume in the Mental measurements yearbook series contains reviews of commercially available tests and tests that have just been published or were revised since their review in a preceding volume. Entries are alphabetically organized by the name of the test, with two reviews prepared independently by individuals with expertise in testing, in the content area tested, or both. A new volume of this series appears about every three years. In addition, reviews published since 1989 are available on the Internet to allow for online searches that can help consumers find reviews as well as specific kinds of measures because of searching capabilities. Several recent journal articles reviewing tests in a particular content area or for a particular group of children with language impairments are also listed in Table 4.3. Computer databases represent a more recent possible source of information on standardized measures. Reviews from the Mental measurements year bookseries are
Page 105 available online through colleges, universities, and public libraries. Reviews included in this online database are identical in content to those included in the bound volumes of the Mental measurements yearbook. Further, these reviews are more timely than those appearing in the printed volumes because reviews that will eventually be incorporated in a later bound volume are added every month. The Health and Psychosocial Instruments (HaPI) database is also available at many libraries and can be searched online. It allows one to search for information about a specific test, to find the publishing information about a test through its name, acronym, or authorship, and to search for instruments focusing by content or age group. The HaPI publishes abstracts and does not contain complete reviews of instruments. However, it does indicate whether information is reported for seven critical characteristics: internal consistency reliability, test–retest reliability, parallel forms reliability, interrater reliability, content validity, construct validity, and criterionrelated validity. Summary 1. Effective evaluation of measures of children’s communication and related skills must be conducted with appreciation for the contextual variables affecting both children and clinicians. 2. The bioecological theory of Bronfenbrenner and his colleagues emphasizes the interplay of the child’s characteristics with those of his or her environment, beginning with the family and extending to the broader physical, social, and historical environment as well. The relevance of this theory to the evaluation of measures and measurement strategies for children lies in the connection between validity and attention to these contextual variables. 3. Among the contextual variables affecting clinicians as they interact with children and evaluate their language are not only personal variables (e.g., their own language and culture), but also legal variables and other variables affecting their professional practice. 4. Evaluation of individual measures requires the potential test user to gather clues suggesting the strengths and weaknesses of the measure for answering a particular clinical question for a particular client. Clientoriented reviews are conducted to refine information obtained from a populationoriented review or in response to the exceptional needs of a particular client. 5. Test manuals or other materials provided by the developer of a measure serve as the primary source of information to be considered in evaluating its usefulness for a given client. 6. The test reviewer needs to approach the review process armed with a skeptical attitude toward unproven claims and an arsenal of information regarding acceptable psychometric standards. 7. The Standards for educational and psychological testing (AERA, APA, & NCME, 1985) is the most widely accepted source for such information on standards.
Page 106 8. Additional information for use in the reviewing process is available in the form of reviews published in standard reference books, relevant journal articles, and computer databases. 9. In spite of existing ideals for evidence of reliability and validity, the clinician may nonetheless decide to use a particular measure even when it does not reach an ideal, when it is the best available for a particular client, and a clinical decision must be made. Key Concepts and Terms clientoriented measure review: evaluation of a measure’s appropriateness for use in answering a specific clinical question for a single client. Individuals with Disabilities Education Act (IDEA): federal legislation addressing the education needs of individuals with disabilities, including children with communication disorders. International Classification of Impairments, Disabilities, and Handicaps (ICIDH): a classification designed by the WHO for global use by health professionals, educators, legislators, and other groups concerned with healthrelated issues to serve as a common language. Mental measurement yearbooks: a wellregarded source of test reviews. nondiscriminatory assessment: the use of measures and procedures for administering and interpreting data that will not confound a child’s language or dialect background with the target of testing. populationoriented measure review: a preliminary evaluation of a measure’s likely appropriateness for use in answering one or more clinical questions for a population of clients who share important similar characteristics. Populationoriented reviews of measures are often conducted for subgroups of clients who are frequently seen by a given clinician. Study Questions and Questions to Expand Your Thinking 1. Consider your own social ecology. Think about a specific kind of decision you have made or will make (e.g., concerning school or employment). What institutions and people affect your decision? 2. Talk to the parent of a young child about the contexts in which that child functions—daycare, time spent with extended family, and so forth. Determine how many hours the child spends in each setting and who the main interaction partners for the child are. How might these settings influence the communication experiences of this child? 3. List five domains of language. 4. Does time taken to conduct a test have any obvious potential relationship to the validity of testing? If so, when or for what groups of children?
Page 107 5. Discuss the importance of conducting a clientoriented review rather than simply a populationoriented review of a measure you will use with a specific client. 6. Go to the library and examine several volumes of the Mental measurements yearbook series. Describe the process by which tests are selected to be reviewed, and examine two reviews for a single speechlanguage measure. 7. Choose a test that you have heard referred to in a course you have taken. See if you can find a review for it in the Mental measurements yearbook series or elsewhere. Also, consider the extent to which the interaction implicit in the testing procedures matches the kinds of experiences a child might have on an everyday basis. 8. Complete a review form for a normreferenced speechlanguage test. 9. Complete a review form for a criterionreferenced speechlanguage measure. Recommended Readings Hutchinson, T. A. (1996). What to look for in the technical manual: Twenty questions for users. Language, Speech, Hearing Services in Schools, 27, 109–121. Sabers, D. L. (1996). By their tests we will know them. Language, Speech, Hearing Services in Schools, 27, 102–108. Salvia, J., & Ysseldyke, J. (1998). Appendix 5. In J. Salvia & J. Ysseldyke (Eds.), Assessment (5th ed., pp. 763–766). Boston: Houghton Mifflin. References American Psychological Association, American Educational Research Association, National Council on Measurement in Education. (1985). Standards for educational and psychological testing. Washington, DC: American Psychological Association. American SpeechLanguageHearing Association. (1995). Directory of speechlanguage pathology assessment instruments. Rockville, MD: Author. AmericanSpeechLanguageHearing Association. (1999). Guidelines for roles and responsibilities of the schoolbased speechlanguage pathologist [Online]. Available: http:/www.asha.org/professionals/library/slpschool_i.htm#purpose. Anastasi, A. (1988). Psychological testing (6th ed.). New York: Macmillan. Anastasi, A. (1997). Psychological testing (7th ed.). Upper Saddle River, NJ: Prentice Hall. Bachman, L. F. (1995). Review of the Receptive–Expressive Emergent Language Test (2nd ed.). In J. C. Conoley & J. C. Impara (Eds.), The twelfth mental measurements yearbook. (pp. 843–845). Lincoln, NE: Buros Institute of Mental Measurements. Bliss, L. S. (1995). Review of the Receptive–Expressive Emergent Language Test (2nd ed.). In J. C. Conoley & J. C. Impara (Eds.), The twelfth mental measurements yearbook (p. 846). Lincoln, NE: Buros Institute of Mental Measurements. Bronfenbrenner, U. (1974). Developmental research, public policy, and the ecology of childhood. Child Development, 45, 1–5. Bronfenbrenner, U. (1986). Recent advances in research on the ecology of human development. In R. K. Silbereisen, E. Eyferth, & G. Rudinger (Eds.), Development as action in context: Problem behavior and normal youth development (pp. 286–309). New York: SpringerVerlag. Bronfenbrenner, U., & Morris, P. (1998). The ecology of developmental processes. In W. Damon & R. M. Lerner (Eds.), Handbook of child psychology: Theoretical models of human development (5th ed., Vol. 1, pp. 993–1028). New York: Wiley. Bzoch, K. R., & League, R. (1994). ReceptiveExpressive Emergent Language Test2. Austin, TX: ProEd.
Page 108 Compton, C. (1996). A guide to 100 tests for special education. Upper Saddle River, NJ: Globe Fearon Educational. Conoley, J. C., & Impara, J. C. (Eds.). (1995). The twelfth mental measurements yearbook. Lincoln, NE: Buros Institute of Mental Measurements. Crais, E. R. (1992). ‘‘Best practices” with preschoolers: Assessing within the context of a familycentered approach. In W. Secord (Ed.), Best practices in school speechlanguage pathology: Descriptive/nonstandardized language assessment (pp. 33–42). San Antonio, TX: Psychological Corporation. Crais, E. R. (1995). Expanding the repertoire of tools and techniques for assessing the communication skills of infants and toddlers. American Journal of Speech Language Pathology, 4, 47–59. Damico, J. S., Smith, M., & Augustine, L. E. (1996). In M. D. Smith & J. S. Damico (Eds.), Childhood language disorders (pp. 272–299). New York: Thieme. Demers, S. T., & Fiorello, C. (1999). Legal and ethical issues in preschool assessment and screening. In E. V. Nuttall, I. Romero, & J. Kalesnik (Eds.), Assessing and screening preschoolers: Psychological and educational dimensions (2nd ed., pp. 50–58). Needham Heights, MA: Allyn & Bacon. DonahueKilburg, G. (1992). Familycentered early intervention for communication disorders: Prevention and treatment. Gaithersburg, MD: Aspen. Education for All Handicapped Children Act of 1975. Pub. L. No. 94142, 89 Stat. 773 (1975). Education of the Handicapped Act Amendments of 1986. Pub. L. No. 99457, 100 Stat. 1145 (1986). Feldt, L. S., & Brennan, R. L. (1989). Reliability. In R. L. Linn (Ed.), Educational measurement (3rd ed., pp. 105–146). New York: American Council on Education and Macmillan. Frattali, C. (1998). Outcome measurement: Definitions, dimensions, and perspectives. In C. Frattali (Ed.), Measuring outcomes in speechlanguage pathology (pp. 1–27). New York: Thieme. Gardner, M. F. (1990). Expressive OneWord Picture Vocabulary Test–Revised. Novato, CA: Academic Therapy. Hammer, A. L. (1992). Test evaluation and quality. In M. Zeidner & R. Most (Eds.), Psychological testing: An inside view. Palo Alto, CA: Consulting Psychologists Press. Hammill, D. D., & Newcomer, P. L. (1988). Test of Language Development–2 Intermediate. Austin, TX: ProEd. Huang, R., Hopkins, J., & Nippold, M. A. (1997). Satisfaction with standardized language testing: A survey of speechlanguage pathologists. Language, Speech, and Hearing Services in Schools, 28, 12–23. Hutchinson, T. A. (1996). What to look for in the technical manual: Twenty questions for users. Language, Speech, Hearing Services in Schools, 27, 109–121. Impara, J. C., & Plake, B. S. (Eds.). (1998). The thirteenth mental measurements yearbook (pp. 1050–1052). Lincoln, NE: Buros Institute of Mental Measurements. Individuals with Disabilities Education Act (IDEA). Pub. L. No. 101476, 104 Stat. 1103 (1990). Individuals with Disabilities Education Act Amendments of 1997. Pub. L. No. 10517, 111 Stat. 37 (1997). Keyser, D. J., & Sweetland, R. C. (1994). (Eds.). Test critiques (Vol. X). Austin, TX: ProEd. McCauley, R. J. (1996). Familiar strangers: Criterionreferenced measures in communication disorders. Language, Speech, and Hearing Services in Schools, 27, 122–131. McCauley, R. J., & Swisher, L. (1984). Psychometric review of language and articulation tests for preschool children. Journal of Speech and Hearing Disorders, 49, 34–42. Merrell, A. W., & Plante, E. (1997). Normreferenced test interpretation in the diagnostic process. Language, Speech, Hearing Services in Schools, 28, 50–58. Murphy, L. L., Conoley, J. C., & Impara, J. C. (Eds.). (1994). Tests in print IV: An index to tests, test reviews, and the literature on specific tests. Lincoln, NE: Buros Institute of Mental Measurements. Muma, J. (1998). Effective speechlanguage pathology: A cognitive socialization approach. Mahwah, NJ: Lawrence Erlbaum Associates. Plante, E., & Vance, R. (1994). Selection of preschool language tests: A databased approach. Language, Speech, and Hearing Services in Schools, 25, 15–24. Plante, E., & Vance, R. (1995). Diagnostic accuracy of two tests of preschool language. American Journal of SpeechLanguage Pathology, 4, 70–76.
Page 109 Radziewicz, C. (1995). In E. TiegermanFarber, Language and communication intervention in preschool children (pp. 95–128). Boston: Allyn & Bacon. Reid, D. K., Hresko, W. P., Hammill, D. D., & Wiltshire, S. (1991). Test of Early Reading Ability–Deaf or hard of hearing. Austin, TX: ProEd. Rothlisberg, B. A. (1995). Review of the Test of Early Reading Ability–Deaf or Hard of Hearing. In J. C. Conoley & J. C. Impara (Eds.), The twelfth mental measurements yearbook (pp. 1049–1051). Lincoln, NE: Buros Institute of Mental Measurements. Sabers, D. L. (1996). By their tests we will know them. Language, Speech, and Hearing Services in Schools, 27, 102–108. Salvia, J., & Ysseldyke, J. E. (1995). Assessment (7th ed.). Boston: Houghton Mifflin. Stephens, M. I., & Montgomery, A. A. (1985). A critical review of recent relevant standardized tests. Topics in Language Disorders, 5 (3), 21–45. Sturner, R. A., Layton, T. L., Evans, A. W., Heller, J. H., Funk, S. G., & Machon, M. W. (1994). Preschool speech and language screening: A review of currently available tests. American Journal of SpeechLanguage Pathology, 3, 25–36. Toubanos, E. S. (1995). Review of the Test of Early Reading Ability–Deaf or Hard of Hearing. In J. C. Conoley & J. C. Impara (Eds.), The twelfth mental measurements yearbook (pp. 1051–1053). Lincoln, NE: Buros Institute of Mental Measurements. van Kleeck, A. (1994). Potential cultural bias in training parents as conversational partners with their children who have delays in language development. American Journal of SpeechLanguage Pathology, 3, 67–78. Vetter, D. K. (1988a). Designing informal assessment procedures. In D. E. Yoder & R. D. Kent (Eds.), Decision making in speechlanguage pathology (pp. 192– 193). Philadelphia: B. C. Decker. Vetter, D. K. (1988b). Evaluation of tests and assessment procedures. In D. E. Yoder & R. D. Kent (Eds.), Decision making in speechlanguage pathology (pp. 190–191). Philadelphia: B. C. Decker. Williams, K. T. (1997). Expressive Vocabulary Test. Circle Pines, MN: American Guidance Service. World Health Organization. (1980). ICIDH: The international classification of impairments, disabilities, and handicaps. Geneva: Author. World Health Organization. (1998). Towards a common language for functioning and disablement: ICIDH2: The International Classification of Impairments, Activities and Participation. Geneva: Author.
Page 110
Page 111
PART
II AN OVERVIEW OF CHILDHOOD LANGUAGE DISORDERS Part II introduces the four most frequently occurring categories of childhood language disorders: specific language impairment (chap. 5) and language problems associated with mental retardation (chap. 6), autism spectrum disorders (chap. 7), and hearing impairment (chap. 8). Each chapter is designed to provide an overview of the nature and special testing problems associated with one category. Within each chapter, disorder categories are defined, where possible, according to criteria outlines the Diagnostic and statistical manual of mental disorders (4th ed.; DSM–IV) of the American Psychiatric Association (1994) and in some chapters according to other influential definitions. Each disorder category is then further introduced in terms of its suspected causes, the special challenges to language assessment afforded by children with the specific problem, their expected patterns of language performance, and accompanying problems that may further complicate these children’s lives and communication functioning. Each chapter also contains a short passage written from the perspective of someone diagnosed with the condition addressed in the chapter.
Page 112
Page 113 CHAPTER
5 Children with Specific Language Impairment Defining the Problem Suspected Causes Special Challenges in Assessment Expected Patterns of Language Performance Related Problems Defining the Problem Sandy is a compact 6yearold who was late in talking and considered unintelligible by all but a few family members until about age 5. She is still often mistaken for a younger child because of her size, limited vocabulary, and frequent errors in grammar. Having recently transferred to a new school, Sandy is having trouble adjusting and has become very quiet except for occasional interactions with friends from her previous school. Joshua, a 9yearold with a history of delayed speech and language, continues to use short, simple sentences that are often ineffective in getting his message across. Despite significant gains in his oral communication, he has made little progress in early reading skills. Thus, despite two years of instruction and special support in both oral and written language, he names letters of the alphabet inconsistently and has a
Page 114 sight vocabulary limited to about 30 words. Joshua also appears to have difficulty understanding many of the instructions given in the classroom. Wilson is a 4yearold whirlwind who augments his limited speech productions with animated gestures and, sometimes, truly gifted doodles. Because of his activity level and awkward, sometimes overwhelming style of interacting, he is avoided by his peers and has formed fierce attachments to the preschool teacher and his speechlanguage pathologist. Wilson’s parents and educators are beginning to question whether his activity level falls within the normal range and will be discussing the possibility of having him evaluated for attention deficit disorder with hyperactivity at their next meeting. Wilson’s ability to understand the communications of others has never been questioned. Although Sandy, Joshua, and Wilson are varied in their patterns of communication difficulties, each can be described as demonstrating specific language impairment (SLI), a disorder estimated to affect between 1.5 and 7% of children (Leonard, 1998). A recently proposed figure of 7% for 5yearolds may be the best current estimate of prevalence: The research on which it was based was rigorous and included the use of a carefully selected sample of 7,218 children (Tomblin et al., 1997). Although estimates differ considerably from study to study, it has generally been found that boys are affected more often than girls, with some studies suggesting that boys are at twice the risk of girls (Tomblin, 1996b). SLI can be defined as “delayed acquisition of language skills, occurring in conjunction with normal functioning in intellectual, socialemotional, and auditory domains” (Watkins, 1994, p. 1). Thus, SLI is frequently described as a disorder of exclusion. As such, it can seem like a definition of leftovers, encompassing those instances where language impairment exists but cannot readily be attributed to factors that clearly limit a child’s access to information about language or to the abilities required to undertake the creative task of language acquisition. On the other hand, specific language impairment can be regarded as a “pure” form of developmental language disorder, one in which language alone is affected (Bishop, 1992b). Hopes of defining the nature of specific language impairment have instigated a wealth of research in child language disorders over the past 50 years. Initially termed “congenital aphasia’’ or “developmental dysphasia,” SLI seemed to offer the opportunity to look at a pure, or “specific,” variety of communication disorder (Rapin, 1996; Rapin & Allen, 1983). Historically, each of the categories of developmental language disorders examined in other chapters in this section offered ostensibly obvious explanations for their existence. In contrast, children with SLI offered no apparent explanations yet promised an opportunity to look at the unique effects of impaired language on development. Or so it first appeared. In the Related Problems section of this chapter, you will read about the subtle differences in cognition and other attributes that have been identified in children with SLI and that thus threaten narrow conceptions of specific impairment. The DSM–IV (American Psychiatric Association, 1994) does not use the term specific language impairment, but includes two disorders that together cover much of the same terrain: Expressive Language Disorder and Mixed Expressive–Receptive Language Disorder. Table 5.1 lists the diagnostic criteria for these two communication
Page 115 Table 5.1 Summary of Criteria for Two Disorders Corresponding to Specific Language Impairment From the Diagnostic and Statistical Manual (4th ed.) of the American Psychiatric Association (1994) Expressive Language Disorder (American Psychiatric Association, 1994, p. 58) The scores obtained from standardized, individually administered measures of expressive language development are substantially below those obtained from standardized measures of both nonverbal intellectual capacity and receptive language development. The disturbance may be manifest clinically by symptoms that A. include having a markedly limited vocabulary, making errors in tense, or having difficulty recalling words or producing sentences with developmentally appropriate length or complexity. B. The difficulties with expressive language interfere with academic or occupational achievement or with social communication. C. Criteria are not met for Mixed ReceptiveExpressive Language Disorder or Pervasive Developmental Disorder. D.
If Mental Retardation, a speechmotor or sensory deficit, or environmental deprivation is present, the language difficulties are in excess of those usually associated with these problems.
Mixed ReceptiveExpressive Language Disorder (American Psychiatric Association, 1994, pp. 60–61). The scores obtained from a battery of standardized individually administered measures of both receptive and expressive language development are substantially A. below those obtained from standardized measures of nonverbal intellectual capacity. Symptoms include those for Expressive Language Disorder as well as difficulty understanding words, sentences, or specific types of words, such as spatial terms. B. The difficulties with receptive and expressive language significantly interfere with academic or occupational achievement or with social communication. C. Criteria are not met for Pervasive Developmental Disorder. D.
If Mental Retardation, a speechmotor or sensory deficit, or environmental deprivation is present, the language difficulties are in excess of those usually associated with these problems.
disorders. The division of SLI into these two categories reflects a recurring impulse among researchers and clinicians to identify subgroups within the larger population—in this case and most often according to whether receptive language is significantly affected. The DSM–IV criteria include a variation on the exclusionary elements of the SLI definition described up to this point. Specifically, in Criterion D for both disorders, the clinician is directed to look for language impairments whose severity is unexplained by the obvious threats to language development included in other exclusionary definitions (e.g., the presence of hearing impairment or mental retardation). The DSM–IV definitions allow both for the identification of a language impairment when no obvious threats exists as well as for cases where the presence of these threats does not seem sufficient to account for the degree of problem presented. Most researchers over the past three decades have used definitions largely like those discussed and have particularly relied on the operationalization of SLI proposed by Stark and Tallal (1981, 1988). The details of such definitions, however, have proven quite controversial (Camarata & Swisher, 1990; Johnston, 1992, 1993; Kamhi,
Page 116 1993; Plante, 1998). Moving from the laboratory to clinical practice in schools, the controversy is intensified because state policies are vigorous participants in the decisionmaking process. In particular, the use of difference or discrepancy scores is often mandated but has faced increasing criticism (e.g., Aram, Morris, & Hall, 1993; Fey, Long, & Cleave, 1994; Kamhi, 1998). Although methods used in identification of SLI are discussed at some length in the Special Challenges in Assessment section of this chapter, they are mentioned here because they affect understanding of the nature of the problem and therefore affect research intended to obtain information about suspected causes, patterns of language performance, and related problems. It seems important to recognize that SLI is a term that is often absent from the daytoday functioning of speechlanguage pathologists in many clinical and education settings. Instead, they frequently use the terms language delays or language impairments, thereby remaining silent on the specificity of a given child’s problems (Kamhi, 1998). Nonetheless, the foundation of research on this population and clinical writings provides an important context for scientifically oriented clinical practice. In the same way that field geologists need to know about basic chemistry despite few encounters in the wild with pure iron or other elements, speechlanguage pathologists can learn from attempts to identify and understand SLIs and to recognize them when they encounter them in their practice. The very length of this chapter compared with the others addressing information about subgroups of children with language disorders testifies to the fertility of the resulting explorations. Suspected Causes The question of what causes isolated language impairment has been approached from several perspectives—from genetic to linguistic, physiological to social. It remains a question—or, more accurately, a series of related questions—that tantalizes researchers, clinicians, and parents alike. It is best viewed as a set of related questions because one can conceive of causes on several different levels (e.g., physical as well as social). In addition, it can be viewed that way because effects are frequently the result of a convergence of causes rather than a single cause. Thus two or more factors may need to come into play before impaired language occurs. Understanding causation is further clouded by the fact that researchers are frequently only in the position of identifying risk factors; that is, factors that tend to cooccur with the presence of SLI, but that can only be thought of as potential causes until the nature of the association can be worked out through further research. In this section, a review of suspected causes encompasses not only differences in brain structure and function, genetics, and selected environmental factors, but also more abstract linguistic and cognitive discussions of the origins of specific language disorder in children. Although there is considerable turmoil in the community of child language researchers concerning the more abstract accounts provided in linguistic and cognitive explanations, their role in assessment and planning for treatment has the potential for being more immediate and influential than that of accounts related to genetics and physiology.
Page 117 Genetics
Genetic origins for SLI have probably been suspected for some years by anyone who has encountered families in which language problems seem more commonplace than one might expect given the relative exceptionality of language impairment. Nonetheless, serious study of genetic contributions to SLI have been undertaken only in the last couple of decades (Leonard, 1998; Pembrey, 1992; Rice, 1996). Largely, the increase in such studies has occurred because of advances in the study of behavioral genetics (Rice, 1996). In addition, however, the delayed interest in the genetics of language impairment has resulted from the need for agreement on a phenotype, that is, the behavior or set of behaviors that constitute critical characteristics of the disorder (Gilger, 1995; Rice, 1996). Several different types of genetic studies are regularly used to link specific diseases or behavioral differences with genetic underpinnings (Brzustowiz, 1996). Among those that have been used to the greatest extent so far in studying SLI are family studies, twin studies, and pedigree studies. In family studies, the family members of a proband (i.e., an affected person who is the focus of study) are examined to determine whether they show evidence of the characteristic or disorder under study at rates that are higher than would be expected in the general population. If they do, the characteristic or disorder is considered familial—a state of affairs that could be due to genetic origins or to common exposure to other influences. Thus, for example, a fondness for chocolate might be found to be familial, but, without further study, could just as easily be due to long exposure to a kitchen full of chocolate delicacies as to a genetic basis. In twin studies, comparisons of the frequency of a characteristic or disorder are made between identical and fraternal twins. Because identical twins share the same genetic makeup, they should show higher concordance for the characteristic if it has a genetic basis; that is, there should be a strong tendency for both identical twins to either have or not have the characteristic. In contrast, if their rates of concordance are relatively high, but similar to those of the fraternal twins (who are no more genetically related than any pair of siblings and thus on average share 50% of their genetic makeup), the characteristic might still be considered familial. However, in that case, it would be more likely the result of environmental rather than genetic influences. (See Tomblin, 1996b, for a discussion of some of the complexities of this type of design.) In pedigree studies, as many members as, possible of a single proband’s large, multigenerational family are examined in order to get insight into patterns of inheritance associated with the targeted characteristic or disorder. Closely related to pedigree studies are segregation studies in which multiple families with affected members are examined to compare observed patterns of inheritance with patterns that have been observed for other genetically transmitted diseases. Despite the difficulties associated with defining a disorder as complex as SLI (Brzustowicz, 1996), considerable progress has been made over the past 15 years in understanding genetic contributions to the disorder. Familial studies (e.g., Neils & Aram, 1986; Tallal, Ross, & Curtiss, 1989; Tomblin, 1989) have consistently demonstrated higher risk among families selected because of an individual member with SLI than families selected because of an unaffected member who is serving as a control
Page 118 participant. Complicating these findings, however, have been observations that many children with SLI come from families where they are the only affected member (Tomblin & Buckwalter, 1994). Further, family histories of SLI may be more common among children with expressive problems only than among those with both receptive and expressive problems (Lahey & Edwards, 1995). Whereas some familial studies (e.g., Neils & Aram, 1986; Tallal et al., 1989; Tomblin, 1989) have used questionnaires to examine the language skills of other often older family members, others have used direct assessment of language skills (e.g., Plante, Shenkman, & Clark, 1996; Tomblin & Buckwalter, 1994). The latter studies are considered more desirable (Leonard, 1998) because they rely neither on participant’s memories of childhood difficulties nor on potentially incomplete and inaccurate school or clinical records. Further, they seem to be more sensitive to manifestations of SLI in adults, thereby capturing a greater number of affected individuals for examination of inheritance patterns (Plante et al., 1996). Most importantly, however, both types of studies can demonstrate familial patterns of SLI, which are the first step toward proving its genetic underpinnings for in least some affected individuals. Twin studies (e.g., Bishop, 1992a; Tomblin, 1996b) have demonstrated higher concordance for SLI among identical than fraternal twins, thus providing evidence of some degree of genetic influence. However, even among identical twins, concordance is not perfect, despite their identical genetic makeup. Consequently it has been suggested that either the affected gene associated with SLI does not always produce the same outcome (due to incomplete penetrance) or it does not operate alone to produce SLI (Tomblin & Buckwalter, 1994; Leonard, 1998). In the former case, incomplete penetrance refers to cases in which a gene associated with a disorder fails to act in an allornothing fashion, with some people who carry a gene showing no ill effects (Gilger, 1995). The latter prospect means that SLI may be caused by more than one gene or that a gene or group of genes must operate in combination with environmental factors. Current research on the genetics of SLI is weighing these alternative scenarios. Among the kinds of studies needed are pedigree and segregation studies in which groups of families or a single family is studied across generations. One family, referred to as the KE family, has been under study for some time (e.g., Crago & Gopnik, 1994; Gopnik & Crago, 1991; VarghaKadeem, Watkins, Alcock, Fletcher, & Passingham, 1995). This family continues to be examined to determine whether a hypothesized autosomal dominant transmission mode is at work. Briefly, autosomal dominant transmission means that the disorder is transmitted through a pair of autosomal chromosomes (i.e., one of the 22 chromosome pairs that are not sexlinked) and will occur even if only one of the two chromosomes in a pair is affected. The KE family has many affected members, as would be expected given an autosomal dominant mode of transmission, as opposed to modes involving the sex chromosomes (a single pair) or a recessive mode of transmission in which both members of a pair would be affected to result in the disorder. In fact, most members of the KE family demonstrate both severely impaired speech and language, and several show cognitive impairment or psychiatric disorders as well. Thus, additional work is needed to examine other families who might be more representative of greater numbers of children with SLI.
Page 119 Continuing pursuit of information about genetic bases is thought to be useful because it may be possible to determine what aspects of language impairment are more biologically determined and, therefore, perhaps less amenable to treatment. Once those determinations are made, clinicians could focus on the fostering of compensatory strategies or on the amelioration of remaining aspects of the language impairment that may be more modifiable through treatment (Rice, 1996). Differences in Brain Structure and Function
The prospect of differences in brain structure and function between children with SLI and those without has beckoned as a potential explanation since researchers first began ruminating about this disorder. This is illustrated by the use of the term childhood aphasia in the 1930s and several decades thereafter. Among the possibilities that have been examined are those of early damage to both cerebral hemispheres, of damage to the left hemisphere only (Aram & Eisele, 1994; Bishop, 1992a), as well as the possibility that differences are not the result of “damage” per se, but rather are the expression of natural genetic variation (Leonard, 1998). Currently, cases of frank neurologic damage—for example those following a stroke or head injury—are excluded from definitions of SLI. Somewhat more difficult to classify are the problems of children with Landau–Kleffner syndrome, also called acquired epileptic aphasia. These children fail to show signs of focal damage except for electroencephalographic abnormalities, yet they experience a profound loss of language skills (Bishop, 1993). Although included in early formulations of childhood aphasia, this syndrome has recently been found to fit within cases that are typically excluded from SLI. Despite the exclusion of known brain damage from strict definitions of SLI, a relatively large number of studies using techniques such as magnetic resonance imaging (MRI) and, less frequently, autopsy examination have been undertaken to determine whether subtle differences in brain structure and function can account for the difficulties facing children with SLI. Often these differences have been structural anomalies that seem to depart from those considered optimal for a lefthemisphere dominance for speech—leading to either righthemisphere dominance or a lack of dominance by either hemisphere (Gauger, Lombardino, & Leonard, 1997). Increasingly, it is thought that such differences may reflect variations in structure that make language development less efficient (e.g., Leonard, 1998). Two areas of the cerebral hemispheres in which such variations have been identified are the plana temporale and the perisylvian areas, illustrated in Fig. 5.1. These two areas overlap, with the smaller planum temporale lying within the larger perisylvian region of each hemisphere; both of the areas lie within an area that has consistently been shown to be associated with language function. Examinations of the plana temporale in individuals with SLI were sparked by a 1985 autopsy study (Galaburda, Sherman, Rosen, Aboitiz, & Geschwind) of adults who had had written language deficits. Detailed examination of these individuals’ brains after death showed an atypical symmetry between the planum temporale on the left and the one on the right. This pattern contrasted with the more typical asymmet
Page 120
Fig. 5.1. The left cerebral hemisphere with the planum temporale highlighted. From Neural bases of speech, hearing, and language (Figure 92), by D. P. Kuehn, M. L. Lemme, & J. M. Baumgartner, 1989, San Antonio, TX: ProEd. Copyright 1989 by ProEd. Adapted with permission. ric arrangement in which the planum temporale on the left is bigger than that on the right, with the larger size thought to reflect greater involvement in language processing. The atypical symmetry results from a typically sized left planum temporale and a largerthanusual right planum temporale. In the only autopsy study conducted to date for a single child with SLI, this same atypical symmetry was observed (Cohen, Campbell, & Yaghmai, 1989). Similar asymmetries, with left larger than righthemisphere perisylvian areas, have also been identified in autopsy studies performed on individuals who did not have SLI during their lifetimes (e.g., Geschwind & Levitsky, 1968; Teszner,Tzavares, Gruner, & Hecaen, 1972). The perisylvian areas, rather than the smaller plana temporale, became the focus of a series of studies conducted by Plante and her colleagues (Plante, 1991; Plante, Swisher, & Vance, 1989; Plante, Swisher, Vance, & Rapcsak, 1991). In those studies, Plante and her colleagues compared the relative size of these areas between hemispheres and between family members who were affected or unaffected by SLI. The researchers focused on the perisylvian areas rather than the plana temporale because of limitations in the use of MRI (Plante, 1996)—a technique that was nonetheless highly desirable because it could be used even on very young, live participants. The researchers found that children with SLI and their families demonstrated perisylvian areas that were larger on the right than those typically seen in studies of individuals without SLI or a known family history of SLI (Plante, 1991; Plante et al., 1989, 1991). These larger right perisylvian areas sometimes associated with symmetry across hemispheres and sometimes with asymmetries favoring the right hemisphere. Nonetheless, because some individuals with atypical configurations did not show language impairment, and others with normal configurations did show such impairment, this structural difference cannot be seen as a single cause of language impairment. In a 1996 review of this literature, Plante noted that the absence of abnormal findings for some individuals may simply be due to the insensitivity of MRI techniques to subtle differences in brain structure. Nonetheless, her argument does not really explain the instances in which identified atypical structures are associated with normal language perform
Page 121 ance. Furthermore, Plante, as well as other researchers in the field (Leonard, 1998; Rice, 1996; Watkins, 1994), believe that a number of factors probably need to be in place for structural brain differences to culminate in language impairment. More recent studies have looked not only at the perisylvian areas but also at other brain structures for differences that may help researchers better understand SLI (e.g., Clark & Plante, 1998; Gauger et al., 1997; Jackson & Plante, 1996). Whereas many of these have been regions in or close to the perisylvian region (e.g., Clark & Plante, 1998; Jackson & Plante, 1996), others have looked at much larger areas of the cerebrum (Jernigan, Hesselink, Sowell & Tallal, 1991), at the extensive tract of nerve fibers connecting the two cerebral hemispheres (Cowell, Jernigan, Denenberg & Tallal, 1994, cited in Leonard, 1998), and at areas including the ventricles (Trauner, Wulfeck, Tallal, & Hesselink, 1995). All of these studies found at least some differences (Cowell et al., 1994). In a recent review of these studies and others using behavioral and neurophysiological data, Leonard (1998) summarized the evidence as indicating the high percentage of atypical neurobehavioral findings for children with specific language impairment implicates a “constitutional basis” that may contribute to the presence of language impairment. The origins of these suspected differences in brain structure lead to other kinds of questions about causes, such as environmental factors. Environmental Variables
Environmental variables can encompass physical, social, emotional, or other aspects of the developing child’s surroundings from conception onward. Two types of environmental variables, however, have received the greatest amount of attention for SLI—(a) variables constituting the social and linguistic environment in which children with SLI are acquiring their language (Leonard, 1998) and (b) demographic variables, such as parental education, birth order, and family socioeconomic status (SES), that affect that environment in less direct ways (Tomblin, 1996b). A particularly engaging and clear account of the literature examining conversational environment of children with SLI can be found in Leonard (1998, chap. 8). In the literature examining this type of environmental influence (e.g., Bondurant, Romeo & Kretschmer, 1983; Cunningham, Siegel, van der Spuy, Clark, & Bow, 1985), most studies have focused on the nature and linguistic content of conversations occurring between children with SLI and their parents. Usually, comparisons are made to conversations between parents and their normally developing children (agematched or languagematched, depending on the study). In addition, in order to clarify “chickenortheegg” speculation about the direction of causation (i.e., Are differences in conversation causing children’s problems or resulting from them?), studies have also examined conversations between children with SLI and unrelated adults (Newhoff, 1977) and even with other children (e.g., Hadley & Rice, 1991). Despite the impediments offered by abundant methodological variations and challenging patterns of empirical disagreements, Leonard (1998) ventured a few generalizations about this line of investigation. First, most of the evidence in which children with SLI are compared with control children who are similar in age suggests that their
Page 122 conversation partners (parents, other adults, and peers alike) make allowances for their diminished language skills and are thus reacting to, rather than causing, the children’s problems. For example, Cunningham et al. (1985) found that mothers of children with SLI interacted similarly to mothers of control children of the similar ages in conditions of free play, but asked fewer questions during a structured task. In addition, for those children with SLI whose comprehension and production were both affected, mothers reduced their length of utterance, something that was not done by mothers whose children were either normally developing or had SLI in which only expressive language was affected. Second, Leonard (1998) contended that in studies where children with SLI are compared with younger children who are similar in language characteristics, findings are less consistent in showing differences. Nonetheless, the most reliable difference in how each group is spoken to by their parents involves the frequency with which recasts are used. A recast is a restatement of a child’s production using grammatically correct structures, often incorporating morphosyntactic forms that had been omitted or produced in error by the child (Leonard, 1998). Recent research has suggested that this conversational strategy is used frequently by parents of normally developing children at earlier stages, but is then faded over time. It has also been shown to be a useful therapeutic strategy (Nelson, Camarata, Welsh, Butkovsky, & Camarata, 1996). Interestingly, Leonard noted that rather than increasing their use of this kind of statement with children with SLI as might be expected in compensation, parents of children with SLI use it less frequently than those of children without SLI. Despite the possible value of additional research in clarifying why this difference is seen, all in all, this line of research has not proven as productive to the understanding of the genesis of SLI as was once hoped (Leonard, 1998). Turning to possible clues in the form of demographic variables, Tomblin (1996b) searched for risk factors in demographic data obtained from the preliminary results (consisting of 32 children with SLI and 114 controls) of a larger epidemiological study (planned to include 200 children with SLI and 800 controls). Specifically, he looked for associations between demographic and biological data and the presence of SLI. Among the variables he examined relative to the home environment were parent education, family income, and birth order of the child in the family. Although there were trends in the direction of children with SLI being later born and having parents with fewer years of education than unaffected children, neither of these trends was significant. Tomblin speculated that the two trends may have been due to the extent that lower incomes are associated with larger families. Also available to Tomblin (1996b) were data concerning exposure to biological risk factors including maternal infection or illness, medication, use of alcohol, and use of tobacco during pregnancy, as well as the evidence of potential trauma at birth and the participants’ birthweights. In these preliminary data at least, Tomblin found no differences between the groups relative to maternal infection and illness during pregnancy, and actually found lower, but nonsignificant rates of exposure to alcohol and medication. Birth histories and birthweights also did not differ significantly. Only maternal smoking showed a trend towards higher levels among the children with SLI. Although attributing the lack of significant findings to the relatively small sample sizes used,
Page 123 Tomblin also suggested that the larger numbers associated with the completed study would be unlikely to reveal effect sizes of any major significance, where effect sizes relate to the magnitude of the difference between groups. Clearly, findings across several lines of research suggest the need for the continuation and coordination of efforts to understand the complexity of variables that put children at risk for SLI. Although neurologic and genetic research findings have been particularly exciting over the past two decades, these variables are not sufficient by themselves to explain SLI. Biological and environmental factors represent important frontiers for a more complete understanding of language impairment (Snow, 1996). At a different level of explanation, linguistic and cognitive accounts attempt to provide more immediate explanations for the specific patterns of language behaviors seen in SLI and their variability across children and over varying ages. Linguistic and Cognitive Accounts
A large number of linguistic accounts of SLI as well as cognitive accounts have been advanced over the past several decades. At present, more than a dozen warrant serious consideration (Leonard, 1998). As a group, these accounts deserve some attention here because of their potential impact on assessment and treatment of children for whom SLI is suspected or confirmed. As discussed in previous chapters, the validity of the assessment tool chiefly turns on the extent to which it captures the construct being measured. Consequently, different models of SLI imply the need for different measures. In practice, however, the link between theoretical understandings of a complex behavior and readily available assessment procedures is usually far from direct. This is particularly true when there are a large number of competing accounts but no clear frontrunners—the current case for SLI. In addition, the term accounts, used here and used by Leonard, specifically implies acknowledgement that these formulations fail to tie together the breadth of data that are typically associated with use of the term theories. Despite these limitations, some familiarity with these competing accounts can help readers anticipate future trends in both theoretical efforts and in recommended assessment practice. Leonard (1998) reviewed a wide field of linguistic and cognitive explanations of SLI, dividing them into three categories. Specifically, he considered six explanations of SLI focusing on deficits in linguistic knowledge: three on limitations in general processing capacity and three on specific processing deficits. Because of space limitations, each of these twelve accounts cannot be discussed in detail here. Instead, a small subset will be used to introduce readers to this complex debate and illustrate the challenges awaiting researchers and clinicians who seek to translate these accounts into assessment practice. Language Knowledge Deficit Accounts
Leonard (1998) argued that Chomsky’s (1986) principles and parameters framework to language acquisition can be seen as a foundation for the major accounts in which deficits in linguistic knowledge are postulated as central to SLI. Stemming
Page 124 from transformational grammar of the 1960s and 1970s, ‘‘principles” represent universals of natural languages, and “parameters” the dimensions along which individual languages differ. Children are presumed to work within the constraints associated with universal principles to acquire the specific knowledge of the parameter settings associated with their ambient language. Chief among the difficulties facing children in this process is the apparent need to understand more than just the surface relations existing between words in sentences as they are heard. Rather, they must also understand the underlying, or inferred, relationships between lexical categories (e.g., noun, verb, adjective) and functional categories that explain relationships between words within sentences (e.g., complementizer, inflection, determiner). Differences in the accounts that Leonard (1998) placed within this category lie primarily in which area of linguistic knowledge is absent or, more often, incomplete in children with SLI. Leonard himself and several colleagues are associated with accounts in which knowledge of functional categories overall is deemed incomplete (Leob & Leonard, 1991; Leonard, 1995). Alternatively, Rice, Wexler, and Cleave (1995) are associated with the extended optional infinitive account, in which children with SLI are thought to remain too long in a developmental phase in which tense is treated as optional. Other accounts see children with SLI as unable to develop implicit grammatical rules (Gopnik, 1990), as developing rules that are too narrow in their application (e.g., Ingram & Carr, 1994), or as lacking the ability to understand different agreement or dependent relationships existing between functional categories (e.g., Clahsen, 1989; van der Lely, 1996). Among the significant challenges facing these accounts is their need to provide more complete explanations of the variability in developmental patterns shown by children with SLI and of crosslinguistic differences in the error patterns and development of children with SLI. In addition, despite emerging efforts to tie linguistic accounts to genetic, biological, and environmental accounts (e.g., Gopnik & Crago, 1991), further steps in that direction are needed. Accounts Positing General Processing Deficits
General processing deficit accounts of SLI place general deficits in cognitive processing at the core of SLI, with the most ambitious of them holding these deficits responsible for both the linguistic and nonlinguistic differences seen in children with SLI (Leonard, 1998). Rather than assume that specific cognitive mechanisms are affected—as is done in the third and final category of accounts—these accounts postulate a more widespread deficiency offering a simpler, more elegant explanation of the patterns of deficits seen in children with SLI. Typically, such accounts tend to describe central cognitive deficits in terms of reductions in processing capacity or speed. Such accounts are particularly compelling for explanations of difficulties in word recall and retrieval and comprehension as well as nonlinguistic cognitive deficits, but must also explain the special difficulties associated with morphosyntax in most Englishspeaking children with SLI. Among the numerous researchers cited by Leonard (1998) as working on accounts of this type are Ellis Weismer (1985), Bishop (1994), Edwards and Lahey (1996; Lahey & Edwards, 1996) as well as Leonard himself.
Page 125 Leonard’s surface hypothesis (e.g., Leonard, 1989; Leonard, Eyer, Bedor, & Grela, 1997) represents one of the most thoroughly probed of the general processing deficit accounts and, consequently, serves here as an important exemplar of such accounts. The surface hypothesis suggests that differences in the pattern of deficits observed crosslinguistically in children with SLI may be due to differences in language structure across languages. Such differences are thought to lead to differences in processing demands rather than to the impaired gralnmatical systems posited by linguistic accounts. This account emphasizes the importance of surface features of languages, such as the physical properties of English graummatical morphology, that may represent special challenges to children, particularly to those with reduced processing capabilities. According to the surface hypothesis, children with SLI will take longer to acquire the more difficult aspects of their language and may focus their processing efforts in some areas at the expense of others (e.g., on word order at the expense of morphology). Among those features of a language that are considered particularly vulnerable are those that are relatively brief, uncommon in languages of the world, or less regular within the language (e.g., numerous grammatical morphemes in English). Leonard (1998) provides a thorough description of the successes and failures of this account in explaining an ever expanding body of empirical data from several language groups. Further he shows its basic compatibility with the surface hypothesis and other processing limitation accounts that emphasize reduced speed of processing. As with the grammatical knowledge accounts, accounts that posit general processing deficits have a wide range of crosslinguistic data to address, including patterns of errors and of acquisition patterns in children with SLI. Further, the appeal of such accounts in terms of simplicity is enhanced if they can also address similar data for children without impaired language. Add to that the desirability of addressing emerging data on the genetic and biologic factors associated with SLI and it becomes only a small wonder that consensus leading to a unified theory of SLI eludes the research community at this time. The last of the three types of accounts Leonard describes within this community wrestles with this same list of empirical challenges but proposes cognitive limitations that are more specific in nature. Specific Processing Deficit Accounts of SLI
According to Leonard (1998), three accounts have focused on specific deficits as responsible for farreaching consequences for language function. Respectively, these accounts hypothesize deficits in phonological memory (Ellis Weismer, Evans, & Hesketh, 1999; Gathercole & Baddeley, 1990), in temporal processing (Tallal, 1976, Tallal & Piercy, 1973; Tallal, Stark, Kallman, & Mellits, 1981), and in the mechanisms used for grammatical analysis (Locke, 1994). These accounts are less well developed than the linguistic and general cognitive deficit accounts in terms of the breadth of data they encompass. Of these accounts, the accounts associated with temporal processing (viz., Stark & Tallal, 1988; Tallal et al., 1996) have had the greatest recent impact, including considerable attention in the popular press (e.g., in a USA Today article [Levy, 1996]).
Page 126 This attention has largely been the result of the popularization of a specific training program called Fast ForWord (Scientific Learning Corporation, 1998). After a long history of work on SLI, Tallal joined with Michael Merzenich and others to conduct a series of remarkable treatment studies (Merzenich et al., 1996; Tallal et al., 1996). In those studies use of Fast ForWord, a computer training program designed to address hypothesized processing difficulties, resulted in significant gains in language performance and auditory processing. Development of that program was based on evidence that children with SLI have difficulty processing brief stimuli or stimuli that follow one another in rapid succession—difficulties that might significantly affect a child’s ability to process speech. Further, the program is based on the hypothesis that the deficit can be ameliorated by exposing children with SLI to stimuli that are initially recognizable but acoustically altered through the lengthening of formant transitions. During treatment, children participate in a large number of videogamelike trials in which they are required to make judgements about the altered stimuli. Across trials, the stimulus characteristics are altered in the direction of natural speech. Readers are encouraged to take note of the debate surrounding this account and the commercialization it has fostered (e.g., Gillam, 1999; Veale, 1999). Ironically, the authors of the other accounts discussed in this section of the chapter have appeared to take greater pains to tie together a huge number of empirical clues about the nature of SLI. However, it is rare to find the public so aware of an account—or at least the treatment program associated with it—and to clamor for its use with children presenting with a wide range of communicationrelated disorders (including reading disabilities and autism). These public responses alone make it a fascinating area of additional investigation for clinicians and researchers interested in children’s language disorders. Independent validation of this treatment and its theoretical underpinnings has yet to be provided (Gillam, 1999). What’s Ahead for Accounts of SLI?
In this section, I have tried my best to point out the most important landmarks of this vast and changing terrain (helped considerably by the work of Leonard, 1998, and the urgings of Bernard Grela to address these complex issues). However, I am certain that I have missed some important vantage points and critical roadways. Nonetheless, I hope that this brief overview provides you with the sense of the complexities facing these researchers. The researchers working on this topic have immense amounts of data to address if they are to settle on a truly comprehensive theory, rather than fragmented accounts of isolated aspects of SLI. Not only must they deal with information about how children of SLI perform on a range of language and nonlanguage tasks, they must do so for the wide range of spoken languages and across the life span. Further they must tie these together with the burgeoning findings about the genetics, brain structures, and social contexts of children with SLI. Other challenges facing researchers interested in SLI have been summarized by TagerFlusberg & Cooper (1999), who reviewed the findings of a recent National Institutes of Health workshop focused on steps needed to produce clear definitions of SLI
Page 127 for genetic study. Despite the narrow focus of that conference, the recommendations that came out of it appear germane to thoughts about the relation of theory to assessment practices. Among the recommendations summarized by TagerFlusberg and Cooper are that researchers abandon exclusionary definitions of SLI, broaden the language domains and informationprocessing skills they assess, and develop a standard approach to defining SLI, not only in preschoolers but also in older school age children, adolescents, and adults. These same recommendations are clinically relevant insofar as combining clinical and research efforts may result in the greatest gains in both arenas. Special Challenges in Assessment In addition to the theoretical challenges to the assessment of children with SLI, these children also come with a range of personal reactions to testing that are at least partially determined by the amount of success they expect. Any of us who has difficulty in certain areas, such as singing, drawing, or playing sports knows how uncomfortable we feel when our performance in those areas is evaluated. Consequently, I urge you to refer back to chapter 3 for some of the general guidelines addressed in that chapter, which will serve as a useful exercise in preparing for working with children with SLI. Beyond the personal dynamics that should always be a special consideration in assessment, children with SLI present several problems related to how they are identified as needing help. Plante (1998) pointed out at least three problems with how such children have been identified by researchers. Some of Plante’s concerns about the literature also face clinicians. Even those that do not, deserve attention by knowledgeable consumers of this research literature. First, Plante (1998) argued, researchers have tended to use criteria for nonverbal IQ (often nonverbal IQ of 85 or greater) that exclude not only children with mental retardation but large numbers of others whose lower intelligence makes them no less relevant to our understanding of SLI. Second, Plante noted that in the identification process, researchers have tended to use tests and cutoff scores on those tests that have not been shown to successfully identify children with the disorder. Specifically, she questioned two particular aspects of the validity of those tests and cutoffs: their sensitivity (the extent to which individuals with disorders are actually identified as having the disorder) and specificity (the extent to which individuals without disorders are successfully identified as such). (See chap. 9 for more complete explanations of these concepts). Third, Plante (1998) questioned the use of discrepancy or difference scores in the practice often referred to as cognitive referencing. Cognitive referencing occurs when the identification of SLI hinges on the demonstration of a specific difference between expected language function (based on nonverbal IQ) and language performance. Plante attacked this practice on two grounds: (a) because of a tendency for such comparisons to be based on ageequivalent scores, which are the targets of a long history of criticism from psychometric perspectives (e.g., see chap. 2) and (b) because there is no good evidence to support the use of nonverbal IQ as an indicator of language potential. As just one example of this lack of evidence, Krassowski and Plante (1997) reported a lack of stability in the performance IQ scores of 75 children with SLI over a 3year time frame that would be inconsistent with their use as a constant
Page 128 measure of language potential. Plante and her colleagues are joined by large numbers of the community of language researchers in finding serious—many would say fatal—flaws with cognitive referencing (e.g., Aram et al., 1993; Fey et al., 1994; Kamhi, 1998; Lahey, 1988). Along with the instability of categorizations obtained through cognitive referencing, others have noted that similar amounts of improvement in specific treatments are made by children who would fall on both sides of conventional cognitive criteria (e.g., Fey et al., 1994). Even readers who have simply skimmed earlier chapters on their way to this one will recognize certain common dilemmas facing clinicians as well as researchers regarding cognitive referencing. Thus, for example, both groups need to be as careful as possible to select measures that have been studied very carefully for the purpose to which they are being used. That is, evidence of criterionrelated validity for how and with whom measures are used is something in which both clinicians and researchers have a prodigious stake. In addition, both groups should avoid the relatively unreliable and misleading nature of ageequivalent scores—insofar as they are able to do so. The “wiggle room” left by that last clause stems from the fact that clinicians may find themselves compelled to use ageequivalent scores by the settings in which they work, particularly for younger children. With regard to cognitive referencing, Casby (1992) noted that in 31 states, eligibility for services based on SLI demand its use in some form. In such situations, an ethical and sensible recommendation would be to provide the required documentation (i.e., to go ahead and report the cognitivereferenced information, ageequivalent scores, or both), but accompany it with appropriate warnings about the limitations of each and recommendations from a more scientifically supportable perspective. In a discussion of problems of differential diagnosis in SLI, Leonard (1998) called attention to a further difficulty associated with the assessment of children considered at risk for the disorder. Specifically, he called attention to the difficulty in distinguishing late talkers, who will ultimately prove to be simply late in developing language, from those children whose late talking foretells persisting problems in language acquisition. Most children with SLI have a history of late talking (which is usually defined in terms of late use of words). However, only one quarter to one half of latetalkers will go on to be diagnosed with a language disorder. Developing accurate predictions of which children are showing early signs of SLI has spurred the efforts of a number of researchers who hope that early identification will lead to effective and efficient early intervention (e.g., Paul, 1996; Rescorla, 1991). Unfortunately, the dramatic variability in children’s normal language development is proving a considerable obstacle. Thus, reliable signs yielding reasonably accurate predictions have evaded researchers, leading Leonard (1998) to recommend with holding diagnoses until at least age 3 and Paul (1996) to advise a “watch and see” policy. A differing interpretation of the data on which Paul’s recommendations are based that includes a plea for more aggressive intervention can be found in van Kleeck, Gillam, and Davis (1997). Also urging more aggressive responses to latetalking children, Olswang, Rodriguez, and Timler (1998) represent a somewhat more optimistic reading of the research evidence. Specifically they outlined speech and language differences and other risk factors that they propose should prompt decisions to intervene. Table 5.2
Page 129 Table 5.2 Predictors and Risk Factors Useful in Helping Clinicians Decide Whether to Enroll Toddlers Who Are Late Talkers for Intervention Predictors Speech
Nonspeech
Risk Factors
Language production l Small vocabulary for age l Few verbs l Preponderance of general allpurpose verbs (e.g., want, go, get, do, put, look, make, got) l More transitive verbs (e.g., John hit the ball) l Few intransitive and ditransitive verb forms (e.g., he sleep, doggie run)
Play l Primarily manipulating and grouping l Little combinatorial and/or symbolic play
Otitis media—Prolonged periods of untreated otitis media
Language comprehension l Presence of 6month comprehension gap l Large comprehensionproduction gap with comprehension deficit
Gestures—Few communicative gestures, symbolic gestural sequences, or supplementary gestures
Heritability—Family member with persistent language and learning problems
Phonology l Few prelinguistic vocalizations Social skills l Limited number of consonants l Behavior problems l Limited variety in babbling structure l Few conversational initiations l Less than 50% consonants correct (substitution of l Interactions with adults more than peers glottal consonants and back sounds for front) l Difficulty gaining access to activities l Restricted syllable structure l Vowel errors
Parent needs l Parent characteristics: Low SES; directive more than response interaction style l Extreme parent concern
Imitation l Few spontaneous imitations l Reliance on direct model and prompting in imitations tasks of emerging language forms
Note. From “Recommending Intervention for Toddlers With Specific Language Learning Difficulties: We May Not Have All the Answers, but We Know a Lot,” by L. Olswang, B. Rodriguez, & G. Timler, 1998. American Journal of SpeechLanguage Palhology, 7, p. 29. Copyright 1998 by American SpeechLanguage Hearing Association. American SpeechLanguageHearing Association. Reprinted with permission.
Page 130 summarizes their list. They recommended that larger numbers of risk factors be viewed as cause for greater concern. Expected Patterns of Language Performance The language performance of children with SLI has undergone greater scrutiny than that of any other group of children with language difficulties. The diversity and depth of this research over several decades leads to some clear expectations of areas in which difficulties can be expected but also to ubquitous expectations that each child will be different. Therefore, before I delve into expected patterns of difficulties, I should mention again that general expectations lead to hypotheses about what might be expected in a given child—not infallible certainties. Generalizations also fail to render either the variations found in studies identifying distinct subtypes of SLI (e.g., Aram & Nation, 1975; Rapin & Allen, 1988; Wilson & Risucci, 1986) or in studies revealing changes in patterns of impairment that occur with age (e.g., Aram, Ekelman, & Nation, 1984; Stothard, Snowling, Bishop, Chipchase, & Kaplan, 1998; Tomblin, Freese, & Records, 1992). Further, these generalizations have been identified for children acquiring English—a potentially serious limitation for clinicians working with children acquiring other lessstudied languages (Leonard, 1998). Thus, the expected patterns discussed here are described only briefly and are meant to prompt consideration of likely areas of difficulty, not to become the only ones given attention. Among the more robust findings from studies examining language skills in Englishspeaking children with SLI have been the findings that (a) expressive and receptive language are often differentially impaired, and (b) degree of involvement can vary from quite mild to quite severe. Also, expressive language tends to be more frequently and severely affected—an observation that is borne out in much of the literature and is also reflected in the DSM–IV (American Psychiatric Association, 1994) definition shared at the beginning of the chapter. Recent research, however, suggests that this disparity may not be as large as has sometimes been thought. Among the children who were found to have impaired language in a report dealing with a large epidemiological study, Tomblin (1996a) identified 35% of children with expressive problems, 28% with receptive problems, and 35% with both expressive and receptive problems (given a cutoff of 1.25 standard deviations below the mean). In Table 5.3, specific areas of difficulty relative to normally developing peers are summarized on the basis of an extensive review of literature appearing in Leonard (1998; cf. Menyuk, 1993; Watkins, 1994). In Table 5.3, the density of comments falling under language production reflects not only the tendency for this modality to be affected by more obvious and often more severe deficits than comprehension, but also by a tendency for it to have received much greater research attention. A related table, Table 5.4, lists specific grammatical morphemes that have been identified as particularly problematic. As you examine Table 5.3, notice that many—although not all—of the differences shown by children with SLI resemble patterns seen in younger children and are therefore characterized as delays. This observation may have implications related to the nature of this disorder. In addition, it supports the reasonableness of approaching
Page 131 Table 5.3 Patterns of Oral Language Impairment by Modality and Domain Reported in Children With Specific Language Impairment (SLI) (Leonard, 1998) Domain Semantics
Production
l Delays in acquiring first words and word combinations Delays in verb acquisition, with overuse of some common Lexical abilities and early word l verbs (e.g., do, go, get, put, want) combinations l Wordfinding difficulties,a especially noted in schoolage children
Comprehension
Deficient in learning to understand new words, particularly those involving actions l
Increased tendency to omit obligatory arguments (e.g., omission of object for transitive verb) or even the verb itself l Increased tendency to omit optional but semantically important l Increased difficulty in acquiring argument structure information information (e.g., adverbials providing information regarding time, from syntactic information for new verbs location, or manner of action) and use of an infinitival complement (e.g., He wants to do this) l
Argument structure
Grammatical morphology constitutes a relative and sometimes enduring weakness in children with SLI (see Table 5.4 for a list of l Limited research suggests poorer comprehension of grammatical morphemes that have received particular attention) grammatical morphemes, especially for those of shorter duration, l Grammatical morphology related to verbs is especially affected and poorer identification of errors involving grammatical morphemes l Errors most often consist of omissions rather than inappropriate use, but are likely to be inconsistent in either case l
Grammatical morphologyb
Although occasionally occurring alone, phonological deficits are almost always accompanied by other language deficits, and vice versa l Delays are most frequently seen with most errors resembling those of younger normally developing children. l Unusual errors in productionc occur rarely, but probably more often than in normally developing children l Greater variability in production than children without SLI at similar stages of phonological development l
Phonology
(Continued)
Page 132 Table 5.3 (Continued) Domain
Production
Comprehension
Some evidence of pragmatic difficulties Although these difficulties largely seem due to communication l Limited research suggests that understanding of the speech problems posed by other language deficits, independent acts of others may be affected pragmatic deficits may occur as well l Comprehension of figurative language (e.g., metaphors, l Participation in communication is negatively affected when idioms) can be affected communication involves adults or multiple communication partners l l
Pragmatics
Narratives
Cohesion of narratives can be affected, and sometimes expected story components are absent l
Comprehension of narratives can be affected when inferences need to be drawn from the literal narrative content l
aEvidenced by unusually long pauses in speech, frequent circumlocution, or frequent use of nonspecific words such as it and stuff. bGrammatical morphology can be defined as “the closedclass morphemes of language, both the morphemes seen in inflectional morphology (e.g., ‘plays,’ ‘played’) and derivational morphology (e.g., ‘fool,’ ‘foolish’), and function words such as articles and auxiliary verbs” (Leonard, 1998, p. 55). cAmong the unusual errors reported for this population are later developing sounds being used in place of earlier developing sounds, a sound segment addition, and use of sounds not heard in the child’s ambient language. treatment goals from a developmental perspective (Leonard, 1998). Also, notice the expanse of unmapped country revealed here. Despite several decades of work, much remains unknown about the abilities of children with SLI and how they are related to one another. Consequently, the potential for valuable outcomes from experimental exploration is immense! Finally, on a very different note, readers of this table may find that their knowledge of some terminology related to linguistic descriptions of these children’s difficulties is outdated or incomplete. They are referred to Hurford (1994) as a reference guide to the more basic grammatical terms. Related Problems When compared with children described in other sections of this book, children with SLI have far fewer related problems. Despite the more restricted nature of their difficulties, however, children with SLI are at increased risk for a number of significant, ongoing problems in addition to a lengthening list of subtle perceptual and cognitive deficiencies that were described briefly earlier. Among these are increased risk for emotional, behavioral, and social difficulties. In addition, there is increased risk for ongoing academic difficulties often associated with diagnoses of learning disabilities (Wallach & Butler, 1994).
Page 133 Table 5.4 Examples of Grammatical Morphemes, an Area of Special Difficulty for Children With Specific Language Impairment (SLI) Inflectional morphemes Past tense, regular –ed: slept, walked, irregular: flew, hid Thirdperson singular s: sits, runs Progressive–ing: is running, is seeing Plural –s: coats, flowers Possessive ‘s (also called genitive ‘s): Sam’s, dog’s Other grammatical morphemes Copula be: he is a boy; they are happy Auxiliary be: she is hunting, he was cooking Auxiliary do: I don’t hate you; Do you remember that man? Articles: the man; a cat Pronouns: anything, herself, I, he, they, them, her
Emotional, Behavioral, and Social Difficulties
The possibility that children with specific language disorders may be at risk for difficulties in personal adjustment has been examined at several levels of severity. These levels of severity have ranged from studies examining the prevalence of identifiable psychiatric diagnoses (e.g., Baker & Cantwell, 1987a, b; Beitchman, Nair, Clegg, & Patell, 1986; Beitchman, Brownlie, et al., 1996; Beitchman, Wilson, et al., 1996) to studies examining specific aspects of peer relationships or social maturation (e.g., Craig, 1993; Farmer, 1997; Fujiki & Brinton, 1994; Gertner, Rice, & Hadley, 1994; Records, Tomblin, & Freese, 1992; Rutter, Mawhood, & Howlin, 1992). Studies looking at this issue differ in a large number of methodological variables (e.g., ages studied, methods used to define language impairment and other problem areas). Nonetheless, a serviceable overview of their findings is that children with SLI are at increased risk for difficulties involving their emotional, behavioral, and social status. Further, this generalization holds for both children and older individuals with a history of SLI—even when they appear to have outgrown persisting language impairment (through treatment or maturational processes alone; e.g., Rutter, Mawhood, & Howlin, 1992). There is evidence that children with receptive problems or those with both expressive and receptive language problems are at greater risk than those with expressive problems alone (e.g., Beitchman, Wilson, et al., 1996; Stevenson, 1996). The causal mechanisms involved in the cooccurrence of communication problems and difficulties in emotional, behavioral, and social realms are difficult to discern and are far from being understood (Stevenson, 1996). Still, the implications of the cooccurrence alone are nonetheless important for those who help children with SLI. Among the specific problems associated with SLI that can be categorized as psychiatric problems are attention deficit disorder (ADD), conduct disorder, and anxiety disorders (Baker & Cantwell, 1987b). Of these three disorders, perhaps the most familiar to many people is attention deficit hyperactivity disorder (ADHD). With
Page 134 an estimated prevalence of 4 to 6% of all elementaryschoolaged children, it has been described as the ‘‘most common significant behavioral syndrome in children” (Wender, 1995, p. 185). Recall that in the description of Wilson at the beginning of the chapter, it was suspected as a contributor to some of his difficulties in fitting into the classroom and interacting with peers. ADHD is typically diagnosed in children who show patterns of inattention, overactivity–impulsivity, or both, that seem inappropriate for age and detrimental to functioning (American Psychiatric Association, 1994). Although symptoms of the disorder may be more common in some situations than others, they occur across settings. Excellent practical recommendations for dealing with the symptoms of this disorder in the classroom are available in Dowdy, Patton, Smith, and Polloway (1998, Appendix A). Conduct disorder is diagnosed in children who demonstrate a repeated and consistent pattern of behavior that is inappropriate for age and violates social or even legal norms (American Psychiatric Association, 1994; Goldman, 1995). Behaviors that are associated with this diagnosis can include aggression to people and animals, destruction of property, deceitfulness, theft, truancy, and running away. Anxiety disorder is diagnosed in children who worry excessively, usually about their performance, with resulting negative effects on their functioning (American Psychiatric Association, 1994). Although the area of concern may shift from time to time, the intensity, duration, and frequency of the anxiety and worry are seen as out of proportion with their actual likelihood or impact. Children with this disorder may be overly concerned about approval and require excessive reassurance about the adequacy of their performance or other focus of concern. Although the diagnoses of attention deficit disorder, conduct disorder and anxiety disorders are relatively rare among children with language impairment, another view of the association between psychiatric diagnoses and language skills has been taken by researchers who examine the language skills of groups of children seen as psychiatric outpatients. In one relatively recent study, researchers found that one third of the 399 such children whose language was screened were identified as having an unsuspected language impairment (Cohen, Davine, Horodezky, Lipsett, & Isaacson, 1993). Thus, awareness of this possible association can help speechlanguage pathologists contribute to the development of children whose emotional and behavioral issues have previously overshadowed very real language difficulties, as well as those children for whom a language diagnosis has already been made. Academic Difficulties
The connection between language difficulties and academic difficulties is a powerful one. In the early grades, academic skills build on language skills used in everyday experience. Later, academic demands, especially for written language acquisition but also for the understanding and use of figurative language, narrative construction and the use of language in reasoning (Nippold, 1998), help fuel additional gains in language development. At least that is the way things are thought to work for normally developing children.
Page 135 Increasingly, it appears that the oral language difficulties of children with SLI may contribute to and be exacerbated by the unsuccessful language experiences they encounter in school. Bashir and Strominger (1996) described the interweaving of oral and written language problems as follows: It is reasonable to argue that the continued academic vulnerability in children with language disorders in the middle grades reflects both the persistence of language problems and restrictions on later language development resulting from reduced reading as well as restricted exposure to different texts and textbased information. (p. 134) Thus, not only may language impairments lead to academic difficulties, but difficulties with the language of the academic setting may contribute to children falling further behind their peers in language development. A recent study by Stothard et al. is quite representative of the literature on later language and academic outcomes (e.g, Hall & Tomblin, 1978; Tomblin et al., 1992; Weiner, 1974) and corroborates some of its more robust findings. The study reports on data from the same children seen at ages 4, 5½, and 15½ years. Experimental measures included measures of oral language (receptive vocabulary, expressive vocabulary, general comprehension, grammatical understanding, naming), shortterm memory and phonological skills (sentence repetition, nonword repetition, and spoonerisms), and written language (consisting of one test that assesses singleword reading, singleword spelling, and reading comprehension). In addition, information about children’s special education status was examined. Results indicated that children who were seen as having persisting SLI at age 5½ demonstrated longstanding impairment, with performances at age 15½ falling below agematched peers on all oral language measures. In particular, 47% of these children obtained verbal composite scores more than 1 standard deviation below the mean, and 20% obtained scores more than 2 standard deviations below the mean. In addition, they showed persisting problems in reading and spelling that had resulted in a high percentage receiving special education assistance of some kind. Even children who seemed to have recovered at age 5½, performed significantly less well than agematched peers at age 15½ on tests tapping shortterm memory and phonological skills. Further, almost a third of these children, 31% (8 out of 26), demonstrated performances consistent with the persisting SLI category, that is, they could again be considered language impaired. This is consistent with a pattern termed illusory recovery, which refers to the apparent reemergence of problems as the complexity of demands placed on children increases with grade level (Scarborough & Dobrich, 1990). The few studies exploring the language skills and academic accomplishments of adults with a history of SLI (Gopnik & Crago, 1991; Hall & Tomblin, 1978; Plante et al., 1996; Tomblin et al., 1992) confirm that many children with SLI will continue to be plagued by significant differences in language performance that impact other areas of functioning, including school advancement. The Personal Perspective for this chapter illustrates the possibility of longterm academic effects of SLI.
Page 136 PERSONAL PERSPECTIVE This perspective was provided by Michele, who told her story to Cynthia Roby, the author of When learning is tough: Kids talk about their learning disabilities (1994). Although Michele primarily discussed her learning disability and its impact on her school experiences, it seems likely that she had language problems as a “learning problem.” “My apartment is above a video store in the city. I can go right downstairs and rent a movie. I don’t have any brothers or sisters. My mother is Korean. She works at a cafeteria. My dad works at the airport; he’s Chinese. We eat lots of Korean and Chinese food. I like rice and noodles. I love pizza, too.… “I think my parents found out I had a learning problem when I was two. I had a problem when people would read to me. I would just draw on the books because I couldn’t understand the stories. It was hard for me to understand the words. “I hated my old school, I felt a little bit mad about having a learning problem. I couldn’t read the words and the other kids could. I had to be sent to a quiet room so I could read. Somebody would help me there. It made me feel happy when I finally got extra help. It didn’t make me feel bad to go to the special classroom. “Then a few years ago, my parents decided to send me to a special school for kids with learning disabilities. I like it there, and the teachers help me. They treat me nicely and help me with my reading. Even so, recess is still the most fun. I run around the playground with the girls. “My parents help me with school work. My dad used to show me flash cards. He still helps me with my math, my reading, and my spelling. He made a list of all the math facts I have to learn; it’s taped next to my bed. My parents are good to me. They don’t get angry at me because I have learning problems. “I think my cousin may have learning problems, too. He is just little. He goes to school, and when the teacher reads a book he won’t listen. He is like me at that age. I’m good at art. I like to do selfportraits and paint and do projects. I would rather paint all day instead of doing math or reading. I like classical music. And last year I learned to play “Can Can’’ on the keyboard.I practiced every day. Sometimes I would mess up a little. Then I would do it over again, and I would do it right. “I think high school will be hard, very hard. I am going to study biology in college—it’s all about human beings and the body parts. I’ll be a teacher when I grow up. I will tell kids not to fight or pinch. I want to teach little kids. They’re cute!” Michele’s tip: “I would tell other kids with learning problems to get books and keep trying to read them.”
Page 137 Although it is the exceptional child with written language difficulties who is without a history of spoken language difficulties (Stark & Tallal, 1988), not all children with SLI go on to be identified with difficulties in written language. Factors that appear to predict later problems in literacy include difficulties with receptive language, phonological awareness, and rapid naming (Leonard, 1998). Phonological awareness is explicit knowledge about the sound structure of the language—for instance, that words are made up of syllables and syllables of individual sounds (Ball, 1993). Other complications to be dealt with in understanding the relationship between language impairment and later academic difficulties are a tendency for lower intelligence and lower SES to encroach as potential confounding variables (Schachter, 1996). Summary 1. Specific language impairment (SLI) is a research construct designed to help identify a “pure” language disorder and is usually defined in terms of exclusionary as well as inclusionary characteristics, although such definitions are increasingly controversial. 2. Among factors that are suspected in the causation of SLI are genetics, differences in brain structure and function, and other biological factors. Environmental factors, especially aspects of the child’s social environment, have been examined, but appear less important at this time. 3. In addition, linguistic and cognitive accounts of causation in SLI have received extensive attention from researchers. According to Leonard (1998), the three major categories into which these accounts fit are those focused on linguistic knowledge deficits, general processing deficits, and specific processing deficits. 4. Although theoretical understanding of SLI can ultimately be expected to engender major shifts in assessment ]methods, little translation from experimental assessment tools to those available to practicing clinicians has yet occurred. More immediate impact may derive from calls for researchers (and clinicians) to assess language and other performance domains more broadly and to seek consensus on diagnostic methods. 5. Special challenges to assessment include problems with the frequent exclusion of children with mental retardation from definitions of SLI, the use of cognitive referencing in research, bureaucratically dictated protocols, and the overuse of measures in identification without sufficient study of their validity for that purpose. 6. As an additional challenge, the problem of differentiating young latetalkers from children who will have a persistent impairment in language poses special difficulties. 7. Patterns of language impairment can range from mild to quite severe and can affect both receptive and expressive language. Domains of language that are particularly problematic for young children learning English appear to include morphology, syntax, and phonology. 8. Related problems for these children include somewhat increased risk for emotional, behavioral, and social difficulties, as well as greater risk for persistent academic difficulties.
Page 138 Key Concepts and Terms anxiety disorder: an emotional disorder in children in which their excessive anxiety and worrying, usually about performance, adversely affects their performance in school and at home. attention deficit disorder (with or without hyperactivity): a psychological disorder in which individuals demonstrate excessive inattention and distractibility, implusivity and hyperactivity, or both when compared with other individuals the same age. cognitive referencing: the use of a measure of intelligence (usually nonverbal IQ) as a reference against which to define impaired language; it is based on the assumption that nonverbal cognition represents an upper bound for language function. concordance: agreement in the presence or absence of a disorder between two individuals in a natural pair (e.g., a pair of identical or fraternal twins). conduct disorder: a psychological disorder in which there is a persistent pattern of violating social rights, others’ rights, or societal norms through behaviors such as aggression toward people or animals, destruction of property, theft, or deceitfulness. effect size: A measure reflecting the magnitude of difference between groups in an experimental study. Whereas statistical significance addresses the reliability of a research finding, effect size provides important information for judging the importance of a statistically significant effect. Fast ForWord: A computerized treatment developed by Paula Tallal, Michael Merzenich, and their colleagues, based on the premise that SLI is caused by difficulties in temporal processing. general allpurpose verbs: Verbs, such as do and get, that occur with relatively highfrequency in the speech of normallydeveloping children, but that also tend to be overused in the speech of children who are “late talkers.” general processing deficit accounts of SLI: explanations of SLI in which processing deficits are presumed to account for both verbal and nonverbal difficulties documented in children with SLI. The surface hypothesis is one such account. incomplete penetrance: the failure of a gene to have the same effect on all individuals who carry it, for example, when a gene that is usually associated with a specific disease does not produce that disease in some individuals who carry it. latetalkers: Children who show delays in language production that may represent early signs of SLI or simply a delay in language development that is overcome as the child matures. linguistic accounts of SLI: Accounts in which deficits in linguistic knowledge are considered the core deficits in children with SLI. Rice, Wexler and Cleave’s extended optional infinitive is an example of this type of account.
Page 139 magnetic resonance imaging (MRI): a relatively noninvasive radiographic technique used to study brain structure in living individuals. phenotype: the behavioral outcome for which a genetic explanation is sought (Rice, 1996). phonological awareness: explicit knowledge about the sound structure of the language, for example, knowing that words are made of syllables, and syllables of individual sounds. proband: the affected individual in a genetic study, whose identified disorder or difficulty leads to researchers including them and members of their family in genetic research. recast: a restatement of a child’s production using grammatically correct structures, often incorporating morphosyntactic forms that had been omitted or produced in error by the child. risk factors: factors that are associated with increased likelihood that a disorder will occur; these factors may or may not represent causes. specific language impairment (SLI): delayed acquisition of language skills, usually defined as occurring in the absence of impairments in other areas of functioning, such as nonverbal cognition and hearing. specific processing deficit accounts of SLI: explanations of SLI in which specific processing deficits (e.g., in auditory processing or phonologic working memory) are thought to account for the language and other difficulties associated with SLI. Tallal’s account based on temporal processing deficits is one example. Study Questions and Questions to Expand Your Thinking 1. How might knowledge that SLI is sometimes ‘‘caused” by differences in brain structure affect diagnosis? How might it affect treatment? 2. Remembering that cooccurrence does not mean causation, consider the significance of a physical marker, such as a specific neurological anomaly, for SLI. What other mechanisms might explain its presence besides its having a role in causing the appearance of language learning difficulties? 3. If you were the parent of a child with SLI, what might you want to know about the genetics of this condition? How might you, as a clinician, explain this information, and where could you suggest that both you and the parent obtain additional information? 4. Describe three possible cooccurring problems that may affect the communication and testtaking behaviors of a child with SLI. 5. On the basis of your reading, what domains of language and communication have been considered important by researchers? Can you find standardized tests that correspond to these areas?
Page 140 6. What research questions do you think are most important for furthering our understanding of this condition? Recommended Readings Gilger, J. W. (1995). Behavioral genetics: Concepts for research and practice in language development and disorders. Journal of Speech and Hearing Research, 38, 1126–1 142. Gillam, R. (1999). Computer assisted language intervention using FastForward: Theoretical and empirical considerations for clinical decisionmaking. Language, Speech, and Hearing Services in Schools, 30, 363–370. Hurford, J. R. (1994). Grammar: A student’s guide. Cambridge, England: Cambridge University Press. Leonard, L. (1998). Children with specific language impairment. Cambridge, MA: MIT Press. References American Psychiatric Association. (1994). Diagnostic and statistical manual of mental disorders (4th ed.). Washington, DC: Author. Aram, D. M., & Eisele, J. A. (1994). Limits to a left hemisphere explanation for specific language impairment. Journal of Speech and Hearing Research, 37, 824– 830. Aram, D. M., Morris, R., & Hall, N. E. (1993). Clinical and research congruence in identifying children with specific language impairment. Journal of Speech and Hearing Research, 36, 580–591. Aram, D. M., & Nation. J. (1975). Patterns of language behavior in children with developmental language disorders. Journal of Speech and Hearing Research, 18, 229–241. Aram, D. M., Ekelman, B., & Nation, J. (1984). Preschoolers with language disorders: 10 years later. Journal of Speech and Hearing Research, 27, 232–244. Baker, L., & Cantwell, D. (1987a). Comparison of well, emotionally disordered and behaviorally disordered children with linguistic problems. Journal of the American Academy of Child and Adolescent Psychiatry, 26, 193–196. Baker, L., & Cantwell, D. (1987b). A prospective psychiatric followup of children with speech/language disorders. Journal of the American Academy of Child and Adolescent Psychiatry, 26, 546–553. Ball, E. W. (1993). Assessing phoneme awareness. Language, Speech, and Hearing Services in Schools, 24, 130–139. Bashir, A., & Strominger, A. (1996). Children with developmental language disorders: Outcomes, persistence, and change. In M. D. Smith & J. S. Damico (Eds.), Childhood language disorders (pp. 119–140). New York: Thieme. Beitchman, J. H., Nair, R., Clegg, M., & Patell, P. G. (1986). Prevalence of psychiatric disorders in children with speech and language disorders. Journal of the American Academy of Child and Adolescent Psychiatry, 25, 528–535. Beitchman, J. H., Brownlie, E. B., Inglis, A., Wild, J., Ferguson, B., Schachter, D., Lancee, W., Wilson, B., & Mathews, R. (1996). Sevenyear followup of speech/language impaired and control children: Psychiatric outcome. Journal of Child Psychology and Psychiatry, 37, 961–970. Beitchman, J. H., Wilson, B., Brownlie, E. B., Walters, H., Inglis, A., & Lancee, W. (1996). Longterm consistency in speech/language profiles: II. Behavioral, emotional, and social outcomes. Journal of the American Academy of Child and Adolescent Psychiatry, 35, 815–825. Bishop, D. V. M. (1992a). The biological basis of specific language impairment. In P. Fletcher & D. Hall (Eds.), Specific speech and language disorders in children (pp. 2–17). San Diego, CA: Singular Press. Bishop, D. V. M. (1992b). The underlying nature of specific language impairment. Journal of Child Psychology and Psychiatry, 33, 3–66. Bishop, D. V. M. (1993). Language development after focal brain damage. In D. Bishop & K. Mogford (Eds.), Language development in exceptional circumstances (pp. 203–219). Hove, UK: Lawrence Erlbaum Associates.
Page 141 Bishop, D. V. M. (1994). Grammatical errors in specific language impairment: Competence or performance limitations? Applied Psycholinguistics, 15, 507–550. Bishop, D. V. M., & Edmundson, A. (1987). Languageimpaired 4yearolds: Distinguishing transient from persistent impairment. Journal of Speech and Hearing Disorders, 52, 156–173. Bondurant, J., Romeo, D., & Kretschmer, R. (1983). Language behaviors of mothers of children with normal and delayed language. Language, Speech, and Hearing Services in Schools, 14, 233–242. Brzustowicz, L. (1996). Looking for language genes: Lessons from complex disorder studies. In M. Rice (Ed.), Towards a genetics of language (pp. 3–25). Mahwah, NJ: Lawrence Erlbaum Associates. Camarata, S., & Swisher, L. (1990). A note on intelligence assessment within studies of specific language impairment. Journal of Speech and Hearing Research, 33, 205–207. Casby, M. (1992). The cognitive hypothesis and its influence on speechlanguage services in schools. Language, Speech, and Hearing Services in School, 23, 198– 202. Chomsky, N. (1986). Barriers. Cambridge, MA: MIT Press. Clahsen, H. (1989). The grammatical characterization of developmental dysphasia. Linguistics, 27, 897–920. Clark, M., & Plante, E. (1998). Morphology of the inferior frontal gyrus in developmentally languagedisordered adults. Brain and Language, 61(2), 288–303. Cohen, M., Campbell, R., & Yaghmai, F. (1989). Neuropathological abnormalities in developmental dysphasia. Annals of Neurology, 25, 567–570. Cohen, N. J., Davine, M., Horodezky, N., Lipsett, L., & Isaacson, L. (1993). Unsuspected language impairment in psychiatrically disturbed children: prevalence and language and behavioral characteristics. Journal of the American Academy of Child and Adolescent Psychiatry, 32, 595–603. Crago, M., & Gopnik, M. (1994). From families to phenotypes. In R. Watkins & M. Rice (Eds.), Specific language impairments in children (pp. 35–51). Baltimore: Paul H. Brookes. Craig, H. K. (1993). Social skills of children with specific language impairment: Peer relationships. Language, Speech, and Hearing Services in Schools, 24, 206– 215. Cunningham, C., Siegel, L., van der Spuy, H., Clark, M., & Bow, S. (1985). The behavioral and linguistic interactions of specifically languagedelayed and normal boys with their mothers. Child Development, 56, 1389–1403. Dowdy, C. A., Patton, J. R., Smith, T. E. C., & Polloway; E. A. (1998). Attention deficit/hyperactivity disorder in the classroom: A practical guide for teachers. Austin, TX: ProEd. Edwards, J., & Lahey, M. (1996). Auditory lexical decisions of children with specific language impairment. Journal of Speech and Hearing Research, 39, 1263– 1273. Ellis Weismer, S. (1985). Constructive comprehension abilities exhibited by languagedisordered children. Journal of Speech and Hearing Research, 28, 175–184. Ellis Weismer, S., Evans, J., & Hesketh, L. J. (1999). An examination of verbal working memory capacity in children with specific language impairment. Journal of Speech, Language, and Hearing Research, 42, 1249–1260. Farmer, M. (1997). Exploring the links between communication skills and social competence. Educational and Child Psychology, 14(3), 38–44. Fey, M., Long, S. H., & Cleave, P. L. (1994) Reconsideration of IQ criteria in the definition of specific language impairment. In R. Watkins & M. Rice (Eds.), Specific language impairments in children (pp. 161–178). Baltimore: Paul H. Brookes. Fujiki, M., & Brinton, B. (1994). Social competence and language impairment in children. In R. V. Watkins & M. L. Rice (Eds.), Specific language impairments in children (pp. 123–143). Baltimore: Paul H. Brookes. Galaburda, A., Sherman, G., Rosen, G., Aboitiz, F., & Geschwind, N. (1985). Developmental dyslexia: Four consecutive patients with cortical anomalies. Annals of Neurology, 18, 222–233. Gathercole, S., & Baddeley, A. (1990). Phonological memory deficits in language disordered children: Is there a causal connection? Journal of Memory and Language, 29, 336–360. Gauger, L., Lombardino, L., & Leonard, C. (1997). Brain morphology in children with specific language impairment. Journal of Speech, Language, and Hearing Research, 40, 1272–1284. Gertner, B. L., Rice, M. L., & Hadley, P. A. (1994). Influence of communicative competence on peer preferences in a preschool classroom. Journal of Speech and Hearing Research, 37, 913–923.
Page 142 Geschwind, N., & Levitsky, W. (1968). Human brain: Asymmetries in the temporal speech region. Science, 161, 186–187. Gilger, J. W. (1995). Behavioral genetics: Concepts for research and practice in language development and disorders. Journal of Speech and Hearing Research, 38, 1126–1142. Gillam, R. (1999). Computer assisted language intervention using Fast ForWord: Theoretical and empirical considerations for clinical decisionmaking. Language, Speech, and Hearing Services in Schools, 30, 363–370. Goldman, S. (1995). Disruptive behavior, lying, stealing, and aggression. In S. Parker & B. Zuckerman (Eds.), Behavioral and developmental pediatrics (pp. 110– 115). Boston: Little, Brown. Gopnik, M. (1990). Featureblind grammar and dysphasia. Nature, 344, 715. Gopnik, M., & Crago, M. (1991). Familial aggregation of a developmental language disorder. Cognition, 39, 1–50. Hadley, P., & Rice, M. L. (1991). Conversational responsiveness of speech and languageimpaired preschoolers. Journal of Speech and Hearing Research, 34, 1308–1317. Hall, P., & Tomblin, B. (1978). A followup study of children with articulation and language disorders. Journal of Speech and Hearing Disorders, 43, 227–241. Hurford, J. R. (1994). Grammar: A student’s guide. Cambridge, England: Cambridge University Press. Ingram, D., & Carr, L. (1994, November). When morphology ability exceeds syntactic ability: A case study. Paper presented at the Convention of the American SpeechLanguageHearing Association, New Orleans, LA. Jackson, T., & Plante, E. (1996). Gyral morphology in the posterior sylvian region in families affected by developmental language disorder. Neuropsychology Review, 6(2), 81–94. Jernigan, T., Hesselink, J., Sowell, E., & Tallal, P. (1991). Cerebral structure on magnetic resonance imaging in language and learningimpaired children. Archives of Neurology, 48, 539–545. Johnston, J. R. (1992). Cognitive abilities of languageimpaired children. In P. Fletcher & D. Hall (Eds.), Specific speech and language disorders in children (pp. 105–116). San Diego, CA: Singular Press. Johnston, J. R. (1993). Definition and diagnosis of language development disorders. In G. Blanken, J. Dittman, H. Grimm, J. C. Marshall, C. W. Wallesch (Eds.), Linguistic disorders and pathologies: An international handbook (pp. 574–585). Berlin, Germany: deGruyter. Kamhi, A. (1993). Children with specific language impairment (developmental dysphasia): Perceptual and cognitive aspects. In G. Blanken, J. Dittman, H. Grimm, J. C. Marshall, C. W. Wallesch (Eds.), Linguistic disorders and pathologies: An international handbook (pp. 574–585). Berlin, Germany: deGruyter. Kamhi, A. (1998). Trying to make sense of developmental language disorders. Language, Speech, and Hearing Services in Schools, 29, 35–44. Krassowski, E., & Plante, E. (1997). IQ variability in children with SLI: Implications for use of cognitive referencing in determining SLI. Journal of Communication Disorders, 30, 1–9. Kuehn, D. P., Lemme, M. L., & Baumgartner, J. M. (1989). Neural bases of speech, hearing, and language. San Antonio, TX: ProEd. Lahey, M. (1988). Language disorders and language development. New York: Macmillan. Lahey, M., & Edwards, J. (1995). Specific language impairment: Preliminary investigation of factors associated with family history and with patterns of language performance. Journal of Speech and Hearing Research, 38, 643–657. Lahey, M., & Edwards, J. (1996). Why do children with specific language impairment name pictures more slowly than their peers? Journal of Speech and Hearing Research, 39, 1081–1098. Leonard, L. (1989). Language learnability and specific language impairment in children. Applied Psycholinguistics, 10, 179–202. Leonard, L. (1995). Functional categories in the grammars of children with specific language impairment. Journal of Speech and Hearing Research, 38, 1270–1283. Leonard, L. (1998). Children with specific language impairment. Cambridge, MA: MIT Press. Leonard, L., Eyer, J., Bedore, L., & Grela, B. (1997). Three accounts of the grammatical morpheme difficulties of Englishspeaking children with specific language impairment. Journal of Speech and Hearing Research, 40, 741–753.
Page 143 Levy, D. (1996, January 5). Sound games help teach impaired kids. USA Today, p. 1A. Locke, J. (1994). Gradual emergence of developmental language disorders. Journal of Speech and Hearing Disorders, 37, 608–616. Loeb, D., & Leonard, L. (1991). Subject case marking and verb morphology in normally developing and specifically languageimpaired children. Journal of Speech and Hearing Research, 34, 340–346. Menyuk, P. (1993). Children with specific language impairment (Developmental dysphasia): Linguistic aspects. In G. Blanken, J. Dittman, H. Grimm, J. C. Marshall, C. W. Wallesch (Eds.), Linguistic disorders and pathologies. An international handbook (pp. 606–625). Berlin, Germany: deGruyter. Merzenich, M., Jenkins, W., Johnston, P., Schreiner, C., Miller, S., & Tallal, P. (1996, January). Temporal processing deficits of languagelearning impaired children ameliorated by training. Science, 271, 77–81. Neils, J., & Aram. D. (1986). Family history of children with developmental language disorders. Perceptual and Motor Skills, 63, 655–658. Nelson, K. E., Camarata, S., Welsh, J., Butkovsky, L., & Camarata, M. (1996). Effects of imitative and conversational recasting treatment on the acquisition of grammar in children with specific language impairment and younger languagenormal children. Journal of Speech and Hearing Research, 39, 850–859. Newhoff, M. (1977). Maternal linguistic behavior in relation to the linguistic and developmental ages of children. Unpublished doctoral dissertation, Memphis State University, Tennessee. Nippold, M. A. (1998). Later language development: The schoolage and adolescent years. (2nd ed.). Austin, TX: ProEd. Olswang, L., Rodriguez, B., & Timlet, G. (1998). Recommending intervention for toddlers with specific language learning difficulties: We may not have all the answers, but we know a lot. American Journal of SpeechLanguage Pathology, 7, 23–32. Paul, R. (1996). Clinical implications of the natural history of slow expressive language development. American Journal of SpeechLanguage Pathology, 5, 5–21. Pembrey, M. (1992). Genetics and language disorder. In P. Fletcher & D. Hall (Eds.), Specific speech and language disorders in children (pp. 51–62). San Diego, CA: Singular Press. Plante, E. (1991). MRI findings in the parents and siblings of specifically languageimpaired boys. Brain and Language, 41, 67–80. Plante, E. (1996). Phenotypic variability in brainbehavior studies of specific language impairment. In M. Rice (Ed.), Toward a genetics of language (pp. 317–335). Mahwah, NJ: Lawrence Erlbaum Associates. Plante, E. (1998). Criteria for SLI: The Stark and Tallal legacy and beyond. Journal of Speech, Language, and Hearing Research, 41, 951–957. Plante, E., Shenkman, K., & Clark, M. (1996). Classification of adults for family studies of developmental language disorders. Journal of Speech and Hearing Research, 39, 661–667. Plante, E., Swisher, L., & Vance, R. (1989). Anatomical correlates of normal and impaired language in a set of dizygotic twins. Brain and Language, 37, 643–655. Plante, E., Swisher, L., Vance, R., & Rapcsak, S. (1991). MRI findings in boys with specific language impairment. Brain and Language, 41, 52–66. Rapin, I. (1996). Practitioner review: Developmental language disorders: A clinical update. Journal of Child Psychology and Psychiatry, 37, 643–655. Rapin, I., & Allen, D. (1983). Developmental language disorders: Nosologic considerations. In U. Kirk (Ed.), Neurospychology of language, reading, and spelling (pp. 155–184). New York: Academic Press. Rapin, I., & Allen, D. (1988). Syndromes in developmental dysphasia and adult aphasia. In F. Plum (Ed.), Language, communication, and the brain (pp. 57–75). New York: Raven Press. Records, N. L., Tomblin, J. B., & Freese, P. (1992). The quality of life of young adults with histories of speechlanguage impairment. American Journal of Speech Language Pathology, 1, 44–53. Rescorla, L. (1991). Identifying expressive language delay at age 2. Topics in Language Disorders, 11(4), 14–20. Rice, M. L. (1996). Of language, phenotypes, and genetics: Building a crossdisciplinary platform for inquiry. In M. Rice (Ed.), Towards a genetics of language (pp. xi–xxv), Mahwah, NJ: Lawrence Erlbaum Associates.
Page 144 Rice, M. L., Wexler, K. and Cleave, P. (1995). Specific language impairment as a period of extended optional infinitive. Journal of Speech and Hearing Research, 38, 850–863. Roby, C. (1994). When learning is tough: Kids talk about their learning disabilities. Morton Grove, IL: Albert Whitman & Company. Rutter, M., Mawhood, L., & Howlin, P. (1992). Language delay and social development. In P. Fletcher & D. Hall (Eds.), Specific speech and language disorders in children (pp. 63–78). San Diego, CA: Singular Press. Scarborough, H., & Dobrich, W. (1990). Development of children with early language delay. Journal of Speech and Hearing Research, 33, 70–83. Schachter, D. C. (1996). Academic performance in children with speech and language impairment: A review of followup research. In J. H. Beitchman, N. J. Cohen, M. M. Konstantaras, & R. Tannock (Eds.), Language, learning, and behavior disorders: Developmental, biological, and clinical perspectives (pp. 515–529). Cambridge, England: Cambridge University Press. Scientific Learning Corporation. (1998). Fast ForWord [Computer software]. Berkeley, CA: Author. Snow, C. E. (1996). Toward a rational empiricism: Why interactionism is not behavior any more than biology is genetics. In M. L. Rice (Ed.), Toward a genetics of language (pp. 377–396). Mahwah, NJ: Lawrence Erlbaum Associates. Stark, R. E., & Tallal, P. (1981). Selection of children with specific language deficits. Journal of Speech and Hearing Disorders, 46, 114–122. Stark, R. E., & Tallal, P. (1988). R. J. McCauley (Ed.), Language, speech, and reading disorders in children: Neuropsychological studies. Boston: Little, Brown/CollegeHill. Stevenson, J. (1996). Developmental changes in the mechanisms linking language disabilities and behavior disorders. In J. H. Beitchman, N. J. Cohen, M. M. Konstantaras, & R. Tannock (Eds.), Language, learning, and behavior disorders: Developmental, biological, and clinical perspectives (pp. 78–99). Cambridge, England: Cambridge University Press. Stothard, S. E., Snowling, M. J., Bishop, D. V. M., Chipchase, B. B., & Kaplan, C. A. (1998). Languageimpaired preschoolers: A followup into adolescence. Journal of SpeechLanguageHearing Research, 41,407–418. TagerFlusberg, H., & Cooper, J. (1999). Present and future possibilities for defining a phenotype for specific language impairment. Journal of Speech, Language, and Hearing Research, 42, 1273–1278. Tallal, P. (1976). Rapid auditory processing in normal and disordered language development. Journal of Speech and Hearing Research, 19, 561–571. Tallal, P., & Piercy, M. (1973). Developmental aphasia: Impaired rate of nonverbal processing as a function of sensory modality. Neuropsychologia, 11, 389–398. Tallal, P., Ross, R., & Curtiss, S. (1989). Familial aggregation in specific language impairment. Journal of Speech & Hearing Disorders, 54, 167–173. Tallal, P., Stark, R. E., Kallman, C., & Mellits, D. (1980). Developmental aphasia: The relation between acoustic processing deficits and verbal processing. Neuropsychologia, 18, 273–284. Tallal, P., Miller, S., Bedi, G., Byma, G., Wang, X., Nagarajan, S., Schreiner, C., Jenkins, W., & Merzenich, M. (1996, January). Language comprehension in languagelearning impaired children with acoustically modified speech, Science, 271, 81–84. Teszner, D., Tzavares, A., Grunder, J., & Hecaen, H. (1972). L’asymetrie droitegauche du planum temporale: A propos de l’etude anatomique de 100 cerveau [Rightleft asymmetry of the planum temporale; apropos of the anatomical study of 100 brains]. Neurological Review, 146, 444–449. Tomblin, J. B. (1989). Familial concentration of developmental language impairment. Journal of Speech and Hearing Disorders, 54, 287–295. Tomblin, J. B. (1996a, June). The big picture of SLI: Results of an epidemiologic study of SLI among kindergarten children. Paper presented at the Symposium for Research in Child Language Disorders, Madison, WI. Tomblin, J. B. (1996b). Genetic and environmental contributions to the risk for specific language impairment. In M. Rice (Ed.), Toward a genetics of language (pp. 191–210). Hillsdale, NJ: Lawrence Erlbaum Associates.
Page 145 Tomblin, J. B., & Buckwalter, P. (1994). Studies of genetics of specific language impairment. In R. Watkins & M. Rice (Eds.), Specific language impairments in children (pp. 17–35). Baltimore: Paul H. Brookes. Tomblin, J. B., Freese, P., & Records, N. (1992). Diagnosing specific language impairment in adults for the purpose of pedigree analysis. Journal of Speech and Hearing Research, 35, 832–843. Tomblin, J. B., Records, N. L., Buckwalter, P., Zhang, X., Smith, E., & O’Brien, M. (1997). Prevalence of specific language impairment in kindergarten children. Journal of Speech, Language, and Hearing Research, 40, 1245–1260. Trauner, D., Wulfeck, B., Tallal, P., & Hesselink, J. (1995). Neurologic and MRI profiles of language impaired children. Technical Report CND9513. Center for Research in Language, University of California at San Diego. van der Lely, H. (1996). Specifically language impaired and normally developing children: Verbal passive vs. adjectival passive interpretation. Lingua, 98, 243–272. van Kleeck, A., Gillam, R. B., & Davis, B. (1997). When is “watch and see” warranted? A response to Paul’s 1996 article “Clinical implications of the natural history of slow expressive language development.” American Journal of SpeechLanguage Pathology, 6, 34–39. VarghaKadeem, F., Watkins, K., Alcock, K., Fletcher, P., & Passingham, R. (1995). Praxic and nonverbal cognitive deficits in a large family with a genetically transmitted speech and language disorder. Proceedings of the National Academy of Sciences, 92, 930–933. Veale, T. K. (1999). Targeting temporal processing deficits through Fast ForWord: Language therapy with a new twist. Language, Speech, and Hearing Services in Schools, 30, 353–362. Wallach, G. P., & Butler, K. G. (1994). Language learning disabilities in schoolage children and adolescents: Some principles and applications. New York: Macmillan. Watkins, R. V. (1994). Specific language impairments in children: An introduction. In R. V. Watkins & M. L. Rice (Eds.), Specific language impairments in children (pp. 1–15). Baltimore: Paul H. Brookes. Wender, E. (1995). Hyperactivity. In S. Parker & B. Zuckerman (Eds.), Behavioral and developmental pediatrics (pp. 185–194). Boston: Little, Brown. Weiner, P. (1974). A languagedelayed child at adolescence. Journal of Speech and Hearing Disorders, 39, 202–212. Wilson, B., & Risucci, D. (1986). A model for clinicalquantitative classification. Generation I: Application to languagedisordered preschool children. Brain & Language, 27, 281–309.
Page 146 CHAPTER
6 Children with Mental Retardation Defining the Problem Suspected Causes Special Challenges in Assessment Expected Pattern of Strengths and Weaknesses Related Problems Tracy, a 10yearold with Down syndrome, attends a regular classroom, where her voice often rings out as she expresses exuberant enthusiasm for all the fun things that happen. Tracy speaks in short sentences that are frequently difficult to understand. Although she sometimes shows considerable frustration with others’ not understanding her, most of the time Tracy appears oblivious to their lack of understanding. A speechlanguage pathologist works with her on goals related to syntax and intelligibility, usually within the classroom. Seth, a 4yearold with cerebral palsy and epilepsy as well as mental retardation, attends a special preschool classroom irregularly because of his frequent illnesses. In the classroom, he spends much of his time in a wheelchair or adaptive seat, which was designed to provide him with the postural support needed for him to control his head movements. In addition to working with him in the classroom, a speechlanguage pathologist visits his home once a week to work with Seth and his mother. Seth vocalizes infrequently and often seems unaware of others in his environment. Goals for him
Page 147 include establishing nonverbal turntaking skills and increasing the frequency of his vocalizations. Jake is a 12yearold boy with mild mental retardation associated with Fetal alcohol syndrome. Although his comprehension skills test within the normal range, and he is generally understandable in his language production, Jake has considerable difficulty in following directions in school. He has been diagnosed with ADD and requires frequent redirecting to stay involved in classroom activities. Although he is eager to establish friendships with his classmates, his ability to use social cues to guide his communications appears inconsistent. Intervention for Jake includes individual, attention within the classroom and participation in a social skills group with the speechlanguage pathologist one time per week. Defining the Problem Tracy, Seth, and Jake are representative of the approximately 3% of schoolage children in the United States who exhibit problems associated with mental retardation (Roeleveld, Zielhuis, & Gabreels, 1997), where mental retardation can be defined as reduced intelligence accompanied by reduced adaptive functioning, that is, reduced ability to function in everyday situations in a manner considered culturally and developmentally appropriate. Because communication is a particularly important adaptive function affected by mental retardation, speechlanguage pathologists often work with affected children and their families. About 85% of children with mental retardation experience mild problems (Lubetsky, 1990) and may not be identified as mentally retarded until they reach school age. Children with more significant degrees of impairment are often identified at an earlier point because their delays in achieving developmental milestones are more pronounced and because they often have additional medical difficulties, such as cerebral palsy or epilepsy (Durkin & Stein, 1996). Although mental retardation is usually present from birth, it can also be diagnosed for conditions that can occur up to 18 years of age, including exposure to environmental toxins such as lead over the first few years of life. Despite the brief definition offered earlier, formulating a more complete, usable definition of mental retardation that is equally acceptable to families, advocates, scientists, clinicians, and politicians has proved controversial and difficult—some would say impossible—particularly where milder forms of retardation are concerned (Baumeister, 1997; Roeleveld et al., 1997). Table 6.1 provides two of the most influential definitions currently being used—those proposed by the American Association for Mental Retardation (AAMR) and the American Psychiatric Association. The AAMR and American Psychiatric Association definitions both specify impairment in adaptive skills as a critical element in the identification process. Traditionally, IQ score alone, with less attention to adaptive skills, was central to the identification process. These two newer definitions address essentially the same adaptive skills (viz., communication, selfcare, home living, social skills, community use, selfdirection, health and safety, functional academics, leisure, and work). Despite this uniformity, however, these definitions are still quite controversial because of significant concerns
Page 148 Table 6.1 Two Influential Definitions of Mental Retardation American Association on Mental Retardation (Luckasson, 1992) Mental retardation refers to substantial limitations in present functioning. It is characterized by significantly subaverage intellectual functioning, existing concurrently with related limitations in two or more of the following applicable adaptive skill areas: communication, selfcare, home living, social skills, community use, self direction, health and safety, functional academics, leisure, and work. Mental retardation manifests before age 18. American Psychiatric Association (1994) Diagnostic criteria: A.
Significantly subaverage intellectual functioning: an IQ of approximately 70 or below on an individually administered IQ test (for infants, a clinical judgment of significantly subaverage intellectual functioning);
Concurrent deficits or impairments in present adaptive functioning (i.e., the person’s effectiveness in meeting the standards expected for his or her age by his or B. her cultural group) in at least two of the following areas: communication, selfcare, home living, social and interpersonal skills, use of community resources, self direction, functional academic sklls work, leisure, health, and safety; C. The onset is before age 18 years.
about the lack of valid measures for many adaptive skill areas (e.g., Jacobson & Mulick, 1996; Macmillan & Reschly, 1997) and because of debates about the number of dimensions needed to capture adaptive functioning (Simeonsson & Short, 1996). Although not evident in Table 6.1, the complete AAMR and American Psychiatric Association definitions differ sharply in their handling of severity. Whereas the American Psychiatric Association definition maintains a traditional treatment of severity using a system with five levels (see Table 6.2), the AAMR system (Luckasson, 1992) replaces those with the description of levels of support needed by the individual (intermittent, limited, extensive, and pervasive) for intellectual ability and for each adaptive skill separately. Because treatment recommendations are often formulated on the basis of severity (Durkin & Stein, 1996), this change in the AAMR definition represents a major departure from longstanding practice. Table 6.2 Degrees of Severity of Mental Retardation Used by the American Psychiatric Association (DSM–IV, 1994) Degree
IQ Level
Mild mental retardation
50–55 to approximately 70
Moderate retardation
35–40 to 40–55
Severe mental retardation
20–25 to 35–40
Profound mental retardation
Below 20 or 25
Mental retardation, severity unspecified
Used when there is a strong presumption of mental retardation but the individual cannot be tested using standardized instruments
Page 149 Radical changes in definitions such as those just described can affect the ways in which governmental and other agencies determine which children are eligible for assistance. They also affect researchers who must identify the group of individuals to whom their research can be generalized and clinicians as they work within the bureaucracy to help affected children and their families (Macmillan & Reschly, 1997). Therefore, although wrangles over definitions can seem irrelevant to a basic understanding of language impairment and its assessment in children with mental retardation, they are powerful in determining how such children can be helped. For example, depending on which of the two definitions described in this section is used and exactly how it is implemented, Jake, the third child described at the beginning of the chapter, might not be identified as a child requiring special attention in the school setting. Suspected Causes Until the past decade, only about 25% of cases of mental retardation were associated with known organic causes (e.g., Down syndrome, perinatal trauma; Grossman, 1983). Recent advances, however, bring that figure up to about 50% (American Psychiatric Association, 1994; Baumeister, 1997), with a wide range of organic causes now identified. Such causes are often associated with more severe cases of mental retardation (Rosenberg & Abbeduto, 1993). Organic Causes
Classification of the many pre, peri, and postnatal organic causes of mental retardation reveals human vulnerability to a myriad of factors that can alter later neurologic development and function. Table 6.3 presents a lengthy but far from complete list of predisposing factors. Knowledge of causation can help in efforts to prevent retardation in some individuals, to counsel families regarding its likelihood of recurring in later children, and to develop treatments that can prevent or ameliorate long term negative consequences. Three important known causes of mental retardation are Down syndrome, fragile X syndrome, and fetal alcohol syndrome. Each of these conditions is described as a syndrome because it is associated with a ‘‘common set of physical traits or malformations sharing a similar prognosis” (Batshaw & Perret, 1981). Two of these syndromes have genetic causes; the third, fetal alcohol syndrome, has a preventable cause—namely, intrauterine exposure to alcohol, a powerful toxin to the developing brain. Consideration of these syndromes demonstrates the intimate connections between the cause of mental retardation and the nature of communication and other difficulties confronting affected children (Cromer, 1981; Hodapp & Dykens, 1994; Hodapp, Leckman, Dykens, Sparrow, Zelinsky, & Ort, 1992; cf. Hodapp & Zigler, 1990). Down syndrome and fragile X syndrome are the most common genetic birth defects associated with mental retardation. Beginning to understand these two conditions, therefore, depends on at least a barebones grasp of human genetics, which will be offered here. More lengthy treatments can be found in resources such as M. M. Cohen (1997).
Page 150 Table 6.3 Categories of Organic Predisposing Factors Associated With Mental Retardation (American Psychiatric Association, 1994, p. 43)
Category
Percentage of cases of mental retardation associated with this factor 5
Inborn errors of metabolism (e.g., TaySachs disease) Singlegene abnormalities (e.g., tuberous sclerosis) l Chromosomal aberrations (e.g., fragile X syndrome, a small number of cases of Down syndrome) l l
Heredity
Early alterations of embryonic development
Specific conditions
30 l l
Pregnancy and perinatal problems
10
General medical conditions acquired in infancy or childhood
5
Chromosomal changes (most cases of Down syndrome—those due to trisomy 21) Prenatal damage due to toxins (e.g., maternal alchohol consumption, infections)
Fetal malnutrition, prematurity, hypoxia (oxygen deficiency), viral and other infections, and trauma l
l
Infections, traumas, and poisoning (e.g., due to lead)
Probably the most basic facts in genetics include the information that all cells in the human body except for the reproductive cells (sperm in men and ova in women) contain 23 pairs of chromosomes. These 23 chromosome pairs consist of 22 pairs of numbered autosomes and 1 pair of sex chromosomes, which are identified as XX for women and XY for men. These chromosomes, which hold many individual genes, act as the blueprints for cell function and thus determine an individual’s physical makeup. Unlike other human cells, ova and sperm cells have half the usual number of chromosomes—23 nonpaired chromosomes and one sex chromosome. During the reproductive process, this feature of reproductive cells allows each parent to contribute one half of each offspring’s genetic material as the genetic materials of both reproductive cells are combined during fertilization. Because chromosomes contain numerous genes, defects to either the larger chromosomes or to individual genes can result in impaired cellular function during embryonic development and later life. Down syndrome is an example of an autosomal genetic disorder in which extra genetic material is found at chromosome pair 21. This condition arises about once in every 800 live births, making it the most common genetic disorder associated with mental retardation. About 95% of the time, Down syndrome occurs because an entire
Page 151
Fig. 6.1. Graphic representation of the genetic test used to identify the presence of Trisomy 21. From Babies With Down Syndrome: A New Parents Guide (p. 8), by K. StrayGunderson (Ed.), 1986, Kensington, MD: Woodbine House. Copyright 1986 by Woodbine House. Reproduced with permission. extra chromosome is present, resulting in the individual’s possessing three chromosomes of chromosome 21 known as trisomy 21, instead of the normal pairing of chromosomes (Bellenir, 1996). Figure 6.1 illustrates the complete set of chromosomes associated with a girl who has Down syndrome. Less frequently, Down syndrome is associated with only a portion of an extra chromosome occurring at chromosome 21 or with the occurrence of an entire extra chromosome 21, but only in some cells within the body (termed mosaic Down syndrome). Usually the chromosomal defect occurs during the development of an individual ovum, but it can occur because of a sperm defect or a defect occurring after the uniting of the sperm and ovum in fertilization. Because of this timing of the change in the genetic material, Down syndrome is described as a genetic disorder, but not an inherited one, in which both parent and child are affected. Down syndrome is associated with a characteristic physical appearance, involving slanted eyes, small skin folds on the inner corner of the eyes (epicanthal folds), slightly protruding lips, small ears, an overly large tongue (macroglossia), and short hands, feet, and trunk (Bellenir, 1996). Figure 6.2 shows two young children with this syndrome. Other more serious physical anomalies found among children with Down syndrome affect the cervical spine, bowel, thyroid, eyes, and heart (Cooley & Graham, 1991). Children with Down syndrome are more susceptible to infection, including otitis media
Page 152
Fig. 6.2. Two children with Down syndrome. (Cooley & Graham, 1991), and are 20 times more likely than other children to develop and die from leukemia. Because of these abnormalities, as a group their life expectancy is somewhat shortened, despite recent advances in the correction of congenital heart defects, improved control of infections, and avoidance of institutionalization. Roughly 80% of these children will live to the age of 30 and beyond (Cooley & Graham, 1991). Adults with Down syndrome have also been shown to be at increased risk for the onset of Alzheimer’slike dementia, or decline in intellectual function (Connor & FergusonSmith, 1997; Zigman, Schupf, Zigman, & Silverman, 1993) Fragile X syndrome is currently thought to be the single most common inherited cause of mental retardation (Baumeister & WoodleyZanthos, 1996). Although it occurs less frequently than Down syndrome—the most frequent genetic cause of mental retardation—fragile X is more frequently inherited than Down syndrome because Down syndrome is almost never passed from one generation to the next. Fragile X occurs about once in every 1250 to 2500 men and about half that often in women (Bellenir, 1996). Although fragile X can occur in either gender, it is more often associated with mental retardation in affected men. When mental retardation occurs, it can range from mild to profound levels, with generally milder impairments in affected women (Dykens, Hodapp, & Leckman, 1994). Because its patterns of inheritance are more complex than those seen in other previously identified genetic disorders, fragile X was only identified in the 1970s (Lehrke, 1972). Fragile X syndrome involves the single gene FMR1, present on the X chromosome, which can be defective or absent. A partially defective gene is referred to as a pre
Page 153 mutation and may be associated with very mild or even no obvious problems in the affected person. When the defect is greater, or the gene FMR1 is absent, more serious problems, including severe to profound mental retardation, are the likely outcome. Fragile X syndrome is inherited through an Xlinked mode of transmission (similar to hemophilia) in which some individuals are “carriers” (usually women) and others are affected individuals (usually men). Fathers have only one X and one Y chromosome. Consequently, they can only transmit a defective X chromosome to a daughter, who will have received a second X from her mother (who has two X and no Y chromosomes). Because only one of the two X chromosomes in a girl is likely to be active, it is possible for daughters to appear unaffected, but to be carriers of the defective chromosome. They can also be affected, however, if they possess two defective X chromosomes or if the defective X chromosome for some reason is the active one. About one third of those girls with the defective gene will be of normal intelligence, one third will have borderline intelligence, and one third will have greater degrees of mental retardation (American College of Medical Genetics, 1997). About 50% of the male offspring of carrier women will demonstrate fragile X syndrome (Dykens et al., 1994), and most of these children will have mental retardation. Boys with fragile X and mental retardation often share the following physical traits: a long, narrow face; long, thick, prominent ears; and overly large testicles (Dykens et al., 1994). Figure 6.3 shows two youngsters with this condition. Beyond the physical traits noted throughout life, children are at risk for obesity during adolescence. On the basis of a smaller number of studies than those undertaken for males with fragile X, it appears that females with fragile X show some similar traits to those of males, although to a lesser extent. Conditions that tend to accompany mental retardation in children with fragile X are ADD and ADHD, anxiety and mood difficulties, as well as auditory and visual problems (Dykens et al., 1994). Considerable controversy has surrounded the relationship between fragile X and autistic disorder (chap. 7, this volume; I. L. Cohen, 1995). There has been some speculation that the rate of cooccurrence may be due to the level of mental retardation rather than to etiology (Dykens et al., 1994). However, work by I. L. Cohen (1995) suggests that boys with both autism and fragile X are more significantly impaired than would be expected if the effects of each condition were simply additive. Fetal alcohol syndrome (FAS) refers to the constellation of physical abnormalities, deficient growth patterns, and cognitive and behavioral problems found in children whose mothers drank heavily during pregnancy. Fetal alcohol effect (FAE) is a closely related diagnosis in which only some portion of the constellation of abnormalities described for FAS is seen in the affected child (Stratton, Howe, & Battaglia, 1996). Although a possible connection between alcohol consumption by mothers during pregnancy and subsequent birth defects has been known throughout history, only in the late 1960s and early 1970s was FAS formally described (Stratton et al., 1996). Despite its having received considerable attention only recently, FAS has been proposed as the “most common known nongenetic cause of mental retardation” (Stratton et al., 1996, p. 7), with estimates of incidence ranging from 0.5 to 3 births per 1000 live births (Stratton et al., 1996). The higher of these incidence figures makes FAS a more frequent cause of mental retardation than either Down syndrome or fragile X syndrome.
Page 154
Fig. 6.3. Two young boys with Fragile X syndrome. In addition, it is thought to be widely underdiagnosed (Maxwell & GeschwintRabin, 1996). Alcohol is one of many different substances well known to be toxic to the developing central nervous system. However, the specific mechanism by which alcohol consumption leads to the variety of difficulties seen in FAS or FAE is poorly understood (Baumeister & WoodleyZanthos, 1996). In general, the magnitude and nature of a toxin’s effects on prenatal development are thought to be closely related to the amount of the toxin, the timing of exposure, and the genetic makeup of the mother and child (Stratton et al., 1996). Currently, however, little is known about how those variables interact to produce the broad range of effects seen in children with full FAS or with FAE. Particularly puzzling are observations that some women who drink very heavily throughout their pregnancy can give birth to unaffected children, whereas other women who drink far less can give birth to children with severe symptoms. This uncertainty about how damage is caused has resulted in strong prohibitions against drinking during pregnancy until more is known about what, if any, degree of exposure is safe. As a group, children with full FAS tend to have mild mental retardation, but for individual children, cognitive levels can range from severe retardation to normal func
Page 155 tion. In addition to mental retardation, cardiac and skeletal abnormalities and vision problems have also been noted. Facial abnormalities apparent during early childhood include the presence of epicanthal folds (such as those seen in Down syndrome), eyelids that are overly narrow from inner to outer corner, a flat midface, smooth or long philtrum (area above the upper lip), and thin upper lip (Sparks, 1993). These facial features are sometimes less pronounced in infancy and after childhood, so they are not as useful as indicators of this problem for some age groups as for others. Congenital hearing loss is another area of increased risk (Stratton et al., 1996). Figure 6.4 shows two youngsters affected by FAS. Nonorganic Suspected Causes
Despite the growing frequency with which biological causes of mental retardation are identified, about half of all cases of mental retardation do not have such well defined explanations. In such cases, the degree of retardation tends to be milder, and the retardation tends to be associated with a family history of mental retardation and low SES (Rosenberg & Abbeduto, 1993). Historically, such cases were classified as ‘‘nonorganic” or “familial” mental retardation.
Fig. 6.4. Two youngsters with fetal alcohol syndrome. From Fetal Alcohol Syndrome: Diagnosis, Epidemiology, Prevention, and Treatment (Figure 11, p. 18), by K. Stratton, C. Howe, & F. Battaglia (Eds.), 1996, Washington, DC: National Academy Press. Copyright 1996 by National Academy Press. Reproduced with permission.
Page 156 Despite the implications that these cases involve social or experiential bases, there is considerable speculation that nonorganic cases of mental retardation may actually reflect our current lack of knowledge rather than truly nonorganic causes (Baumeister, 1997; Richardson & Koller, 1994). Many cases now identified as nonorganic may be recategorized as the relationship of low SES and family history to exposure to environmental toxins (e.g., lead), poor nutrition, and other ultimately organic causes are uncovered. The one major, truly nonorganic factor associated with mental retardation is severe social deprivation, as a result of either inadequate institutional conditions or limitations of a child’s principal caregiver (Richardson & Koller, 1994). Yet even that mechanism may act by depriving the infant’s maturing nervous system of the proper inputs to promote specific physiological states required for brain development. Special Challenges in Assessment One of the most important things to keep in mind when trying to understand any child is his or her uniqueness—the uniqueness of current strengths and weaknesses, history, and family situation. Most important, there is the need to remember that uniqueness that makes them “Tracy” or “Seth” or “Jake,” rather than just the child with a particular syndrome and pattern of deficits. Assessing children with mental retardation tempts some individuals to equate them with their level of retardation or its etiology and tempts some people to pay attention to what they cannot do rather than to what they are doing in their communications. Personal Perspective 6 hints at the negative effects of such a mistake. PERSONAL PERSPECTIVE The following passage is taken from a book written by a pair of young adult friends who have each been diagnosed with Down syndrome. The title of their book is Count Us in: Growing up With Down Syndrome (Kingsley & Levitz, 1994, p. 35). August ‘90 Mitchell: I wish I didn’t have Down syndrome because I would be a regular person, a regular mainstream normal person. Because I didn’t know I had Down syndrome since a long time ago, but I feel very special in many ways. I feel that being with, having Down syndrome, there’s more to it than I expected. It was very difficult but… I was able to handle it very well. Jason: I’m glad to have Down syndrome. I think it’s a good thing to have for all people that are born with it. I don’t think it’s a handicap. It’s a disability for what you’re learning because you’re learning slowly. It’s not that bad. (p. 35) How do you avoid these temptations? First, plan assessments using initial hypotheses about developmental levels and patterns of impairment (which will be described in the next section) and on information obtained from caregivers or others who know
Page 157 the child well. Framing the assessment questions with special clarity can help you anticipate the particular challenges individual children might pose to the validity of conventional instruments. Second, prepare to alter your plan as needed to keep the child engaged and interacting. Not only does this mean that you may need to turn away from a standardized instrument midstream (e.g., if it is developmentally inappropriate) in favor of a more informal or dynamic assessment method (see chap. 10), you may also want to consider the use of adaptations. Test adaptations are changes made in the test stimuli, response required of the child, or testing procedures (Stagg, 1988; Wasson, Tynan, & Gardiner, 1982). On the one hand, the use of test adaptations threatens the validity of normreferenced comparisons that may be made using the instrument. Therefore, if a clinical question that really requires that kind of comparison is at stake (e.g., an initial evaluation in which a difference from norms must be demonstrated to help a child receive services), the clinician will avoid adaptations if possible. On the other hand, when some aspect of the standard administration other than the basic skill or knowledge being tested interferes with a child’s ability to reveal his or her actual skill or knowledge, one can argue that the validity of the comparison has already been severely compromised. Table 6.4 lists some of the most common adaptations used. Regardless of which adaptations are used, they should be described in reports of test results and the clinician should com Table 6.4 Examples of Testing Adaptations Used Frequently With Children With Mental Retardation and Frequent Coexisting Problems (Stagg, 1988) Reason for Adaptation
Recommended Adaptations Increased use of social, tangible, and activity reinforcers (Fox & Wise, 1981) Breaking up administration into smaller periods of time to maximize attention l Use of auditory commands or visual cueing (e.g., with a light pen) to direct attention prior to each item (Wasson, Tynan, & Gardiner, 1982) l
Attention and motivation
l
Replacement of tabletop administration to position a child to achieve optimal motor performance Use of alternative response modes (e.g., gazel., head pointers, oral instead of pointing; Wasson, Tynan, & Gardiner, 1982) l Removal of response time restrictions l Breaking up administration into smaller periods of time to address fatigue l l
Motor skills
l
Hearing
l l
Vision
l l
Substitution of sign for oral presentation Addition of gesture or sign to oral presentation (Wasson, Tynan, & Gardiner, 1982) Positioning to enhance child’s access to visual information and to optimize residual hearing Substitution of standard visual stimuli by highcontrast stimuli or larger stimuli Placement of all stimuli within the child’s visual field (as determined prior to testing)
Page 158 ment on the extent to which these adaptations are likely to interfere with the valid use of norms. Related to the use of adaptations is a method that Sattler (1988) has proposed as a followup to standardized test administration —testing of limits. A test of limits involves (a) providing additional cues, (b) changing test modality (e.g., from written to oral), (c) establishing methods used by the tested child, (d) eliminating time limits, and (e) asking probing questions designed to clarify a child’s thinking leading to a response. It is meant to help the clinician gain an appreciation of how a child has approached the task and what aspects of it interfered with success. It is closely related to dynamic assessment approaches, which I describe in greater detail in chapter 10. Special issues in testing that I discuss in later chapters of the book are outoflevel testing and discrepancy testing. Except for brief definitions, these topics are not addressed here because they are also relevant to some of the other groups of children discussed in the next few chapters. Outoflevel testing (Berk, 1984) refers to the use of an instrument developed for children of a different age group from that of the child to be tested. In the context of children with mental retardation, this is done in order to use content that is developmentally appropriate. This practice is discussed again in chapter 10. Discrepancy testing refers to the comparison of performances in two different behavioral or skill areas (e.g., between ability and achievement) to determine whether a discrepancy exists. This kind of testing is important for children with mental retardation because it will often be required as part of the procedures dictated within an educational system to justify the provision of specific kinds of assistance. This topic is discussed repeatedly throughout this book, but especially in chapters 9 and 10, because it represents one of the greatest contemporary challenges to assessment. Expected Pattern of Strengths and Weaknesses In psychology and special education, level of mental retardation has played a much greater role than etiology in the identification of participants for research studies and the development of treatment approaches (Baumeister, 1997; Hodapp & Dykens, 1994). However, there is a growing sensitivity that both etiology (e.g., Down syndrome, fragile X syndrome) and level of mental retardation (viz., mild, moderate, severe, profound) provide useful bases for some tentative predictions regarding likely patterns of behavioral strengths and weaknesses (Miller & Chapman, 1984). Syndromes for which communication skills have been extensively studied are Down syndrome and, to a lesser extent, fragile X syndrome. Several other syndromes, such as Williams Syndrome (Bellugi, Marks, Bihrle, & Sabo, 1993; Mervis, 1998), Prader Willi (Donaldson, Shu, Cooke, Wilson, Greene, & Stephenson, 1994), and Turner Syndrome (Downey, Ehrhardt, Gruen, Bell, & Morishima, 1989) have begun to be studied. Table 6.5 summarizes tentative patterns of strengths and weaknesses as they have been suggested for children with Down syndrome, fragile X, FAS, and Williams syndrome, a congenital metabolic disease usually associated with moderate to severe learning difficulties. (Williams syndrome was not discussed previously in this chapter
Page 159 Table 6.5 Patterns of Strengths and Weaknesses Among Children With Mental Retardation Associated With Down Syndrome, Fragile X Syndrome, Fetal Alcohol Syndrome, and Williams Syndrome Syndrome
Relative Strengths in Communication
Relative Weaknesses in Communication
Down syndrome
Morphology (Fowler, 1990; Rondal, 1996) Syntax (Fowler, 1990; Rondal, 1996) l Phonology (Rondal, 1996) l Expressive skills relative to receptive skills (Dykens, Hodapp, & Leckman, 1994) l Semantics (Rondal, 1996) l Pragmatics (e.g., turntaking, l Plateauing of development in above areas diversity of speech acts; Rondal, 1996) from late childhood on (Rondal, 1996) l Nonverbal social interaction skills l Auditory processing (Hodapp, 1996) (Hodapp, 1996) l Nonverbal requesting behavior (Hodapp, 1996) l Increased risk of hearing loss (Bellenir, 1996) l Increased risk of fluency disorder (Bloodstein, 1995) l l
Fluency abnormalities (e.g., perseverative and staccato speech, rate of speech, cluttering; Rondal & Edwards, 1997). l Pragmatics, especially poor eye contact and other autisticlike behaviors (Rondal & Edwards, 1997) l Phonology, difficulty in sequencing syllables (Dykens, Hodapp, & Leckman, 1994; Rondal & Edwards, 1997) l
Expressive vocabulary skills (Rondal & Edwards, 1997) l Possibly syntax (although sometimes grammar has been identified as a weakness; Dykens, Hodapp, & Leckman, 1994; Rondal & Edwards, 1997) l
Fragile X syndromea
(Continued)
Other Strengths and Weaknesses Strengths: l Adaptive behavior (Hodapp, 1996) l “Pleasant personality” (Hodapp, 1996) Weaknesses: l Low task persistence (Hodapp, 1996) l Mathematics (Hodapp, 1996) l Inadequate motor organization (Hodapp, 1996) l Visually directed reaching (Hodapp, 1996) l Visual monitoring l Hypotonia(Hodapp, 1996) l Slow orienting to auditory information (Hodapp, 1996) Strengths: l Adaptive skills (especially in personal and domestic skills; Dykens, Hodapp, & Leckman, 1994) Weaknesses: l Attention deficits and hyperactivity (Dykens, Hodapp, & Leckman, 1994) l Social avoidance and shyness (Dykens, Hodapp, & Leckman, 1994)
Page 160 Table 6.5 (Continued) Syndrome
Relative Strengths in Communication
Fetal alcohol syndrome l Most areas of language relatively and fetal alcohol effect unaffected
Relative Weaknesses in Communication
Comprehension Pragmatics (e.g., frequently tangential responses; Abkarian, 1992) l l
Strengths: l Cognitive delays, when present, are usually mild Weaknesses: l Attentional problems or hyperactivity (Stratton, Howe, & Battaglia, 1996) l Increased risk for visual and hearing problems (Stratton, Howe, & Battaglia, 1996) l Increased risk for behavior problems (Stratton, Howe, & Battaglia, 1996)
Expressive language (Rondal & Edwards, 1997) l Morphology and syntax (Rondal & Edwards, 1997) Strengths: l Lexical knowledge (Rondal & l Facial recognition (Rondal & Edwards, 1997) Edwards, 1997) Receptive language (Udwin & Yule, 1990). l Weaknesses: l Metalinguistic knowledge (Rondal & l Pragmatics skills (socially inappropriate l Severe visuospatial deficits (Rondal & Edwards, 1997) content, poor eye contact; Rondal & Edwards, Edwards, 1997) 1997) l Fluency, prosody (Rondal & l Hyperacusis (negatively sensitive to noise), Edwards, 1997) especially younger children l Narrative skills (Rondal & Edwards, 1997) l Phonological skills (Rondal & Edwards, 1997) l
Williams syndromeb
Other Strengths and Weaknesses
aPatterns relate almost entirely to affected males because of the paucity of data on affected females. bPatterns based on a very limited database.
Page 161 because of its rarity.) Because the four groups of children described in Table 6.4 have experienced very different levels of scrutiny, they differ in the certainty with which these strengths and weaknesses are known (Hodapp & Dykens, 1994). Specifically, children with Down syndrome have received much more attention than those with fragile X, who have, in turn, received considerably more attention than those with Williams or FAS. Interestingly, there has even been some work suggesting that the specific type of chromosomal abnormality resulting in Down syndrome results in different prognoses for communication outcomes, with better communication skills predicted for those children with mosaic Down syndrome than with the more common trisomy 21 (Rondal, 1996). Related Problems Children with mental retardation are at risk for a variety of additional healthrelated and social problems, particularly if the retardation is more severe (American Psychiatric Association, 1994). For example, two medical conditions that occur frequently among children with severe or profound mental retardation are epilepsy and cerebral palsy, which have expected percentage of occurrence rates of 19–36% for epilepsy and 20–40% for cerebral palsy (Richardson & Koller, 1994). Overall, children with mental retardation, regardless of etiology, appear to be at four times the normal risk level for ADHD, although there is some question as to whether their attention problems are really manifestations of mental retardation rather than an independent additional problem (Biederman, Newcorn, & Sprich, 1997). Other behavioral and emotional problems are also observed more frequently among individuals with mental retardation than among others, including conduct disorder, anxiety disorders, psychozoidal disorder, and depression (Eaton & Menolascino, 1982). Often, the etiology of mental retardation is closely associated with risk levels for particular problems. For example, different kinds of visual problems are found in children with Down syndrome than in children with fragile X syndrome. Whereas children with Down syndrome will frequently experience nearsightedness and cataracts (Connor & FergusonSmith, 1997; Lubetsky, 1990), children with fragile X syndrome will more commonly have strabismus, a problem in the coordination of eye movements (Maino, Wesson, Schlange, Cibis, & Maino, 1991). Children with developmental and speech delays have also been found to be at increased risk for maltreatment, including physical abuse, sexual abuse, and neglect (Sandgrund, Gaines, & Green, 1974; Taitz & King, 1988). Given the close contact that speechlanguage pathologists frequently have with their clients, this increased incidence of maltreatment makes it particularly important for them to be aware of signs of maltreatment (Veltkamp, 1994). Summary 1. Mental retardation, which affects about 3% of children in the United States, involves reduced intelligence and reduced adaptive functioning. 2. More severe levels of mental retardation (i.e., moderate, severe, and profound) are often diagnosed relatively early, but are relatively uncommon, affecting only 15%
Page 162 of those children diagnosed with mental retardation. Mild mental retardation affects about 85% of children with mental retardation but tends to be diagnosed later— sometimes not until school age. 3. Definitions of mental retardation proposed by the AAMR and the American Psychiatric Association differ primarily in their characterization of severity, with the AAMR definition proposing levels of support needed for numerous intellectual and adaptive functions in place of levels of impairment. 4. Increasingly, organic factors, as opposed to familial or nonorganic factors, are being identified as reasonable explanations for cases of mental retardation. The three most common organic causes of mental retardation are Down syndrome, fragile X syndrome, and FAE. 5. Down syndrome and fragile X syndrome are the most frequent genetic sources of mental retardation. Down syndrome is almost always associated with a chromosomal abnormality, whereas fragile X syndrome is associated with an error involving a single gene on the X chromosome. 6. FAS, which is usually associated with mild mental retardation, is considered the most frequent preventable cause of mental retardation. 7. Assessment challenges include the need for particularly careful selection of developmentally appropriate instruments, increased need for less formal measures because of a lack of appropriate standardized measure, and the need to adapt tests to help insure that aspects of the child’s difficulties that are unrelated to the concept being tested are not preventing successful performance. 8. Expected patterns of communication performance are related to level of mental retardation and to etiology. Key Concepts and Terms adaptive functioning: reduced ability to function in everyday situations in a manner considered culturally and developmentally adequate. autosomes: the most common type of chromosome within the human cell. They are usually contrasted with the sex chromosomes, which typically consist of a single pair (XX for women and XY for men). chromosomes: structures within human cells that carry the genes that act as blueprints for cell function. dementia: a significant decline in intellectual function, usually after a period of normal intellectual function. discrepancy testing: the comparison of performances in two different behavioral or skill areas (e.g., between ability and achievement) to determine whether a discrepancy exists; often used as a requirement for services in education systems. Down syndrome: an autosomal genetic disorder that is considered the most common genetic abnormality resulting in mental retardation. It is associated with mild to severe mental retardation and particularly marked difficulties with syntax and phonology.
Page 163 fetal alcohol effect (FAE): a diagnosis related to FAS, in which some but not all of the abnormalities required for a diagnosis of FAS are observed. Fetal alcohol syndrome (FAS): the constellation of physical abnormalities, deficient growth patterns, and cognitive and behavioral problems found in children with a significant prenatal exposure to alcohol. fragile X syndrome: the most common inherited cause of mental retardation; it is related to an Xchromosome abnormality that may be passed through several generations before becoming severe enough to result in mental retardation. The syndrome more commonly affects men than women. mental retardation: reduced intelligence accompanied by reduced adaptive functioning. mosaic Down syndrome: an uncommon form of Down syndrome occurring in less than 5% of cases, when trisomy 21 affects only some rather than all cells in the body. outoflevel testing: the use of an instrument developed for children whose age differs from that of the child to be tested (Berk, 1984). premutation: a gene that is somewhat defective but not associated with significant abnormalities, as can happen in families where fragile X syndrome is subsequently identified. sex chromosomes: genebearing chromosomes associated with genderrelated characteristics; these are related to numerous birth defects in which patterns of transmission appear to be affected by gender. strabismus: a problem in eye movement coordination, sometimes referred to as crossed eyes. trisomy 21: the most typical chromosomal abnormality in Down syndrome, consisting of a third chromosome 21. Williams syndrome: a congenital metabolic disease usually associated with moderate to severe learning difficulties. Study Questions and Questions to Expand Your Thinking 1. What are the major common components of the definitions of mental retardation provided in this chapter? 2. Describe three possible cooccurring problems that may affect the communication and testtaking behaviors of a child with mental retardation. 3. What is the most common inherited cause of mental retardation? What is the most common preventable cause? 4. Determine the definition for mental retardation used in a school system near you. How does that definition compare to those of the AAMR and the American Psychiatric Association? 5. One test of adaptive skills that is frequently used is the Vineland Adaptive Behavior Scales (Sparrow, Balla, & Cicchetti, 1984). Examine that measure in terms
Page 164 of items related to communication. What language domains (e.g., semantics, syntax, morphology, pragmatics) and what language modalities (speaking, listening, writing, reading) are emphasized? 6. Using a format like that used in Table 6.5, identify a syndrome not described in this chapter (e.g., PraderWilli syndrome, cri du chat) and prepare a brief list of expected patterns of language and communication. 7. Examine the test manual of a language test to determine (a) what, if anything, is said about the appropriateness of the measure for a child with mental retardation, and (b) what aspects of one or more tasks included in the test might be incompatible with the characteristics of the following children: l l l
a child with severe cerebral palsy and moderate retardation whose only reliable response mode is a slow, effortful pointing response; a child with mild retardation but severe attention and motivational problems; and a child with Down syndrome who has moderate retardation and a severe visual impairment.
Recommended Readings Cohen, M. M. (1997). The child with multiple birth defects. (2nd ed.) New York: Oxford University Press. Dykens, E. M., Hodapp, R. M. & Leckman, J. F. (1994). Behavior and development in fragile X Syndrome. Thousand Oaks, CA: Sage. Hersen, M., & Van Hasselt, V. (Eds.). (1990). Psychological aspects of developmental and physical disabilities: A casebook. Newbury Park, CA: Sage. Rondal, J. A., & Edwards, S. (1997). Language in mental retardation. San Diego, CA: Singular. StrayGunderson, K. (Ed.). (1986). Babies with Down syndrome: A new parents guide. Kensington, MD: Woodbine Press. References Abkarian, G. (1992). Communication effects of prenatal alcohol exposure. Journal of Communication Disorders, 25(4), 221–240. American College of Medical Genetics. (1997). Policy statement on fragile X syndrome: Diagnostic and carrier testing [Online]. Available: http://www.faseb.org/genetics/acmg/pol16.htm. American Psychiatric Association. (1994). Diagnostic and statistical manual of mental disorders (4th ed.). Washington, DC: Author. Batshaw, M. L., & Perret, Y. M. (1981). Children with handicaps: A medical primer. Baltimore: Brookes Publishing Company. Baumeister, A. A. (1997). Behavioral research: Boom or bust? In W. E. MacLean, Jr. (Ed.), Ellis’ handbook of mental deficiency, psychological theory and research (3rd ed., pp.3–45). Mahwah, NJ: Lawrence Erlbaum Associates. Baumeister, A. A., & WoodleyZanthos, P. (1996). Prevention: Biological factors. In J. W. Jacobson & J. A. Mulick (Eds.), Manual of diagnosis and professional practice in mental retardation (pp. 229–242). Washington, DC: APA. Bellenir, K. (1996). Facts about Down syndrome. Genetic disorders handbook. (pp. 3–14). Detroit, MI: Omnigraphics. Bellugi, U., Marks, S., Bihrle, A., & Sabo, H. (1993). Dissociation between language and cognitive functions in Williams syndrome. In D. Bishop & K. Mogford (Eds.), Language development in exceptional circumstances (pp. 177–189). Mahwah, NJ: Lawrence Erlbaum Associates.
Page 165 Berk, R. A. (1984). Screening and diagnosis of children with learning disabilities. Springfield, IL: Thomas. Biederman, J., Newcorn, J. H., & Sprich, S. (1997). Comorbidity of attentiondeficit/hyperactivity disorder. In T. A. Widiger, A. J. Frances, H. A. Pincus, R. Ross, M. B. First, & W. Davis (Eds.), DSM–IV sourcebook. Washington, DC: American Psychiatric Association. Bloodstein, O. (1995). A handbook on stuttering (5th ed.). San Diego, CA: Singular. Cohen, I. L. (1995). Behavioral profiles of autistic and nonautistic Fragile X males. Developmental Brain Dysfunction, 8, 252–269. Cohen, M. M. (1997). The child with multiple birth defects (2nd ed.). New York: Oxford University Press. Connor, M., & FergusonSmith, M. (1997). Essential medical genetics (5th ed.). Oxford, England: Blackwell. Cooley, W. C. & Graham, J. M. (1991). Common syndrome and management issues for primary care physicians: Down syndrome—An update and review for the primary pediatrician. Clinical Pediatrics, 30(4), 233–253. Cromer, R. (1981). Reconceptualizing language acquisition and cognitive development. In R. L. Schiefelbusch & D. D. Bricker (Eds.), Early language: Acquisition and intervention. Baltimore: University Park Press. Donaldson, M. D. C., Shu, C. E., Cooke, A., Wilson, A., Greene, S. A., & Stephenson, J. B. (1994). The PraderWilli syndrome. Archives of Diseases of Children, 70, 58–63. Downey, J., Ehrhardt, A. A., Gruen, R., Bell, J. J., & Morishima, A. (1989). Psychopathology and social functioning in women with Turner syndrome. Journal of Nervous and Mental Disorders, 177, 191–201. Durkin, M. S., & Stein, Z. A. (1996). Classification of mental retardation. In J. W. Jacobson & J. A. Mulick, (Eds.), Manual of diagnosis and professional practice in mental retardation (pp. 67–73). Washington, DC: APA. Dykens, E. M., Hodapp, R. M., & Leckman, J. F. (1994). Behavior and development in fragile X syndrome. Thousand Oaks, CA: Sage. Eaton, L. F., & Menolascino, F. J. (1982). Psychiatric disorder in the mentally retarded: Types, problems, and challenges. American Journal of Psychiatry, 139, 1297–1303. Fowler, A. E. (1990). Language abilities in children with Down syndrome: evidence for a specific syntactic delay. In D. Cicchetti & M. Beeghley (Eds.), Children with Down syndrome (pp. 302–328). Cambridge, England: Cambridge University Press. Fox, R., & Wise, P. S. (1981). Infant and preschool reinforcement survey. Psychology in the Schools, 18, 87–92. Gottlieb, M. L. (1987). Major variations in intelligence. In M. I. Gottlieb & J. E. Williams (Eds.), Textbook of developmental pediatrics (pp. 127–150). New York: Plenum. Grossman, H. J. (Ed.). (1983). Classification in mental retardation. Washington, DC: American Association on Mental Deficiency. Hersen, M., & Van Hasselt, V. B. (Eds.). (1990). Psychological aspects of developmental and physical disabilities: A casebook. Newbury Park, CA: Sage Publications. Hodapp, R. M. (1996). Crossdomain relations in Down’s syndrome. In J. A. Rondal, J. Porera, L. Nadel, & A. Comblain (Eds.), Down s syndrome: Psychological, psychological, and socioeducational perspectives (pp. 65–79). San Diego, CA: Singular. Hodapp, R. M., & Dykens, E. M. (1994). Mental retardation’s two cultures of behavioral research. American Journal on Mental Retardation, 98, 675–687. Hodapp, R. M., & Zigler, E. (1997). New issues in the developmental approach to mental retardation. In W. E. MacLean, Jr. (Ed.), Ellis’ handbook of mental deficiency, psychological theory and’ research (3rd ed., pp. 1–28). Mahwah, NJ: Lawrence Erlbaum Associates. Hodapp, R. M., Leckman, J. F, Dykens, E. M., Sparrow, S. S., Zelinsky, D. G., & Ort, S. I. (1992). KABC profiles of children with fragile X syndrome, Down syndrome, and nonspecific mental retardation. American Journal on Mental Retardation, 97, 39–46. Jacobson, J. W., & Mulick, J. A. (Eds.) (1996). Manual of diagnosis and professional practice in mental retardation. Washington, DC: APA.
Page 166 Kingsley, J., & Levitz, M. (1994). Count us in: Growing up with Down syndrome. New York: Harcourt Brace. Lehrke, R. G. (1972). A theory of Xlinkage of major intellectual traits. American Journal of Mental Deficiency, 76, 611–619. Lubetsky, M. J. (1990). Diagnostic and medical considerations. In M. Hersen & V. B. Van Hasselt (Eds.), Psychological aspects of developmental and physical disabilities: A casebook (pp. 25–53). Newbury Park, CA: Sage Publications. Luckasson, R. (1992). Mental retardation: Definition, classification, and systems of support. Washington, DC: American Association on Mental Retardation. Macmillan, D. L., & Reschly, D. J. (1997). Issues of definition and classification. In W. E. MacLean, Jr. (Ed.), Ellis’ handbook of mental deficiency, psychological theory and research (3rd ed., pp. 47–71). Mahwah, NJ: Lawrence Erlbaum Associates. Maino, D. M., Wesson, M., Schlange, D., Cibis, G., & Maino, J. H. (1991). Optometric findings in the fragile X. Optometry and Vision Science, 68, 634–640. Maxwell, L. A., & GeschwintRabin, J. (1996). Substance abuse risk factors and childhood language disorders. In M. D. Smith & J. S. Damico (Eds.), Childhood language disorders (pp. 235–271). New York: Thieme. Mervis, C. B. (1998). The Williams syndrome cognitive profile: Strengths, weaknesses, and interrelations among auditory shortterm memory, language, and visuospatial constructive cognition. In E. Winograd, R. Fivush, & W. Hirst (Eds.), Ecological approaches to cognition. Mahwah, NJ: Lawrence Erlbaum Associates. Miller, J. F, & Chapman, R. (1984). Disorders of communication: Investigating the development of language of mentally retarded children. American Journal of Mental Deficiency, 88, 536–545. Richardson, S. A., & Koller, H. (1994). Mental retardation. In I. B. Pless (Ed.), The epidemiology of childhood disorders (pp. 277–303). New York: Oxford University Press. Roeleveld, N., Zielhuis, G. A., & Gabreels, F. (1997). The prevalence of mental retardation: A critical review of the literature. Developmental Medicine and Child Neurology, 39, 125–132. Rondal, J. A. (1996). Oral language in Down’s syndrome. In J. A. Rondal, J. Porera, L. Nadel, & A. Comblain (Eds.), Down’s syndrome: Psychological, psychobiological, and socioeducational perspectives (pp. 99–117). San Diego, CA: Singular. Rondal, J. A., & Edwards, S. (1997). Language in mental retardation. San Diego, Singular Publishing Group. Rosenberg, S., & Abbeduto, L. (1993). Language and communication in mental retardation: development, processes, and prevention. Hillsdale, NJ: Lawrence Erlbaum Associates. Sandgrund, A., Gaines, R., & Green, A. (1974). Child abuse and mental retardation: A problem of cause and effect. American Journal of Mental Deficiency, 79, 327–330. Sattler, J. M. (1988). Assessment of children (3rd ed.). San Diego, CA: Author. Simeonsson, R. J., & Short, R. J. (1996). Adaptive development, survival roles, and quality of life. In J. W. Jacobson & J. A. Mulick (Eds.), Manual of diagnosis and professional practice in mental retardation (pp. 137–146). Washington, DC: APA. Sparks, S. N. (1993). Children of prenatal substance abuse. San Diego, CA: Singular. Sparrow, S., Balla, D., & Cicchetti, D. (1984). Vineland Adaptive Behavior Scales. Circle Pines, MN: American Guidance Service. Stagg, V. (1988). Clinical considerations in the assessment of young handicapped children. In T. D. Wachs & R. Sheehan (Eds.), Assessment of young developmentally disabled children (pp. 61–73). New York: Plenum. Stratton, K., Howe, C., & Battaglia, E (1996). Fetal alcohol syndrome: Diagnosis, epidemiology, prevention, and treatment. Washington, DC: National Academy Press. StrayGunderson, K. (Ed.). (1986). Babies with Down syndrome: A new parents’ guide. Kensington, MD: Woodbine Press. Taitz, L. S., & King, J. M. (1988). A profile of abuse. Archives of Disease in Childhood, 63, 1026–1031. Udwin, O., & Yule, W. (1990). Expressive language of children with Williams syndrome. American Journal of Medical GeneticsSupplement 6, 108–114.
Page 167 Veltkamp, L. J. (1994). Clinical handbook of child abuse and neglect. Madison, CT: International Universities Press. Wasson, P., Tynan, T., & Gardiner, P. (1982). Test adaptations for the handicapped. San Antonio, TX: Education Service Center, Region 20. Zigman, W. B., Schupf, N., Zigman, A., & Silverman, W. (1993). Aging and Alzheimer’s disease in people with mental retardation. In N. W. Bray (Ed.), International review of research in mental retardation (Vol. 19, pp. 41–70) New York: Academic Press.
Page 168 CHAPTER
7 Children with Autistic Spectrum Disorder Defining the Problem Suspected Causes Special Challenges in Assessment Expected Patterns of Language Performance Related Problems Andrew is a 4yearold who rarely speaks or vocalizes. He also fails to respond or make eye contact when others speak to him. He has some activities he will engage in incessantly, such as spinning parts of a toy truck or twirling his fingers in front of his eyes. Andrew has epileptic seizures almost daily, is not yet toilet trained, rises early in the morning and awakens once or twice each night—problems that provide additional stress to his caring, beleaguered parents. He was initially identified as having severe to profound mental retardation and has more recently been identified as having Autistic Disorder Peter is a 12yearold who speaks infrequently and often appears to ignore remarks directed to him by others. He occasionally repeats the full text of a television commercial containing words he neither uses nor appears to understand in other contexts. Peters expressive and receptive language, as measured through standardized tests, appear delayed, his vocal intonation sounds unmodulated in pitch; and he rarely
Page 169 seems able to practice the giveandtake required for conversation. Although Peter was initially identified as having autism, he has recently been diagnosed as having pervasive developmental delay not otherwise specified. Amelia is a 10yearold girl who was considered normal in her development of language until her extreme difficulty in using language for communication was noticed when she entered preschool. Despite having nearnormal language abilities on standardized measures, her need for sameness and her difficulty in engaging in social interaction make her a very solitary child. She performs best in school subjects such as mathematics and geography, which appear to interest her greatly. Her problems have been tentatively identified as associated with Asperger’s Disorder. Defining the Problem Autistic spectrum disorder, the diagnostic category that encompasses many of the problems of Andrew, Peter, and Amelia, is found in 0.02 to 0.05 % of the population, or in about 2 to 5 of every 10,000 people (American Psychiatric Association, 1994). Recently, somewhat higher estimates have suggested as many as 10 to 14 of every 10,000 individuals (Trevarthen, Aitken, Papoudi, & Robarts, 1996). Even with these higher estimates, autism spectrum disorder is relatively rare. The magnitude of its impact on affected children and their families, however, has caused it to be the focus of considerable research and clinical writing. Its impact stems from the severity of symptoms, which include delayed or deviant language and social communication and abnormal ways of responding to people, places, and objects. There is also some evidence to suggest that it is becoming more prevalent (WolfSchein, 1996; cf. Trevarthen et al., 1996). About 75% of children with autism are diagnosed with mental retardation as well (Rutter & Schopler, 1987), with about 50% reportedly having IQs less than 50 and fewer than 33% having IQs greater than 70 (Waterhouse, 1996). There is great uncertainty associated with these figures, however, because the diagnosis of mental retardation is often questionable given the difficulty these children have in participating in formal assessment procedures (WolfSchein, 1996). In the influential DSM–IV system of nomenclature (American Psychiatric Association, 1994), autistic spectrum disorder is referred to as Pervasive Developmental Disorder (PDD), a category that includes autistic disorder, Rett’s disorder, childhood disintegrative disorder, Asperger’s disorder, and pervasive developmental disorder not otherwise specified (PDDNOS) (Waterhouse, 1996). Readers should be aware that an alternative and somewhat more complicated set of diagnoses related to autism has been formulated by the World Health Organization (WHO) in the International Classification of Diseases (ICD; WHO, 1992, 1993), although it is not discussed here. Autistic disorder is sometimes referred to as Kanner’s autism or infantile autism and is the most common of spectrum disorders. Its symptoms are similar to the other disorders within the PDD category, including severe delays in ‘‘reciprocal social interaction skills, communication skills, and the presence of stereotyped behavior, interests and activities” (American Psychiatric Association, 1994, p. 65). Although chil
Page 170 dren with autistic disorder share many characteristics with children with other PDD disorders, the primary focus of this chapter is children with autistic disorder and their surprising degree of heterogeneity, with regard to levels of cognitive function, language outcomes, and specific symptoms (Hall & Aram, 1996; Myles, Simpson, & Becker, 1995). The considerable differences within this single disorder are illustrated by the range of difficulties described at the outset of the chapter in relation to Peter and Andrew. The American Psychiatric Association (1994) definition for Autistic Disorder is presented in Table 7.1. Besides calling attention to these children’s very marked problems in social interaction and language, this definition emphasizes the abnormal and Table 7.1 A Definition of Autistic Disorder (American Psychiatric Association, 1994) A. A total of six (or more) items from (1), (2), and (3), with at least two from (1) and one each from (2) and (3):
(1) Qualitative impairment in social interaction, as manifested by at least two of the following:
(a)
(b) failure to develop peer relationships appropriate to developmental level;
(c)
(d) lack of social or emotional reciprocity.
(2) Qualitative impairments in communication as manifested by at least one of the following:
(a)
(b) in individuals with adequate speech, marked impairment in the ability to initiate or sustain a conversation with others;
(c) stereotyped and repetitive use of language or idiosyncratic language;
(d) lack of varied, spontaneous makebelieve play or social imitative play appropriate to developmental level.
(3) Restricted repetitive and stereotyped patterns of behavior, interests, and activities, as manifested by at least one of the following:
(a) encompassing preoccupation with one or more stereotyped and restricted patterns of interest that is abnormal either in intensity or focus;
(b) apparently inflexible adherence to specific, nonfunctional routines or rituals;
(c) stereotyped and repetitive motor mannerisms (e.g., hand or finger flapping or twisting, or complex wholebody movements);
(d) persistent preoccupation with parts of objects.
B.
Delays or abnormal functioning in at least one of the following areas, with onset prior to age 3 years: (1) social interaction, (2) language as used in social communication, or (3) symbolic or imaginative play.
marked impairment in the use of multiple nonverbal behaviors such as eyetoeye gaze, facial expression, body postures, and gestures to regulate social interaction; a lack of spontaneous seeking to share enjoyment, interests, or achievements with other people (e.g., by a lack of showing, bringing, or pointing out objects of interest);
delay in, or total lack of, the development of spoken language (not accompanied by an attempt to compensate through alternative modes of communication such as gestures or mime);
C. The disturbance is not better accounted for by Rett’s syndrome or Childhood Disintegrative Disorder. Note. From Diagnostic and Statistical Manual of Mental Disorders (4th ed., pp. 70–71) by the American Psychiatric Association, 1994, Washington, DC: Author. Copyright 1994 by the American Psychiatric Association. Adapted with permission.
Page 171 often rigid pattern of interaction with objects and other aspects of their environment that is characteristic of children with autism. In this definition, the onset is specified as being prior to age 3 because of the variety of ages at which marked changes in development are reported: Although many children are described by their parents as having always been distant and unresponsive, others are described as having responded to social interaction normally until age 1 or 2 (American Psychiatric Association, 1994; Prizant & Wetherby, 1993). Difficulties in defining autistic disorder arise from the remarkable heterogeneity of children with the disorder and from the extent to which their problems overlap with those associated with other developmental disorders and with mental retardation (Carpentieri & Morgan, 1996; Nordin & Gillberg, 1996; Waterhouse et al., 1996). Table 7.2 lists the other disorders included within PDD and the characteristics that are thought to distinguish autistic disorder from them. A number of researchers (e.g., Rapin, 1996; Waterhouse, 1996; Wing, 1991 ) have explored common features across specific disorders included within PDD and have suggested that frequent changes in terminology and clinical categories are likely to continue as more is learned about these children (Waterhouse, 1996). In particular, considerable research has recently been devoted to the defining boundaries between Asperger’s syndrome and autistic disorder in individuals with higher measured IQs (Ramberg, Ehlers, Nyden, Johansson & Gillberg, 1996; Wing, 1991). The overlap between mental retardation and autistic spectrum disorder also presents major challenges to researchers and clinicians. As mentioned earlier, about 75% of children with autistic spectrum disorder are diagnosed with mental retardation. In addition, the severity of mental retardation appears to be related to the frequency of autistic symptoms. For example, in one recent Swedish study (Nordin & Gillberg, 1996a), autistic spectrum disorder was identified in about 12% of children with mild retardation, whereas it was identified in 29.5% of those with severe retardation. The fact that not all children with mental retardation show autistic symptoms, however, suggests that much more needs to be done to understand the relationship of these two conditions. Increased understanding of the nature of the relationship between mental retardation and the specific cognitive deficits associated with autistic spectrum disorder should help improve the quality of care directed to children with these combined difficulties. Additional difficulties in diagnosis are due to the changing nature of symptoms associated with autistic disorder with age, although currently there is considerable disagreement over the nature and direction of those changes (i.e., improvement vs. decline; e.g., see Eaves & Ho, 1996; Piven, Harper, Palmer, & Arndt, 1996). Despite possible changes over time, however, it is rare for individuals diagnosed as autistic in childhood to enter adulthood without significant residual problems (e.g., see Piven et al., 1996). A personal experience with an acquaintance in graduate school—who in retrospect would probably have been identified as having Asperger’s disorder and whom I will call Matthew Metz—captures this generality for me: Although Matthew would eventually complete a Ph.D. in history, he invariably greeted members of our graduate house he saw on campus with an introduction—“Hi, you may not remember me, but my name is Matthew Metz.” This greeting persisted despite months of having
Page 172 Table 7.2 Differentiating Autistic Disorder From Other Disorders Within the Autistic Spectrum Disorder (Called Pervasive Developmental Disorders, PDD, by the American Psychiatric Association, 1994) Disorder
Major Characteristics
Basis for Differentiation From Autistic Disorder Differences in sex ratios (female only versus predominately male in autism) l Head growth slows down after infancy only in Rett’s; Autistic disorder may actually be associated with an abnormally large head circumference (Waterhouse et al., 1996) l Social interaction difficulties are more persistent into late childhood in autism than in Rett’s disorder l Differentiation from autism depends on good evidence of normal development during first two years; otherwise, the autism categorization is preferred l
An autosomal disorder affecting only women (probably no men are identified because of fetal mortality) l Normal pattern of early physical, motor development with later loss of skills and deceleration in head growth l Associated with severe or profound mental retardation and limited language skills l Characteristic hand movements (“wringing” or “washing” of hands) l
Rett’s disorder
Marked regression after at least 2 years of seemingly normal development l Social, communication, and behavioral characteristics similar to autism l Usually associated with severe mental retardation l Very rare disorder, possibly more common in men than women l
Childhood disintegrative disorder
Absence of significant language and cognitive deficits in Asperger’s disorder, but very significant delays in autism l Except for social communication deficits, adaptive skills are developmentally appropriate in Asperger’s, but not in autism l Asperger’s Disorder is typically diagnosed later than autism, often at school age, possibly due to later onset than autism l
Preserved language function in the presence of ‘‘severe and sustained impairment in social interaction” (p. 75) l Restricted, repetitive patterns of behavior, interests, and activities (e.g., pronounced interest in train schedules) l
Asperger’s disorder
Severe and pervasive impairment in social interaction and/or verbal and nonverbal communication and/or presence of restricted, l Onset or symptoms failing to conform to criteria for other repetitive patterns of behavior, interests, and activities PDD, including autism l Failure to meet specific criteria required for other PDD l Sometimes referred to as “atypical autism” categories described above with regard to severity of symptoms or age of onset l
Pervasive Developmental Disorder Not Otherwise Specified (PDDNOS)
Page 173 shared dinners at a common table with the acquaintances he addressed. As you may expect, Matthew had a very restricted social sphere that was largely confined to fellow students in his graduate program. When I last heard of him, he was living with his elderly parents and earned a limited income by writing entries on historical subjects for publishers of an encyclopedia. Thus, even in the presence of the intellectual abilities required for completion of a graduate degree, significant challenges for Matthew persisted well into adulthood. Suspected Causes To date, discussions of etiology for autistic spectrum disorder have focused on socioenvironmental, behavioral, and purely organic possibilities (Haas, Townsend, Courchesne, Lincoln, Schreibman, & YeungCourchesne, 1996; Waterhouse, 1996; WolfSchein, 1996). The socioenvironmental perspective had strong proponents in the 1960s, especially among psychoanalysts who held that poor parenting was the source of these children’s difficulties (e.g., Bettelheim, 1967). More recently, however, such theories have lost favor with almost all researchers and clinicians. Currently, the dominant perspective on autism is that it has one or more organic bases in the form of underlying neurological abnormalities. The nature of neurologic abnormalities underlying autism has not yet been well documented and constitutes a major area of research (Rapin, 1996). Proposed sites of suspected neurologic abnormalities are the frontal lobe (Frith, 1993), the reticular formation of the brain stem (Rimland, 1964), and the cerebellum (Courchesne, 1995)—just to name a few (cf. Cohen, 1995; WolfShein, 1996). In addition, the role that the right hemisphere of the brain plays in autistic symptoms has received some attention (e.g., Shields, Varley, Broks, & Simpson, 1996). Although localized functional abnormalities have been sought, it has frequently been suggested that the underlying abnormalities are in fact likely to be diffuse (Rapin, 1996). As a more distal causal factor leading to the brain abnormalities that are then believed to cause autistic symptoms more directly, genetic factors are implicated for some cases of autism. Evidence supporting this reasoning includes (a) the preponderance of males in all categories with PDD except Rett’s disorder (American Psychiatric Association, 1994; Waterhouse et al., 1996),1 (b) the tendency for PDD to occur much more frequently in some families than in others (Folstein & Rutter, 1977), and (c) the tendency for PDD to occur frequently among individuals with fragile X, where genetic abnormalities are well documented (Cohen, 1995). Many cases of autism, however, have yet to be linked to genetic abnormalities. Nonetheless, it is suspected that these cases are still due to organic factors arising before rather than during or after the child’s birth (Rapin, 1996). Other suspected sources of the presumed neurologic abnormalities include metabolic disorders and infectious disorders (e.g., congenital rubella, encephalitis, or meningitis; Rapin, 1996; WolfSchein, 1995). In some cases, no likely causal factor is suggested—leading to cases that are termed 1 The reasoning is that male preponderance may exist because males’ single X chromosome makes them at special risk for Xchromosome defects.
Page 174 idiopathic, that is, without a known cause. Efforts to identify the real nature of such idiopathic cases and to identify the specific mechanisms by which known causes act to create autistic symptoms represent some of the most needed areas for research on PDD. Special Challenges in Assessment Children with autistic spectrum disorder present the greatest imaginable challenges to the clinician contemplating formal testing as a means of collecting information. Frequently, these children’s essential social interaction deficits dramatically limit their participation in the usual giveandtake required by most standardized language instruments. Consequently, informal measures, especially parent questionnaires and behavioral checklists, are used very frequently for purposes of screening, diagnosis, and description of language among children and adults with autistic spectrum disorder (Chung, Smith, & Vostanis, 1995; DiLavore, Lord, & Rutter, 1995; Gillberg, Nordin, & Ehlers, 1996; Nordin, & Gillberg, 1996; Prizant & Wetherby, 1993; Sponheim, 1996). Alternatives to standardized tests are particularly valuable for those children whose communication repertoire is very limited, a group that includes as many as 50% of all children with autism (Paul, 1987). Where the purpose of an evaluation is to aid in diagnosis of the disorder, it has been argued that parent interviews may be considerably better than observational methods that may be applied by clinicians (Rapin, 1996). Table 7.3 lists some of the most common questionnaires, interview schedules, checklists, and other instruments used in screening and diagnosing autistic spectrum disorders. Although many of these focus on the entire range of difficulties often seen as part of autism, some focus on selected skill areas, such as communication or play. Despite the frequent need for nontraditional, observational techniques, more traditional, standardized speech and language tests can play a useful role in language assessments of some children with autism. In particular, children with more elaborate language and communication skills—children who are often described as “high functioning” may be amenable to standardized testing when appropriate attention is paid to motivation and other enabling factors. Information obtained from family members and other individuals who are very familiar with the child can help pinpoint the reinforcers that will prove most helpful in facilitating a child’s participation and warn against specific stimuli (e.g., types of environmental noise such as traffic noise or the sound of some electrical devices) that are likely to be distracting or disturbing to the individual child. For higher functioning children, standardized speech and language testing may not only be feasible, but quite vital to a thorough understanding of their strengths and weaknesses—particularly for receptive skills that, unlike expressive skills, cannot be as readily observable in spontaneous productions. Even when expressive language testing is feasible, analysis of spontaneous productions will almost always constitute a particularly desirable tool for expressive language assessment. Not only does analysis of spontaneous language allow one to simultaneously examine variables related to numerous expressive language domains (Snow & Pan, 1993), one can argue that the validity of such measures will be particularly superior for children who are so reactive to standardized testing procedures. In
Page 175 Table 7.3 Recent Behavioral Checklists and Interview for Screening and Description of Autistic Spectrum Disorder (Chung, Smith, & Vostanis, 1995; Gillberg, Nordin, & Ehlers, 1996; WolfSchein, 1996) Purpose
Instrument
Age Group
Description
Screening
Checklist for Autism in Toddlers (CHAT; BaronCohen, Allen, & Gillberg, 1992)
Uses 14 items that are responded to by parent (n = 9) and by Children from 18 to 30 clinician (n = 5 items); found to have a low rate of false positives months and reported to have good reliability (Gillberg, Nordin & Ehlers, 1996)
PreLinguistic Autism Diagnostic Observation Schedule (PLADOS) (DiLavore, Lord, & Rutter, 1995)
Uses 12 playbased activites with 17 associated ratings, with Children under 6 years items administered by the examiner or through one of the child’s of age caregivers; designed to relate directly to the DSM–IV or ICD10 criteria
Asperger Syndrome Screening Questionnaire (ASSQ; Ehlers & Gillberg, 1993)
7 to 16 years
A teacher questionnaire containing 27 items; it appears to consistently identify Asperger’s disorder, but it may overidentify in cases of other social abnormalities; one of the few measures developed to be sensitive to Asperger’s disorder.
Diagnosis and Description
Autism Diagnostic InterviewRevised (ADIR; Lord, Rutter, & Le Couteur, 1994)
Children from 18 months to adults
Uses interview of parents or caregivers of Individuals with suspected autistic disorder. Designed to relate directly to the DSM–IV or ICD10 criteria.
Childhood Autism Rating Scale (CARS) (Schopler, Reichler, & Renner, 1986)
Children
Uses direct observation of children with suspected autistic disorder. Designed to be used in diagnosis and description of severity.
Page 176 chapter 10, the use of spontaneous language sample analyses is discussed at some length. Expected Patterns of Language Performance Certain specific language behaviors are frequently associated with autism, although they may also occur infrequently in normal language development and in other language disturbances. Among these behaviors are echolalia, pronominal reversals, and stereotypic or nonreciprocal language (Fay, 1993; Paul, 1995). Echolalia consists of the immediate or delayed repetition of speech, often without evident communicative intent. Echolalic productions can often be quite complex in their language structure relative to the level of the child’s spontaneous communications and may simply represent memorized routines rather than creatively generated language. The presence of echolalic productions often appears to indicate a child’s attempt to stay engaged in the social interaction despite failing to understand what has just been said or being unable to produce a more suitable response. Such productions, consequently, may be communicative in intent and therefore provide information about the nature of the child’s pragmatic skills (Paul, 1995). Pronominal reversals involves an apparent confusion in pronoun choice in which first and second person pronouns are substituted for one another. Thus, for example, a child might say “you go” when apparently referring to himor herself. Although at one point in time these errors were thought to reflect the child’s failure to distinguish himor herself from the environment, they are currently taken to reflect the child’s inflexible use of language forms. In short, the child treats pronouns, which are sometimes referred to as “deictic shifters,” as unchanging labels, thereby failing to recognize the shift that allows “I” to refer to several different speakers in turn simply by virtue of their role as speaker, and “you’’ by virtue of their role as listener. Although once considered a hallmark of the disorder, pronominal reversals are not necessarily used frequently (Baltaxe & D’Angiola, 1996). The Personal Perspective included in this chapter contains the reflections of Donna Williams, an adult with autism, who argues persuasively for the relative unimportance of pronoun use as a target for therapy, given all of the words one needs to learn. PERSONAL PERSPECTIVE The following passage comes from a book written by a young woman who describes herself as having autism associated with high functioning (Williams, 1996, pp. 160–61). In this passage, she discusses which words are important and which are unimportant to learn: “Words to do with the names of objects are probably the most important ones to connect with as it is hard to ask for help if you haven’t got these. If someone can only say ‘book,’ at least you can work out what they might want done with
Page 177 it. if they just say ‘look’ but haven’t connected with ‘book’, you have a whole house full of things that can be ‘looked’ (at or for). “Words to do with what things are contained in (box, bottle, bag, packet), made of (wood, metal, cloth, leather, glass, plastic, powder, goo) or what is done with them (eating, drinking, closing, warming, sleeping) are also really important to learn. Much later, less tangible, less; directly observable words such as those to do with feelings (had enough, hurt, good, angry) or body sensations (tired, full, cold, thirsty) are really important to connect with. “Words to do with pronouns, such as ‘I,’ ‘you,’ ‘he,’ ‘she,’ ‘we’ or ‘they,’ aren’t so important. Too many people make a ridiculous big hooha about these things, because they want to eradicate this ‘symptom of autism,’ or for the sake of ‘manners’ or impressiveness. Pronouns are ‘relative’ to who is being referred to, where you are and where they are in space and who you are telling all this to. That’s a lot of connections and far more than ever have to be made to correctly access, use and interpret most other words. Pronouns are, in my experience, the hardest words to connect with experienceable meaning because they are always changing, because they are so relative. In my experience, they require far more connections, monitoring and feedback than in the learning of so many other words. “Too often so much energy is put into teaching pronouns and the person being drilled experiences so little consistent success in using them that it can really strongly detract from any interest in learning all the words that can be easily connected with. I got through most of my life using general terms like ‘a person’ and ‘one,’ calling people by name or by gender with terms like ‘the woman’ or ‘the man’ or by age with terms like ‘the boy.’ It didn’t make a great deal of difference to my ability to be comprehended whether I referred to these people’s relationship to me or in space or not. These things might have their time and place but there are a lot of more important things to learn which come easier and can build a sense of achievement before building too great a sense of failure.” Stereotypic or nonreciprocal language refers to idiosyncratic use of words or even whole sentences (Paul, 1995). Often the particular word or phrase seems to be used because it was first heard in a particular situation or in conjunction with a specific event or objects. Thereafter, it is used to stand for the associated situation, event, or object, despite its lack of meaning to anyone except a very perceptive individual present at the time the association ‘was formed. Temple Grandin, a college professor who has recently published several books about her experiences as someone with autism, describes a personal example of nonreciprocal language: Teachers who work with autistic children need to understand associative thought patterns. An autistic child will often use a word in an inappropriate manner. Sometimes these uses have a logical associated meaning and other times they don’t. For example, an autistic child might say the word “dog” when he wants to go outside. The word ‘‘dog” is associated with going outside. In my
Page 178 own case, I can remember both logical and illogical use of inappropriate words. When I was six, I learned to say ‘prosecution.’ I had absolutely no idea what it meant, but it sounded nice when I said it, so I used it as an exclamation every time my kite hit the ground. I must have baffled more than a few people who heard me exclaim “Prosecution!” to my downward spiraling kite. (Grandin, 1995, p. 32) In addition to characteristic kinds of atypical language use, patterns of language strengths and weaknesses among children with autistic disorder and Asperger’s disorder have received extensive attention by researchers. Table 7.4 summarizes the language characteristics described for three diagnoses in the spectrum: two forms of autistic disorder and Asperger’s disorder. The two descriptions provided under autistic disorder are included because of the relatively rich research base that has identified very different skills seen in individuals who can be described as high versus lowfunctioning in terms of severity as well as in terms of nonverbal intelligence scores. A study performed by a large group of researchers headed by Isabelle Rapin (1996) provides the most comprehensive study of the largest number of children with autism to date; it made use of normal controls and two other control groups—(a) a group of languageimpaired children to act as controls for the highfunctioning children with autism and (b) a group of children without autism but with low nonverbal IQs to act as a control group for the lowfunctioning children with autism. That multiyear, multisite study provided much of the information included in Table 7.4. Despite my use of the subcategories high and lowfunctioning, it should be noted that researchers have identified several subgroupings of autistic spectrum disorder beyond those discussed in this chapter, including aloof, passive, and activebutodd; e.g., Frith, 1991; Sevin et al., 1995; Waterhouse, 1996; Waterhouse et al., 1996). Related Problems Autistic Disorder, and indeed most of the disorders on the autistic spectrum, are characterized by a number of behavioral problems in addition to those already discussed in terms of communication and language. Two of these—“restricted repetitive and stereotyped patterns of behavior, interests, and activities” and “lack of varied, spontaneous makebelieve play or social imitative play appropriate to developmental level” are considered central enough to the nature of the disorder to be listed in the DSM–IV definition (American Psychiatric Association, 1994). They are closely related. Restricted and stereotyped patterns of behavior, interests, and activities can include behaviors such as the child’s rocking, flapping one or both hands in front of his or her own eyes, repeatedly manipulating parts of objects (such as spinning the wheel on a toy or repeatedly opening and closing a hallway door), or, more alarmingly, repeatedly biting or striking others or himor herself. Some of these repetitive behaviors can be interpreted as selfstimulatory or as efforts by the child to deal with anxiety and avoid overstimulation (e.g., Cohen, 1995); others are more difficult to interpret. Stereotyped, repetitive behaviors (sometimes referred to as stereotypies) will often need to be addressed in order to free the child to attend to important interactions (such as assessment or establishing relationships with peers). How they should be addressed
Page 179 Table 7.4 Patterns of Strengths and Weaknesses Among Children With Autistic Disorder—HighFunctioning, Autistic Disorder—LowFunctioning, and Asperger’s Disorder Disorder
Relative Strengths in Communication
Relative Weaknesses in Communication
Receptive language more affected than expressive language (Rapin, 1996) l Expressive vocabulary (Rapin, 1996) l Functional use of expressive language below performance on most tests of expressive l Written language superior to oral language and superior to written language (Rapin, 1996) Autistic disorder–high language skills of children with delayed l Pragmatic skills functioning (ADHF) language skills, but normal intelligence l Rapid naming within a category (Rapin, 1996) (Rapin, 1996) l Formulated output of connected speech l Relatively less use of echolalia than in (Rapin, 1996) ADLF l Verbal reasoning (Rapin, 1996) l Delayed development of questionasking as pronounced in ADLF (Rapin, 1996)
Other Strengths and Weaknesses
l
(Continued)
Strengths: l Preserved function on visuospatial and visual perceptual skills (Rapin, 1996) Weaknesses: l Marked delay in onset of ability to engage in symbolic play (Rapin, 1996) l Possible deficits in memory (Rapin, 1996) l Subtle motor deficits, especially affecting gross motor skills (Rapin, 1996) that are more consistent with language skills than nonverbal IQ
Page 180 Table 7.4 (Continued) Disorder
Relative Strengths in Communication
Relative Weaknesses in Communication
Other Strengths and Weaknesses
Expressive vocabulary is a relative l Verbal communication may be absent in strength and is generally better than about half of these children (Rapin, 1996) receptive vocabulary l When present, most areas of language are l Patterns of strength and weaknesses severely affected (Rapin, 1996) may be especially difficult to determine l Reported temporary regression of language because of floor effects on many skills in early development (Rapin, 1996) measures (Rapin, 1996)
Strengths: l Nonverbal performance supperior to verbal performance (Rapin, 1996)
Generally preserved language skills (American Psychiatric Association, 1994) l Phonology, except possibly in the areas of prosody l Syntax
Strengths: l Normal nonverbal intelligence Weaknesses: l Motor clumsiness (Ramberg et al., 1996; Wing, 1991)
l
Autistic disorder–low functioning (ADLF)
l
Asperger’s Disorder (AD)
Pragmatic skills (Ramberg, Ehlers, Nyden, Johansson, & Gillberg, 1996; Wing, 1991) l Atypical prosody and vocal characteristics (Ramberg et al., 1996) l
Note. Asperger’s Disorder is considered equivalent to Autistic DisorderHigh Functioning by some authors (e.g., Rapin, 1996).
Page 181 must be determined in relation to their potentially adaptive role from the child’s perspective. Team approaches using behavioral interventions and, at times, drug intervention are sometimes useful. The high frequency of these stereotyped patterns of interaction is combined with a lack of the spontaneous, imaginative play considered so characteristic of childhood. Although this deficiency has been noted since autism was first described by Kanner in 1943, it has recently been seen as related to these children’s apparent inability to assume alternative perspectives—an ability that also supports social interaction. It has been said that one of the chief cognitive deficits in children with autistic disorder may be their lack of a theory of mind, the ability to think about emotions, thoughts, motives—either in themselves or others (Frith, 1993). Sometimes, pronounced sensory abnormalities have been inferred in many autistic children on the basis of their apparent avoidance of and negative reactions to many auditory, visual, and tactile stimuli. In particular, hypersensitivity and hyposensitivity have been associated with autistic spectrum disorders (e,.g., Roux et al., 1995; Sevin et al., 1995). Recently, a controversial therapy technique, auditory integration training (Rimland & Edelson, 1995), has been devised in an attempt to eliminate these, abnormal responses to auditory stimuli seen in some children. In a growing number of studies, children with autism spectrum disorder have been found to be at increased risk for motor abnormalities. For example, in a recent large scale study, children in both highand lowfunctioning groups showed a greater frequency of motor abnormalities than did groups of children with either mental retardation without autism or SLI (Rapin, 1996). However, oromotor impairments tended to be more common and more severe among children in the lowfunctioning, group. Among the difficulties noted have been akinesia (absent or diminished movement), bradykinesia (delay in initiating, stopping, or changing movement patterns); and dyskinesia (involuntary tics or stereotypies; Damasio & Maurer, 1978) as well as problems with muscle tone, posture, and gait (Page & Boucher, 1998). Of particular interest to speechlanguage pathologists who may wish to work on oral motor activities in efforts to foster speech or on manual gestures have been reports of oral and manual dyspraxia, difficulties in the performance of purposeful voluntary movements in the absence of paralysis or muscular weakness (Page & Boucher, 1998; Rapin, 1996). Other problems that are more common among children on the autistic disorder spectrum than among children without identified problems are epilepsy, especially a form called infantile spasms, and sleep disorders (Rapin, 1996). ADHD (discussed in chap. 5), is also more prevalent (Wender, 1995). Summary 1. Autistic spectrum disorder, also termed Pervasive Developmental Disorder (PDD), encompasses at least four related and relatively rare disorders: Rett’s disorder, autistic disorder, Asperger’s syndrome, childhood disintegrative disorder, and pervasive developmental disorder not otherwise specified (PDDNOS) according the diagnostic system of the DSM–IV (American Psychiatric Association, 1994).
Page 182 2. Difficulties shared by children with autistic spectrum disorders include delayed or deviant language, social communication, and abnormal ways of responding to people, places, and objects. 3. Autistic spectrum disorders frequently cooccur with mental retardation, perhaps because of a shared cause: underlying neurologic abnormalities. 4. Although the source of underlying neurologic abnormalities is generally unknown, genetic factors and prenatal infections are suspected in some cases. 5. Children with autistic spectrum disorder are often unable to participate in standardized testing required for the diagnosis of their disorder, making the use of observational methods and parental questionnaires a very frequent and relatively wellstudied alternative. 6. Echolalia, pronominal reversals, and stereotypic language are abnormal features of language that are seen more frequently in autistic disorder than in other developmental language disorders. 7. Other problems affecting children with autistic spectrum disorders include a lack of spontaneous, imaginative play and restricted patterns of behavior, interests, and activities. In addition, these children are at increased risk for motor abnormalities, seizures, and sleep disorders. Key Concepts and Terms akinesia: absent or diminished movement. autistic disorder: the major and most frequently occurring disorder category within the larger DSM–IV (American Psychiatric Association, 1994) definition of Pervasive Developmental Disorders; often used synonymously with infantile autism or Kanner’s autism. Asperger’s disorder: an autistic disorder within the larger DSM–IV category of Pervasive Developmental Disorders in which early delays in communication are absent; often considered synonymous with highfunctioning autism. bradykinesia: a motor abnormality characterized by delays in initiation, cessation, or alteration of movement pattern. childhood disintegrative disorder: a very rare autistic disorder within the larger DSM–IV category of Pervasive Developmental Disorders in which a period of about 2 years of normal development is followed by autistic symptoms. dyskinesia: a movement abnormality characterized by involuntary tics or stereotypies. dyspraxia: difficulties in the performance of purposeful voluntary movements in the absence of paralysis or muscular weakness; for example, oral dyspraxia, manual dyspraxia, verbal dyspraxia (also frequently referred to as verbal apraxia). echolalia: immediate or delayed repetition of a previous speaker’s or one’s own utterance.
Page 183 epilepsy: a chronic disorder associated with excessive neuronal discharge, altered consciousness, and sensory activity, motor activity, or both. Pervasive Developmental Disorders (PDDs): the group of severe disorders having their onset in childhood, characterized by significant deficits in social interaction and communication, as well as the presence of stereotyped behavior, interests and activities; considered synonymous with autistic disorder spectrum disorder. pervasive developmental disorder not otherwise specified (PDDNOS): Within the DSM–IV system of disorder classification, this diagnosis is made when some but not all of the major criteria for autistic disorder are met; also referred to as atypical autism. pronominal reversals: incorrect use of first and thirdperson pronouns (e.g., “you want” to mean ‘‘I want”), which are considered typical of autistic speech. Rett’s disorder: a severe autosomal pervasive developmental disorder affecting only girls, in which a brief period of normal development is followed by regression; associated with severe or profound levels of mental retardation. stereotypy: frequent repetition of a meaningless gesture or movement pattern. theory of mind: the ability to think about emotions, thoughts, and motives—either in oneself or others; considered to be a primary deficit among individuals whose difficulties fall along the autistic disorder spectrum. Study Questions and Questions to Expand Your Thinking 1. On the Internet, look for sites related to PDD. For which disorders within that designation do you find web sites? Who are the main audiences for these sites? How do sites respond differently to these various audiences? 2. On the basis of Table 7.2, list the major characteristics of a child’s behavior that will be needed to determine which PDD label is most appropriate. 3. On the basis of the discussion of suspected causes of PDD, outline two major research needs that should be pursued by future researchers. 4. List in order of importance the problems—other than those intrinsic to autism itself—presented to adults who wish to interact with children with PDD. 5. What features of a child’s communication would cause you to be most concerned that he or she was showing symptoms of autism? What features of his or her language? 6. What practical problems might a parent of a child with PDD face that are different from those faced by other parents? 7. Find out what definition of autistic spectrum disorders is used in a local school system. How does it differ from the system described in DSM–IV (American Psychiatric Association, 1994)?
Page 184 Recommended Readings Angell, R. (1993). A parent’s perspective on the preschool years. In E. Schopler, M. E. Van Bourgondien, & M. M. Bristol (Eds.), Preschool issues in autism. New York: Plenum. Campbell, M., Schopler, E., Cueva, J. E., & Hallin, A. (1996). Treatment of autistic disorder. Journal of the American Academy of Child and Adolescent Psychiatry, 35, 124–143. Grandin, T. (1995). Thinking in pictures and other reports from my life with autism. New York: Doubleday. Schopler, E. (1994). Behavioral issues in autism. New York: Plenum. Strain, P. S. (1990). Autism. In M. Hersen & V. B. Van Hasselt (Eds.), Psychological aspects of developmental and physical disabilities: A casebook (pp. 73– 86). Newbury Park, CA: Sage. References American Psychiatric Association. (1994). Diagnostic and statistical manual of mental disorders (4th ed.). Washington, DC: Author. Angell, R. (1993). A parent’s perspective on the preschool years. In E. Schopler, M. E. Van Bourgondien, & M. M. Bristol (Eds.), Preschool issues in autism (pp. 17–38). New York: Plenum. Baltaxe, C. A. M., & D’Angiola, N. (1996). Referencing skills in children with autism and specific language impairment. European Journal of Disorders of Communication, 31, 245–258. BaronCohen, S., Allen, J., & Gillberg, C. (1992). Can autism be detected at 18 months? The needle, the haystack, and the CHAT. British Journal of Psychitary, 161, 839–843. Bettelheim, B. (1967). The empty fortress. New York: Collier Macmillan. Carpentieri, S., & Morgan, S. B. (1996). Adaptive and intellectual functioning in autistic and nonautistic retarded children. Journal of Autism and Developmental Disorders, 26, 611–620. Chung, M. C., Smith, B., & Vostanis, P. (1995). Detection of children with autism. Educational and Child Psychology, 12(2), 31–36. Cohen, I. L. (1995). Behavioral profiles of autistic and nonautistic fragile X males. Developmental Brain Dysfunction, 8, 252–269. Courchesne, E. (1995). New evidence of cerebellar and brainstem hypoplasia in autistic infants, children, and adolescents: The MR imaging study by Hashimoto and colleagues. Journal of Autism and Developmental Disorders, 25, 19–22. Damasio, A. R., & Maurer, R. G. (1978). A neurological model for childhood autism. Archives of Neurology, 35, 779–786. DiLavore, P. C., Lord, C., & Rutter, M. (1995). The PreLinguistic Autism Diagnostic Observation Schedule. Journal of Autism and Developmental Disorders, 25, 355–379. Eaves, L. C., & Ho, H. H. (1996). Brief report: Stability and change in cognitive and behavioral characteristics of autism through childhood. Journal of Autism and Developmental Disorders, 26, 557–569. Ehlers, S., & Gillberg, C. (1993). The epidemiology of Asperger syndrome: A total population study. Journal of Child Psychology and Psychiatry, 34, 1327–1350. Fay, W. (1993). Infantile autism. In D. Bishop & K. Mogford (Eds.), Language development in exceptional circumstances (pp. 190–202). Mahwah, NJ: Lawrence Erlbaum Associates. Folstein, S., & Rutter, M. (1977). Infantile autism: A genetic study of 21 twin pairs. Journal of Child Psychology and Psychiatry, 18, 297–321. Frith, U. (1991). Asperger and his syndrome. In U. Frith (Ed.), Autism and Asperger syndrome (pp. 1–36). Cambridge: Cambridge University Press. Frith, U. (1993). Autism and Asperger syndrome. Cambridge, England: Cambridge University Press. Gillberg, C., Nordin, V., & Ehlers, S. (1996). Early detection of autism: Diagnostic instruments for clinicians. European Child and Adolescent Psychiatry, 5, 67–74. Grandin, T. (1995). Thinking in pictures and other reports from my life with autism. New York: Doubleday. Hall, N. E., & Aram, D. M. (1996). Classification of developmental language disorders. In I. Rapin (Ed.), Preschool children with inadequate communication (pp. 10–20). London: MacKeith Press.
Page 185 Haas, R. H., Townsend, J., Courchesne, E., Lincoln, A. J., Schreibman, L., & YeungCourchesne, R. (1996). Neurologic abnormalities in infantile autism. Journal of Child Neurology, 11(2), 84–92. Kanner, L. (1943). Autistic disturbances of affective contact. Nervous Child, 2, 217–250. Lord, C., Rutter, M., & Le Couteur, A. (1994). Autism Diagnostic InterviewRevised: a revised version of a diagnostic interview for caregivers of individuals with possible pervasive developmental disorders. Journal of Autism & Developmental Disorders, 24, 659–685. Myles, B. S., Simpson, R. L., & Becker, J. (1995). An analysis of characteristics of students diagnosed with higherfunctioning Autistic Disorder. Exceptionality, 5 (1), 19–30. Nordin, V., & Gillberg, C. (1996a). Autism spectrum disorders in children with physical or mental disability or both: I. Clinical and epidemiological aspects. Developmental Medicine and Child Neurology, 38, 297–311. Nordin, V., & Gillberg, C. (1996b). Autism spectrum disorders in children with physical or mental disability or both: II. Screening aspects. Developmental Medicine and Child Neurology, 38, 314–324. Page, J., & Boucher, J. (1998). Motor impairments in children with autistic disorder. Child Language Teaching and Therapy, 14, 233–259. Paul, R. (1987). Communication. In D. J. Cohen & A. M. Donnellan (Eds.), Handbook of autism and pervasive developmental disorders (pp. 61–84). New York: Wiley. Paul, R. (1995). Language disorders from infancy through adolescence: Assessment and intervention. St. Louis, MO: Mosby. Piven, J., Harper, J., Palmer, P., & Arndt, S. (1996). Course of behavioral change in autism: A retrospective study of highIQ adolescents and adults. Journal of the American Academy of Child and Adolescent Psychiatry, 35, 523–529. Prizant, B. M., & Wetherby, A. M. (1993). Communication in preschool autistic children. In E. Schopler, M. E. Van Bourgondien, & M. M. Bristol (Eds.), Preschool issues in autism (pp. 95–128). New York: Plenum. Ramberg, C., Ehlers, S., Nyden, A., Johansson, M., & Gillberg, C. (1996). Language and pragmatic functions in schoolage children on the autism spectrum. European Journal of Disorders of Communication, 31, 387–414. Rapin, I. (1996). Classification of autistic disorder. In I. Rapin (Ed.), Preschool children with inadequate communication. (pp. 10–20). London: MacKeith Press. Rimland, B. (1964). Infantile autism. New York: Appleton. Rimland, B., & Edelson, S. M. (1995). Brief report: a pilot study of auditory integration training in autism. Journal of Autism & Developmental Disorders, 25, 61– 70. Roux, S., Malvy, J., Bruneau, N., Garreau, B., Guerin, P., Sauvage, D., & Barthelemy, C. (1994). Identification of behaviour profiles within a population of autistic children using multivariate statistical methods. European Child and Adolescent Psychiatry, 4, 249–258. Rutter, M., & Schopler, E. (1987). Autism and pervasive developmental disorders: Concepts and diagnostic issues. Journal of Autism and Developmental Disorders, 17, 159–186. Schopler, E., Reichler, R. J., & Renner, B. R. (1988). The Childhood Autism Rating Scale (CARS). Revised. Los Angeles: Western Psychological Services. Sevin, J. A., Matson, J. L., Coe, D., Love, S. R., Matese, M. J., & Benavidez, D. A. (1995). Empirically derived subtypes of pervasive developmental disorders: A cluster analytic study. Journal of Autism and Developmental Disorders, 25, 561–578. Shields, J., Varley, R., Broks, P., & Simpson, A. (1996). Hemispheric function in developmental language disorders and highlevel autism. Developmental Medicine and Child Neurology, 38, 473–486. Snow, C. E., & Pan, B. A. (1993). Ways of analyzing the spontaneous speech of children with mental retardation: The value of crossdomain analyses. In N. W. Bray (Ed.), International review of research in mental retardation (Vol. 19, pp. 163–192). New York: Academic Press. Sponheim, E. (1996). Changing criteria of autistic disorders: A comparison of the ICD10 research criteria and DSM–IV with DSM–III–R, CARS, and ABC. Journal of Autism and Developmental Disorders, 26, 513–525. Strain, P. S. (1990). Autism. In M. Hersen & V. B. Van Hasselt (Eds.), Psychological aspects of developmental and physical disabilities: A casebook (pp. 73– 86). Newbury Park, CA: Sage.
Page 186 Trevarthen, C., Aitken, K., Papoudi, D, & Robarts, J. (1996). Children with autism: Diagnosis and interventions to meet their needs. London: Jessica Kingsley. Waterhouse, L. (1996). Classification of autistic disorder (AD). In I. Rapin (Ed.), Preschool children with inadequate communication (pp. 21–30). London: MacKeith Press. Waterhouse, L., Morris, R., Allen, D., Dunn, M., Fein, D., Feinstein, C., Rapin, I., & Wing, L. (1996). Diagnosis and classification in Autism. Journal of Autism and Developmental Disorders, 26, 59–86. Wender, E. (1995). Hyperactivity. In S. Parker & B. Zuckerman (Eds.), Behavioral and developmental pediatrics (pp. 185–194). Boston: Little, Brown. Williams, D. (1996). Autism—An insideout approach: An innovative look at the mechanics of ‘autism ‘ and its developmental ‘cousins.’ Bristol, PA: Jessica Kingsley. Wing, L. (1991). The relationship between Asperger’s syndrome and Kanner’s autism. In U. Frith (Ed.), Autism and Asperger syndrome (pp. 93–121). Cambridge, England: Cambridge University Press. WolfSchein, E. G. (1996). The autistic spectrum disorder: A current review. Developmental Disabilities Bulletin, 24(1), 33–55. World Health Organization. (1992). The ICD10 classification of mental and behavioral disorders: Clinical descriptions and diagnostic guidelines. Geneva, Switzerland: Author. World Health Organization. (1993). The ICD10 classification of mental and behavioral disorders: Diagnostic criteria for research. Geneva, Switzerland: Author.
Page 187 CHAPTER
8 Children with Hearing Impairment Defining the Problem Suspected Causes Special Challenges in Assessment Expected Patterns of Oral Language Performance Related Problems Bradley was 5 years old when it was determined that he had a mild, bilateral sensorineural hearing loss. Prior to entering kindergarten, his parents described him as a shy child who disliked larger play groups and preferred playing alone or with one close friend. In a noisy 16child classroom, the adequacy of his hearing was first questioned by his kindergarten teacher, who reported that she often had difficulty getting his attention and found his poor attention during circle time inconsistent with his good attention in oneonone situations. A hearing screening by a speechlanguage pathologist, which was performed because of concerns about delayed phonologic development, was the immediate source of a referral for the complete audiological examination in which his hearing loss was identified. After detection of the hearing loss, Bradley was fitted for binaural behindtheear aids. (He loved the bright blue earmolds and tubing he was allowed to choose.) Within a short time of the fitting, Bradley appeared more attentive during circle time and readily made progress in work on targeted speech distortions.
Page 188 Sammy, or Samantha on formal occasions, is a 3yearold whose moderate highfrequency hearing loss was identified shortly after birth following her failure on a highrisk screening conducted because of her family history of hearing loss. Because initially an earlevel fitting proved unfeasible, Sammy used a bodyworn aid, which was replaced by a behindtheear fitting at age 1½. Six months ago, the use of an FM trainer was extended to the home after continuous use in a preschool group that she had attended since age 1½. Although she is experiencing some delays in speech, her communication development otherwise appears ontarget. Desmond’s profound hearing loss was identified using auditory brainstem response (ABR) during his 3week stay in a neonatal intensive care unit, following his premature birth at 7 months gestational age with a birth weight of 3.1 pounds. He required ventilator support for 5 days after birth. Now 5 years old, Desmond’s parents have been frustrated by Desmond’s slow progress in oral language development despite years of participation in special education and several failed attempts at successful amplification. Desmond currently uses a vibrotactile aid to increase his awareness of environmental sounds and his speech reception and is being considered as a candidate for a cochlear implant. Defining the Problem Estimates of the prevalence of hearing impairment in children vary from 0.1 to 4%—or from 1 in every 25 to 1 in every 1,000 children—depending on the definitions used (BradleyJohnson & Evans, 1991; Northern & Downs, 1991). Of children between the ages of 3 and 17, about 52,000 have impairments severe enough to be termed deafness, where deafness can be defined as a hearing loss, usually above 70 dB, that precludes the understanding of speech through listening (Ries, 1994). When all levels of hearing loss are considered, hearing impairment is the most common disability among American school children (Flexer, 1994). The negative impact of deafness for the normal acquisition of oral language may seem obvious: You cannot learn about phenomena with which you have limited experience. In addition, for children with profound hearing loss, this experience is largely restricted to a sensory channel (i.e., vision) that is mismatched to the most distinctive characteristics of that phenomenon (i.e., oral language). One line of evidence suggesting how great this mismatch is comes from a growing body of research suggesting that the structure of oral languages differs substantially from that of visuospatial languages (such as sign; Bellugi, van Hoek, LilloMartin, & O’Grady, 1993). Nonetheless, there is research suggesting that lipreading becomes more important to oral language development as hearing impairment worsens (MogfordBevan, 1993). Because limiting auditory exposure limits learning opportunities, even children with milder hearing losses—who therefore obtain greater amounts of acoustic information about oral language than children with greater hearing losses—experience significant consequences for their spoken language reception in everyday situations. Therefore, although this chapter focuses most intently on children with greater degrees of hearing impairment, it also alerts readers to the jeopardy in which children
Page 189 with even unilateral or “mild” bilateral hearing impairments are placed when it comes to language learning and academic success (Bess, 1985; Bess, Klee, & Culbertson, 1986; Carney & Moeller, 1998; Culbertson & Gilbert, 1986; Oyler, Oyler, & Matkin, 1988). In the Personal Perspective for this chapter, a teenager describes the ways in which deafness has affected her school life. PERSONAL PERSPECTIVE The following is an excerpt from the transcript of a statement made by Darby, a high school junior with a profound hearing impairment. She speaks about the academic and personal challenges facing her in school: “I have never, and most likely never will, hear sounds in the same way as a hearing person. As a result, hearing people experience things every millisecond of the day that I never will. By the same token, I have experienced things and will experience things that no hearing person can. “My deafness makes me different, and that difference makes me strong. I seem to get respect from other people just for doing things a hearing person can do with ease. For example, watch television, use the telephone, listen to music, and so on. For whatever reason, I never think about the fact that I am doing something that would normally be difficult for someone who couldn’t hear. In fact, I have never looked at myself as someone who was limited in any way, someone who couldn’t do something that any other hearing person could do I’ve always know that I was different, but even though people would intimate that I wasn’t able to compete on the same’s level as hearing people, I would ignore them, or maybe I just didn’t “hear” them. “I have always attended Dalton, a private hearing school. It has never been, and never will be, easy for me. I have experienced periods of rejection and isolation, but I have proven myself worthy of the privilege of attending this school by receiving grades as good as many of my hearing peers and better than most. “I have definitely survived the academic challenges of my school and life. Socially, I still feel though that I’m not accepted as a true equal, but hey, that’s their problem, they don’t know what they’re missing.’’ (Ross, 1990, pp. 304–305) Overall degree of hearing loss, or magnitude, is a major descriptor of hearing impairment, usually based on an estimate of an individual’s ability to detect the presence of a pure tone at three frequencies important for speech information (500, 1000, and 2000 Hz; BradleyJohnson & Evans, 1991). Table 8.1 lists major categories of hearing loss and provides some preliminary information about the effects of that level of loss. Although deafness is not listed as a category in the table, it is frequently used to refer to a hearing loss greater than or equal to 70 dB (Northern & Downs, 1991).
Page 190 Table 8.1 Effect of Differing Magnitudes of Hearing Loss Average Hearing Level (500–2000 Hz) 0 to 15 dB
Description
What Can Be Heard Without Amplification
Handicapping Effects (If Not Treated in First Year of Life)
Probable Needs
Normal range
All speech sounds
None
None
15 to 25 dB
Slight hearing loss
Vowel sounds are heard clearly; unvoiced consonant sounds may be missed
Mild auditory dysfunction in language learning
Consideration of need for hearing aid, speechreading, auditory training, speech therapy, preferential seating
25 to 30 dB
Mild hearing loss
Only some of the speech sounds are heard—the louder voiced sounds
Auditory learning dysfunction, mild language retardation, mild speech problems, inattention
Hearing aid, speech reading, auditory training, speech therapy
30 to 50 dB
Moderate hearing loss
Almost no speech sounds are heard when produced at normal conversational level
Speech problems, language retardation, All of the above, plus consideration of learning dysfunction, inattention special classroom situation
50 to 70 dB
Severe hearing loss
Severe speech problems, language No speech sounds are heard at normal retardation, learning dysfunction, conversational level inattention
All of the above, probable assignment to special classes
70+ dB
Profound hearing loss
No speech or other sounds are heard
Severe speech problems, language retardation, learning dysfunction, inattention
All of the above, probable assignment to special classes
Note. From Hearing in Children (4th ed., p. 14), by J. L. Northern and M. P. Downs, 1991, Baltimore: Williams & Wilkins. Copyright 1994 by Williams & Wilkins. Reprinted with permission.
Page 191 The term hard of hearing is used to refer to lesser degrees of hearing loss that allow speech and language acquisition to occur primarily through audition (Ross, Brackett, & Maxon, 1991). In addition to the magnitude of loss, related variables that influence how children’s language is affected include (a) variables affecting the auditory nature of the loss (such as type, configuration, and whether the loss is unilateral or bilateral), (b) the age at which the hearing loss is acquired, (c) the age at which it is identified, and (d) how well the loss is managed. Type of hearing loss—conductive, sensorineural, or mixed—refers to the physiological site responsible for reduced sensitivity to auditory stimuli. Conductive hearing losses result from conditions that prevent adequate transmission of sound energy somewhere along the pathway leading from the external auditory canal to the inner ear. They can result from conditions that block the external ear canal or interfere with the energytransferring movement of the ossicles (small bones) of the middle ear. Conductive losses are generally similar across frequencies and, at their most severe, do not exceed 60 dB (Northern & Downs, 1991). Such losses can often be corrected or significantly reduced using medical or surgical therapies (Paul & Jackson, 1993). One particularly common cause of conductive hearing loss is middle ear infection, otitis media. The hearing loss associated with this condition may be the most widely experienced form of hearing loss, given that 90% of children in the United States have had at least one episode of otitis media by age 6 (Northern & Downs, 1991). Although not all episodes of otitis media are associated with hearing losses, when they are observed the overall magnitudes of loss have generally been found to fall from 20 to 30 dB in the affected ear (Frial Cantekin, & Eichler, 1985). Sensorineural hearing losses result from damage to the inner ear or to some portion of the nervous system pathways connecting the inner ear to the brain. They are responsible for the most serious hearing losses, accounting for or contributing to most hearing losses in the severe to profound range. In addition, they account for most congenital hearing losses (Scheetz, 1993) and are rarely reversible (Northern & Downs, 1991). Mixed hearing losses refer to losses in which both conductive and sensorineural components are evident. Because the conductive components of a mixed hearing loss are generally treatable, such losses often become sensorineural in nature following effective treatment for the condition underlying the conductive loss. For example, a child with Down syndrome may experience a mixed loss consisting of a sensorineural loss exacerbated by poor eustachian tube function and chronic otitis media. Effective management of the middle ear condition can reduce the magnitude of the loss substantially in many cases. Consequently, clinicians who work with children who have sensorineural losses need to be especially aware that an already significant degree of loss can be further worsened if middle ear disease goes undetected. Central auditory processing disorders refer to abnormalities in the processing of auditory stimuli occurring in the absence of reduced acuity for pure tones or at a more pronounced level than would be expected given the degree of reduced acuity. In especially severe cases, such difficulties have been described as a specific type of language disorder: verbal auditory agnosia (Resnick & Rapin, 1991). Although central auditory processing disorders receive increasing attention by audiologists, their sepa
Page 192 rability from language disabilities and other learning disabilities continues to be debated (Cacace & McFarland, 1998; Rees, 1973). Hearing loss configuration refers to the relative amount of loss occurring at different frequency regions of the sound spectrum. For example, a highfrequency loss is one in which the loss is largely or solely confined to the higher frequencies of the speech spectrum. In contrast, a flat hearing loss is one in which the degree of loss is relatively constant across the spectrum. Knowing the magnitude and configuration of an individual’s hearing loss can help you predict what sounds will be difficult for him or her to hear at specific loudness levels. A pair of figures may help illustrate this. Figure 8.1 consists of two frequency × intensity graphs (like those of a traditional audiogram) on which are plotted a variety of common sounds occurring at various intensity levels and frequencies. The shaded area on Figure 8.1A indicates the sound frequencies and intensities that might not be heard by children with severe highfrequency hearing losses—children such as Sammy, who was described at the beginning of the chapter. Although Sammy would easily hear environmental sounds such as car horns or telephones as well as many speech sounds when they are produced at conversational loudness levels, she would probably miss most fricative sounds because of their high frequency (high pitch) and low intensity (softness) when they are produced in the same conversations. Figure 8.1B represents the kind of loss frequently associated with deafness, the kind of loss demonstrated by Desmond. The negligible amount of auditory information to which Desmond has access is wellillustrated by this figure. The centrality of visual information to Desmond’s interactions with the world is further brought home when you are told that even the best available amplification would probably fail to improve Desmond’s access to sound information. Consequently, it is not surprising that vision has been called “the primary input mode of deaf children” (Ross, Brackett, & Maxon, 1991) and that management of the communication needs of such children often veers away from methods in which auditory information plays a major role (Nelson, Loncke, & Camarata, 1993), although growing effectiveness of cochlear implants may increase that somewhat, especially as cochlear implants are used at younger ages (TyeMurray, Spencer, & Woodworth, 1995). A cochlear implant entails the insertion of a sophisticated device that includes an internal receiver/stimulator and an external transmitter and microphone with a micro speech processor (Sanders, 1993). Their rapid development and increasing application make them an exciting development in the management of severe hearing losses. Whether one or both ears are affected represents another important factor determining the significance of a hearing loss. Unilateral hearing losses, ones affecting only one ear, usually have fewer negative consequences than bilateral hearing losses. That does not mean, however, that unilateral losses are insignificant. Adequate hearing in both ears is of particular importance when listening to quiet sounds or in noisy surroundings—especially for children. This special importance of bilateral hearing in children arises because their incomplete language acquisition makes using language knowledge and environmental context to “guess” the message being conveyed by an imperfect signal much harder for them than it is for adults. In a study conducted by Bess et al. (1986), about one third of the children who exhibited unilateral sen
Page 193
Fig. 8.1. Figures illustrating the types of sounds that are likely to be heard (unshaded areas) and not heard (shaded areas) for two different hearing losses: a severe highfrequency loss (8.1A) and a profound hearing loss (8.1B). For purposes of clarity, but contrary to most instances in real life, these figures represent hearing loss as identical for each ear. From Hearing in children (4th ed., p. 17), by Northern and Downs, Baltimore: Williams & Wilkins. Copyright © 1991 by Williams & Wilkins. Adapted by permission.
Page 194 sorineural hearing losses of 45 dB HL or greater were found to have either failed a grade or required special assistance in school. Despite the importance of the nature of hearing loss affecting a child to that child’s overall outcome for speech and oral language, several nonauditory factors can play a very significant role. For example, the age at which a hearing loss is acquired has a tremendous impact on the extent to which it will interfere with the acquisition of oral language. Congenital hearing losses, those present at birth, are more detrimental than those acquired in early childhood, which in turn are more detrimental than those acquired in later childhood or adulthood. Even 3 or 4 years of good hearing can dramatically alter a child’s later language skills (Ross, Brackett, & Maxon, 1991). This fact has led to the use of the term prelingual hearing loss to refer to a hearing loss acquired before age 2, which is thus thought to be associated with a more significant impact (Paul & Jackson, 1993). The age of detection of hearing loss in children is yet another variable affecting the oral language of hearingimpaired children. The earlier the detection of hearing loss in children, the better the outcome for language acquisition—assuming, of course, that adequate intervention follows. Recently devised methods, such as the measurement of auditory brainstemevoked responses and transientevoked otoacoustic emissions, permit the detection of even mild hearing loss in children from shortly after birth (Carney & Moeller, 1998; Mauk & White, 1995; Northern & Downs, 1991). Between 10 and 26% of hearing loss is estimated to exist at birth or to occur within the first 2 years of life (Kapur, 1996), thus making efforts at detection an ongoing need. Despite the possibility of early detection, however, hearing loss will escape detection for varying periods of time in children whose hearing is not screened or is screened prior to the onset of the loss. In a recent study, Harrison and Roush (1996) surveyed the parents of 331 children who had been identified with hearing loss. They found that when there was no known risk factor, the median age of identification of hearing loss was about 13 months for severe to profound losses and 22 months for mild to moderate losses. Although the presence of known risk factors was associated with decreased age at identification for milder losses (down to about 12 months), identification for more severe losses remained about the same in this group (12 months). Median additional delays of up to 10 months were observed between identification of hearing loss and early interventions. These delays represent precious lost time for children whose auditory experience of the world is compromised. Only in late 1999 have efforts to make universal screening of infant hearing a reality (Mauk & White, 1995) received momentous support in the form of The Newborn and Infant Hearing Screening and Intervention Act of 1999. This federal legislation provides new funding for newborn hearing screening grants to individual states. It is hoped that this funding will cause all states to implement infant screening programs leading to a revolution in the early identification of hearing loss. A fourth factor influencing how hearing loss will affect children’s language development is the management of the loss. For children with mild and moderate bilateral or unilateral losses, there is considerable agreement as to the approaches that will optimize their access to the auditory signal on which they will rely for processing infor
Page 195 mation about oral language. Table 8.2 lists some of the types of interventions typically considered in the hearing management of children with these lesser degrees of loss. When it comes to children with greater losses, however, there is much controversy among professionals as well as members of the Deaf community (Coryell & Holcomb, 1997). A frequent battleground for those interested in interventions for deaf youngsters concerns the primacy of oral versus signed language. Arguments favoring an emphasis on oral language stress that the vast majority of society are users of oral language and, therefore, deaf children should be given tools with which to negotiate effectively within that context. Further, it can be stressed that their families will almost always (90% of the time) be composed entirely of hearing individuals (Mogford, 1993). Arguments favoring an emphasis on sign language stress that the Deaf community is a cohesive subculture in which visuospatial communication is the effective norm. In fact, in recent years, the Deaf community has begun to advocate for a difference rather than disorder perspective on hearing impairment, a political perspective thought to be vital to the emotional and social wellbeing of its members (Corker, 1996; Harris, 1995). Arguments favoring a strong emphasis on sign language also sadly note that only poor levels of achievement in oral language and particularly poor Table 8.2 Interventions Used With Children Who Have Mild and Moderate Hearing Impairment (Brackett, 1997) Method
Function
Personal amplification FM radio systems used with remote microphones
Increase loudness levels of acoustic signals; acoustic signal enhanced relative to background noise levels; A far superior means of dealing with a noisy classroom than preferential seating (Flexer, 1994); one of several types of special amplification systems (Sanders, 1993)
Sound treatment of classrooms (e.g., using carpets, acoustic ceiling tiles, curtains)
Reduction of reverberation and other sources of noise
Preferential seating
Reduction of distance between speaker and child can increase audibility of a signal; Sitting next to a child is better than sitting in front of the child (Flexer, 1994), although for children who require visual information, this strategy decreases access to visual information
Inclusion in regular classroom with supplementation through pullout services
Provision of the wealth of social and academic experiences afforded by regular classrooms, with support designed to preview and review instructional vocabulary as well as work on communication goals inconsistent with classroom setting (e.g., the earliest stages involved in acquiring a new communicative behavior)
Auditory learning program (e.g., Ling, 1989; Stout & Windle, 1992)
Improvement of the child’s attention and use of auditory information enhanced by personal and classroom amplification
Page 196 levels of achievement in written language (which often plateaus at a thirdgrade level) have been the norm in studies of individuals with severe to profound hearing losses (Dubé, 1996; Paul, 1998). Total communication was originally proposed as the simultaneous use of multiple communication modes (e.g., fingerspelling, sign language, speech, and speech reading) selected with the child’s individual needs in mind. As implemented, however, total communication has been found typically to consist of the simultaneous use of speech and one of several sign languages other than American Sign Language (ASL) that use word order and word inflections closely resembling those of spoken English (Coryell & Holcomb, 1997). The most prominent examples of these sign languages, sometimes referred to as manually coded English systems, are Signing Essential English (SEE1), Signing Exact English (SEE2) and Signed English. Although most classroom teachers report using this relatively limited form of total communication (sometimes termed simultaneous communication), it is infrequently used among adults in the Deaf community (Coryell & Holcomb, 1997). In a review of studies of treatment efficacy for hearing loss in children, Carney and Moeller (1998) noted a current trend toward considering oral language as a potential second language for deaf children, to be acquired after some degree of proficiency in a first (visuospatial) language is attained. This approach, termed the bilingual education model, is seen by some as having the strengths associated with learning a language (i.e., ASL) for which a cohesive community of users exists, while at the same time valuing the importance of English competence as a curricular rather than rehabilitative issue (Coryell & Holcomb, 1997; Dubé, 1996). Data supporting this approach, however, are relatively sparse as yet. To date, such data consist of evidence of strong academic performance in English by deaf children reared by deaf parents who are proficient in ASL and evidence that skills in English are strongly related to skills in ASL, independent of parental hearing status (Moores, 1987; Spencer & Deyo, 1993; Strong & Prinz, 1997). A recent position statement of the Joint Committee of ASHA and the Council on Education of the Deaf (1998) illustrates the growing influence of the Deaf community’s insistence that deafness be viewed as a “cultural phenomenon” rather than a clinical condition (Crittenden, 1993). In that position statement, professionals are cautioned to adopt terminology that respects the individual and family or caregiver preferences while facilitating the individual’s access to services and assistive technology. Sensitivity to cultural factors is a requisite for speechlanguage pathologists in all settings working with all populations. For speechlanguage pathologists working with members of the Deaf community, it is a requirement of critical importance to the deaf child’s social and emotional development. Suspected Causes What is currently known about the causes of permanent hearing impairment in children is almost entirely restricted to studies focused on more serious levels of hearing loss, especially deafness. Although there may be considerable overlap in the known
Page 197 causes of deafness and milder degrees of impairment, differences also exist. Because this section limits itself to causes related to these more severe levels of hearing loss, I remind readers that what I say relates less clearly to children with milder losses. Genetic factors are suspected in about half of all cases of deafness (Kapur, 1996; Vernon & Andrews, 1990). Of these genetically based instances of deafness, about 80% are due to autosomal recessive disorders, almost 20% are autosomal dominant disorders, and the remaining are sexlinked (Fraser, 1976). Because recessive disorders demand that both parents of an individual contribute a defective gene for their offspring to demonstrate the disorder without necessarily showing evidence of the disorder themselves, it is relatively uncommon for children with congenital deafness to have parents who are also deaf. This information is important for appreciating that most congenitally deaf children grow up with parents whose first language is oral and who will need to acquire sign as a belated second language if they are to assist their child’s acquisition of sign. Genetically caused deafness sometimes occurs within the context of genetic syndromes in which one or more specific organ systems (e.g., the skeleton, skin, nervous system) are also affected. About 70 such syndromes have been identified, including Down syndrome, Apert syndrome, Treacher Collins, Pierre Robin, and muscular dystrophy (Bergstrom, Hemenway, & Downs, 1971). Although most genetically caused deafness will be sensorineural in type, conductive components are also observed. Some syndromes are associated with hearing losses that are progressive, causing increasing hearing loss over time, often at unpredictable rates. Examples of such syndromes are Friedrich’s ataxia, severe infantile muscular dystrophy, and Hunter syndrome, as well as the closely related Hurler syndrome. Nongenetic causes of deafness include prenatal rubella, postnatal infection with meningitis, prematurity, rh factor incompatibility between mother and infant, exposure to ototoxic drugs, syphilis, Meniere’s disease, and mumps (Vernon & Andrews, 1990). Four of these factors—prenatal rubella, meningitis, syphilis, and mumps—are infectious diseases meaning that their successful prevention can drastically reduce instances of deafness from those causes. The three noninfectious factors most commonly associated with hearing loss in children are rh factor incompatibility, exposures to ototoxic drugs, and Meniere’s. Rh factor incompatibility refers to a condition in which a mother and the embryo she is carrying have blood types characterized by discrepant rh factors, a circumstance that stimulates the production of maternal antibodies against the developing child. This condition is currently considered preventable through maternal immunization or the treatment of the infant using phototherapy or transfusions (Kapur, 1996). Ototoxicity refers to a drug’s toxicity to the inner ear. Although the use of drugs with this side effect is usually avoided in pregnant women and infants, they may be required as the only effective treatment for some diseases. Monitoring of hearing can frequently prevent hearing loss in children who require treatment with ototoxic drugs because of infections or cancer (Kapur, 1996). Prematurity, birth 2 or more weeks prior to expected due date (Dirckx, 1997), is an increasingly frequent correlate of hearing impairment. Whereas mortality was once
Page 198 an almost certain outcome of prematurity, improved neonatal care over the past half century (Vernon & Andrews, 1990) has resulted in the increased survival of children who nonetheless may show residual effects. Premature birth is most directly associated with hearing impairment and other cooccurring difficulties (e.g., mental retardation, cerebral palsy) through the neurologic stresses it places on the infant. Indirect links between prematurity and hearing impairment lie in the fact that premature birth is frequently precipitated by conditions that are themselves associated with hearing impairment (such as prenatal rubella, meningitis, and rh factor incompatibility). Prematurity increases risk of deafness by 20 times (Kapur, 1996). Special Challenges in Assessment When assessing the oral communication skills of children with hearing impairment, the speechlanguage pathologist is confronted with numerous threats to the validity of his or her decision making. Therefore, in addition to the usual care that must be taken to determine the precise questions prompting assessment and factors that may complicate accurate information gathering, clinicians working with children whose hearing is temporarily (e.g., during episodes of otitis media) or permanently impaired, must consider a larger than usual range of possible complicating factors and necessary adaptations. Table 8.3 lists some of the considerations related to the evaluation of language skills of a child with hearing impairment. A major first consideration for children with very severe hearing loss is the choice of language or languages in which the child is to be assessed. Often, testing in both a sign and an oral language is reasonable for obtaining information about potentially optimal performance as well as about development with the alternative form. Complexities of the child’s hearing loss and of its management will need to be considered in making this decision, because children who may be considered deaf do not always receive enough exposure to sign language to consider it their first language (Mogford, 1993). Although efforts to standardize assessments of ASL have begun (e.g., LilloMartin, Bellugi, & Poizner, 1985; Prinz & Strong, 1994; Supalla et al., 1994), children’s performance in ASL (the most common sign language system in the United States) is usually informally assessed by individuals with high levels of proficiency in ASL. A small number of standardized tools have been developed. Among these are the Caro Table 8.3 Considerations When Planning the Assessment of a Child With Impaired Hearing Determine what modality or modalities will be used. Match demands placed on hearing to assessment question. Ensure that instructions are understood. Identify an appropriate normative group for normreferenced interpretations. Ensure optimal attention and minimal distractions. Consider the use of modifications and describe in reports of testing. Rely on multiple measures and team input for preparation and interpretation.
Page 199 lina Picture Vocabulary Test (Layton & Holmes, 1985), which is designed for use with children age 2 years–8 months to 18 years whose primarily mode of communication is sign. Dubé (1996) discusses the current pressing need for better methods of assessing children’s competence in both ASL and English. For children with severe to profound hearing losses, assessment of oral language may also require interactions in ASL (e.g., to assure that a task is understood). Maxwell (1997) reasonably pointed out that deaf individuals often use both sign and spoken language depending on the demands of the communicative situation, and he pointed out that determining what modes of communication a child uses are too often based on hearsay or the clinician’s own limitations in identifying the communication system being used. Therefore, speechlanguage pathologists who work frequently with hearing impaired children should be proficient in sign themselves and, ideally, in both signed English and ASL modes. Those who are not proficient but receive occasional requests to serve hearingimpaired children should proceed carefully in determining what can be done in the absence of such proficiency and should be prepared to make referrals as needed to ensure optimal assessment data. The majority of this section of the chapter is devoted to considerations coming into play during oral language testing. For children with all degrees of hearing loss, one of the first considerations in oral language testing is the listening condition confronting the child. For example, is the setting in which the testing is done relatively quiet? If very quiet, optimal performance may be assessed (assuming other factors are optimal). If less quiet, optimal performance will be unlikely, but useful information for extrapolating typical performance in similar settings may be obtained. In many cases, language esting is performed for purposes of examining optimal performance. However, if the purpose of testing is to determine the kind of difficulty facing the child in a conventional classroom, then testing in noisier environments would be indicated. Ying (1990) discussed a systematic approach to examining the child’s functional auditory skills under conditions varying (a) access to both visual and auditory information versus to auditory information only and (b) auditory stimuli that are close versus far in conditions that are (c) noisy versus quiet. Knowledge of the child’s listening conditions includes not only information about the ambient environment, but also about the status of the child’s hearing and hearing aid at the time of testing. Recalling that children’s hearing can be affected more readily by middle ear infections than can adults’ makes it particularly important to know whether the child has an upper respiratory infection or is showing signs of reduced hearing, such as altered response to auditory stimuli or appearing confused or in pain in noisy situations (Flexer, 1994). Ascertaining directly or indirectly that a hearing aid has charged batteries and is functioning well is time well spent given studies indicating that children’s hearing aids are frequently found to be functioning unacceptably (e.g., Musket, 1981; Worthington, Stelmachowicz, & Larson, 1986). In addition, when hearing aids are in place but not functioning because of a dead battery, their ‘‘use” has been found to reduce hearing by an additional 25 to 30 dB at critical speech frequencies (Smedley & Plapinger, 1988). Ensuring that directions are understood is obviously most crucial for receptive language testing, but can be critical for expressive language testing as well, particularly when verbal instructions are used. Besides the steps described earlier, which are
Page 200 aimed at improving the child’s auditory access to information, seating to make the tester’s face visible and welllit (not backlit) can help. Because children with hearing loss may fail to signal their incomplete understanding of directions (Paul & Jackson, 1993), the clinician must be particularly watchful for hesitations or facial expressions indicating a lack of understanding. In addition, students should be encouraged to ask questions when they are uncertain (BradleyJohnson & Evans, 1991). When the questions being asked in an assessment are in regard to whether the child’s performance is like that of his or her peers, the ticklish question of norms is presented. Sometimes it is assumed that those peers should be a group that is similar in age to the tested child (e.g., because these are the children with whom a child will be compared at school and with whom he or she shares a common developmental history; BradleyJohnson & Evans, 1991). In such cases, finding appropriate norms is relatively easy, and the use of peers with normal hearing is quite appropriate (Brackett, 1997; Ying, 1990). However, using that normative group will not help you figure out whether any observed decrements in performance are due primarily to hearing differences, or whether additional cognitive or environmental barriers to language learning exist. For those questions, norms should ideally consist of children with similar patterns of hearing impairment and similar developmental experiences. Alternatively, interpretations based on information other than norms should be considered (for detailed discussion of informal methods, see Maxwell, 1997; Moeller, 1988; Ross, Brackettt, & Maxon, 1991; and YoshinagaItano, 1997). Table 8.4 lists language tests that have been developed for or normed on children with hearing impairment (BradleyJohnson & Evans, 1991). Even when such norms are available, determining the appropriateness of the norms still hinges on the test user’s examination of the test manual for specific information about the normative sample. For children with hearing impairment, factors affecting the relevance of norms include the group’s age of onset of the hearing loss, degree and type of loss, etiology, presence of other significant problems, and the communication used during testing (BradleyJohnson & Evans, 1991). Once an appropriate measure has been selected, increasing the child’s attention to the task and minimizing distractions further enhance the possibility of obtaining information reflecting optimal performance. Positioning oneself close to the child and paying close attention to the child’s gaze as a signal of current focus can help increase attention while also minimizing distractions (Maxwell, 1997). By modifying testing procedures, one risks invalidating normative comparisons. However, when testing modifications are noted in reports on the testing and discussed for their possible effects on test validity, their use can actually improve validity by removing sources of error that are unrelated to the skill or attribute being tested. Ying (1990) discussed a number of possible modifications to use when testing children with hearing impairment. These include asking the child to repeat all verbal stimuli to ensure that poor reception is not undermining performance and using extra demonstration items to ensure that the child understands the task demands. Another possible modification she recommended was repeating verbally presented test items. Also, when standardized instructions call for simultaneous presentation of verbal and visual stimuli, she suggested altering procedures so that the verbal stimulus is presented
Page 201 Table 8.4 Language Tests for Children With Hearing Impairment That Were Designed or Adapted for Children With Hearing Impairment (BradleyJohnson & Evans, 1991) Test
Ages
Description of the Test
Comments
Batelle Developmental Inventory (Newborg, Stock, Birth to 8 Wnek, Guidubaldi, & Svinicki, years of age 1984)
Tests across 5 domains: personalsocial, l Although children with hearing impairment are described as an adaptive, motor, communication, and cognitive; appropriate population for testing, neither norms nor studies purpose is to identify children with handicaps, validating that use are contained in the test manual determine strengths and weaknesses, and help in l Adaptations of items in the communication domain have been planning instruction and monitoring progress described as “inappropriate”
Grammatical Analysis of Elicited Language—Pre 3 to 6 years sentence Level (GAELP; Moog, Kozak, & Geers, 1983)
l Standardized on 150 hearingimpaired children enrolled in oral Skills assessed for comprehension, prompted educational programs whose hearing impairment was not production, and imitated production, with items described at 3 levels: readiness, single words, and word l No data for children who use manual communication combinations l Scores expressed as percentiles
Norms obtained for 3 groups of children with hearing impairment and one nonhearingimpaired group; considerable information is available about these groups; one of the groups with hearing impairment came from total education backgrounds and was tested using that method l 94 items assessing articles, modifiers, pronouns, subject nouns, object nouns, wh questions, verbs, verb inflections, copula inflections, prepositions, and negation l Scores expressed as percentiles or language quotients (M= 100; SD = 15) l
Grammatical Analysis of Elicited Language—Simple Sentence Level (GAELS; Moog & Geers, 1985)
(Continued)
5 to 9 years
Skills are assessed in terms of prompted production and imitation.
Page 202 Table 8.4 (Continued) Test
Ages
Description of the Test
Comments Two groups of children, one with and one without hearing impairment were studied; the hearingimpaired children had severe to profound levels of impairment and were without other problem areas; l 16 grammatical categories are assessed: articles, noun modifiers, subject nouns, object nouns, noun plurals, personal pronouns, indefinite and reflexive pronouns, conjunctions, auxiliary verbs, first clause verbs, verb inflections, infinitives and participles, prepositions, negation, and wh questions. l Scores expressed as percentiles or language quotients (M = 100; SD = 15) l
Grammatical Analysis of Elicited Language—Complex Sentence Levels (GAEL C;Moog & Geers, 1980
8 to 12 years
Skills are assessed in terms of prompted production and imitation
Normed on 364 children with hearing impairment ranging from moderate to profound and 283 children without hearing impairment; considerable information is available about the hearingimpaired group l 100 items are used to assess 20 sentence types, including simple sentences, imperatives, negatives, passives, dative sentences, expanded simple sentences, adverbial clauses, relative clauses, conjunctions; deleted sentences, noninitial subjects, embedded imperatives, and complements l Test may be orally presented or presented through simultaneous presentation of signed and spoken English l Results are presented as percentiles or standard scores. l
Rhode Island Test of Language Structure (Engen & Engen, 5 to 17+ years Designed to assess comprehension of syntax 1983)
Standardized on 372 children from 2 years to 8 years, 11 months, with profound hearing impairments from oral programs l Interexaminer reliability data only; no test–retest data or validity information. l
Scales of Early Communication Skills (SECS; Moog & Geers, 2 to 9 years 1975)
Verbal and nonverbal skills are assessed receptively and expressively through teacher ratings
Teacher Assessment of Grammatical Structures (TAGS; Moog & Kozak, 1983)
There are 3 levels of the test: presentence, simple sentence, Criterionreferenced teacher rating of children’s l and complex sentences grammatical structures at four levels: Can be used with children who use signed or spoken English comprehension, imitated production, prompted l l Structures examined are less comprehensive than in other production, and spontaneous production measures developed by Moog and her colleagues
Not specified
Page 203 first followed by the visual stimulus, thus allowing the child to look at the clinician as he or she speaks. Use of an FM listening situation during testing can also be recommended for obtaining information about optimal performance (Brackett, 1997). It is unlikely that one measure or one person who interacts with a hearingimpaired child will capture all of the child’s strengths and weaknesses as a communicator (Moeller, 1988). Consequently, the speechlanguage pathologist will need to rely on multiple measures and seek team input both as an assessment is planned and as it is interpreted. In addition to the audiologist, the child’s educators, psychologists, and especially those who know the child the best—the child him or herself and the child’s parents—can be valuable sources of information. An excellent source of recommendations for effective interactions with families can be found in Donahue Kilburg (1992) and Roush and Matkin (1996). Expected Patterns of Oral Language Performance Despite evidence that even children with mild or unilateral hearing losses are at risk for academic difficulties (Bess, 1985; Bess et al., 1986; Carney & Moeller, 1998; Culbertson & Gilbert, 1986; Oyler et al., 1988), relatively little is known about their oral or sign language development (MogfordBevan, 1993). To date, most research on oral language development in children with hearing impairment has focused on children with more severe congenital losses (MogfordBevan, 1993) or with the fluctuating hearing loss associated with otitis media (Klein & Rapin, 1992). The fluctuating hearing loss associated with otitis media appears more important when combined with other risk factors for disordered language development than it does when viewed as a single explanatory factor (Klein & Rapin, 1992; Paul, 1995). In contrast, there is considerable evidence that deaf children and those who are hard of hearing experience difficulties across all oral language domains and modalities—at least when comparisons are made against sameage peers (MogfordBevan, 1993). Syntax has been described as the “most severely affected aspect of language” in children with hearing loss that occurs congenitally or in early childhood (Mogford Bevan, 1993). Phonology is understandably quite affected, although some children who appear to derive all of their phonological information visually (through speech reading) demonstrate the ability to use the phonological code and show many phonological patterns consistent with younger, hearing children (MogfordBevan, 1993). Documented semantic deficits involve lexical items referring to sounds and concepts related to the ordering of events across time, and possibly, to the use of metaphorical language (MogfordBevan, 1993). Pragmatic deficits are sometimes described and attributed to the close relationship of pragmatics to syntax as well as to changes that occur in conversational interaction on the part of speaker and listener when one is deaf. A different pattern of conversational initiation and turntaking represents the milieu in which such children acquire their knowledge of language use (MogfordBevan, 1993; YoshinagaItano, 1997). Therefore, it has been suggested that comparisons with hearing peers may not prove to be a useful means of understanding the pragmatic development of deaf children. In a recent article, Yoshinaga Itano (1997)
Page 204 described a comprehensive approach to assessing pragmatics, semantics, and syntax among children with hearing impairment in which the interrelationships of these domains was stressed and both informal and formal measures were used. Related Problems Children with hearing loss appear to be at increased risk for a number of problems (e.g., Voutilainen, Jauhiainen, & Linkola, 1988). This increased risk may arise because the cause of the hearing loss has multiple negative outcomes (e.g., some genetic syndromes or infections can cause both mental retardation and hearing loss). Alternatively hearing loss may make children more vulnerable (e.g., children who are less able to communicate for any reason may be a greater risk for psychosocial difficulties). Despite a convergence of evidence suggesting increased risk, the specific prevalence of multiple handicaps in children with hearing loss is a matter of considerable debate (BradleyJohnson & Evans, 1991). The prevalence of specific problems also appears to be related to etiology. For example, whereas children whose hearing impairments are inherited tend to have fewer additional problems (inherited or unknown etiologies), those whose hearing impairment is due to cytomegalovirus are at increased risk for behavioral problems (BradleyJohnson & Evans, 1991). In a 1979 study looking at additional problems areas for children with hearing impairment (Karchmer, Milone, & Wolk, 1979), the most common additional problems were mental retardation (7.8%), visual impairment (7.4%), and emotional–behavioral disorder (6.7%). Although each of these problems was found to occur in less than 10% of children with hearing loss, their prevalence was still considerably higher than in children without hearing loss (BradleyJohnson & Evans, 1991). The increased prevalence of emotional–behavioral disorders is of interest because of the special management issues that accompany it. Biological factors may be responsible for emotional–behavioral disorders in children with hearing loss. However, it has also been suggested that mismatches between the child’s communication needs and capacities and those of his or her caregivers and peers may contribute to special environmental stresses that increase a child’s risk of these disorders (Paul & Jackson, 1993). Paul and Jackson provided a fascinating discussion of the literature describing the subtle and notsosubtle differences in world experience that accompany deafness. The one problem area in which children with hearing loss were found to be at reduced risk in the study by Karchmer et al. (1979) was learning disorders, a finding that some authors have attributed to the effects of overshadowing (Goldsmith & Schloss, 1986). Overshadowing is the tendency for professionals to focus on a primary problem to a degree that causes them to overlook other, significant problem areas. Although overshadowing may be one source of underidentification of learning disabilities in children with hearing loss, another possible source is certainly the tendency of researchers and clinicians to define learning disabilities as “specific learning disabilities,” in which problems known to affect learning are excluded. The question remains, however, whether some children with a hearing loss have a learning disability whose origin is unrelated to that hearing loss.
Page 205 Summary 1. Permanent hearing loss in children encompasses both (a) children who are hard of hearing, who will learn speech primarily through auditory means, and (b) children who are deaf, who may acquire speech primarily through vision. 2. Characteristics of hearing losses that affect the impact of the loss include degree of loss (mild, moderate, severe, profound; hard of hearing, deafness), type of loss (conductive, sensorineural, mixed), configuration (flat, highfrequency, lowfrequency), laterality (unilateral vs. bilateral), and age of onset (congenital, acquired). 3. Genetic sources account for about 50% of all cases of deafness, with remaining causes including infectious disease, rh factor incompatibility, and exposure to ototoxic drugs. 4. Even mild or unilateral hearing loss can negatively affect children’s language learning and academic progress, and there is some evidence to suggest that the transient hearing loss associated with otitis media can interact with other risk factors to undermine children’s learning, (Peters, Grievink, van Bon, Van den Bercken, & Schilder, 1997). 5. Management of the hearing loss for children who are hard of hearing ideally includes amplification (hearing aids and FM system use), sound treatment of the child’s language learning environment, speechlanguage intervention, and classroom support as needed. 6. Under most current programs of early identification and subsequent interventions, deafness poses a grim threat to children’s normal acquisition of an oral language. 7. Current controversies in deafness include the relative importance of oral versus sign languages in children’s acquisition of communication competence and the role of the Deaf culture as a political force. 8. Challenges in the assessment of communication of children with hearing loss include difficulties in determining the mode(s) in which to conduct testing (e.g., oral, ASL, Total Communication) as well as a scarcity of both appropriate developmental expectations for communication acquisition and standardized normreferenced measures for this population in any mode. Key Concepts and Terms cochlear implant: a prosthetic device that provides stimulation of the acoustic nerve in response to sound and is used with individuals who have little residual hearing. conductive hearing loss: a hearing loss caused by an abnormality affecting the transmission of sound and mechanical energy from the outer to the inner ear. deafness: a hearing loss greater than or equal to 70 dB HL, which precludes the understanding of speech through audition.
Page 206 FM (frequency modulated) radio systems: one of several systems designed to address the problems of low signaltonoise ratios and reverberation occurring in settings such as classrooms; these are used in combination with personal hearing aids. hard of hearing: having a degree of hearing loss usually less than 70 dB HL, which allows speech and language acquisition to occur primarily through audition. hearing loss configuration: the pattern of hearing loss across sound frequencies—for instance, a highfrequency loss is one in which the loss is greatest in the high frequencies. mixed hearing loss: a hearing loss with both conductive and sensorineural components. otitis media: middle ear infection. ototoxicity: the property of being poisonous to the inner ear that is found for some drugs and environmental substances. otoacoustic emissions: lowlevel audio frequency sounds that are produced by the cochlea as part of the normal hearing process (LonsburyMartin, Martin, & Whitehead, 1997). overshadowing: the tendency for professionals to focus on a primary problem to a degree that causes them to overlook other, significant problem areas. prelingual hearing loss: a hearing loss acquired before age 2, which is thought to be associated with a more significant impact. prematurity: birth 2 or more weeks prior to expected due date. rh factor incompatibility: condition in which the blood of mother and infant have discrepant rh factors resulting in maternal antibody production that can prove harmful to the infant if untreated. sensorineural hearing loss: hearing loss due to pathology affecting the inner ear or nervous system pathways leading to the cortex. Study Questions and Questions to Expand Your Thinking 1. The tendency to have a diagnosis such as deafness overshadow other significant but less severe conditions is an understandable but quite unfortunate clinical error. How might you avoid this kind of error in clinical practice? 2. Protective ear plugs (e.g., EAR Classic) produce the equivalent of a mild (approximately 20–30 dB) hearing loss. Find a pair and use them in three different listening conditions. For example, talking with a friend face to face in a quiet setting, listening to a lecture from your usual seat in the classroom, and watching the TV news with the loudness level set at a comfortable listening level (before you put the plugs in). Write down what you hear. 3. Repeat the experiment from Question 2 using only one ear plug. Besides noting what you hear, note whether you changed anything else about your behavior as you listened and talked.
Page 207 4. Briefly describe an argument you might make favoring the use of total communication with a deaf child born to hearing parents. 5. Repeat Question 4, but argue in favor of the use of ASL only with the same child. 6. Consider the etiologies described for hearing loss in this chapter. What preventive measures might help reduce the occurrence of hearing loss in infants? Are there any of these measures in which you could play a role as a schoolbased speechlanguage pathologist? As a citizen of your local community? 7. List four things you would want to be sure to remember as you prepare for the oral language evaluation of a child who is hard of hearing and who regularly uses a hearing aid, where the purpose of the evaluation is to determine the child’s optimal performance. Recommended Readings Carney, A. E., & Moeller, M. P. (1998). Treatment efficacy: Hearing loss in children. Journal of SpeechLanguageHearing Research, 41, 561–584. Northern, J. L., & Downs, M. P. (1991) Hearing in children (4th ed.). Baltimore: Williams & Wilkins. Paul, P. V., & Quigley, S. P. (1994). Language and deafness (2nd ed.). San Diego, CA: Singular. Scheetz, N. A. (1993). Orientation to deafness. Boston: Allyn & Bacon. References American SpeechLanguageHearing Association and the Council on Education of the Deaf. (1998). Hearing loss: Terminology and classification; position statement and technical report. ASHA, 40 (Suppl. 18), pp. 22–23. Bellugi, U., van Hoek, K., LilloMartin, D., & O’Grady, L. (1993). The acquisition of syntax and space in young deaf signers. In D. Bishop & K. Mogford (Eds.), Language development in exceptional circumstances (pp. 132–149). Mahwah, NJ: Lawrence Erlbaum Associates. Bergstrom, L., Hemenway, W. G., & Downs, M. P. (1971). A high risk registry to find congenital deafness. Otolaryngological Clinics of North America, 4, 369– 399. Bess, F. H. (1985). The minimally hearingimpaired child. Ear and Hearing, 6(1), 43–47. Bess, F., Klee, T., & Culbertson, J. L. (1986). Identification, assessment and management of children with unilateral sensorineural hearing, loss Ear and Hearing, 7 (1), 43–51. Brackett, D. (1997). Intervention for children with hearing impairment in general education settings. Language, Speech, and Hearing in Schools, 28, 355–361. BradleyJohnson, S., & Evans, L. D. (1991). Psychoeducational assessment of hearingimpaired students: Infancy through high school. Austin, TX: ProEd. Cacace, A. T., & McFarland, D. J. (1998). Central auditory processing disorder in schoolaged children: A critical review. Journal of SpeechLanguageHearing Research, 41, 355–373. Carney, A. E., & Moeller, M. P. (1998). Treatment efficacy: Hearing loss in children. Journal of SpeechLanguageHearing Research, 41, S61–S84. Corker, M. (1996). Deaf transitions: Images and origins of deaf families, deaf communities and deaf identities. Bristol, PA: Jessica Kingsley. Coryell, J., & Holcomb, T. K. (1997). The use of sign language and sign systems in facilitating the language acquisition and communication of deaf students. Language, Speech & Hearing Services in Schools, 28, 384–394. Crittenden, J. B. (1993). The culture and identity of deafness. In P. V. Paul & D. W. Jackson (Eds.), Toward a psychology of deafness: Theoretical and empirical perspectives (pp. 215–235). Needham Heights, MA: Allyn & Bacon.
Page 208 Culbertson, J. L. & Gilbert, L. E. (1986). Children with unilateral sensorineural hearing loss: Cognitive, academic, and social development. Ear and Hearing, 7(1), 38–42. Dirckx, J. H. (1997). Stedman’s concise medical dictionary for the health professions. Baltimore: Williams & Wilkins. DonahueKilburg, G. (1992). Familycentered early intervention for communication disorders: Prevention and treatment. Gaithersburg, MD: Aspen. Dubé, R. V. (1995). Language assessment of deaf children: American Sign Language and English. Journal of the American Deafness and Rehabilitation Association, 29, 8–16. Engen, E., & Engen, T. (1983). The Rhode Island Test of Language Structure. Baltimore: University Park Press. Flexer, C. (1994). Facilitating hearing and listening in young children. San Diego, CA: Singular Press. Fraser, G. R. (1976). The causes of profound deafness in childhood. Baltimore: The Johns Hopkins University Press. Fria, T. J., Cantekin, E. I., & Eichler, J. A. (1985). Hearing acuity of children with otitis media with effusion. Otolaryngology—Head and Neck Surgery, 111, 10–16. Goldsmith, L., & Schloss, P. J. (1986). Diagnostic overshadowing among school psychologists working with hearingimpaired learners. American Annals of the Deaf, 131, 288–293. Harris, J. (1995). The cultural meaning of deafness. Brookfield, VT: Ashgate. Harrison, M., & Roush, J. (1996). Age of suspicion, identification, and intervention for infants and young children with hearing loss: A national study. Ear and Hearing, 17(1), 55–62. Kapur, Y. P. (1996). Epidemiology of childhood hearing loss. In S. E. Gerber (Ed.), The handbook of pediatric audiology (pp. 3–14). Washington, DC: Galludet University Press. Karchmer, M. A., Milone, M. N., & Wolk, S. (1979). Educational significance of hearing loss at three levels of severity. American Annals of the Deaf, 124, 97– 109. Klein, S. K., & Rapin, I. (1992). Intermittent conductive hearing loss and language development. In D. Bishop & K. Mogford (Eds.), Language development in exceptional circumstances (pp. 96–109). Mahwah, NJ: Lawrence Erlbaum Associates. Layton, T. L., & Holmes, D. W. (1985). Carolina Picture Vocabulary Test. Austin, TX: ProEd. LilloMartin, D., Bellugi, U., & Poizner, H. (1985). Tests for American Sign Language. San Diego: The Salk Institute for Biological Studies. Ling, D. (1989). Foundations of spoken language for hearing impaired children. Washington, DC: A. G. Bell Association for the Deaf. LonsburyMartin, B. L., Martin, G. K., & Whitehead, M. L. (1997). Distortionproduction otoacoustic emissions. In M. S. Robinette & T. J. Glattke (Eds.), Otoacoustic emissions: Clinical applications (pp. 83–109). New York: Thieme. Mauk, G. W., & White, K. R. (1995). Giving children a sound beginning: The promise of universal newborn hearing screening. Volta Review, 97(1), 5–32. Maxwell, M. M. (1997). Communication assessments of individuals with limited hearing. Language, Speech, Hearing Services in Schools, 28, 231–244. Moeller, M. P. (1988). Combining formal and informal strategies for language assessment of hearingimpaired children. Journal of the Academy of Rehabilitative Audiology. Monograph Supplement, 21, 73–99. Mogford, K. (1993). Oral language acquisition in the prelinguistically deaf. In D. Bishop & K. Mogford (Eds.), Language development in exceptional circumstances (pp. 110–131). Mahwah, NJ: Lawrence Erlbaum Associates. MogfordBevan, K. (1993). Language acquisition and development with sensory impairment: Hearing impaired children. In G. Blanken, J. Pitman, H. Grimm, J. C. Marshall, & C. W. Wallesch (Eds.), Linguistic disorders and pathologies: An international handbook (pp. 660–679). Berlin, Germany: deGruyter. Moog, J. S., & Geers, A. E. (1975). Scales of Early Communication Skills. St. Louis, MO: Central Institute for the Deaf. Moog, J. S., & Geers, A. E. (1980). Grammatical analysis of elicited language: Complex sentence level. St. Louis, MO: Central Institute for the Deaf.
Page 209 Moog, J. S., & Geers, A. E. (1985). Grammatical analysis of elicited language: Simple sentence level. St. Louis, MO: Central Institute for the Deaf. Moog, J. S., & Kozak, V. J. (1983). Teacher assessment of grammatical structure. St. Louis, MO: Central Institute for the Deaf. Moog, J. S., Kozak, V. J., & Geers, A. E. (1983). Grammatical analysis of written language: Presentence level. St. Louis, MO: Central Institute for the Deaf. Moores, D. F. (1987). Educating the deaf. Boston: Houghton Mifflin. Musket, C. H. (1981). Maintenance of personal hearing aids. In M. Ross, R. J. Roeser, & M. Downs (Eds.), Auditory disorders in school children (pp. 229–248). New York: Thieme & Stratton. Nelson, K. E., Loncke, F., & Camarata, S. (1993). Implications of research on deaf and hearing children’s language learning. In M. Marschark & M. D. Clarke (Eds.), Psychological perspectives on deafness (pp. 123–152). Hillsdale, NJ: Lawrence Erlbaum Associates. Newborg, J., Stock, J. R., Wnek, L., Guidubaldi, J., & Svinicki, J. (1984). Batelle Developmental Inventory. Allen, TX: DLM Teaching Resources. Northern, J. L., & Downs, M. P. (1991). Hearing in children (4th ed.). Baltimore: Williams & Wilkins. Oyler, R. F., Oyler, A. L., & Matkin, N. D. (1988). Unilateral hearing loss: Demographics and educational impact. Language, Speech, and Hearing Services in the Schools, 19, 201–209. Paul, P. V. (1998). Literacy & Deafness. Boston: Allyn & Bacon. Paul, P. V., & Jackson, D. W. (1993). Toward a psychology of deafness: Theoretical and empirical perspectives. Needham Heights, MA: Allyn & Bacon. Paul, P. V., & Quigley, S. P. (1994). Language and deafness (2nd ed.). San Diego, CA: Singular. Paul, R. (1995). Language disorders from infancy through adolescence: Assessment and intervention. St. Louis: Mosby. Peters, S. A. F., Grievink, E. H., van Bon, W. H. J., Van den Bercken, J. H. L., & Schilder, A. G. M. (1997). The contribution of risk factors to the effect of early otitis media with effusion on later language, reading, and spelling. Developmental Medicine and Child Neurology, 39, 31–39. Prinz, P., & Strong, M. (1994). A test of ASL. Unpublished manuscript, San Francisco State University, California Research Institute. Rees, N. S. (1973). Auditory processing factors in language disorders: The view from Procrustes’ bed. Journal of Speech and Hearing Disorders, 38, 304–315. Resnick, T. J., & Rapin, I. (1991). Language disorders in children. Psychiatric Annals, 21, 709–716. Ries, P. W. (1994). Prevalence and characteristics of persons with hearing trouble: United States. 1990–91. National Center for Health Statistics. Vital Health Statistics, 10 (188). Ross, M. (1990). Hearing impaired children in the mainstream. Parkton, MD: York Press. Ross, M., Brackett, D., & Maxon, A. (1991). Assessment and management of mainstreamed hearingimpaired children: Principles and practices. Austin, TX: ProEd. Roush, J., & Matkin, N. D. (1994). Infants and toddlers with hearingloss: Family centered assessment and intervention. Baltimore: York Press. Sanders, D. A. (1993). Management of hearing handicap. Englewood Cliffs, NJ: PrenticeHall. Scheetz, N. A. (1993). Orientation to deafness. Needham Heights, MA: Allyn & Bacon. Smedley, T., & Plapinger, D. (1988). The nonfunctioning hearing aid: A case of double jeopardy. The Volta Review, February/March, 77–84. Spencer, P. E., & Deyo, D. A. (1993). Cognitive and social aspects of deaf children’s play. In M. Marschark & M. D. Clarke (Eds.), Psychological perspectives on deafness (pp. 65–91). Hillsdale, NJ: Lawrence Erlbaum Associates. Stout, G. G., & Windle, J. (1992). Developmental approach to successful listening II—DASL II. Denver: Resource Point. Strong, M., & Prinz, P. (1997). A study of the relationship between American Sign Language and English literacy. Journal of Deaf Studies and Deaf Education, 2 (1), 37–46. Supalla, T., Newport, E., Singleton, J., Supalla, S., Metlay, D., & Coulter, G. (1994). Test Battery for American Sign Language Morphology and Syntax. Burtonsville, MD: Linstok Press. TyeMurray, N., Spencer, L., & Woodworth, G. G. (1995). Acquisition of speech by children who have prolonged cochlear implant experience. Journal of Speech & Hearing Research, 38(2), 327–337.
Page 210 Vernon, M., & Andrews, J. F. (1990). Other causes of deafness: Their psychological role. The psychology of deafness (pp. 40–67). New York: Longman. Voutilainen, R., Jauhiainen, T., & Linkola, H. (1988). Associated handicaps in children with hearing loss. Scandinavian Audiological Supplement, 33, 57–59. Worthington, D. W., Stelmachowicz, P., & Larson, L. (1986). Audiological evaluation. In M. J. Osberger (Ed.), Language and learning skills of hearing impaired students. American SpeechLanguageHearing Association Monographs, 23, 12–20. Ying, E. (1990). Speech and language assessment: Communication evaluation. In M. Ross (Ed.), Hearing impaired children in the mainstream (pp. 45–60). Parkton, MD: York Press. YoshinagoItano, C. (1997). The challenge of assessing language in children with hearing loss. Language, Speech, and Hearing Services in Schools, 28, 362–373.
Page 211
PART
III CLINICAL QUESTIONS DRIVING ASSESSMENT
Page 212
Page 213 CHAPTER
9 Screening and Identification: Does This Child Have a Language Impairment? The Nature of Screening and Identification Special Considerations When Asking This Clinical Question Available Tools Practical Considerations Since his infancy, Serge’s parents had suspected that there was something different about their third child. Although he was a healthy and friendly baby, he rarely vocalized and used only a few intelligible words by the time he was 3. He also seemed able to ignore much of what went on around him while being extraordinarily sensitive to loud noises such as motorcycles or a TV turned up by his older siblings. On the basis of Serge’s mother’s reports and the results of the Denver II (Frankenburg, Dodds, & Archer, 1990), an early educator at a preschool screening recommended a complete speechlanguage and hearing evaluation. Amelia had ‘‘just gotten by” in the early grades. Although she never performed particularly well, she rarely failed assignments and never received a failing grade. She was well organized, attentive, and ever so eager to please. Her parents were accepting of her performance because they, too, had never done terribly well in school; they had just been happy that she was enjoying it so much. All of her enjoyment vanished,
Page 214 however, in the fourth grade, when the language of the classroom became more complex and more dependent on the books being used. She pretended to be sick in order to avoid school and cried in frustration when the work seemed too hard. Her teacher and the school speechlanguage pathologist were so alarmed by her behavior and by the quality of her written and oral discourse that they decided an indepth examination of her oral language and literacy skills was necessary immediately. The Nature of Screening and Identification Screening and identification of language disorders are closely related enterprises. Screening procedures aid clinicians in making a relatively gross decision—Should this child’s communication be scrutinized more closely for the possible presence of a language disorder? Identification, on the other hand, takes that question several steps further. Does this child have a language disorder, a difference in language, or both? Often this complex question is tied to yet another question: Is this child eligible for services within a particular setting? Screening
In many cases, referrals by concerned parents, teachers, or physicians function as indirect screening mechanisms. Nonetheless, alternative procedures are needed in cases when such indirect methods are unlikely to occur or are unsuccessful. Although detection may readily occur at the behest of concerned families facing severe problems, detection may be delayed when the problems are mild (e.g., when they consist of subtle difficulties in comprehension) or when they are unaccompanied by obvious physical or cognitive disabilities (Prizant & Wetherby, 1993). Screening is typically used when the number of individuals under consideration makes the use of more elaborate methods impractical—usually from the perspectives of both time and money. Much of the current thinking about screening and its relationship to identification are borrowed from the realm of public health (e.g., Thorner & Remein, 1962). In that context, screenings are designed to be quick, inexpensive, and capable of being conducted by individuals with lesser amounts of training. Similarly, in speechlanguage pathology, the administration and interpretation of screening methods should require minimal time and expertise. Nonetheless, validity continues to be of critical importance because an inaccurate screening procedure is useless no matter how quick or inexpensive it may be! A number of different kinds of screening mechanisms occur in the detection and management of language disorders. Of greatest importance for our purposes is screening for the presence of a language disorder. Such a screening, for example, might be performed on all 3–5 year olds in a given school district, often as part of a broader screening for a variety of health and developmental risks. Another example of such a comprehensive screening would occur as part of neonatal intensive care followup. When examined alone, communication is screened using a great variety of measures with selected aspects of speech, language, and hearing as their major foci.
Page 215 In practice, such measures are often informal and frequently make use of several measures—some formal and some informal—to increase the comprehensiveness of the examination. Specific tools used in a more focused approach to language screening are discussed in the Available Tools section of this chapter. When examined as part of a broader screening effort, communication is frequently assessed using a measure designed to address a variety of major areas of functioning. One example of these kinds of screening measures is the Denver Developmental Screening Test—Revised (Frankenburg, Dodds, Fandal, Kazuk, & Cohrs, 1975; Feeney & Bernthal, 1996), a screening tool for children from birth to age 6 that makes use of direct elicitation and parental reports. Another is the Developmental Indicators for Assessment of Learning—Revised (DIALR; MardellCzudnoswki & Goldenberg, 1983), a screening tool for children ages 2–6 that is often used to screen larger numbers of children through the use of a team of evaluators, each of whom elicit behaviors from an an individual child within a given area. In a 1986 study of the 19 measures most commonly used in federally funded demonstration projects around the United States, Lehr, Ysseldyke, and Thurlow (1986) found only 3 that they judged to be technically adequate: the Vineland Adaptive Behavior Scales (Sparrow, Balla, & Cicchetti, 1984), the McCarthy Scales of Children’s Abilities (McCarthy, 1972), and the Kaufman Assessment Battery for Children (Kaufman & Kaufman, 1983). Bracken (1987) noted similar problems with available screening measures, especially among measures designed for children younger than 4. This lack of welldeveloped comprehensive screening tests is particularly problematic given the demand inherent in the Individuals with Disabilities Education Act (IDEA, 1990) which compels identification of atrisk children at very young ages. Screening procedures are also used by speechlanguage pathologists during comprehensive communication assessments to determine (a) whether specific areas of communication (e.g., voice, fluency, hearing) need indepth testing and (b) whether problems exist and thus require referrals in other major areas of functioning (e.g., vision, cognition). Nuttall, Romero, and Kalesnik (1999) provided a wide ranging discussion of various types of developmental preschool screenings. Identification
Essentially, identification procedures for language disorders in children are intended to verify the existence of a problem that may have been suspected by referral sources or uncovered through a screening program. For the purposes of this book, identification is seen as synonymous with the term diagnosis, when that term is defined as the “identification of a disease, abnormality, or disorder by analysis of the symptoms presented” (Nicolosi, Harryman, & Kresheck, 1996, p. 86). Diagnosis is often defined so that it includes the larger set of questions leading to conclusions regarding etiology, prognosis, and recommendations for treatment (e.g., see Haynes, Pindzola, & Emerick, 1992). Here, however, the term identification is preferred as a means of expediting our focus on the special measurement considerations it entails.
Page 216 Identification decisions involving children are crucial for at least two reasons. First, identification is usually the first step that enables the child to receive help, often in the form of intervention. This step is a critical one because of the emotional, monetary, and temporal demands that accompany intervention that will be met to varying degrees by the child, the parents, the speechlanguage pathologist, as well as the larger community. Second, by leading to effective intervention, correct identification can help prevent or mitigate the additional social and scholastic problems that may accompany language impairment. Identification decisions are among the most important ones made by speechlanguage pathologists and, therefore, should be among the most carefully made. Because identification decisions often involve the assignment of a label, they are often associated with a fear on the part of many parents and some theorists (Shepard, 1989) that the child will be equated with the disorder. For example, the parents may fear that their child will no longer be seen as “a cute, complicated child” when he or she becomes an ‘‘autistic child.” Although person first nomenclature (e.g., referring to “a person with autism” rather than “an autistic person” or, worse yet, “an autistic”) is intended to make the process of labeling more benign, the negative implications of being identified as having a communication disorder exist nonetheless in the minds of parents and perhaps in the understandings of naive observers. This is evident when parents find one label—for example, “language impaired”—more acceptable than another—such as “language delayed”—as clinicians frequently discover during their interactions with families (Kamhi, 1998). Concerns about labeling in the special education community are intense and have led to recommendations to avoid labels as much as possible, particularly for younger children and in cases where only a screening has been conducted (Nuttall et al., 1999). Many of the measurement issues associated with identification mirror those of screening. However, the more permanent nature of identification and its association with decisions about access to continuing services raise the stakes in the quality of decision making required. In the next section, special measurement considerations affecting both screening and identification are discussed in some detail, with efforts made to call readers’ attention to points where the two differ. Special Considerations When Asking This Clinical Question If I were reading this book as a student (or as a clinician who finds measurement less interesting than I do), I would be hoping that my friendly author would offer several easy steps toward accurate and efficient screening and identification. Better yet, perhaps she would tell me exactly which screening and identification measures I should purchase and exactly which three simple steps I should follow for infallible clinical decision making. Sadly, as much as I would like to help, a blanket prescription for test purchasing and use cannot be made for all of the testing situations facing even a very small group of readers. Instead, what I can do is provide basic information about some special considerations and then, in the next section, introduce some of the many available measures that can be used for screening and identification.
Page 217 In this section of the chapter, several special considerations are explored to help readers engage in the process of test selection and interpretation for the purposes of screening and identification. These special considerations represent refinements of some of the information presented in earlier chapters—refinements dictated by the particular demands of screening and identification as testing purposes. In learning how to choose the best possible measure for a given purpose, the tie between measurement purpose and methodology was not always obvious to me. Some time ago, in my first published article, a colleague and I used 10 operational definitions of psychometric guidelines offered by the APA, AERA, and NCME (1985) to evaluate 30 language and articulation tests used with preschool children (McCauley & Swisher, 1984a). The criteria included an adequate description of tester qualifications, evidence of test–retest reliability, information about criterionrelated validity, and others. Almost instantly, a wellknown language researcher, John Muma (1985), chastised us, citing, among other reasons, the danger that readers would assume that each of the criteria we included was equally as important as every other. Today, as in 1985, it seems to me that although Muma failed to understand the basic intent of the article, he was absolutely on the mark in his concern about its fostering misunderstanding. In fact, as you will see in the next chapters, different purposes of testing will draw special attention to different aspects of the measures one might use. It is important to pay attention to this ironclad connection in order to make ethical decisions. The appropriateness of standardized normreferenced tests for purposes of identifying a language disorder or difference is almost universally accepted in the clinical literature (e.g., see Kelly & Rice, 1986; Merrell & Plante, 1997; Sabatino, Vance, & Miller, 1993; cf. Muma, 1998). In addition, such instruments are widely favored for that purpose by practicing speechlanguage pathologists (e.g., see Huang, Hopkins, & Nippold, 1997). Often, their use is mandated as the backbone of screening and identification efforts. In an ideal world, speechlanguage pathologists would be able to predict flawlessly which children would experience persistent, penalizing differences in communication based on a description of each child’s current language status. Thus, criterionreferenced measures would generally suffice for both identification and treatment planning. However, given the current level of understanding, the best strategy is to (a) identify those children whose performance seems sufficiently different from the performances of a relatively large group of peers as to warrant concern and (b) supplement that information with other sources of information, particularly from persons familiar with the child’s functional communication. Because of the tie between normreferenced measures and identification procedures, most of the special considerations regarding screening and identification discussed next relate to the use of normreferenced measures in decision making. The six special considerations involve (a) weighing measure sensitivity and specificity in test selection, (b) deciding on cutoff scores, (c) remembering measurement error in score interpretation, (d) wrestling with the disorder–difference question, (e) conducting comparisons between scores, and (f) taking into account base rates and referral rates in evaluating screening measures. The first two of these considerations address concerns that will pri
Page 218 marily be dealt with by the clinician prior to use of an instrument in a particular case. The second three address concerns arising during the process of test use. The last consideration relates to one’s thinking about how to implement and potentially evaluate a screening program—a more specific concern than the other five. Weighing Measure Sensitivity and Specificity in Test Selection
On the basis of previous discussions of validity, readers can anticipate that a measure used to screen or identify children for language disorders should provide as a cornerstone of evidence supporting its validity convincing empirical documentation of its ability to distinguish children with and without such disorders (Plante & Vance, 1994). One method used to examine the accuracy of classification achieved by screening and identification measures entails the comparison of the measure under study with a measure that is considered valid or at least acceptable given the state of the art. Comparison against an ideal is often described as a comparison with a gold standard, a measure that has been so thoroughly studied that it is thought to represent the very best measure available for a given purpose. Because of the scarcity of gold standards in arenas related to child language assessment, the more typical scenario involves a comparison with a wellstudied and respected measure. In the case of a screening measure, the comparison is often made between the results of a screening procedure and those of a more elaborate and established method of identification. The comparison may involve the use of a more wellestablished test or test battery that has been independently validated. As you may recognize in the discussion that follows, the method used to compare these performances is largely an elaboration of the contrastinggroups method described in chapter 3. The comparison often makes use of a contingency table, such as that portrayed in Fig. 9.1 and in earlier sections of the book. In Fig. 9.1, two tables are used—one to illustrate the components of this type of table and the other to show a hypothetical example: the results of the Hopeful Screening Test contrasted with those of the Firmly Established Identification Measure for a group of 1000 individuals. As you can see from the first table in the figure, sensitivity is simply the proportion of true positives produced by the measure. Thus, it reflects how frequently those children needing further evaluation are accurately found using this measure. According to a more formal definition, sensitivity is a measure of the ability of a test or procedure to give a positive result when the person being assessed truly does have the disorder. Specificity is a measure of the ability of a measure to give a negative result when the person being assessed truly does not have the disorder. It is usually described as the proportion of true negatives associated with the measure. Thus for a screening measure, specificity reflects how frequently individuals will be held back from additional evaluation who actually shouldn’t be evaluated because they are problemfree. In other words, a test or procedure that underidentifies children suffers from poor sensitivity, and a test or procedure that overidentifies children suffers from poor specificity. In the case of the hypothetical Hopeful Screening Test of Language, sensitivity seems to be less than most people would be happy with: on the basis of its results,
Page 219
Fig. 9.1. Information contained in a contingency table and an example showing how it can be used to calculate sensitivity and specificity.
Page 220 22%, or about 1/5, of children with the disorder would go undetected and thus be excluded from further assessment. In contrast, the measure’s specificity is excellent, with only about 5 out of every 100 children who are performing normally recommended for unnecessary testing. In discussions of what constitutes acceptable levels of overall accuracy for language identification measures, Plante & Vance (1994) noted that overall accuracy (i.e., the percentage of true positives plus true negatives given out of the entire population) should be at least 90% for an evaluation of “good” and 80% for an evaluation of“ fair.” Thus, although the Hopeful Screening Test of Language might be considered good in its overall accuracy (about 94%), its sensitivity cannot be regarded nearly so highly (78%). With regard to sensitivity and specificity for languagescreening procedures, Plante and Vance (1995) recommended that a higher standard be met for sensitivity than for specificity. Specifically, they recommended that sensitivity should be at 90% or above, whereas for specificity they accepted levels of 80% as “good” and 70% as “fair.” Thus, although sensitivity and specificity are both inversely related to the frequency of errors (also called “misses’’) in decision making associated with a particular test or procedure, it is important to want to examine them independently rather than lumped together in a single measure of accuracy because their effects differ. As Plante and Vance noted, sensitivity is more important for screening measures than specificity because the underreferrals associated with poorer sensitivity may have greater negative effects on children than overreferrals associated with poorer specificity. Taking Plante and Vance’s (1995) line of thought one step further, not only should clinicians go beyond overall accuracy of classification in their evaluations of measures, they should also consider the implications of a measure’s sensitivity and specificity levels in light of the specific testing situation. Properties of that testing situation include the gravity of the decision to be made and its irreversibility. For example, lower sensitivity may be more acceptable in settings where failures to refer for testing or to take steps toward identification will be corrected— such as a situation in which a wellinformed teaching staff will be likely to bring a child to the clinician’s attention regardless of previous screening results. Similarly, lower specificity may be tolerated in situations where testing resources are not sorely taxed (if there are such places). Finally, as a point that cannot be overstressed—the relative sensitivity and specificity of accessible alternatives needs to enter into the clinician’s decision making: It makes little sense to jump from a rocking boat to a sinking one. Yet this is the action that may be taken regularly by clinicians who choose reliance on their own untested “judgment” over a flawed but better understood screening mechanism. Lest the reader hope that if other indicators of validity and reliability look promising all is likely to be well with regard to a test’s sensitivity and specificity, consider a relevant finding of Plante and Vance’s (1994) research. Using criteria closely related to those used in McCauley and Swisher (1984a), Plante and Vance rated 21 language tests designed for use with 4 to 5 year olds. The researchers then conducted a study of 4 of the tests that met a relatively larger number of criteria (6 out of 10) to determine their sensitivity and specificity. Of the 4 they examined, only one achieved
Page 221 acceptable levels. Thus, it pays to look for specific information on sensitivity and specificity—and to demand it from publishers as a prerequisite to purchase. In summary, sensitivity and specificity data provide special insight into the way that measures function for purposes of screening and identification. Thus, they can provide enormously valuable evidence of a measure’s value for those purposes. Whereas for many purposes sensitivity is even more important than specificity, the specific context in which the measure is used and the availability of preferable alternatives will ultimately affect clinical perceptions of acceptable levels. Finally, it seems quite probable that the absence of this information from test manuals, although currently commonplace, will be rectified only when clinicians begin to discriminate among tests on this basis and to directly urge publishers to take action. Choosing a Cutoff Score
One factor that affects both sensitivity and specificity is the cutoff used to determine whether a positive or negative result has been obtained. When a screening or identification decision is made using a normative comparison, a cutoff score is selected to indicate the score at which a child’s performance is seen as crossing an invisible boundary between a region of normal variation for that particular group on that particular measure into a region suggesting a difficulty or difference worthy of attention. Clearly, however, the location of the cutoff point is both arbitrary and significant. Shifting its location can decrease a test’s specificity while increasing its sensitivity, or vice versa. Thus, the choice of a cutoff is not a trivial matter. Clinically oriented authors writing about language disorders have recommended a variety of possible cutoffs for use when normreferenced instruments are used as part of developmental language assessments. For example, Owens (1995) noted that scores falling below the 10th percentile are often considered “otherthannormal.” Leonard (1998) also observed that researchers frequently use cutoffs falling 1.25 or 1.5 standard deviations below the mean, thus falling close to Owen’s 10th percentile. Similarly, Paul (1995) endorsed a cutoff at the 10th percentile, corresponding to a standard score of about 80 and a z score falling 1.25 standard deviations below the mean for scores that are normally distributed. She indicated that she based her recommendation, in part, on similar levels previously recommended by Fey (1986) and Lee (1974). However, because of concerns about its arbitrariness and questionable psychometric defensibility, Paul’s complete criterion is somewhat more elaborate. Specifically, she required that a child thought by significant adults in his or her life to have a communication handicap should score below the tenth percentile or below a standard score of 80 on two wellconstructed measures of language function to be thought of as having a language disorder. (p. 5) Paul’s intention was to make sure that this definition would not strongarm children who had no reallife problems into diagnoses simply because of differences in test scores that, although detectable, are of little or no practical significance. (See a longer discussion of clinical or practical significance in chap. 11.)
Page 222 It is also important to note that Paul (1995) recommended the use of two “wellconstructed” measures, given that the use of one or two measures that are less than that will undermine the intent of the recommendation. Just as a chain is no stronger than its weakest link, a battery (even of just 2 measures) will be no more accurate than its least accurate member (Plante & Vance, 1994; Turner, 1988). Because of this concern, Plante (1998) recently recommended that a single valid test along with a second functional indicator (e.g., clinician judgment, enrollment in treatment) be used for verification of specific language impairment for research purposes. This recommendation leads to an obvious parallel for initial implications and one that can be seen as consistent with IDEA (Plante, personal communication). Sometimes, when cutoffs are selected in accordance with test developer recommendations, clinicians and researchers use different cutoffs for different tests. Usually, the recommendations of the test developers result in very similar cutoffs to those discussed earlier. Looking back at the normal curve and its relationship to different types of scores in Fig. 2.5 suggests that small differences in cutoffs should result in only small shifts in selection, thus suggesting that the method used to select a cutoff probably does not matter. Surprisingly, however, Plante and Vance (1994, 1995) demonstrated that an empirically derived cutoff can greatly enhance a measure’s sensitivity and specificity. Further, they showed that empirically derived cutoffs are likely to vary from test to test, thus making the use of a “onecutofffitsalltests’’ practice something that they would advise against. Their work is described briefly in the next paragraphs to help illustrate the value of research into basic measurement issues such as cutoff selection. In their studies, Plante and Vance (1994, 1995) used a statistical technique called discriminant analysis—a form of regression analysis—to examine outcomes associated with different cutoffs. Using this technique, the experimenter determines to what extent variation in scores is accounted for by group membership and then examines the accuracy of predictions of group membership made from a resulting regression equation. It allows one to examine the ways in which changing the cutoff affects sensitivity and specificity. Plante and Vance (1994, 1995) recommended two strategies for ensuring the availability of empirically derived cutoffs such as those that can be obtained through discriminant analysis. First, they advised clinicians to insist that standardized measures offer such cutoffs along with data concerning sensitivity and specificity. Second, they noted the possibility of developing local cutoffs, a process that requires fewer participants than local norming but that can require clinicians who attempt it to seek statistical assistance (Plante & Vance, 1995). Although not endorsed by Plante and Vance (1995), the development of local norms may also represent a responsible strategy for increasing the availability of data concerning sensitivity and specificity of decisions in settings where sufficient resources and numbers of children (including those with disorders) exist (e.g., see Hirshoren & Ambrose, 1976; Norris, Juarez, & Perkins, 1989; Smit, 1986). Software designed to aid in the construction of local norms (Sabers & Hutchinson, 1990) makes this strategy more feasible than it once was (Hutchinson, 1996). In addition, the development and use of local norms has been recommended as a means of dealing with
Page 223 bias in testing that results from the use of inappropriate norms (e.g., see VaughnCooke, 1983). In summary, then, the cutoffs used to identify children’s performance as falling below expectations are often arbitrarily set at about 1.25 to 1.5 standard deviations below the mean. However, greater sensitivity and specificity can be achieved when empirical methods are used to optimize the performance of the measures used. Not only does this practice constitute another step that can be taken by test authors and publishers to improve the quality of clinical decision making in the field, it represents a topic of such practical significance as to invite a wealth of applied research. In addition, as Paul (1995) suggested, the current state of the art precludes reliance on a single measure—or even a single battery of measures—to lead in a lockstep fashion to decision making. Integration of functional data about the child will remain a necessary component of screening and identification for the foreseeable future. As understanding of functional or qualitative data—such as portfolios and teacher reports of critical incidents—increases (e.g., Schwartz & Olswang, 1996), their role will probably increase as well (see chap. 10), with beneficial results for the sensitivity and specificity of the process. Further, in many clinical and especially educational settings, the choice of cutoff to be used can seem—and in some cases may be—outside the control of the speechlanguage pathologist. The role played by educational agencies in establishing guidelines for measurement use and clinicians’ productive responses to these are discussed later in this chapter in the section called Practical Considerations. There are theoretical concerns, too, about the use of cutoffs that relate to our understanding of the very nature of language impairment in all children, but particularly in those for whom no obvious cause exists: children with SLI. Dollaghan and Campbell (1999) recently called attention to the fact that the use of an arbitrary cutoff at a point along a normal distribution of scores is at odds with theoretical notions that language impairment represents a natural category, or taxon. Instead, they say that it implies an assumption that children with “impaired” language may simply represent those children who have less language ability, in the same way that short persons have less height. This possibility has been pointed out by several theoreticians addressing the question of etiology for children with SLI (e.g., see Lahey, 1990; Leonard, 1987) but has failed to receive sustained attention. As an important step toward reviving consideration of this hypothesis, Dollaghan and Campbell noted that the question of whether “language impairment” represents a distinct category versus the lower range of a continuum of performance is an empirical one with potentially powerful repercussions for both assessment and treatment. Specifically, as a working hypothesis they predict that if language impairment is taxonic, language deficits would be likely to be more focused and would therefore require more focused assessments and treatments. Dollaghan and Campbell (1999) also noted that the time may be ripe for addressing the question of the nature of language impairment because parallel concerns in clinical psychology with regard to schizophrenia and depression have spawned rich advances in methodology (Meehl, 1992; Meehl & Yonce, 1994, 1996). They conjectured that these advances might provide an auspicious starting point for additional efforts. Among the implications of this work are the possibility of identifying those cutoffs that truly identify children who are categorically different in their language
Page 224 skills from other children rather than those who simply seem quantitatively suspicious because of their lower performances. Thus, these methods may prove to provide additional strategies for more rational cutoff selection. Remembering Measurement Error in Score Interpretation
Once a measure has actually been selected and administered and a cutoff level settled on, the clinician uses the test taker’s score to assist in a decision regarding screening or identification. During this process, because of the weight attached to individual scores in screening and identification decisions, remembering measurement error in score interpretation becomes critical to solid clinical decision making—even when functional criteria are incorporated. Recall that in chapter 3 the concept of SEM was described as a means of conveying the impact of a test’s reliability on an individual score. Specifically, the lower the reliability of the instrument, the higher the error (quantified using SEM) attached to the individual score. The importance of reliability and SEM is not due to their ability to remove error (because they can’t), but rather to their helping us understand the magnitude of error we face. Figure 9.2 is intended to provide an example illustrating the effect of SEM on a screening decision. It shows the same score achieved by a child on two different screening measures—one with a larger SEM and the other with a smaller SEM for that child’s age group. Around each of these scores, there is a 95% confidence interval. The confidence interval represents a range of scores in which it is likely (although not absolutely assured) that the test taker’s true score falls. A 95% confidence level means that there is a probability of 95% that the interval contains the child’s true score and, of course, 5% that it does not. It is often recommended that clinicians characterize children’s performance using the range of scores encompassed within the confidence interval, rather than a single score. Further, it has been suggested that the SEM for a measure should be no more than one third to one half of its standard deviation (Hansen, 1999). If a score of 75 is used as a cutoff on each test in the example, clearly the task of deciding that the child’s performance falls below that value becomes much trickier for test A than for test B, despite identical scores. In fact, one might be tempted to refrain from using test A in favor of test B when screening children of this particular age. However, perhaps test A is preferable as a screening tool for other reasons, for example, because it has a more appropriate normative sample and better evidence of validity for children similar to the one being tested. In that case, the clinician may decide to use the measure but view the resulting data with greater circumspection. Some tests make it quite easy to take error into account during score interpretation because of the way in which a child’s scores are plotted on the test form. For tests that do not provide this userfriendly feature, however, the test user can calculate a confidence interval using the tables and following the example laid out in Fig. 9.3. Although the choice of confidence level is somewhat arbitrary, more stringent levels are usually selected for more momentous decisions. Confidence intervals of 68, 95,
Page 225
Fig. 9.2. Two 95% confidence intervals calculated for the same score using two different screening measures, one with a larger SEM, on the left, and the other with a smaller SEM, on the right. and 99% are the ones most typically reported, with 85 and 90% used less frequently (Sattler, 1988).1 The old adage “know your limitations”—including know the limitations of your data—would work as an apt summary of this brief section. Information about SEM can help clarify the significance of reliability data for individual clients and can thus be used to help the clinician make choices in the measures he or she adopts. Further, through the 1 Also note that Salvia and Ysseldyke (1991) and others (including McCauley & Swisher, 1984b, Nunnally, 1978) recommended a slightly more complex procedure in which an estimated true score is calculated first. This procedure is offered as a first step in appreciating the. potential value of confidence intervals but should not be taken as definitive.
Page 226
Fig. 9.3. Table to be used in calculating confidence intervals, with an example. From “The truth about scores children achieve on tests’’ by J. Brown, 1989, Language, speech, hearing services in schools, 20, p. 371. Copyright © 1989 by American SpeechLanguageHearing Association. Reprinted with permission. use of confidence intervals during interpretation of an individual’s performance, the clinician is given the opportunity to gauge the possible effect of a measure’s known imperfection (imperfect reliability in this case). Therefore, what may have begun to sound like a repeated refrain in the last three sections can be sounded again here. One should always make use of such information when it is readily available, calculate it if possible, and encourage test publishers to provide it when it is neither offered nor calculable.
Page 227 Wrestling with the Disorder–Difference Question
The diversity of cultural and language backgrounds represented among any group of children can be quite breathtaking. Even in Vermont, which is often cited as one of the least diverse states in the country, the school district of the state’s largest city, Burlington (population 40,000), has children whose first languages include Vietnamese, SerboCroatian, Mandarin, and Arabic. In fact, in 1998 and 1999, about 25 languages other than English were spoken by children whose proficiency in English was sufficiently low to require special intervention. During the time frame 1987–1988 to 1998–1999, the number of such children grew from just below to 20 to just about 300 (Horness, personal communication). Because several national companies are represented in Burlington, there are numerous children who have moved here from different regions of the United States with their parents. Whereas some of these families have moved from other New England regions with similar regional dialects to Vermont, others have moved from the Deep South or other regions claiming distinct regional dialects. Further, children in this same school district come from families with incomes below the poverty level to those with incomes in the stratosphere of affluence. On the basis of these few facts, it seems safe to say that each speechlanguage pathologist working in this school district confronts issues related to differences in culture, regional dialect, social dialect, and primary language on a daily basis. Even in Vermont! As this example illustrates, diversity affecting language use among young native speakers of English and language use by children who are acquiring English as a second language is the rule rather than the exception. Consequently, professionals who work with children are challenged to remain vigilant to cultural and linguistic factors in the selection and use of screening and identification measures. Clearly, the magnitude of the challenge differs substantially when the clinician works with children who speak a minority dialect of English compared with those who are being exposed to English for the first time in a school setting. This latter group of children are sometimes referred to as having limited English proficiency (LEP). Regardless of whether they are seen as having a language disorder, they will often be served through an English as a Second Language (ESL) program in school systems. In contrast, the children who speak a minority dialect of English are perhaps more easily misunderstood by the SLP because their differences in dialect may go unappreciated, in the assumption that they are bidialectal—that is, able to use the dialect of the school and a regional or social dialect as well. They may also include children whose first dialect is unknown to both their classmates and the speechlanguage pathologist, thus further increasing the complexity of the speechlanguage pathologist’s work. Regardless of the differences between these groups of children, any time there is a mismatch between the tools being used or between the clinician’s language and culture and the language and culture of the child, the issue of difference versus disorder becomes relevant. Table 9.1 offers a pair of hypothetical scenarios in which challenges of this type are presented. Before figuring out exactly how to respond to the challenges of linguistic and cultural diversity, however, we need to remind ourselves of what threats to validity are
Page 228 Table 9.1 Scenarios Illustrating the Challenges of Cultural and Linguistic Diversity Little English, Little Vietnamese—An Experiential Deficit or a Disorder? Although Van and a twin sister were born in the Southwestern United States to parents of Vietnamese heritage, Van was adopted at age 5 by a professional couple in New England after he was removed from his home because of severe neglect. Not much was known about his life before the adoption. However, informants knowledgeable in Vietnamese indicated that although his understanding in that language seemed excellent, he spoke little. During a foster placement immediately preceding his adoption, he had begun to use English as frequently as Vietnamese, but he continued to be very quiet around everyone except his new parents. The speechlanguage pathologist and the educational team assigned to work with Van and his new family was interested in obtaining information about Van’s language status in both languages. American English Dialect, Probably Not a Disorder, but a Problematic Difference Raymond moved from a school district in New Orleans in which about 95% of his classmates in kindergarten were Black, to a racially mixed suburb of Chicago in which a White speechlanguage pathologist who had been raised in Toronto, Canada was assigned to serve as his speechlanguage pathologist. Concerns had been raised about his speech intelligibility and his vocabulary use and understanding by his classroom teacher, who was a White native of Indiana. Although both professionals had many years of experience working with children and colleagues in their racially diverse school, neither had had such a difficult time in understanding a speaker of Black English. They wanted to determine whether Raymond’s speech was simply “different” because of his dialect or whether it represented a genuine problem. Although they were relieved to find out that Raymond’s family considered him a competent, if young speaker, they were even more perplexed about how they might smooth his transition into his new school.
interwoven with diversity. I begin by considering the threats that occur in instances where a child speaks a dialect of English or is acquiring English as a second language—for example, Black English or Spanishinfluenced English. Among the threats to valid testing in English that have been most thoroughly discussed are those arising from the potential for measures to use situations, directions, formats, or language that are inconsistent with the child’s previous experience (Taylor & Payne, 1983). Here, the chief concern is in correctly respecting the presence of a language difference, a difference in language use associated with systematic variation in semantics, phonology, and so on, when compared with the idealized dialect that is typically represented in standardized language measures. The danger, of course, is erroneously identifying a difference as a disorder. ASHA (1993) has defined language difference more elaborately as a variation of a symbol system used by a group of individuals that reflects and is determined by shared regional, social, or cultural/ethnic factors. A regional, social or ethnic variation of a symbol system is not considered a disorder of speech or language. (p. 41) For children using minority dialects, Englishlanguage measures developed without attention to dialectal and accompanying cultural variation are especially problematic for purposes of screening and identification. The advantages and disadvantages
Page 229 of alternatives for children who speak Black English and other minority dialects fuel continuing discussion (e.g., see Damico, Smith, & Augustine, 1996; Kamhi, Pollock, & Harris, 1996; Kayser, 1989, 1995; Reveron, 1984; Taylor & Payne, 1983; Terrell & Terrell, 1983; Van Keulen, Weddington, & DeBose, 1998; Vaughn Cooke, 1983). Not surprisingly, many strategies for coping with this complex issue have been considered, but none are completely satisfactory for use with children speaking minority dialects (VaughnCooke, 1983; Washington, 1996). When the continuing use of normreferenced instruments for these children is entertained (e.g., see Kayser, 1989; VaughnCooke, 1983), it is generally recognized that there are few existing measures that have been found to be suitable. The strategies that have been recommended and tried include the development of alternative norms, either through adding minorities in small numbers to normative samples or obtaining normative data for minority children—ideas that are, respectively, ineffective or impractical in addressing a problems with the norms (e.g., VaughnCooke, 1983). A second method involves modifying objectionable test components (e.g., Kayser, 1989), and a third involves developing alternative scoring rules designed to give credit for “correct” answers in the dialect being considered (e.g., Terrell, Arensberg, & Rosa, 1992). Both of these latter methods have been found lacking because they invalidate the norms, thus transforming the targeted measure into an informal criterionreferenced measure. Table 9.2 lists some modifications in test admin Table 9.2 Modifications of Testing Procedures 1. Reword instructions. 2. Provide additional time for the child to respond. 3. Continue testing beyond the ceiling. 4. Record all responses, particularly when the child changes an answer, explains, comments, or demonstrates. 5.
Compare the child’s answers to dialect or to first language or second language learning features. Rescore articulation and expressive language samples, giving credit for variation or differences.
6. Develop several more practice items so that the process of ‘‘taking the test” is established. 7.
On picture vocabulary recognition tests, have the child name the picture in addition to pointing to the stimulus item to ascertain the appropriateness of the label for the pictorial representation.
8. Have the child explain why the “incorrect” answer was selected. 9.
Have the child identify the actual object, body part, action, photograph, and so forth, particularly if he or she has had limited experience with books, line drawings, or the testing process.
10. Complete the testing in several sessions. 11. Omit items you expect the child to miss because of age, language, or culture. 12. Change the pronunciation of vocabulary. 13. Use different pictures. 14. Accept culturally appropriate responses as correct. 15. Have parents or other trusted adult administer the test items. 16. Repeat the stimuli more than specified in the test manual. Note. From “Speech and Language, Assessment of SpanishSpeaking Children,” by H. Kayser, 1989, Language, Speech, and Hearing Services in Schools, 20, p. 244. Copyright 1989 by American SpeechLanguageHearing Association. Reprinted with permission.
Page 230 istration that have been proposed for use with minority children who have been tested with existing normreferenced tests; these modifications might profitably be applied in cases where a description of the child’s responses to certain kinds of stimuli is wanted. Usually, however, those cases will exist not during identification of a language impairment, but during the descriptive process that follows it (see chap. 10). A fourth method consists of supplementing existing normreferenced measures with descriptive tools (VaughnCooke, 1983), which seems to present a very difficult interpretation challenge to the clinician because normreferenced measures will be assumed to be biased, and descriptive measures are usually not up to the challenge of identification. Finding more widespread approval than those methods just discussed are strategies that entail the abandonment of currently available measures. These include (a) the substitution of descriptive methods (such as language sample analysis or criterionreferenced measures; e.g., see Damico, Smith, & Augustine, 1996; Leonard & Weiss, 1983; Schraeder, Quinn, Stockman, & Miller, 1999) and (b) development of new, more appropriate normreferenced instruments (VaughnCooke, 1983; Washington, 1996). Sole use of criterionreferenced approaches, such as language sampling, has the chief disadvantage of insufficient data supporting that strategy in screening and identification. Washington also noted that language analyses that might be conducted for young speakers of Black English are hampered by the absence of appropriate norms because normative data are currently available only for adolescents and adults. However, the many proponents of a criterionreferenced or descriptive approach (e.g., see Damico, Secord, & Wiig, 1992; RobinsonZañartu, 1996) would argue that despite their drawbacks, descriptive strategies offer the least dangerous of the choices. Not much progress has been made in the development of appropriate normreferenced instruments; however, that may change in response to pressures for improved nonbiased assessment. In addition, perusal of recently developed tests suggests that more sophisticated efforts are being made to consider dialect use in the development of tests for more diverse populations. This has included the test developer’s examination of item bias for minority children (Plante, personal communication). Depending on when it is obtained, the resulting data can be used in the test’s early development to lead to less biased testing or can be presented to show that a relatively unbiased measure has been achieved. Beyond the realm of traditional recommendations for improving language assessment validity for diverse groups of children, attention has been paid recently to the development of methods that seek to reduce the effects of prior knowledge and experience on performance. Two approaches of particular interest are processing dependent measures and dynamic assessment methods. The development of processingdependent measures involves the use of tasks with either high novelty or high familiarity for all participants (e.g., Campbell, Dollaghan, Needleman, & Janosky, 1997). Dynamic assessment methods focus on the child’s learning of new material rather than acquired knowledge. This is done as a means of leveling the effects of prior experience and obtaining information about how to support the child’s learning beyond the assessment situation (e.g., GutierrezClellan, Brown, Conboy, & RobinsonZañartu, 1998; Olswang, Bain, & Johnson, 1992; Peña, 1996). Although proposed as being applicable to identification decisions, these two types of measures are more frequently used for descriptive purposes and are discussed more thoroughly in the next chapter.
Page 231 Assessments designed to address the needs of children who can be described as having LEP are growing in number. Table 9.3 illustrates some of the measures that are being developed for use with children from diverse linguistic and cultural backgrounds. Clearly at this point, the majority of these measures have been developed for children with Spanish as their first language. Some of these measures are developed “from scratch” and thus can take advantage of the existing knowledge base concerning development and disorders in the target languages. In contrast, others are little more than translations of existing tests—a practice that requires considerable care and may still result in measures that do not get at the heart of major developmental tasks in the language. For example, translations can be hampered by items that do not have true counterparts or that will require greater linguistic complexity to convey information in the target language than in the original. Consumers should be cautioned to be skeptical of their own comfort level with such adaptations of familiar tests. Further, they will want to be careful of the match between the dialect spoken by the child and the dialect in which a test is written. I encourage you to look at more thorough discussions of the special challenges posed during the identification of language impairment in several groups whose first or major language or dialect is either not English or not the dialect of English typical of standardized tests. Sources warranting particular attention exist for children who are Native American (Crago, Annahatak, Doehring, & Allen, 1991; Leap, 1993; RobinsonZañartu, 1996), Hispanic American (Kayser, 1989, 1991, 1995), Asian American (Cheng, 1987; Pang & Cheng, 1998), and who speak Black English (Kamhi et al., 1996; Van Keulen et al., 1998) and regional dialects (Wolfram, 1991). Conducting Comparisons between Scores
Clinicians rarely compare scores on different instruments as part of screening. Instead, such comparisons occur more commonly during identification. They are particularly common in settings requiring a comparison of nonverbal and verbal skills called cognitive referencing. Despite widespread criticism of this practice (Aram, Morris, & Hall, 1993; Fey, Long, & Cleave, 1994; Kamhi, 1998; Krassowski & Plante, 1997; Lahey, 1988), its use is nonetheless mandated in several states to justify services. In addition, it has sometimes been used in research definitions of SLI and other learning disabilities (see a lengthier discussion of this point in chap. 5). Comparisons of this kind are also used as a means of identifying strengths and weaknesses in preparation for planning intervention—a descriptive use that is touched on in the next chapter. When single pairs of scores are compared, the comparison is frequently referred to as discrepancy analysis; when larger numbers of scores are compared, it is more frequently referred to as profile analysis. Numerous discussions of the hazards of this type of comparison are provided in the literature (e.g., McCauley & Swisher, 1984b; Salvia & Ysseldyke, 1998). The focus of the current discussion is the use of such comparisons in identification. For purposes of illustration, imagine that a child’s overall score on a language measure is to be compared with her performance on a nonverbal measure of intelli
Page 232 Table 9.3 Selected Tests Designed for Children Whose Primary Language Is Not English (Compton, 1996; Roussel, 1991)
Test
Ages
Oral Language Modalities & Domains
Language
Reference
Bilingual Syntax Measure– Chinese (Tsang, n.d.)
Grades K–12
Chinese
ESem
Tsang, C.(n.d.) Bilingual Syntax Measure–Chinese. Berkeley, CA: AsianAmerican Bilingual Center.
Spanish Structured Photographic Expressive Language Test (Werner & Kresheck, 1989)
30 to 511; 40 to 95
Spanish
E
Werner, E.O., & Kresheck, J. S. (1989). Spanish Structured Photographic Expressive Language Test. Sandwich, IL: Janelle.
BerSil Spanish Test (Beringer, 4 to 12 years n.d.)
Spanish
RSem, Morph
Beringer, M. (n.d.). BerSil Spanish Test. Rancho Palos Verdes, CA: The BerSil Company.
Austin Spanish Articulation Test 3 years to adult (CarrowWoolfolk, n.d.)
Spanish
EPhon
CarrowWoolfolk, E. (n.d.). Austin Spanish Articulation Test. Allen, TX: DLM Teaching Resources.
Compton Speech and Language Screening 3 to 6 years Evaluation–Spanish (Compton & Kline, n.d.)
Spanish
Compton, A. J., & Kline, M. (n.d.). Compton Speech R & EPhon, Sem, Syn and Language Screening Evaluation–Spanish. San Francisco: Institute of Language.
Test de Vocabulario en Imagenes Peabody (Dunn, Lugo, Padilla, & Dunn, 1986)
Spanish
RSem
26 to 1711
Dunn, L. M., Lugo, D.E., Padilla, E.&R., E Dunn, L.M. (1986). Test de Vocabulario en Imagenes Peabody. Circle Pines, MN: American Guidance Service.
Page 233 Expressive OneWord Picture Vocabulary Test–Spanish 2 to 11 (Gardner, n.d.)
Spanish
ESem
Gardner, M. E (n.d.). Expressive One Word Picture Vocabulary TestSpanish. San Francisco: Children’s Hospital of San Francisco.
Preuba del Desarrollo Inicial del Lenguaje (Hresko, Reid, & 3 to 7 Hammill, n.d.).
Spanish
RSem, Syn
Hresko, W. P., Reid, D. K., & Hammill, D. D. (n.d.). Preuba del Desarrollo Inicial del Lenguaje. San Antonio, TX: ProEd.
Clinical Evaluation of Language Function–3 Spanish Edition 6 to 21 (Semel, Wiig, & Secord, n.d.)
Spanish
R & ESem, Morph, Syn, Prag
Semel, E., Wiig, E. H., & Secord, W. (n.d.). Clinical Evaluation of Language Function3 Spanish Edition. San Antonio, TX: Psychological Corporation.
Spanish
RSem
Toronto, A. S., Leverman, D., Hanna, C., Rosenzweig, P., & Maldonado, A. (n.d.). Del Rio Language Screening Test. Austin, TX: National Educational Laboratory.
Spanish
E&R
Zimmerman, I. L., Steiner, V., & Pond, R. (1992). Preschool Language Scale–3. San Antonio, TX: Psychological Corporation.
Del Rio Language Screening Test (Toronto, Leverman, Hanna, Rosenzweig, & Maldonado, n.d.)
3 to 6
Preschool Language Scale–3 (Zimmerman, Steiner, & Pond, Birth to 6 years 1992) Sequenced Inventory of Communication Development– 04 to 40 Revised (Hedrick et al., 1984)
Spanish translation
E & R
Hedrick, D. L., Prather, E. M., Tobin, A. R., Allen, D. Y., Bliss, L. S., & Rosenberg, L. R. (1984). Sequenced Inventory of Communication Development Revised Edition. Seattle, WA: University of Washington Press.
Bilingual Syntax Measure– Tagalog (Tsang, n.d.)
Tagalog
E
Tsang, C. (n.d.). Bilingual Syntax MeasureTagalog. Berkeley, CA: Asian–American Bilingual Center.
Grades K–12
Note. E = Expressive. R = Receptive. ESem = Expressive Semantics, etc. Morph = Morphology. Phon = Phonology. Syn = Syntax. Prag = Pragmatics.
Page 234 gence. Imagine that she receives a standard score of 70 on the former and 90 on the latter. On the face of this comparison, it looks like there is quite a difference. However, differences between scores, also called difference scores or discrepancies, are often less reliable than the scores on which they are based. In fact, the likelihood that observed differences are due to error rather than real differences is affected by three factors: the reliability of each measure, the correlation of the two measures, and the similarity of their normative samples (Salvia & Ysseldyke, 1998). The task of assessing norm comparability is as straightforward as looking over descriptions of each normative group to determine whether they seem to differ in ways that could affect the scores to be compared. To see why this is necessary, recall that the standard scores best used to summarize test performance include the group mean in their calculation. Therefore, something about the normative group may push one group mean higher (e.g., one group is more “elite” in some sense than the other). Consequently, one would fare more poorly in a comparison against that group than against a group with a lower mean, even if one’s true abilities in the two areas were comparable. To provide a poignant example, imagine a ruthless clinician has decided to compare your language and nonverbal skills—using scores obtained by comparing your performances against those of Nobel laureates in literature for the former and fifth graders for the latter. Not only could you legitimately question the inappropriateness of the norms as a basis of each of the scores, you could also vehemently protest the resulting comparison. Thankfully, flagrant mismatches between test norms used in comparisons may not occur outside of examples like this one. However, if overlooked, more subtle mismatches can nonetheless contribute to poor decisions and inappropriate clinical actions. Taking test error and test correlation into account is less straightforward than inspecting norms. On the basis of ideas analogous to those used for calculating a confidence interval around a single score, however, it is possible to calculate a confidence interval around a difference score. Salvia and Ysseldyke (1998) described two methods based on differing assumptions about the causal relationship of the two skills being compared. In addition to the actual score data, both methods require information about the reliability of the measures being used and about their correlation. Whereas the relevant information about reliability and the nature of normative samples should be readily available for individual measures, information about the correlation between measures will often be lacking. In that event, abandoning a direct comparison and instead noting the results of each test as supporting or not supporting the identification of a problem in a given area may represent the best alternative (McCauley & Swisher, 1984b). Even when a difference between two scores is found to be reliable, Salvia and Good (1982) pointed out, a difference of that magnitude may not be particularly uncommon, or, even more importantly, it may not be functionally meaningful. Because of the resources involved, determining the functional significance of differences in skill levels represents yet another area in which clinicians must look to the research literature to help them interpret their clinical data. Fortunately, in cases where comparisons between scores affect identification decisions, there is a rich literature examining these issues (e.g., for SLI). Clinicians can be more active and work to change policy in settings in which the use of discrepancies is mandated for purposes for which they have been found to lack meaning.
Page 235 In summary, comparing scores is a more complicated endeavor than it first appears, involving as it does not only the child’s test scores but also the properties of the two tests, especially their norms and intercorrelaltion. A wellreasoned conservatism in undertaking identifications based on such comparisons should be joined by a healthy appetite for the clinical literature exploring their significance. Taking into Account Base Rates and Referral Rates
Each of the special considerations addressed earlier had a more specific focus on test selection or on the use of tests with a particular child. Two other factors that affect screening and identification decisions really represent features of the clinical environment: the rarity of the disorder (the base rate of the disorder) and the frequency with which referrals are made in a particular setting (the referral rate). In this section, these two topics are discussed briefly because of their effect on screening programs. The lower the base rate of the disorder—that is, the rarer the disorder in the general population—the more likely it becomes that the positive results of screening or identification are actually false positives rather than true positives (Hummel, 1999). Shepard (1989) pointed out that although people understand that classification error will occur based on fallible measures and decision processes, they fail to appreciate that that error will fall equally on those children who are identified as having a disorder as those who are not, even when the validity coefficient for the measure being used is quite large. She concluded that when base rates are low, “even with reasonably valid measures, the identifications will be equally divided between correct decisions and false positive decisions” (Shepard, 1989, p. 551). This problem is particularly acute when measures are less valid for a given population, such as minority children, where overidentification is very likely to result (Schraeder et al., 1999). Concern about low base rates has led public health researchers and psychologists interested in rare psychiatric outcomes (e.g., suicide) to develop several strategies designed to target screening at subsets of the larger population with higher base rates. These include strategies that include the use of multistep screening procedures and the application of screening procedures to subgroups who are expected to have higher prevalence rates than the general population (Derogatis & DellaPietra, 1994). Currently, the prevalence of childhood language disorders across all types is not particularly low, as can be illustrated by the fact that it is estimated that children with language disorders constitute 53% of all speechlanguage pathologists’ case loads (Nelson, 1993). Nonetheless, it is sufficiently low that careful selection of groups for language screening makes good sense. Children about whom concerns are expressed or who are demonstratively failing in some aspect of their adaptation to school or home environments make obvious candidates for more focused screenings and indeed are often seen for screening prior to more comprehensive evaluations. Screening programs in preschool education are associated with enormous differences in referral rates (Thurlow, Ysseldyke, & O’Sullivan, 1985, as cited in Nuttall et al., 1999), the rates at which children who are screened are referred on for additional assessment. This variability leads to concerns about overreferral when referral rates are particularly high and underreferral when they are particularly low. Because
Page 236 overreferrals needlessly tax clinical resources, parental concern, and the child’s patience, whereas underreferrals deprive children of needed attention, steps to study and alter referral rates have been recommended. Changes in the targets for screening and the criteria (including cutoffs) used can be made to address verified inadequacies in the screening mechanism. In addition, the use of a secondlevel screening using measures that are intermediate in their efficiency and comprehensiveness between initial screenings and fullfledged assessments has been recommended (Nuttall et al., 1999). Available Tools Screening
Available screening measures differ in terms of whether information is obtained directly by the speechlanguage pathologist and whether the measurement is formal or informal. Screening methods include the use of normreferenced standardized tools as well as informal cliniciandeveloped measures. Over the past few years there has been growing interest in the development of questionnaires that might be used to increase the involvement of parents and others familiar with the child and improve the quality of information obtained from them. More recently still, there has been an interest in the development of criterionreferenced authentic assessments in which specific minimal competencies are evaluated in a familiar setting. Schraeder et al. (1999) described such a protocol that was developed for use with young speakers of Black English. Because its elements were selected for their high degree of overlap with features of Standard American English, Schraeder and her colleagues suggested its potential relevance for many children in the targeted age group of 3yearolds. Parent Questionnaires and Related Instruments
Although historically some instruments have incorporated the use of parent report for very young children (e.g., the Sequenced Inventory of Communicative Development, Hedrick, Prather, & Tobin, 1975), extensive development of parent questionnaires for languagedisorder screening has blossomed only in the past decade. The use of such instruments is welcomed from a familycentered perspective (Crais, 1993) because parents are given the opportunity to share their expertise concerning the child as part of their collaboration in the assessment process. In addition, these measures also show good potential for efficient, valid use from a psychometric point of view. One obvious advantage that they have over the clinicianadministered procedures is their ability to obtain information that has been accumulated by the parent over time using questions that cover a variety of situations and settings. For some children and at some times, the testing advantage is irrefutable: The child will simply not cooperate for more direct testing or is so thoroughly affected by the testing situation as to make the results of structured observations hopelessly flawed. Even when children are more amenable to interacting with strangers, parent questionnaires may help remove the subtler invalidating influence of the clinician on the child’s behavior (Maynard & Marlaire, 1999).
Page 237 On the basis of a growing number of studies, it appears that parent questionnaires may reliably and validly be used to obtain information about a number of language areas, especially expressive vocabulary and syntax—although most individual measures are still very undeveloped. Leading the trend toward increased development of these measures, the MacArthur Communication Development Inventories (Fenson et al., 1991) has been thoroughly studied (e.g., Bates, Bretherton, & Snyder, 1988; Dale, Bates, Reznick, & Morisset, 1989; Reznick & Goldsmith, 1989). In addition, it has also been effectively adapted for use with other languages, including Italian, Spanish, and Icelandic (Camaioni, Castelli, Longobardi, & Volterra, 1991; JacksonMaldonado, Thal, Marchman, Bates & GutierrezClellan, 1993; Thordardottir & Ellis Weismer, 1996). Other tools that assess communication more broadly have also been developed but have received less widespread attention and validation (e.g, Girolametto, 1997; Hadley & Rice, 1993; Haley, Coster, Ludlow, Haltiwanger, & Andrellos, 1992). Table 9.4 lists five instruments for use with Englishspeaking children under the age of 3, each of which consists of a parent questionnaire or makes use of parent report for at least some items. Questionnaires that take advantage of the familiarity of other adults with the child—usually classroom teachers—are also being developed (Bailey & Roberts, Table 9.4 Instruments for Use With Children Under 3 Years of Age, Including Parent Reports Measure and Source
Ages covered
Receptive or Expressive
Areas of Language Covered
Language Development Survey (Rescora, 1989). From “The language development survey: A screening tool for delayed language in toddlers.’’ Journal of Speech and Hearing Disorders, 54, 587–599.
2yearolds
E
Semantics
MacArthur Communicative Development Inventories (Fenson, Dale, Reznick, Thal, Bates, Hartung, Pethick, & Reilly, 1991). San Diego, CA: San Diego State University, Center for Research in Language.
8 months to 2½ years
E
Semantics
ReceptiveExpressive Emergent Language Test (2nd ed.; Bzoch & League, 1971). Austin, TX: ProEd.
0 to 3 years
R and E
Rosetti Infant–Toddler Language Scale (Rosetti, 1990). East Moline, IL: LinguiSystems.
0 to 3 years
R and E
Pragmatics, play, comprehension, expression
R and E
Phonology, morphology, syntax, semantics
Sequenced Inventory of Communication DevelopmentRevised (Hedrick, Prather, & 4 months to 4 years Tobin, 1984). Seattle, WA: University of Washington Press.
Page 238 1987; Sanger, Aspedon, Hux, & Chapman, 1995; Semel, Wiig, & Secord, 1996; Smith, McCauley, & Guitar, in press; Stokes, 1997). Results of these have also been compared with parent questionnaires (Whitworth, Davies, & Stokes, 1993) and against formal assessments (Botting, ContiRamsden, & Crutchley, 1997). Usually, however, these questionnaires have not been developed for use in the identification process, but rather to describe the nature of problems facing the child in the classroom. Thus, they will be considered in the next chapter, which deals with description. NormReferenced Standardized Measures
Standardized measures are not well established as screening tools in the field. Only 50% of the 109 clinicians in Oregon responding to a survey concerning their test use reported that they used standardized measures for screening (Huang et al., 1997). Another related result from that same study was that only I screening test (the Screening Test of Adolescent Language, Prather, Breecher, Stafford, & Wallace, 1980) appeared in the list of 10 tests that are most commonly used by speech language pathologists in their work with four age groups (0–3, 4–5, 6–12, and 13–19). Nonetheless, standardized screening of younger children has received increased attention with the IDEA requirement that children with communication disorders be identified before entering school (Nuttall, Romero, & Kalesnik, 1999; Sturner, Layton, Evans, Heller, Funk, & Machon, 1994). Stumer et al. (1994) reviewed 51 measures available for speech and language screening covering at least some part of the 3–6year age span. In that review, the researchers found that only 6 of the measures they examined provided sufficient normative data, and were both brief (i.e., requiring 10 minutes or less) and comprehensive (i.e., covering more than one modality or domain). Thus, despite a playing field filled with many players, the number of instruments that warrant serious consideration as a comprehensive language screening tool are relatively few. Table 9.5 describes the four tools supported in Sturner et al.’s review. Despite the focus of Sturner et al. (1994) on preschool screening measures, many of the measures studied by Sturner et al. also extend to cover schoolage children. Nonetheless, the availability of measures for both younger schoolage children and adolescents is greatly reduced compared with those available for preschoolers. This is probably due, for the most part, to the various referral mechanisms that can reduce the need for formal screenings. Also, the persistent nature of language problems means that screening of older children and adolescents for language disorders will usually only be needed if screenings have been absent or ineffective at younger ages. Identification NormReferenced Standardized Instruments
Even children within any specific category of developmental language disorders (i.e., language disorder associated with hearing loss, autism spectrum disorder, mental retardation, and SLI) vary considerably in the areas of language that are affected. Thus, it is important to be quite comprehensive in the identification process, particularly because
Page 239 Table 9.5 Communication Screening Measures for Children Between 3 and 7 Years of Age That Were Found to Be Brief, NormReferenced, and Comprehensive (Defined as Phonology [Articulation] and Other Language Domains) by Sturner, Layton, Evans, Heller, Funk, and Machon (1994)
Test
Ages Covered
3 to 7 years
X
X
2 to 6 years
X
X
X
X
X
<1 to 6 years
X
PreK–lst grade
X
X
X
4 to 5 years
X
Texas Preschool Screening (Haber & Norris, 4 to 6 1983) years
X
Fluharty Preschool Speech and Language Screening Test (Fluharty, 1978) Physician‘s Developmental Quick Screen (Kulig & Baker, 1975) Stephens Oral Language Screening Test (Stephens, 1977)
Sentence Repetition Screening Test (Sturner, Kunze, Funk, & Green, 1993)
Note. MMY = Mental Measurement Yearbooks.
Morphosyntax
Reviewed in MMY?
Receptive
Communication Screen (Striffler & Willis, 1981)
Semantics
Phonology Pragmatics
Expressive
X
X
X
Page 240 part of that process will often be the identification of which aspects of language are affected More comprehensive coverage across modalities (receptive, expressive) and domains of language (e.g., syntax, phonology) can be achieved through the use of a measure designed for that purpose (e.g., the Test of Language Development—Primary: 3; Newcomer & Hammill, 1997). It can also be achieved through the use of a battery of tests that provide more comprehensive coverage or through a combination of these methods. Even when a “comprehensive” measure is used, however, certain aspects of language function (especially pragmatics and discourse) are almost certainly overlooked. The Appendix lists over 50 tests that have been described as useful in the identification process. The table includes very basic information about the test’s identifying information, content, and intended population. Almost all of the measures published between 1989 and 1996 have been reviewed for the Mental measurements yearbook online review service, thus allowing anyone with access to the Internet an opportunity to examine at least one, and often two, independent reviews. Earlier tests are likely to have been reviewed in the Mental measurement yearbook printed volumes. Tests published after about 1996 are likely to be reviewed soon, perhaps even before the publication of this book. Although the Appendix is not intended to be exhaustive, the number of tests it includes illustrates the staggering task facing clinicians who must choose among them. It is interesting to note the relatively large number of tests that have been created in the 1990s and the relatively small number of publishing houses responsible for their availability if not their original construction. On the plus side, this means that in efforts to increase the quality of available measures, individual clinicians and the profession can focus their cooperative interactions with fewer parties. On the negative side, it means that publishers are often in the position of competing largely with their own products—a prospect that makes it unlikely for free market pressures to help drive the quality of tests higher. CriterionReferenced Measures
In the realm of criterionreferenced measures, specific measures obtained through language analysis (e.g., mean length of utterance, or MLU; 14morpheme count; typetoken ratio) are gaining increasing support in the identification process (e.g., Aram et al., 1993). In particular, some researchers have used MLU as an identification tool and found it to be more consistent with clinician judgments than certain test data (M. Dunn, Flax, Sliwinski, & Aram, 1996). Usually, however, MLU is used in combination with normreferenced measures (Leonard, 1998). Because language analysis measures are typically considered more useful in description than identification, the next chapter contains a more detailed account of recent studies in which their strengths and limitations are examined. Nonetheless, it is important to reiterate here that their use in identification is growing in significance. Practical Considerations In chapter 4, several variables affecting clinicians were highlighted for their potential effects on speechlanguage pathologists. These variables included federal legislation, local regulations, and global changes in perspective toward behavioral problems. In
Page 241 cases of screening and identification, particularly as they are practiced in school settings, those variables can dramatically affect the shape of practice—both for better and for worse (Cirrin et al., 1989). In this brief section, the effects of these factors on screening and identification are primarily discussed through practice constraints related to determining children’s eligibility for services. In 1989, Nye and Montgomery examined the criteria used in 47 states to identify children as having a language disorder. They used a case example in which a 13year old girl who moved frequently because her father was in the military had variously been considered “language disordered in one state, learning disabled in another, ineligible in a third, and eligible only for tutorial support in a fourth.” (Nye & Montgomery, 1989, p. 26). In the 47 states they examined, they found that although most provided specific definitions of language disorder, the definitions were highly inconsistent from one state to the next. Only about a half of the states made some reference to the components of language, and among those that did, semantics and syntax were included far more frequently than phonology, morphology, and pragmatics. Twentyone states required the use of at least one standardized language test and only 7 required use of a language sample. Three different means of finding eligibility were identified across the states—the use of a discrepancy formula, a rating–severity scale, and professional report. Nye and Montgomery noted the poor reliability likely to be attached to the use of rating–severity scales. Consistent with the poor evaluation of discrepancy scores even in the 1980s, the authors expressed dismay at the relative frequency with which discrepancy formulas were used. However, they seemed to have combined instances in which a cutoff is used (e.g., 1.5 standard deviations below the mean on a standardized measure) with the truly more notorious instances of cognitive referencing in which a discrepancy is found between two measures for a given child. This practice makes the extent of cognitive referencing difficult to determine from their report. In their conclusions, Nye and Montgomery pointed out the need for greater uniformity in terminology and criteria used with this population. In case you have been reading this account and hoping that things changed rapidly, a brief look at similar variables 4 years later (Apel, Hodson, Shulman, & Gordon Brannan, 1994) will be of interest. Apel and his colleagues examined the eligibility guidelines for most states (data for Tennessee could not be obtained) and for the District of Columbia. The data showed continuing inadequacy in the definitions being used. Definitions of language used by state Departments of Education included reference to both oral and written language only 40% of the time, with the majority of state definitions including either no reference to oral or written language (40%) or definitions addressing oral language only (20%). Specific guidelines for eligibility were often missing (46%) or were quite heterogeneous. Although standard scores often figured in available guidelines, cutoffs were quite variable (ranging from 1.5 to greater than 2 standard deviations below the mean), encouragement to use multiple standardized measures was often absent, and severity ratings were sometimes used as bases for eligibility. When specific criteria for preschool children were sought, only 8 states (16%) had developed criteria for that population and the types of criteria used were quite variable. Among the practices incorporated in these guidelines were the use of percentage delay as the sole criterion or as part of a more complex criterion—a
Page 242 practice that, unfortunately, relies on the use of notoriously unreliable ageequivalent scores. In short, 4 years did not appear to have resulted in many improvements in the practices reflected in state regulations. Where are we today? A study of state regulations comparable with those of Nye and Montgomery (1989) and Apel et al. (1994) is currently underway by ASHA (Susan Karr, personal communication). Although these data are in the process of being analyzed, it seems unlikely that the fit between legislatively influenced practice and ‘‘best” practices will have been brought into much better alignment than that reported a decade ago by Nye and Montgomery. One positive trend, however, is the intention of the developers of this most recent report to pair recommendations for components comprising a defensible set of guidelines with preliminary followup efforts designed to result in the redrafting of guidelines in at least a small number of states (Susan Karr, personal communication, October, 1999). In 1997, Merrell and Plante called attention to the need for more studies aimed at furthering the development of empirical bases for test selection. In particular, they noted that such studies can minimize test selection based on subjective grounds, such as test familiarity and the recommendations or mandates of supervisors or districts. Although responding to legal and workplace obligations is a necessary part of clinical practice, the nature of the response can go beyond a simple compliance with an unsatisfactory status quo. More satisfying and ethical responses include increasing the knowledge base of the profession through studies intended to identify best practices, increasing the knowledge base of individuals around measurement issues, and working with professional organizations at the state and national levels to effect needed changes. Summary 1. Screening procedures, which typically are designed to be efficient in terms of time and other resources, lead to decisions that a child receive further assessment. Strategic targeting of groups to be screened and the use of multiple steps in screening procedures can improve screening accuracy when concerns about rarity of the disorder (low base rates) and about overly high referral rates are encountered in a particular setting. 2. Identification involves the determination that a language problem exists, usually through the use of normative comparisons facilitated by normreferenced measures. Methods used in identification are often affected by the eligibility requirements instituted by state Departments of Education. 3. Measures of sensitivity and specificity provide important empirical bases for test selection. 4. Cutoff scores are used in research and clinical practice to standardize identification decisions. Although their empirical determination is feasible, they are often related to state eligibility requirements and are best used in conjunction with awareness of the possible influence of measurement error and functional criteria that associate test performance with realworld effects on the child’s social functioning.
Page 243 5. For all children, but particularly for those with limited English proficiency or dialect use that differs substantially from the clinician’s, the clinician’s attention to the effects of language difference and cultural effects on assessment can enhance validity. Although controversy persists in the face of an inadequate but growing literature on the subject of language and cultural influences on assessment, clinical strategies for mitigating negative effects on assessment abound. 6. When scores on different measures are compared with one another during the identification process, factors requiring consideration include the effects of test error, test correlation, and differences in normative groups. 7. Although most measures used in screening of older children are standardized normreferenced tests, the use of parent questionnaires for younger children and criterionreferenced measures derived from language analyses is becoming more common with increasing research. 8. Among practical factors affecting the selection of measures for use in screening and identification are state guidelines, which have historically been slow to respond to professional recommendations regarding best practices. Key Concepts and Terms cutoff score: the score that serves as a decision boundary in screening or identification, such that scores above a particular level are seen are presenting nonproblematic performance and those below that level are seen as indicative of potential disorder or difference. gold standard: a measure used as the basis for comparison when a second measure is being evaluated. It is thought to provide a “true” measurement of the behavior or characteristic being measured. language difference: a difference in language use reflecting systematic variation in phonology, syntax, semantics, and so forth, when compared with the dialect that is typically represented in standardized language measures. limited English proficiency (LEP): language difficulties in English that appear to be related primarily to ineffective or insufficient exposure to the language rather than to a language disorder, which may nonetheless be coexisting. person first nomenclature: using terms such as “a child with impaired language” instead of “a languageimpaired child’’ to avoid undue emphasis on the role of the problem in understanding the child. referral rate: the rate at which children who are screened are referred on for additional assessment. sensitivity: the ability of a measure to give a positive result when the person being assessed truly has the disorder. specificity: the ability of a measure to give a negative result when the person being assessed truly does not have the disorder.
Page 244 Study Questions and Questions to Expand Your Thinking 1. What might the effects of poor sensitivity be on the following decisions? Screenings for hearing loss in children with known language impairments; Identification testing for children’s eligibility for communication problems warranting early intervention services; and Determination of the presence of a language disorder in a bilingual child. 2. What might the effects of poor specificity be on the following decisions? Screenings of a large group of kindergarten children for speech and language disorders; Identification of oral language disorder in children who are failing academically; and Language screenings for children who speak Spanishinfluenced English. 3. Imagine that you are a school speechlanguage pathologist who is interested in obtaining information about the specificity and sensitivity of your own screening procedures? How might you obtain the information you need for looking at both hits and both kinds of misses—false positives and false negatives? Which kind of information will be most difficult to obtain? 4. Use Appendix A and your reading of this chapter to consider the following questions. What domains of language and what age groups appear to be less well represented in standardized tests? Besides those reasons given in the text, can you think of reasons for these patterns? 5. Take two measures listed in Appendix A that are said to target one or more language domains and modalities in common. Compare and contrast the content of these shared components in terms of numbers and kinds of items, tasks, and stimuli. 6. On the basis of what you have read, create a list of 5 research questions that, if answered, would greatly improve screening and assessment practices in speech language pathology. Recommended Readings American SpeechLanguageHearing Association (1993). Definitions of communication disorders and variations. Asha, 35(Suppl. 10), 40–41. Hansen, J. C. (1999). Test psychometrics. In J. W. Lichtenberg & R. K. Goodyear (Eds.), Scientist–practitioner perspectives on test interpretation (pp. 15–30). Boston: Allyn & Bacon. Maynard, D. W., & Marlaire, C. L. (1999). Good reasons for bad testing performance: The interactional substrate of educational testing. In D. Kovarsky, J. Duchan, & M. Maxwell (Eds.), Constructing (in)competence (pp. 171–196). Mahwah, NJ: Lawrence Erlbaum Associates. References American SpeechLanguageHearing Association (1999). Guidelines for roles and responsibilities of the schoolbased speechlanguage pathologist [Online]. Available: http:/www.asha.org/professionals/library/slpschool_i.htm#purpose.
Page 245 Apel, K., Hodson, B., Shulman, B., & GordonBrannan, M. (November, 1994). Severity ratings and eligibility criteria: A (confused?) state of the union. Miniseminar presented to the annual convention of the American SpeechLanguageHearing Association, New Orleans, LA. Aram, D. M., Morris, R., & Hall, N. E. (1993). Clinical and research congruence in identifying children with specific language impairment. Journal of Speech and Hearing Research, 36, 580–591. Bailey, D., & Roberts, J. E. (1987). Teacher–Child Communication Scale. Chapel Hill, NC: University of North Carolina. Bates, E., Bretherton, I., & Snyder, L. (1988). From first words to grammar: Individual differences and dissociable mechanisms. Cambridge, England: Cambridge University Press. Botting, N., ContiRamsden, G., & Crutchley, A. (1997). Concordance between teacher/therapist opinion and formal language assessment scores in children with language impairment. European Journal of Disorders of Communication, 32, 317–327. Bracken, B. A. (1987). Limitations of preschool instruments and standards for minimal levels of technical adequacy. Journal of Psychoeducational Assessment, 4, 313–326. Brown, J. (1989). The truth about scores children achieve on tests. Language, Speech, and Hearing Services in Schools, 20, 366–371. Bzoch, K. R., & League, R. (1971). The Receptive–Expressive Language Emergent Language Scale—Revised. Gainesville, FL: Language Education Division, Computer Management Corporation. Camaioni, L., Castelli, M. C., Longobardi, E., & Volterra, V. (1991). A parent report instrument for early language assessment. First Language, 11, 345–359. Campbell, T., Dollaghan, C., Needleman, H., & Janosky, J. (1997). Reducing bias in language assessment: Processingdependent measures. Journal of Speech LanguageHearing Research, 40, 519–525. Cheng, L. L. (1987). Assessing Asian language proficiency: Guidelines for evaluating limitedEnglishproficient students. Rockville, MD: Aspen. Cirrin, F. M., Bashir, A., Brinton, B., Damico, J. S., Dublinske, S., Edwards, E. B., Grimes, A. M., Kamhi, A. G., Prelock, P. A., Rodriguez, J. M., Shulman, B. B., Tibbits, D. F., & Westby, C. (March, 1989). Issues in determining eligibility for language intervention. ASHA, 113–118. Compton, C. (1996). A guide to 100 tests in special education. Upper Saddle River, NJ: Globe Fearon Educational Publisher. Crago, M. B., Annahatak, B., Doehring, D. G., & Allen, S. (1991). First language evaluation by native speakers: A preliminary study. Journal of SpeechLanguage Pathology and Audiology, 15, 43–48. Crais, E. R. (1993). Families and professionals as collaborators in assessment. Topics in Language Disorders, 14(1), 29–40. Dale, P., Bates, E., Reznick, S., & Morisset, C. (1989). The validity of a parent report instrument of child language at twenty months. Journal of Child Language, 16, 239–49. Damico, J. S., Secord, W. A., & Wiig, E. H. (1992). Descriptive language assessment at school: Characteristics and design. In W. Secord (Ed.), Best practices in school speechlanguage pathology: Descriptive/nonstandardized language assessment (pp. 1–8). San Antonio, TX: Psychological Corporation. Damico, J. S., Smith, M., & Augustine, L. E. (1996). Multicultural populations and language disorders. In M. D. Smith & J. S. Damico (Eds.), Childhood language disorders (pp. 272–299). New York: Thieme. Derogatis, L. R., & DellaPietra, L. (1994). The use of psychological testing for treatment planning and outcome assessment. In M. E. Mareish, (Ed.), The use of psychological testing for treatment planning and outcome assessment (pp. 22–54). Hillsdale, NJ: Lawrence Erlbaum Associates. Dollaghan, C., & Campbell, T. (1999, November). Is child language impairment a taxon? Paper presented at the American SpeechLanguageHearing Association, San Francisco. Dunn, L. M., & Dunn, L. M. (1997). Peabody Picture Vocabulary Test–III (3rd ed.). Circle Pines, MN: American Guidance Service. Dunn, L. M., Lugo, D. E., Padilla, E. R., & Dunn, L. M. (1986). Test de Vocabulario en Imagenes Peabody [Peabody Picture Vocabulary Test]. Circle Pines, MN: American Guidance Service. Dunn, M., Flax, J., Sliwinski, M., & Aram, D. M. (1996). The use of spontaneous language measures as criteria for identifying children with specific language impairment: An attempt to reconcile clinical and research incongruence. Journal of Speech and Hearing Research, 39, 643–654.
Page 246 Erickson, J. G., & Iglesias, A. (1986). Assessment of communicatin disorders in nonEnglish proficient children. In O. Taylor (Ed.), Nature of communication disorders in culturally and linguistically diverse populations (pp. 181–218). San Diego, CA: CollegeHill Press. Feeney, J., & Bernthal, J. E. (1996). The efficiency of the Revised Denver Developmental Screening Test as a language screening tool. Language, Speech, Hearing Services in Schools, 27, 330–332. Fenson, L., Dale, P., Reznick, S., Thal, D., Bates, E., Hartung, J., Pethick, S., & Reilly, J. (1991). Technical manual for the MacArthur Communicative Development Inventories. San Diego, CA: San Diego State University. Fey, M. (1986). Language intervention with young children. San Diego, CA: CollegeHill Press. Fey, M., Long, S., & Cleave, P. (1994). Reconsideration of IQ criteria in the definition of specific language impairment. In R. Watkins & M. Rice (Eds.), Specific language impairments in children (pp. 161–178). Baltimore: Paul H. Brookes. Fluharty, N. (1978). Fluharty Preschool Speech and Language Screening Test. Boston: Teaching Resources. Frankenburg, W. K., Dodds, J., & Archer, P. (1990). Denver–II: Technical manual. Denver, CO: Denver Developmental Materials. Frankenburg, W. K., Dodds, J., Fandal, A., Kazuk, E., & Cohrs, M. (1975). The Denver Developmental Screening Test: Revised. Denver, CO: Denver Developmental Materials. Girolametto, L. (1997). Development of a parent report measure for profiling the conversational skills of preschool children. American Journal of SpeechLanguage Pathology, 6, 25–33. GutierrezClellen, V F., Brown, S., Conboy, B., & RobinsonZañartu, C. (1998). Modifiability: A dynamic approach to assessing immediate language change. Journal of Children s Communication Development, 19, 31–42. Haber, J. S., & Norris, M. L. (1983). The Texas Preschool Screening Inventory: A simple screening device for language and learning disorders. Children’s Health Care, 12(1), 11–18. Hadley, P. A., & Rice, M. L. (1993). Parental judgments of preschoolers’ speech and language development: A resource for assessment and IEP planning. Seminars in Speech and Language, 14, 278–288. Haley, S. M., Coster, W. J., Ludlow, L. H., Haltiwanger, J. T., & Andrellos, P. J. (1992). Pediatric evaluation of disability inventory, Version 1.0. Boston: New England Medical Center Hospitals. Hansen, J. C. (1999). Test psychometrics. In J. W. Lichtenberg & R. K. Goodyear (Eds.), Scientist–practitioner perspectives on test interpretation (pp. 15–30). Boston: Allyn & Bacon. Haynes, W. O., Pindzola, R. H., & Emerick, L. L. (1992). Diagnosis and evaluation in speech pathology (4th ed). Englewood Cliffs, NJ: Prentice–Hall. Hedrick, D., Prather, E., & Tobin, A. (1975). Sequenced Inventory of Communication Development. Seattle, WA: University of Washington Press. Hirshoren, A., & Ambrose, W. R. (1976). Language, Speech, and Hearing Services in Schools, 7(2), 86–89. Huang, R., Hopkins, J., & Nippold, M. A. (1997). Satisfaction with standardized language testing: A survey of speechlanguage pathologists. Language, Speech, and Hearing Services in Schools, 28, 12–29. Hummel, T. J. (1999). The usefulness of tests in clinical decisions. In J. W. Lichtenberg & R. K. Goodyear (Eds.), Scientist–practitioner perspectives on test interpretation (pp. 59–112). Boston: Allyn & Bacon. Hutchinson, T. A. (1996). What to look for in the technical manual: Twenty questions for users. Language, Speech, and Hearing Services in Schools, 27, 109–121. Individuals With Disabilities Education Act (IDEA). Pub. L. No. 101 –476, 104 Stat. 1103 (1990). JacksonMaldonado, D., Thal, D., Marchman, V., Bates, E., & GutierrezClellen, V. (1993). Early lexical development in Spanishspeaking infants and toddlers. Journal of Child Language, 20(3), 523–549. Kamhi, A. (1998). Trying to make sense of developmental language disorders. Language, Speech, and Hearing Disorders in Schools, 29, 3 544. Kamhi, A., Pollock, K. E., & Harris, J. L. (Ed.). (1996). Communication development and disorders in African American children: Research, assessment, and intervention. Baltimore: Paul H. Brookes. Kaufman, A. S., & Kaufman, N. L. (1983). Kaufman Assessment Battery for Children. Circle Pines, MN: American Guidance Service. Kayser, H. (1989). Speech and language assessment of Spanishspeaking children. Language, Speech, and Hearing Services in Schools, 20, 226–244.
Page 247 Kayser, H. (1991). Interpreters in speechlanguage pathology. Texas Journal of Audiology and Speech Pathology, 17, 28–29. Kayser, H. (Ed.). (1995). Bilingual speechlanguage pathology: An Hispanic focus. San Diego, CA: Singular Publishing Group. Kelly, D. J., & Rice, M. L. (1986). A strategy for language assessment of young children: A combination of two approaches. Language, Speech, and Hearing Services in Schools, 17, 83–94. Krassowski, E., & Plante, E. (1997). IQ variability in children with SLI: Implications for use in cognitive referencing in determining SLI. Journal of Communication Disorders, 30, 1–9. Kulig, S. G., & Baker, K. A. (1975). Physician’s Developmental Quick Screen for Speech Disorders. Galveston, TX: Department of Pediatrics, University of Texas Medical Branch at Galveston. Lahey, M. (1988). Language disorders and language development. New York: Macmillan. Lahey, M. (1990). Who shall be called language disordered? Some reflections and one perspective. Journal of Speech and Hearing Disorders, 55 612–620. Leap, W. L. (1993). American Indian English. Salt Lake City, UT: University of Utah Press. Lee, L. (1974). Developmental sentence analysis. Evanston, IL: Northwestern University Press. Lehr, C. A., Ysseldyke, J. E., & Thurlow, M. L. (Eds.). (1986). Assessment practices in model early childhood education programs. Psychology in the Schools, 24, 390–399. Leonard, L. (1987). Is specific language impairment a useful construct? In S. Rosenberg (Ed.), Advances in applied psycholinguistics, 1: Disorders of first language acquisition (pp. 1–39). New York: Cambridge University Press. Leonard, L. (1998). Children with specific language impairment. Cambridge: Massachusetts Institute of Technology Press. Leonard, L., & Weiss, A. L. (1983). Application of nonstandardized assessment procedures to diverse linguistic populations. Topics in Language Disorders, 3(3), 35–45. MardellCzudnoswki, C., & Goldenberg, D. (1983). Developmental Indicators for Assessment of Learning—Revised (DIALR). Edison, NJ: Childcraft Education Corporation. Maynard, D. W., & Marlaire, C. L. (1999). Good reasons for bad testing performance: The interactional substrate of educational testing. In D. Kovarsky, J. Duchan, & M. Maxwell (Eds.), Constructing (in)competence (pp. 171–196). Mahwah, NJ: Lawrence Erlbaum Associates. McCarthy, D. A. (1972). Scales of Children’s Abilities. San Antonio, TX: Psychological Corporation. McCauley, R. J., & Swisher, L. (1984a). Psychometric review of language and articulation tests for preschool children. Journal of Speech and Hearing Disorders, 49, 34–42. McCauley, R. J., & Swisher, L. (1984b). Use and misuse of normreferenced tests in clinical assessment: A hypothetical case. Journal of Speech and Hearing Disorders, 49, 338–348. Meehl, P. E. (1992). Factors and taxa, traits and types, differences of degree and differences in kind. Journal of Personality, 60, 117–174. Meehl, P. E., & Yonce, L. J. (1994). Taxometric analysis: I. Detecting taxonicity with two quantitative indicators using means above and means below a sliding cut (MAMBAC procedure). Psychological Reports, 74 (Monograph Supplement 1V74), 1059–1274. Meehl, P. E., & Yonce, L. J. (1996). Taxometric analysis: II. Detecting taxonicity using two quantitative indicators in successive intervals of a third indicator (MAXCOV procedure). Psychological Reports, 78 (Monograph Supplement 1V78), 1091–1227. Merrell, A., & Plante, E. (1997). Normreferenced test interpretation in the diagnostic process. Language, Speech, and Hearing Services in Schools, 28, 50–58. Muma, J. (1985). “No news is bad news”: Response to McCauley and Swisher (1984). Journal of Speech and Hearing Disorders, 50, 290–293. Muma, J. (1998). Effective speechlanguage pathology: A cognitive socialization approach. Mahwah, NJ: Lawrence Erlbaum Associates. Nelson, N. W. (1993). Language intervention in school settings. In D. K. Bernstein & E. Tiegerman (Eds.), Language and communication disorders in children (3rd ed., pp. 273–324). New York: Merrill. Newborn Infant Hearing Screening and Intervention Act. Pub. L. No. 106 –113. 113 Stat 1501 (1999). Newcomer, P. L., & Hammill, D. D. (1997). Test of Language Development—Primary: 3. Austin, TX: ProEd.
Page 248 Nicolosi, L., Harryman, E., & Kresheck, J. (1996). Terminology of communication disorders (4th ed.). Baltimore: Williams & Wilkins. Norris, M. K., Juarez, M. J., & Perkins, M. N. (1989). Adaptation of a screening test for bilingual and bidialectal populations. Language, Speech, and Hearing Services in Schools, 20, 381–390. Nunnally, J. (1978). Psychometric theory (2nd ed.). New York: McGrawHill. Nuttall, E. V., Romero, I., & Kalesnik, J. (Eds.). (1999). Assessing and screening preschoolers: Psychological and educational dimensions (2nd ed.). Needham Heights, MA: Allyn & Bacon. Nye, C., & Montgomery, J. K. (1989). Identification criteria for language disordered children: A national survey. Hearsay: The Journal of the Ohio Speech and Hearing Association, Spring, 26–33. Olswang, L., Bain, B., & Johnson, G. (1992). Using dynamic assessment with children with language disorders. In S. F. Warren & J. Reichle (Eds.), Causes and effects in communication language intervention (pp. 187–215). Baltimore: Brookes Publishing. Owens, R. E. (1995). Language disorders: A functional approach to assessment and treatment (2nd ed.). Boston: Allyn & Bacon. Pang, V. O., & Cheng, L. L. (Eds.). (1998). Struggling to be heard. The unmet needs of Asian Pacific American children. Albany, New York: State University of New York Press. Paul, R. (1995). Language disorders from infancy through adolescence. Assessment and intervention. St. Louis, MO: Mosby Yearbook. Peña, E. (1996). Dynamic assessment: The model and its language applications. In K. Cole, P. Dale, & D. Thal (Eds.), Advances in assessment of communication and language (pp. 281–308). Baltimore: Brookes Publishing. Plante, E. (1998). Criteria for SLI: The Stark and Tallal legacy and beyond. Journal of Speech, Language, and Hearing Research, 41, 951–957. Plante, E., & Vance, R. (1994). Selection of preschool speech and language tests: A databased approach. Language, Speech, and Hearing Services in Schools, 25, 15–23. Plante, E., & Vance, R. (1995). Diagnostic accuracy of two tests of preschool language. American Journal of SpeechLanguage Pathology, 4, 70–76. Prather, E. M., Breecher, S. V. A., Stafford, M. L., & Wallace, E. M. (1980). Screening Test of Adolescent Language (STAL). Seattle, WA: University of Washington Press. Prizant, B. M., & Wetherby, A. M. (1993). Communication and language assessment for young children. Infants and Young Children, 5, 20–34. Rescorla, L. (1989). The language development survey: A screening tool for delayed language in toddlers. Journal of Speech and Hearing Disorders, 54, 587–599. Reveron, W. W. (1984). Language assessment of Black children: The state of the art. Papers in the Social Sciences, 4, 79–94. Reznick, S., & Goldsmith, L. (1989). A multiple form word production checklist for assessing early language. Journal of Child Language, 16, 91–100. RobinsonZañartu, C. (1996). Serving Native American children and families: Considering cultural variables. Language, Speech, and Hearing Services in Schools, 27, 373–384. Rosetti, L. (1990). Rosetti Infant–Toddler Language Scale. East Moline, IL: LinguiSystems. Roussel, N. (1991). Appendix A: Annotated bibliography of Communicative Abilities Test. In E. V. Hamayan & J. S. Damico (Eds.), Limiting bias in the assessment of bilingual children (pp. 320–343). Austin: ProEd. Sabatino, A. D., Vance, H. B., & Miller, T. L. (1993). Defining best diagnostic practices. In H. B. Vance (Ed.), Best practices in assessment for school and clinical settings (pp. 1–28). Brandon, VT: Clinical Psychology Publishing. Sabers, D., & Hutchinson, T. (1990). User norms software. Chicago: Riverside Publishing. Salvia, J., & Good, R. (1982). Significant discrepancies in the classification of pupils: Differentiating the concept. In J. T. Neisworth (Ed.), Assessment in special education (pp. 77–82). Rockville, MD: Aspen. Salvia, J., & Ysseldyke, J. E. (1991). Assessment (5th ed.). Boston: Houghton Mifflin. Salvia, J., & Ysseldyke, J. E. (1998). Assessment (7th ed.). Boston: Houghton Mifflin. Sanger, D., Aspedon, M., Hux, K., & Chapman, A. (1995). Early referral of schoolage children with language problems. Journal of Childhood Communication Disorders, 16, 3–9.
Page 249 Sattler, J. M. (1988). Assessment of children. San Diego: Author. Schraeder, T., Quinn, M., Stockman, I. J., & Miller, J. F. (1999). Authentic assessment as an approach to preschool speechlanguage screening. American Journal of SpeechLanguage Pathology, 8, 195–200. Schwartz, I. S., & Olswang, L. B. (1996). Evaluating child behavior change in natural settings: Exploring alternative strategies for data collection. Topics in Early Childhood Special Education, 16, 82–101. Semel, E., Wiig, E. H., & Secord, W. A. (1966). Observation Rating Scales. Clinical Evaluation of Language Fundamentals (3rd ed.). San Antonio: The Psychological Corporation. Shepard, L. A. (1989). Identification of mild handicaps. In R. Linn (Ed.), Educational measurement (3rd ed., pp. 545–572). New York: American Council on Education and Macmillan. Smit, A. (1986). Ages of speech sound acquisition: Comparisons and critiques of several normative studies. Language, Speech, and Hearing Services in Schools, 17, 175–186. Smith, A. R., McCauley, R. J., & Guitar, B. (in press). Development of the Teacher Assessment of Student Communicative Competence (TASCC) for children in grades 1 through 5. Communication Disorders Quarterly. Sparrow, S., Balla, D., & Cicchetti, D. (1984). Vineland Adaptive Behavior Scales. Circle Pines, MN: American Guidance Service. Stephens, M. I. (1977). Stephens Oral Language Screening Test (SOLST). Peninsula, OH: Interim Publishers. Stokes, S. F (1997). Secondary prevention of paediatric language disability: A comparison of parents and nurses as screening agents. European Journal of Disorders of Communication, 32, 139–158. Striffler, N., & Willis, S. (1981). The Communication Screen. Tucson, AZ: Communication Skill Builders. Sturner, R. A., Kunze, L., Funk, S. G., & Green, J. A. (1993). Elicited imitation: Its effectiveness for speech and language screening. Developmental Medicine and Child Neurology, 35, 715–726. Sturner, R. A., Layton, T. L., Evans, A. W., Heller, J. H., Funk, S. G., & Machon, M. W. (1994). Preschool speech and language screening: A review of currently available tests. American Journal of SpeechLanguage Pathology, 3, 25–36. Taylor, O. L., & Payne, K. T. (1983). Culturally valid testing: A proactive approach. Topics in Language Disorders, 3, 8–20. Terrell, S. L., Arensberg, K., & Rosa, M. (1992). Parentchild comparative analysis: A criterionreferenced method for the nondiscriminatory assessment of a child who spoke a relatively uncommon dialect of English. Language, Speech and Hearing Services in Schools, 23, 34–42. Terrell, S. L., & Terrell, F. (1983). Distinguishing linguistic differences from disorders: The past, present, and future of nonbiased assessment. Topics in Language Disorders, 3, 107. Thordardottir, E. T., & Ellis Weismer, S. (1996). Language assessment via parent report: Development of a screening instrument for Icelandic children. First Language, 16, 265–285. Thorner, R. M., & Remein, Q. R. (1962). Principles and procedures in the evaluation of screening for disease. Public Health Monograph, 67, 408–421. Turner, R. G. (1988). Techniques to determine test protocol performance. Ear and Hearing, 9, 177—189. Van Keulen, J. E., Weddington, G. T., & DeBose, C. E. (1998). Speech, language, and learning and the African American child. Boston: Allyn & Bacon. VaughnCooke, F B. (1983). Improving language assessment in minority children. Asha, 25, 29–34. Washington, J. A. (1996). Issues in assessing the language abilities of African American children. In A. G. Kamhi, K. E. Pollock, & J. L. Harris (Eds.), Communication development and disorders in African American children (pp. 35–54). Baltimore: Brookes. Weddington, G. T. (1987). The assessment and treatment of communication disorders in culturally diverse populations. Unpublished manuscript. Whitworth, A., Davies, C., & Stokes, S. F (1993). Identification of communication impairments in preschoolers: A comparison of parent and teacher success. Australian Journal of Human Communication Disorders, 21, 112–133. Wolfram, W. (1991). Dialects and American English. Englewood Cliffs, NJ: PrenticeHall.
Page 250 CHAPTER
10 Description: What Is the Nature of This Child’s Language? The Nature of Examining Change Special Considerations for Asking This Clinical Question Available Tools Practical Considerations Nigel is a 9yearold with mild mental retardation whose placement in a multiage classroom is complicated by a moderate hearing loss and ADD. A 3year reevaluation conducted at the beginning of the school year included extensive audiological assessment as well as standardized language testing that confirmed particular difficulties in expressive phonology and morphosyntax. Language sampling and a classroom checklist were used to help determine the educational impact of Nigel’s difficulties and to help plan accommodations and develop Nigel’s individualized educational plan. Tao has a long history of communication problems that have changed with age. She was diagnosed with autism at age 4, then Asperger’s syndrome at age 8. Now, at age 12, with appropriate accommodations and intensive treatment, she is in a regular junior high school. Speechlanguage intervention has centered on addressing her pragmatic challenges with peers and teachers. Goals in this area have been identified and tracked during the semester using a variety of descriptive measures
Page 251 created by her clinicians. Recently a dynamic assessment designed to examine Tao’s emerging awareness of the perspectives of others was undertaken as part of this process. The Nature of Description Describing their skills and the problems faced by children with suspected language impairments sometimes occurs as part of screening, thus preceding the use of formal procedures associated with identification. More often, description represents a critical component of initial assessments and continues throughout all of the later steps involved in speechlanguage management. With such pervasiveness, description undoubtedly constitutes the major measurement task facing clinicians. The purposes served by description are varied. Descriptive measures are initially used to characterize the specific areas of linguistic or communicative difficulty facing a child, the functional limitations those difficulties impose, and increasingly the effects on the child’s social roles that are associated with the child’s language disorder (Goldstein & Geirut, 1998). At the same time, descriptive measures can be used to help plan initial treatment strategies, choose specific treatment goals, and provide the basis for later comparisons. During treatment, descriptive probes—especially of untreated but related stimuli—and other descriptive measures are likely to provide some of the best evidence of treatment effectiveness (Bain & Dollaghan, 1991; Olswang & Bain, 1994; Schmidt & Bjork, 1992) because they reflect the extent to which generalization is occurring. In fact, much of the profession’s recent focus on measuring outcomes to document the value of treatment (see Frattali, 1998) involves the development and use of descriptive measures. Despite the ubiquity of descriptive measures (and perhaps because of it), the measurement challenges they present can be overlooked, or at least underappreciated (Leonard, Prutting, Perozzi, & Berkley, 1978; McCauley, 1996; Minifie, Darley, & Sherman, 1963). Illustrating a growing interest in those challenges, Secord (1992) devoted an entire book to descriptive, nonstandardized language assessment. In an early chapter of that book, Damico, Secord, and Wiig (1992) noted that effective descriptive assessment procedures need to be “as rigorous as normreferenced tests” (p. 1). The source of that rigor, however, is much less obvious than that associated with measures used for purposes of classification. Much of the rigor associated with methods used in the identification of language impairment appears to reside in the hands of others (e.g., test authors and publishers, individual researchers). In contrast, for descriptive measures, the responsibility for rigor falls largely into the hands of the clinician. As Leonard (1996) observed, such measures are “essentially experimental tasks”—often created by clinicians and sometimes borrowed directly from experimenters. Fortunately, in creating and understanding such measures, the clinician has allies in the increasing number of clinician–researchers in speechlanguage pathology and related fields who develop and share individual methods and reflections on the measurement challenges they present. In this chapter, I try to pass along some of their insights and direct readers to particularly helpful examples.
Page 252 Special Considerations for Asking This Clinical Question The process of description can sometimes use normreferenced measurement. When profiles of performance are examined to assess broader patterns of strengths and challenges within different areas of communication, standardized normreferenced measures can provide useful information (Olswang & Bain, 1991). This is especially true when limitations due to test content and measurement error are taken into account (McCauley & Swisher, 1984; Salvia & Ysseldyke, 1981). Usually, however, the process of description makes use of criterionreferenced measurement. Such measurement can function at several levels of detail—from more global categorizations of language function in different modalities to the detailed description of a specific language or communication skill (e.g., frequency of use of a particular grammatical morpheme or communicative intent in a given conversational context). Although such descriptions may not always fit within a view of measurement as the assignment of numbers to behaviors, they fit within the broader view of behavioral measurement as a simplification process or as information compression used to aid decision making (Barrow, 1992; Morris, 1994). Thus, as with all cases of measurement, our central concern with validity remains (APA, AERA, & NCME, 1985; Messick, 1989). However, validity is fostered through means that may superficially appear unrelated to the psychometric concerns described for normreferenced instruments. For example, rather than a study of criterionrelated validity using numerous participants and other normreferenced measures, evidence for descriptive measures may involve the collection of supporting qualitative and subjective data for a much smaller number of cases, or even a single case. Because a principal value of such measures is their close tie to a specific construct, the user’s alertness to the nature of a targeted construct and the degree to which a specific measure serves as an acceptable indicator of it rises in importance from large to gargantuan proportions. Damico et al. (1992) discussed three complex characteristics pivotal to effective descriptive assessment techniques: authenticity, functionality, and richness of description. Authenticity is used to refer to three related concepts: linguistic realism, ecological validity, and psychometric veracity. Linguistic realism involves the treatment of communication in data collection and analysis as a complex and synergistic process with the sharing of meaning as its goal, whereas ecological validity refers to the preservation of natural communicative contexts in assessment. The third concept, psychometric veracity, encompasses the traditional concepts of reliability and validity as well as the clinical practicality of the measures in terms such as time and required resources. Concerns regarding authenticity have led to the use of the term authentic assessment to refer to assessments designed with authenticity as their paramount virtue (e.g., Schraeder, Quinn, Stockman, & Miller, 1999). The term functionality as used by Damico et al. (1992) relates to effectiveness, fluency, and appropriateness of conveyed meaning. This criterion focuses not just on obtaining information about clients’ underlying competence but also about their ability to put knowledge into play effectively to achieve communication goals. The crite
Page 253 rion of richness of description, cited by those same authors, entails the use of assessment procedures designed to provide detailed descriptions of communicative performance leading to explanatory hypotheses for detected communication difficulties. This criterion, then, associates descriptive measures with the manipulation of variables in the environment (materials used, identity of communication partner, etc.) that can be studied for their immediate effect on performance. I urge readers to examine the original source (Damico et al., 1992) in order to get a deeper feel for the intricacies involved in assessment that preserve those characteristics of communication that make communication what it is. I also suggest, however, that the overarching point Damico and his colleagues were making is that descriptive measures of communication need to be valid—they need to measure what they purport to measure. Specifically, to the very great extent to which communication is embedded in social interaction, intended to share meaning, and constrained by the physiological and social makeup of its users, its measurement must honor those properties or suffer the fate of reduced validity. The work of Damico et al. and numerous others (e.g., Kovarsky, Duchan, & Maxwell, 1999; Lund & Duchan, 1983, 1993; Muma, 1998) is extremely valuable in calling attention to these special properties—an endeavor made all the more necessary by the frequent equating of principles such as validity only with normreferenced measurement. Because of growing sensitivity to the demands for a widening range of descriptive measures, advice about construction of such measures by clinicians themselves has become increasingly available (e.g., Miller & Paul, 1995; Vetter, 1988). Providing a succinct foundation for these recommendations, Vetter outlined a systematic process for developing informal assessment procedures. In an earlier publication on criterionreferenced measures (McCauley, 1996), I modified that process somewhat and have modified it further in Fig. 10.1 through the addition of a step encouraging clinicians to seek out existing probes for possible use or adaptation. In the process outlined in Fig. 10.1, the crucial first step is the formulation of the specific clinical question. In questions of description, the clinician is relatively unencumbered by the external, regulatory forces (e.g., state requirements) that affect both the kinds of clinical questions that are asked and the methods used to answer them. However, that does little to decrease, and may even increase, the clinical perspicacity required at this step. The multiple levels of WHO’s classification systems (WHO, 1980, 1998) come into play in the complexity of this step. Recall that these levels (e.g., impairment, disorder, disability, and handicap in the 1980 version) consider the broader effects of health conditions and the role that society plays in determining the implications of a given condition for the individual. These levels bring to mind the challenge of describing a child’s communication in terms of effects on the child’s participation in social roles, as well as in the specifics of lexicon, grammar, and so forth. Consequently, the clinician who wishes to describe a child’s communication will need to choose selectively from a large number of possible levels and areas for which description is possible. In so doing, the clinician can focus on a smaller number of clinical questions whose answers can have a powerful impact on the child’s treatment and subsequent functioning. The remaining steps in Vetter’s process entail tailoring the procedure to meet the demands of a specific clinical question and client, implementing it, and then evaluat
Page 254
Fig. 10.1. Steps in the development of an informal measure. ing its effectiveness—ideally through the accumulation of data spanning several clients. The clinician’s reactions to this evaluative step in the process can include changing instructions or the specific items used, increasing the number of items used in order to increase reliability, or abandoning the procedure altogether. Particularly when a measure lends itself to use with numerous children, additional steps such as a more rigorous evaluation of reliability and the development of local
Page 255 norms can be well worth the additional effort. Cirrin and Penner (1992) discussed how descriptive measures can be implemented districtwide. Their recommendations make use of multiple stages to ensure feasibility and validity. Among factors that they stressed are the need to (a) use pilot procedures with a small number of clinicians prior to widespread use, (b) conduct initial training and followup sessions for all users, and (c) undertake a districtwide trial period. Cirrin and Penner stressed the value of local norms as a means of improving eligibility decisions, but they also acknowledged the heavy administrative demands this entails in terms of expertise and staff time. In so doing, they point to one of the chief challenges of descriptive measures—making their construction and use fit within the sometimes harsh demands for efficiency (especially time demands) facing most speechlanguage pathologists. However, it should always be remembered that cutting corners by paying too much attention to being efficient can result in an incomplete picture of a child’s problem that will result in the long run in far greater losses of time. This issue will receive additional attention later in the chapter in the section entitled ‘‘Practical Considerations.” Available Tools One reason that description can seem relatively perplexing from a measurement perspective is the diversity of available tools and strategies. This diversity includes tools that are quite standardized, tools proposed informally in research or clinical publications, and tools that the clinician may decide to develop ondemand to address a specific clinical question for which no commercially developed alternative is available. Although not exhaustive, a relatively detailed list of available types of such measures has been offered by Damico et al. (1992)—language sample analyses, probes, rating scales, and online observations. Although one can broadly categorize the tools and strategies listed by Damico et al. (1992) as normreferenced and criterionreferenced measures falling at various levels of standardization, considering them in greater detail seems warranted. Consequently, all of the categories described by Damico et al. as well as standardized normreferenced measures and standardized criterionreferenced measures are briefly discussed in this section. Two additional assessment strategies are also highlighted—dynamic assessment (GutierrezClellen, Brown, Conboy, & RobinsonZañartu, 1998; Lidz, 1987, Lidz & Peña, 1996) and qualitative measures (Olswang & Bain, 1994; Schwartz & Olswang, 1996). These techniques are singled out for special attention by virtue of their emerging status as innovative approaches to description. Although dynamic assessment has received considerable attention in the professional literature (Butler, 1997), the use of qualitative measures represents a refinement of clinical practice that has received less direct critical attention. 1. Standardized NormReferenced Measures
Standardized normreferenced measures are frequently used to characterize areas of greater or lesser deficit—a type of description that involves what is sometimes termed profile analysis or discrepancy analysis. For example, many clinicians make use of
Page 256 the structure of available normreferenced tests in which both receptive and expressive skills are examined to determine the extent of problems in each area. Additionally, they may make use of subtest structure, when it is available, to further refine a list of more specific strengths and challenges. For example, the clinician may note a child’s better performance on receptive subtests with longer stimuli (e.g., listening to paragraphs) than on those with shorter stimuli (e.g., word classes). In chapter 9, problems in profile analysis were discussed in relation to using profiles in identification decisions (see the section on conducting comparisons between scores). As a brief reprise, these problems relate to the difficulty in distinguishing real differences between scores from those due to measurement error or to differences in normative groups. In addition, when measures used in a profile are highly correlated, the comparison may offer little or no new information (Olswang & Bain, 1994; Turner, 1988). Finally, even differences between tests or subtests that are real (i.e., are not due to error) and have occurred on measures of independent skills may not represent differences that are any greater than those that may be observed in normal development (Berk, 1984; Olswang & Bain, 1991). The strategy of simply distinguishing between ageappropriate and nonageappropriate functioning seems a useful alternative to more elaborate but problematic strategies of interpretation (McCauley & Swisher, 1984). This strategy consists of making decisions about the adequacy of functioning in a given area independently, rather than in relation to function in other areas. Several difficulties in addition to those described in chapter 9 arise when normreferenced tests are used to identify a detailed set of strengths and challenges for purposes of description. One difficulty lies in the relatively small number of content areas for which subtests are available. Looking only at those areas for which subtests do exist is quite akin to the story of the intoxicated soul who looks under the lamppost for lost keys. Turning to nonnormreferenced measures presents a logical, “sober” alternative in many such cases. Even when a subtest contains items that seem perfectly relevant to a description of a child’s communication, however, normreferenced tests can also be used erroneously in efforts to provide detailed information. Treating individual items or even subtests as reliable descriptors is likely to be erroneous in part because of the unreliability of small sample sizes (i.e., the small amount of the child’s behavior that was sampled; McCauley & Swisher, 1984). In addition, because items in such tests are usually selected more often because they discriminate between individuals than because of the specific content they reflect, they can provide a spotty representation of the specific content area (McCauley & Swisher, 1984). In addition to their use in profiles, normreferenced tests are used in outoflevel testing, the practice of using a test that may not be appropriate for a client of a given age to sample a set of behaviors. This descriptive information is intended to help define what an individual does and does not do in response to a standard task and set of stimuli. Although this practice is probably most frequently used with individuals with mental retardation, it can be used at any time when more appropriate measures are wanting (Berk, 1984). When used in this way, the measure is treated as if it were criterionreferenced, with the sampling of content becoming critically important to its value. The problem of small, unrepresentative samples of behavior described earlier will require cautious interpretation or, more probably, a search for a more appropriate tool.
Page 257 2. Standardized CriterionReferenced Measures
Criterionreferenced measures have traditionally been applauded for their descriptive powers. After all, they are generally constructed to enable a description of an individual’s knowledge base, rather than to facilitate comparisons between individuals. However, there are relatively few criterionreferenced measures of communication that demonstrate the same degree of standardization seen in normreferenced tests. Because criterionreferenced measures require more comprehensive coverage of smaller content areas, the demand for any single measure may not be sufficient to support more extensive development. Recalling also that interest in the measurement community has only lately turned to criterion referencing, it is easy to understand why informal criterionreferenced measures abound. Nonetheless, a few more elaborately developed criterionreferenced measures exist, and many are in various stages of development. Specific types of procedures used to collect data for criterionreferenced interpretation vary significantly and include each of the measurement types discussed in the remainder of this chapter. The decision to highlight standardized criterionreferenced measures in this separate section was based on a desire to emphasize the potential value of strengthening such measures through the additional empirical scrutiny that accompanies their formal development. Table 10.1 provides several examples of criterionreferenced measures encompassing diverse communication domains and modalities. They vary in the extent to which they have been standardized. However, at a minimum they demonstrate several of the hallmarks of standardization for a criterionreferenced instrument: development of guidelines for appropriate use, administration procedures, scoring procedures, and method of interpretation. 3. Probes
Probes involve the use of structured tasks or contexts intended to elicit a given behavior (Damico et al., 1992). Although that definition can also apply to the contents of standardized measures, the term probe is more typically reserved for more informal measures. Elicitation greatly increases the probability of obtaining information about a given behavior within a given time span, particularly for those behaviors that occur less frequently in natural conversation. However, elicitation procedures represent potential intrusions on the naturalness of the elicited behavior. This potential means that, insofar as naturalness is a major concern in description, their use should primarily be limited to behaviors that occur only rarely without elicitation. In addition, special care should be taken during their construction to preserve the authenticity of the communication exchange in which they are embedded. When such care is seen as impractical, the resulting data more closely resemble a standardized test in miniature than a descriptive procedure meeting the more intense demands for naturalness of context desirable for this type of measurement question. Data obtained from probes are frequently evaluated by the clinician in terms of number or percentage correct. (See the discussion of observational codes under On Line Observations later in this chapter.)
Page 258 Table 10.1 A List of Some CriterionReferenced Measures Available for the Description of Language Disorders in Children
Test Name
Reference
Reviewed in Mental Measurements Yearbooks
Ages
Receptive and/or Expressive Phonology Semantics Morphology
Foster, R., Giddan, J. J., & Assessment of Stark, J. (1983). Assess ment of Children’s Children’s Language Language Comprehension. Palo Alto, CA: Comprehension Consulting Psychologists Press.
3 years to 6 years, 11 months
R
Miller–Yoder Language Comprehen sion Test
Miller, J. F., & Yoder, D. E. (1984). Miller–Yoder Language Comprehension Test. Austin, TX: ProEd.
4 to 8 years
R
Preschool Language Assessment Instrument
Blank, M., Rose, S. A., & Berlin, L. J. (1978). Preschool Language Assessment Instrument (PLAI). San Antonio, TX: Psychological Corporation.
2 years, 9 months to 5 years, 8 months
R/E
Receptive– Expressive Emergent Language Test– 2
Bzoch, K. R., & League, R. (1991). ReceptiveExpressive Language Test2. Austin, TX: ProEd.
0 to 3 years
R/E
Wiig Criterion Referenced Inventory of Language
Wiig, E. (1990). Wiig Criterion Referenced Inventory of Language. San Antonio: Psychological Corporation.
4 to 13 years
E
X
Pragmatics
X
X
X
Syntax
X
X
X
X
X
X
X
X
X
X
Page 259 An extended example in which measures varying in naturalness are described may help readers see the tradeoffs between naturalness, efficiency, and the clinician’s control of variables affecting performance. The procedures in this example derive from attempts to examine phonological performance on a single sound or sound pattern in some detail and over time. The first part of this example was created in 1967, when Elbert, Shelton, and Arndt developed the Sound Production Task (SPT). In that task, the client imitated the production of 30 to 60 items containing a particular target sound. Some items on the SPT consisted of nonsense syllables, others of single words, and others of short phrases containing the sound. The SPT was designed to obtain relatively large numbers of observations in varying phonetic and linguistic contexts, while avoiding repeated, inappropriate use of entire normreferenced tests or items from them. In a study of patterns of acquisition for /s/ and /r/ in treatment, Diedrich and Bangert (1980) used the SPT, but they also devised a less reactive procedure, that is, one that was more covert in terms of its focus and thus less apt to elicit uncharacteristically careful speech from the tested child. For this second procedure, called the Talking Task (TT), the clinician engaged in a 3minute conversation with the child and covertly noted the number of correct productions out of those attempted. Although the TT represented an interesting innovation, it left the clinician at the mercy of chance, in that infrequently occurring sounds might occur only a few times during the 3minute sample—something that might be addressed by defining the sample length in terms of a certain number of attempts, rather than in terms of time. In 1981, Secord developed a set of tasks, the Clinical Probes of Articulation Consistency (CPAC), recently replaced by the Secord Contextual Articulation Tests (SCAT; Secord & Shine, 1997), which bears some relationship to each of these previous two tasks. In the SCAT, probes for each consonant /r/ and vocalic / / are elicited in prevocalic and postvocalic positions, as well as in clusters—in imitations of single words, short phrases, and sentences as well as in delayed retellings of a story containing many words with the target sound. Thus, this set of probes can efficiently help the clinician consider the possible effects of linguistic complexity (single word, sentence, narrative contexts) and phonetic context (postvocalic, prevocalic, and cluster contexts). However, naturalness is somewhat reduced in a story retelling format and is reduced still further in imitation. These kinds of tradeoffs abound in the construction of probes, making the sharing of successful creations with colleagues a substantial and timesaving contribution. In a book entitled Assessing children’s syntax (McDaniel, McKee, & Cairns, 1996), a variety of elicitation strategies for both comprehension and production are discussed in detail by researchers who have considerable experience in their application. Table 10.2 lists a number of these elicitation strategies. The descriptions of these strategies reveal the common techniques available to both professional test authors and clinicians wishing to construct a syntactic probe for a particular client. Informal probes have also been developed to examine pragmatic skills—an area in which there is a dearth of standardized measures (Lund & Duchan, 1993). For example, Lucas, Weiss, and Hall (1993) described the development of a probe designed to examine the extent to which children with communication disorders are sufficiently
Page 260 Table 10.2 Elicitation Strategies for Assessing the Comprehension and Production of Syntax in Children Strategy
Description
Strengths and Weaknesses
Production
Elicited imitation (Lust, Flynn, & Foley, 1996)
Strengths: You can choose stimuli very precisely and “know” what the child is attempting to say. Studies show good agreement with comprehension and other data. The technique is applicable The child is asked to repeat an utterance (usually a single with small changes for children from a wide range of cultures and sentence) exactly as produced by an adult. It is assumed that only languages and can be used at relatively low developmental levels. structures reflecting the child’s grammatical competence will be Weaknesses: Stimulus design is complex due to the need to produced. An easy technique, even for children as young as 1 or l control variables that are not of direct interest (e.g., cognitive 2. demand, attention, grammatical complexity, sentence length). The technique has been criticized for relying unduly on shortterm memory.
Elicited production (Thorton, 1996)
Strengths: Generation of the targeted structure rests more entirely with the child and is unlikely to be due to chance. A large number of such probes have been described in the research Situations are created to increase the likelihood that the child will literature. attempt to produce a given structure, usually including the use of a Weaknesses: The child’s enjoyment level is key to the success “lead in” sentence that is produced by the adult to ‘‘provide the l of the strategy because she or he needs to be an active context and ‘ingredients’ for production of the structure without participant. The awkwardness associated with a “no response” modeling it.” Sometimes this technique makes use of a puppet from the child may be intensified relative to other methods and who can be asked questions, directed to do things, or corrected. may make children less willing to continue. Working out the Typically used with normally developing children 3 years and details required to elicit production may require considerable older. piloting with adults or normally developing children. Similarly, correct productions are far more straightforwardly interpreted than incorrect or untargeted productions.
l
l
Page 261
Comprehension intermodal preferential looking (Hirsh Pasek & Golinkoff, 1996)
The child is seated on a parent’s lap, hears a stimulus and then is presented simultaneously with two novel video images—one matching and the other not matching what has been said. Greater time spent watching the matching video is expected for comprehended structures. Used for children between 12 months and 4 years of age.
Strengths: Minimal action is required. Use of videos allows the presentation of dynamic relationships. Can be used at lower developmental levels than many other tasks. l Weaknesses: Considerable time and expertise are required to create the video stimuli. Only a few stimuli can be studied at any point in time. l
Strengths: This technique has been widely used to assess understanding or grammaticality of specific phonological distinctions, lexical comprehension or comprehension of specific The child hears the adult or a recorded voice presenting a verbal morphosyntactic structure. It tends to produce results Picture selection task (Gerken stimulus and then points to one of two to four pictures. Typically comparable to object selection where either task is feasible. & Shady, 1996). this task is useful with normally developing children 20 to 24 l Weaknesses: Considerable time can be required to produce months and older. comparable target and foil items. Although use of taperecorded speech or synthetic speech can help increase children’s attention, it increases the complexity of task construction. Failures to respond are difficult to interpret. l
Strengths: The task has a long history of use and is easy and inexpensive to use. It can be fun for the child and can be particularly effective in assessing understanding of anaphora and pronominalization. It is relatively openended task that may be less sensitive to response bias than many others, yet may be The child is asked to use provided props to act out a sentence associated with a tendency to repeatedly use a prop once it is ActingOut Tasks (Goodluck, that is read or played back from tape. Typically used for children picked up. 1996) older than 3 years. l Weaknesses: It cannot be used with constructions or predicates that are difficult to act out and can be associated with responses that are difficult to interpret. Because of the cognitive complexity of the task, it typically is used for normally developing children older than 3 years, thus limiting use with children with language difficulties. l
Page 262 informative in their utterances as they participate in a roleplaying game. The child is assigned the role of “warehouse manager” and is approached by the clinician “toy buyer” and asked where different toys might be found in the warehouse. In a similar vein, Roeper, de Villiers, and de Villiers (1999) recently described their ongoing efforts to design an extensive number of probes for assessing important interacting knowledge in pragmatics, semantics, and syntax for 5yearolds—for example, the need to know specific semantic and syntactic forms to achieve particular pragmatic functions. Elaborately developed in terms of the materials, instructions, and scoring procedures, both the probes developed by Lucas et al. and those developed by Roeper et al. illustrate that a measure’s formality is better conceived of as a continuum than a dichotomy. Further, the thorough description of the probes offered by Lucas et al. illustrate the extent to which sharing the results of welldeveloped probes can increase the efficiency of clinicians’ efforts. Professional journals and a growing number of books on language development and disorders describe numerous clinical and research probes (e.g., Brinton & Fujiki, 1992; Lund & Duchan, 1993; Miller, 1981; Miller & Paul, 1995; Simon, 1984). Table 10.3 showcases a modest sample of these probes for children across a wide range of ages and developmental levels. It is offered to help provide a feel for the heterogeneity and considerable potential of such measures. 4. Rating Scales
Rating scales consist of assigning numerals or labels to an individual’s behavior in a particular context. Rating scales are typically completed by the clinician or other observer after the observation of individual communication events. At times, such scales can be used to help observers summarize their experience across multiple observation experiences. Rating scales differ from online observations, another type of descriptive measure, in that online judgements are made during rather than after the actual communicative event. Rating scales have a lengthy history in psychology and speechlanguage pathology (e.g., see Schiavetti, 1992), but primarily in research rather than clinical settings (e.g., Burroughs & Tomblin, 1990; Campbell & Dollaghan, 1992). However, increasing attention to the documentation of children’s functional limitations (Goldstein & Gierut, 1998) may cause rating scales to be used with greater frequency in the future. Two types of rating scales that have been most influential in speechlanguage pathology are interval scaling and direct magnitude estimation (Campbell & Dollaghan, 1992; Schiavetti, 1992). These rating scales are usually used to compare a large number of stimulus examples—something that is not always done with rating scales. When interval scaling is used, the rater assigns each characteristic or behavior being rated to a linearly partitioned continuum, which is marked off using numerals or descriptive labels. Thus, for example, a rater might be asked to rate a behavior on a continuum from uncommon to most common, using a 6 or 7point scale that might look something like this:
Page 263
or this:
When direct magnitude estimation is used, the rater is asked to rate each characteristic or behavior either as a proportion of a standard stimulus provided as part of the rating system or as a proportion of other rated stimuli. Thus, for example, Camp Table 10.3 A Sample of Probes Used in the Description of Children’s Language
Procedure (Source)
Approximate Age of Child for Whom the Task Could Be Used (If Specified)
Description
Comprehension of action words 12 to 24 months (Miller & Paul, 1995)
Child is asked to perform actions that the child’s parent(s) believes he or she may understand on familiar objects and people. Unconventional actions may be requested to help distinguish action unconnected to the request from intentional responses.
Bellugi’s negation test (Miller, 1981)
The child is asked to provide the negative of an utterance produced by an adult. Variations can include different auxiliaries, negative with indefinites, imperatives, and multipropositional sentences.
Production of question forms (Lund & Duchan, 1993)
The Messenger Game. The child is asked to get information from a third party, ideally one who is out of view. For example, “Ask her how she got to this school?”
Comprehension of nonliteral meaning (Lund & Duchan, 1993)
Early adolescence
Joke explanations. The child is asked to explain a joke that he or she finds humorous.
Comprehension of classroom direction vocabulary (Miller & 6 to 12 years Paul, 1995)
Classroom directions and vocabulary that are thought to be difficult for the child are incorporated in instructions that the child must follow using paper and pencil.
Production of sequential description (Simon, 1984)
Description for using a payphone. Child is shown a picture of a pay phone and asked to give a step by step description of how it is used.
Middle and High school students
Page 264 bell and Dollaghan (1992) described a method in which no standard stimulus is provided. In their study, listeners were instructed to assign any number of their choice to the first of 36 speech samples they were asked to rate. Later samples were then rated subjectively on the bases of (a) their proportional informativeness relative to the other judgments made in the sample and (b) the understanding that higher numbers were to be associated with greater informativeness than lower numbers. The Observational Rating Scales that are included as part of the third edition of the Clinical Evaluation of Language Fundamentals (Semel, Wiig, & Secord, 1996) provide an example of how a rating scale can be used to enrich the clinician’s understanding of the schoolage child and his or her communication environment. They are mentioned here because of the relative dearth of such scales for schoolage children, although they are becoming more common—for example, the Functional Status Measures (Educational Settings) of the Pediatric Treatment Outcomes Form (ASHA, 1995) and the Teacher Assessment of Student Communicative Competence (Smith, McCauley, & Guitar, in press). In addition, the Observational Rating Scales are of particular interest because of their novel inclusion of parallel rating forms so that comparable information can be obtained from the child, his parent(s) and teacher(s). They represent an example of the interval scaling method, one in which individuals are asked to respond in a summative fashion to past observations. Each scale of the Observational Rating Scales consists of 40 items addressing ‘‘troubles” facing the child in listening (9 items), speaking (19 items), reading (6 items), and writing (6 items). To illustrate the nature of these items, let me indicate that the first listening item is “I have trouble paying attention” for the student version (often completed with the speechlanguage pathologist); “My child has trouble paying attention” for the parent version; and “The student has trouble paying attention” for the teacher version. Each item is rated as occurring never, sometimes, often, or always, with DK (Don’t know) used to mark items for which the rater feels unable to pass judgment. The Observational Rating Scales also describe procedures for the observers to identify and provide examples of their top five concerns, thus paving the way for functionally oriented intervention planning. The chief appeals of rating scales are the apparent ease with which they can be created and administered, as well as their wide applicability (Pedhazur & Schmelkin, 1991; Salvia & Ysseldyke, 1998). These virtues, however, may mask their susceptibility to a number of problems, especially ones stemming from poorly defined points along an interval scale and from differences introduced by different raters. In a brief review of such measurement issues facing rating scales, Pedhazur and Schmelkin (1991) concluded that ratings may often “tell more about the raters than about the objects they rate” (p. 121). They cited a rich literature in which the perceptual aspects of the rating task make raters vulnerable to a number of types of bias. Two common types of bias include halo effects, in which raters allow impressions of general characteristics or previous knowledge to have a consistent effect on ratings, and leniency effects, in which overly positive judgments appear to occur because the rater is familiar with the person whose characteristics are being rated (Primavera, Allison, & Alfonso, 1996). An additional challenge to valid use of rating scales lies in the need to achieve a successful fit between the nature of the characteristic being rated and the type of scal
Page 265 ing method used to rate it (Campbell & Dollaghan, 1992; Schiavetti, 1992). In particular, researchers have noted a difference in what kind of scale is appropriate depending on whether the rated characteristic falls along a metathetic versus a prothetic continuum. On a metathetic continuum, raters’ responses to differences between rated entities seem to reflect qualitative distinctions; whereas on a prothetic continuum, raters’ responses to differences between rated entities appear to reflect quantitative distinctions (Stevens, 1975). The classic contrastive pair illustrating these two types of continuum are pitch and loudness. Without looking ahead to the next paragraph, can you anticipate which of those two characteristics of sound is prothetic (i.e., characterized by quantitative rather than qualitative differences)? If you decided that loudness was prothetic, you are in agreement with a large body of research suggesting that people tend to treat judgments such as loudness as if they were judgements about whether a stimulus had “more” or “less” of something (Stevens, 1975). In contrast, pitch differences tend to be judged as if they represent qualitatively different stimuli. Well, the challenge to devising appropriate rating scales is that whereas direct magnitude estimation can validly be used to measure either type of characteristic, interval scaling appears to only be valid for measuring characteristics that are metathetic. Campbell and Dollaghan (1992) suggested that because of the lack of research determining which language characteristics are metathetic versus prothetic, direct magnitude estimation is a less risky choice for researchers and clinicians who wish to use rating scales in their descriptions of children’s language disorders. They noted that direct magnitude estimation can be used to provide a comparison of children’s spontaneously produced language against that of their peers. Among the most important uses they saw for such judgments were the examination of change occurring as result of or in the absence of treatment. In particular, Campbell and Dollaghan described a method in which 10 to 15 listeners could be used to provide ratings with a stable percentage of variability. Specifically, Campbell and Dollaghan (1992) had 13 listeners compare the informativeness—‘‘amount of verbal information conveyed by a speaker during a specified period of spontaneous language production” (p. 50)—achieved by three children who had sustained severe brain injury with three agematched controls, when both sets of children were engaged in a videonarration task (Dollaghan, Campbell, & Tomlin: 1990). (Recall that the particulars of the direct estimation method involved in this study were described earlier in the chapter when that rating method was introduced.) The use of this technique provided social validation to the recovery patterns shown by the 3 children with brain injury who participated in the study. The relatively large number of raters required for use of direct magnitude estimation may preclude its use in many clinical situations. However, it may prove valuable as a means of validating more efficient methods of social validation. In addition, it may prove valuable as a method that could provide exactly the information required for certain clinical situations. For example, it might be used as described by Campbell and Dollaghan to support to a relatively costly or lengthy treatment approach for a given child or group of similar children. Not surprisingly, then, it appears that the use of rating scales as a descriptive measurement tool, like others discussed in this section, has a greater complexity than might
Page 266 at first be apparent. Thus, wise users will require as much evidence regarding validity as possible for specific methods prior to deciding to implement them clinically. Further evidence of their promise should prompt users to want to participate in providing such evidence. 5. Language Analysis
Language sampling and analysis have enjoyed a long history of use in studies of children’s language acquisition (e.g., Brown, 1973; Miller, 1981; Templin, 1957). The variety of procedures recommended for elicitation of language samples and for the derivation of measures based on them has grown appreciably over the past 40 years and has changed as understandings of the nature of language impairments have changed (Evans, 1996a; Gavin, Klee, & Membrino, 1993; Miller, 1996; Stromswold, 1996). In a study of some 253 American speechlanguage pathologists who work with preschool children, Kemp and Klee (1997) found that 85% of them used language analysis in their practice, with most preferring nonstandardized forms to formal procedures. Language analyses are sometimes avoided by clinicians who report that they do not have the time to incorporate them into practice or that they lack the computer resources that would make their use more time efficient (Kemp & Klee, 1997). However, these objections are rapidly being addressed by the refinement and proliferation of computerized analysis programs (Long, 1999). Innovations such as transcription laboratories staffed by nonprofessional transcribers, the creation of databases reporting findings for large numbers of children, and the availability of analysis procedures at no cost also point to greater practicality of language analysis in the future (Evans & Miller, 1999; Miller, 1996; Long, personal communication, January 7, 2000; Miller, Freiberg, Rolland, & Reeves, 1992). Among the numerous discussions extolling the virtues of language sampling and analysis, Evans and Miller (1999) offered one that is particularly powerful: The language sample, by contrast [with available standardized tools], represents the child’s integration of specific intervention goals within the larger communication context and provides clinicians with an opportunity to assess children’s language skills dynamically across a range of situations that vary in communicative demand (e.g., freeplay, interview, narration, picture description). Language samples can be collected as often as necessary without performance bias, and changes in children’s abilities can be documented across a wide range of linguistic levels. (Evans & Miller, 1999, pp. 101–102) Additionally, such analyses can examine not only many aspects of language, but can also be used to examine how complexity in one area may impact another—a theme of growing interest in the evolution of language assessment tools. Although language analyses are typically used to assess aspects of expressive communication, they are also frequently used as a means of examining receptive skills. In particular, it seems that children’s responses to the direction and comments of their conversational partners provide data that are valued by many clinicians (Beck, 1996). In the next section, the evolution
Page 267 of language sampling and analysis is described to help readers understand the variety of available measures and how these measures have changed over time. The Evolution of Language Analyses
In 1996a, Evans reviewed the changes in emphasis in language sampling techniques that have accompanied changes in theoretical perspectives on language development and language disorders. In particular, she discussed the influence of three dominant research paradigms spanning the past halfcentury: (a) the behaviorist learning paradigm, (b) the formalist competencebased paradigm (encompassing “generative syntax, generative semantics, and a narrow interpretation of syntax,” Evans, 1996a, p. 208) and (3) the functionalist paradigm. A brief summary of her comments is relevant to anyone using language analysis because so many of the measures associated with earlier paradigms remain available and in widespread use—sometimes in revised versions and sometimes in their original form (Kemp & Klee, 1997). In the heyday of the behaviorist learning paradigm, the roles of the environment on learning and the word as the unit of analysis were emphasized. Language acquisition was understood to occur through the reinforcement of correct use of words and sentences (word sequences). Although standardized language tests (e.g., the Peabody Picture Vocabulary Test, Illinois Test of Psycholinguistic Abilities) dominated language assessment methods during this period, language analysis techniques were used as well and emphasized counts or descriptions of different verbal behaviors (e.g., type–token ratio, measures of sentence length). The second paradigm discussed by Evans (1996a), the formalist competencebased paradigm, was designed to address the generativity of children’s language, that is, the use of novel and therefore unmodeled and presumably unreinforced utterances (e.g., overregularization of past tense, as in “he goed.”). As Evans notes, this paradigm was made possible by linguistic theory of the day (particularly the work of Chomsky), in which a major goal of linguists became the identification of language independent competencies, termed linguistic universals. Such universals were thought to suggest features of languages and linguistic structure that were likely to occur in all languages. Evans (1996a) suggested that initial orientations within the formalist paradigm were largely syntactic in nature and proceeded on the assumption that domains of language—syntax, semantics, and so forth—could be viewed independently. An assumption was also made that variability in performance was more likely to be a function of a child’s knowledge than a function of contextual factors. According to Evans’s account, later developments in this paradigm, fueled by theory and data from a variety of sources, shifted the focus somewhat—first to semantics, then to pragmatics. Evans pointed out that language analyses associated with the formalist period similarly shifted, although sometimes subtly, from largely syntactic measures (e.g., Developmental Sentence Scoring, DSS; Language Sampling, Analysis, and Training, LSAT; and Language Assessment Remediation and Screening Procedure, LARSP) to measures focusing on semantics (e.g., mean length of utterance in morphemes, MLUm) and, later, on pragmatics (e.g., Roth & Spekman, 1984).
Page 268 Evans (1996a) noted that, throughout this period, the child’s task in language acquisition was largely seen as that of acquiring competence in the underlying rules of the ambient language. Predictably, then, childhood language disorders within this paradigm were seen as difficulties in acquiring the rules of the individual subsystems of language. In Evans’s view, language assessments have thus grown through accretion to require elaborate analyses across semantics, syntax, and pragmatics—a process that has been made more feasible through modern technology. Among the analyses she associates with this period are the Systematic Analysis of Language Transcripts (SALT; Miller & Chapman, 1982, 1998) and the Child Language Analysis programs (CLAN; MacWhinney, 1991). Evans (1996a) suggested that functional theories, the last of the three paradigms, were prompted by difficulties in accounting for children’s variability across contexts. If rule acquisition is what is taking place, then a form evidencing that rule should either be present or not present in a child’s productions—not present in some situations, but not others, with some conversational partners, but not others. The functionalist paradigm is reflected in works such as Bates and MacWhinney (1989). According to Evans, it is based on the following premise: Variability in speaker performance is simply the final solution to the interaction among the internal state of a complex system (i.e., the underlying speaker competence), the structure of the system (e.g., word order, lexical items, morphonology, suprasegmentals), and the impact of external constraints such as realtime language processing demands. (Evans, 1996a, p. 254) Within the functionalist paradigm, then, variability becomes a major source of information about the current state of a child’s dynamic system (linguistic and nonlinguistic) as it responds to external conditions (e.g., situational or attentional factors). Increased variability is seen as an opportunity for positive change. In addition, this paradigm emphasizes the necessity of examining the interplay of language domains, an area identified by numerous authors as among the most exciting challenges facing clinicians this decade (Howard, Hartley, & Muller, 1995). Evans (1996b) provided an example of such interactions when she found fewer morphosyntactic omissions in the speech of children with SLI when their utterances occurred within a conversational turn rather than adjacent to a shift in conversational turn. Numerous studies beyond those just cited (e.g., Crystal, 1987; Panagos & Prelock, 1982; Paul & Shriberg, 1982) argued that rich and powerful understandings of children’s speech and language development emerge from the kinds of detailed analyses called for by current theory. Certainly one of the major advantages of language sampling, then, is the variety of questions to which the resulting sample can be put. For example, Dollaghan and Campbell (1992) described a taxonomy of withinutterance disruptions arising from language rather than fluency disorders to help characterize the subtle deficits lying across language domains that plague young speakers with language disorders, both developmental and acquired. Table 10.4 lists some of the standardized measures currently used to describe children’s language skills based on language samples. In this table, a variety of informa
Page 269 Table 10.4 Tools Available for Detailed Analyses of Language Samples (Evans, 1996a; Long, 1999; Owens, 1998) Procedures
Content of Analyses
Computerized?
Assigning structural stage (Miller, 1981)
Morphology, Syntax
Communication analyzer (Finnerty, 1991)
Morphology, Syntax
√
Computerized Language Assessment, Remediation, and Screening Procedure (LARSP; Bishop, 1985)
Morphology, Syntax
√
Computerized profiling versions 6.2 and 1.0 (Long & Fey, 1989)
Morphology, Syntax, Narrative
√
Computerized language analysis (CLAN; MacWhinney, 1991)
Morphology, Syntax, Narrative
√
Computerized language error analysis report (CLEAR; Bakervan den Goorbergh, 1990)
Morphology, Syntax, Pragmatics
√
Computerized profiling (CP; Long, Fey & Channell, 1998)
Morphology, Syntax
√
Developmental sentence scoring (DSS) Computer Program (Hixson, 1985)
Morphology, Syntax
Content, form, and use analysis (Lahey, 1988)
Semantics, Morphology, Syntax, and Pragmatics
Index of Productive Syntax (IPSyn; Scarborough, 1990)
Morphology, Syntax
Language assessment, remediation, and screening procedure (LARSP:, Crystal, Fletcher, & Garman, 1989)
Morphology, Syntax
Language sampling, analysis, and training (LSAT; Tyack & Gottsleben, 1974)
Morphology, Syntax
Lingquest (Mordecai, Palin, & Palmer, 1985)
Morphology, Syntax
Parrot early language sample analysis (PELSA; Weiner, 1988)
Morphology, Syntax
Profile in semantics—grammar (PRISMG; Crystal, 1982)
Semantics
Profile in semantics—lexicon (PRISML; Crystal, 1982)
Morphology
Pye analysis of language (PAL; Pye, 1987)
Morphology, Syntax
√
Systematic analysis of language transcripts (SALT; Miller & Chapman, 1998)
Morphology, Syntax, Narrative
√
√
√ √
tion about the procedure and children for whom it would be useful are provided. In addition, those procedures that are available on computer are indicated. Recently, one of these computerized programs, CP (Long, Fey & Channell, 1998) has been made available without charge at the following Internet website: http://www.cwru.edu/artsci/cosi/cp.htm (Long, January 7, 2000, personal communication). Readers are reminded that computerized measures should be viewed hopefully (Long, 1991, 1999; Long & Masterson, 1993), but with caution as well (Cochran &
Page 270 Masterson, 1995). After all, computers render it possible to conduct language analyses that would be prohibitively timeconsuming if performed by hand, but they also make it possible to make really silly or wrongheaded mistakes more quickly than ever—for example, to use the wrong analysis for a particular child. The user of such measures must exercise as much caution as ever in selecting the specific sample to be used as input and in “buying into” the specific techniques used. Further, one should recognize that although language samples are “natural’’ in the sense that they are often not consciously structured by the clinician, they are nonetheless subject to the same contextual effects that affect normreferenced test performance (Plante, February 18, 2000, personal communication). A growing literature on the subject of language analyses can help clinicians determine what is available and likely to be useful for their clients (Cochran, & Masterson, 1995; Long, 1991, 1999; Long & Masterson, 1993). Although a detailed account of even a single analysis tool is beyond the scope of this book, a summary of some recent research may help the reader see the wealth of information obtainable through language analysis. Table 10.5 lists some patterns of disordered language performance that can be described using the SALT (Miller, 1996). Miller and Klee (1995) used these categories to characterize problems of 256 children from ages 2 years, 9 months to 13 years, 8 months. The data were based on conversational and narrative samples, contexts that were selected because of the wealth of research on the former and the important connection to literacy of the latter (Miller, 1996). Miller and Klee (1995) found significant numbers of children at varying ages falling in one or more categories, with only 20 children not described by any category. For preschool children, one very specific measure that has remained in use in a relatively consistent form across the paradigms described by Evans has been the MLU, measured in morphemes. Guidelines for the calculation of MLU as described Chapman (1981) are shown in Table 10.6. MLU is regularly used clinically (Kemp & Klee, 1997; Miller, 1996) and has been incorporated in several of the procedures described in Table 10.4, including SALT. Its use is based on the premise that, at least in younger children, increasing syntactic complexity will also require increasing utterance length—especially when length is measured in morphemes and therefore would be sensitive to increases in either words or grammatical or derivational morphemes. Numerous studies lend credence to the value of MLU in describing language change through the preschool years (Conant, 1987; Rondal, Ghiotto, Bredart, & Bachelet, 1988; Scarborough, Wyckoff, & Davidson, 1986). In 1993, Blake, Quartaro, and Onorati found evidence that MLU correlated highly with a measure of grammatical complexity obtained using the LARSP until an MLU of 4.5 was reached. Findings such as these have provided considerable support for MLU’s widespread use in research as a means of grouping children according to language skill (Miller, 1996), but the appropriateness of MLU depends on the precise focus of the study.1 Recent research (e.g., Aram, Morris, & Hall, 1993) has also suggested the diagnostic utility 1 Leonard (1996) described several alternative measures for equating research groups that will be more appropriate in certain circumstances, including mean number of arguments expressed per utterance, mean number of openclass words per utterance, measures of unstressed syllable production or wordfinal consonant production, and expressive vocabulary.
Page 271 Table 10.5 A Clinical Typology of Disordered Language Performance Based on Use of the SALT Clinical types
Characteristics
Utterance formulation
Maze revisions at word and phraselevel units; increased MLU; pauses within and between utterances; word order errors
Word finding
Maze revisions and repetitions at wordand partwordlevel units; pauses within utterances; word omissions; word choice errors
Hypoverbal rate
Decreased number of utterances and words per minute; pauses within and between utterances
Hyperverbal rate
Increased number of utterances and words per minute, which may be combined with reduced semantic content
Pragmatic or discourse
Noncontingent utterances; pronominal reference errors; problems with topic maintenance, new versus old information, and narrative structure
Semantic or reference
Overgeneralization, wordchoice, and Noun Phrase–Verb Phrase symmetry errors; abandoned utterances; redundancy
Delayed development
Decreased number of different words and total number of words; delayed syntactic development as measured in MLU and other detailed syntactic analyses
Note. SALT = Systematic Analysis of Language Transcripts; MLU = mean length of utterance; NPVP = n. From “Progress in Assessing, Describing, and Defining Child Language Disorder,” by J. Miller, 1996, in K. N. Cole, P. S. Dale, and D. J. Thal (Eds.), Assessment of Communication and Language (p. 319), Baltimore: Brookes Publishing. Copyright 1996 by Brookes Publishing. Reprinted with permission. in clinical settings, particularly where production difficulties are prominent features of the child’s difficulties. Technical Considerations: Sample Size and Variations in Language Sampling Conditions
Recently, Muma et al. (1998) reported on a study conducted several years earlier in which language samples were obtained from a group of seven normally developing children between the ages of 2 years, 2 months and 5 years, 2 months. They noted that 200–300 utterances were needed to obtain acceptable error rates on many grammatical structures related to the child’s use of different grammatical systems (nominal, auxiliary, verbal) and grammatical operations (use of relative clauses, do insertion, participle shifts, etc.). Specifically they found a 15% error rate for the 200–300 utterance samples versus error rates of 55 and 40%, respectively, for 50 utterance and 100utterance samples. Not surprisingly, then, these data suggest that the more specific the nature of the information that will be looked for in the language analysis (i.e., whether detailed information about specific structures is sought), the longer the sample will need to be (Plante, February 20, 2000, personal communication). In a similar study, Gavin and Giles (1996) conducted a SALT analysis on language samples of varying sizes based on either increments of time (12 or 20 minutes) or number of utterances (25–175, in 25word increments). Study participants were 20 children from 31 to 46 months of age. The researchers examined the test–retest relia
Page 272 Table 10.6 A Summary of the Method for Calculating Mean Length of Utterance (MLU) in Morphemes, as Described by Chapman (1981) as an Adaptation From Brown (1973) Preparing the speech sample for calculation of MLU
l
The child’s speech is segmented using the criterion of terminal intonation (rising or falling).
These procedures differ from those of Brown (1973) in that a sample of the first consecutive 50 utterances (including the first page of transcription) rather than 100 utterances (excluding the first page) is recommended. l
l
Excluded from the sample of utterances are unintelligible or partially unintelligible utterances. Included are “doubtful” transcriptions and exact utterance repetitions.
Counting morphemes in each utterance Morphemes are defined as minimal meaningful units of a language, with dog and s given as examples. Counting rules based on those of Brown (1973) are given to address the greater uncertainty of what constitutes a morpheme in the speech of a child. The total count for each utterance is calculated, summed, and divided by the total number of utterances spoken to yield the MLU. The counting rules are given verbatim: “(1)
Stuttering is marked as repeated efforts at a single word; the word is counted once in the most complete form produced. In the few cases where a word is produced for emphasis, or the like (no, no, no), each occurrence is counted separately.
(2)
Such fillers as mm or oh are not counted, but no, yeah, and hi are.
(3)
All compound words (two or more free morphemes), proper nouns, and ritualized reduplications count as single words. Some examples are birthday, racketyboom, choochoo, quackquack, nightnight, pocketbook, seesaw. The justification for this decision is that there is no evidence that the constituent morphemes function as such for these children.
(4)
All irregular pasts of the verb (got, did, went, saw) count as one morpheme. Again, there is no evidence that the child relates these to present form.
(5)
All diminutives (doggie, mommie) count as one morpheme because these children do not seem to use the suffix productively. Diminutives are the stand forms used by the child.
(6)
All auxiliaries (is, have, will, can, must, would) count as separate morphemes as do all catenatives (gonna, wanna, hafta, gotta) The catenatives are counted as single morphemes, rather than as going to or want to, because evidence is that they function as such for children. All inflections, for example, possessive (s), plural (s), third person singular (s), regular past (ed), and progressive (ing), count as separate morphemes. (Chapman, 1981, p. 24)
Chapman (1981) identified several special characteristics of a sample that may affect the representativeness of the MLU: high rate of imitation (i.e., >20% of the child’s utterances), frequent selfrepetitions within a speech turn, a high proportion of answers occurring in response to adult questions (i.e., >30–40% of the child’s utterances), frequent use of routines (such as “counting, saying the alphabet, nursery rhymes, song fragments, commercial jingles, or long utterances made up by listing objects in a book or the room’’), and a high proportion of utterances in which clauses are conjoined by and. Among the strategies she suggested for addressing these problems are calculations conducted with and without imitations, selfrepetitions, frequent routines, and responses to questions. In addition, she suggested obtaining additional samples with another adult who asks fewer questions when high rates of question responses are noted and the use of another measure (the T unit) when a high proportion of utterances consist of clauses conjoined by and.
Page 273 bility of four measures (MLU, number of different words, total number of words, and means syntactic length) in samples at these different lengths. They found that only at the largest number of utterances (about 175) did reliability coefficients meet or exceed .90, the value considered acceptable for diagnostic use. The implication of these findings extends beyond a simple admonition for clinicians to attempt to obtain larger sample sizes on which to base language analyses or for them to be very aware of the potential for error dogging analyses based on smaller samples—although those are clear and potent implications. Even more importantly, however, they illustrate the connection between reliability and sample size that haunts many if not most descriptive measures. Obviously, rarer structures or phenomena are more likely to be vulnerable, but additional research will prove helpful in guiding us toward best practices in our choice of tools and sample sizes. The conditions under which language samples are collected are known to affect numerous measures obtained in language analyses (Agerton & Moran, 1995; Landa & Olswang, 1988; Miller, 1981; MoellmanLanda & Olswang, 1984; Terrell, Terrell, & Golin, 1977). Even a partial listing of some of the variables affecting a child’s productions can leave one quite daunted—for example, race and familiarity of communication partner, stimulus materials, number of communication partners, number and types of questions asked, type of communication required (e.g., narrative, description of a procedure), to name a few! It is possible to leave these variables uncontrolled—as is often done when an unstructured conversation between clinician and child is used as the sample. In such cases, the clinician will want to consider these variables in his or her analysis and interpretation process. As an alternative to unstructured language samples, structured sampling tasks have been recommended as providing more relevant (i.e., valid) information for some clinical questions. Following is a list of five sets of tasks designed to elicit structured language samples for schoolage children (Cirrin & Penner, 1992): 1. 2. 3. 4. 5.
describing an object or picture that is in the view; recalling a twoparagraph story told by the clinician without pictures; describing a person, place, or thing that is not present in the immediate surroundings; providing a description of how to do something familiar (e.g., making a sandwich); and telling what the child would do in a given situation (e.g., waking up or seeing a house on fire)
This list illustrates tasks that manipulate some of the variables that may present a child with particular difficulty, thus allowing the clinician to target language sampling for those areas of special importance for the individual child. However, it is important to remember that each of these conditions is likely to affect more about the chilid’s productions than simply the variable that appears to be manipulated. For example, on the basis of the precise way in which the task is set up by the clinician, variables beyond the desired topic or level of language complexity will probably be affected.
Page 274 In another effort to help clinicians standardize the conditions under which they collect conversational language samples, Campbell and Dollaghan (1992) offered a sequence of topic questions that they suggested be used in order, but only as spurs to conversation. Thus, only topics that the child would show genuine interest in would be continued. Further, additional topics introduced by the child would be pursued as long as they continued to interest the child. The intended result was increased consistency across examiners. In brief, the sequence begins with questions about the child’s age, birth date, and siblings; then proceeds to questions about family pets, favorite home activities, and school affairs; and closes with questions about vacations, favorite books, and TV shows. Although this list is relatively conventional, the decision of a group of colleagues to adopt it—or some other consistent set of starter questions—might help lend greater consistency to the language samples obtained across children. This, in turn, would increase the integrity of local measures that might be made using the data from a number of clients. However, it should be noted that standardization in this way is not necessarily going to add to the representativeness of the sample for the individual child—that may best be achieved by entering one of a child’s favorite activities and simply observing what happens there. 6. OnLine Observations
This category of descriptive measures is characterized by Damico et al. (1992) as realtime observation and coding of behaviors exhibited during communicative interactions as they happen. Thus, these measures differ from rating scales that are completed outside of that time frame. Although not at all rare in research on communication, Damico et al. noted the relative rarity with which they are applied by speechlanguage clinicians in clinical practice. McReynolds and Kearns (1983) described five kinds of observational information or codes that are frequently used in applied research settings to obtain online measures: (a) trial scoring, (b) event recording, (c) interval recording, (d) time sampling, and (e) response duration. As each of these is described, the reader will see that these same categories can be used to describe the outcomes of probes. The chief difference between probes and online observations is that the latter involves responses to a more naturalistic communication event, whereas the former involves a greater level of contrivance on the part of the clinician. In trial scoring, responses following a specific stimulus or trial are scored as correct or incorrect. Such responses can occur either naturally or with prompting. Although correct versus incorrect are the most commonly used labels applied to responses in trial scoring, a numerical code (which may in fact represent a type of rating scale) may be used to provide greater detail about the nature of responses. One example of a numerical code is the multidimensional scoring system used in the Porch Index of Communicative Ability in Children (Porch, 1979), which uses a 16point scoring system to reflect 5 dimensions (accuracy, responsiveness, completeness, promptness, and efficiency). Readers should note that only rarely are such combinations of rating scales and trial scoring used in online situations because of the intense demands on the rater, which leaves such measures quite vulnerable to problems with reliability.
Page 275 In event recording, a code is established consisting of behaviors (including verbal, nonverbal, or both) of interest. That code is then used to summarize the targeted child’s behaviors over a given time period (e.g., a 15minute period). One example of a code that might be used in event recording would be the one developed by Dollaghan and Campbell (1992). That code had been developed to describe withinutterance speech disruptions (i.e., pauses, repetitions, revisions, and orphans— linguistic units such as sounds or words that are not reliably related to other such units within an utterance). Whereas Dollaghan and Campbell used that code in an analysis of previously recorded language samples, it could also be used for online observation. Interval recording and time sampling, two sampling methods that are closely related, are also closely related to event recording (McReynolds & Kearns, 1983). In interval recording, a set time period is divided into short, equal intervals (e.g., 10 seconds) and events are noted as having occurred once if they occur at any point during the interval. In time sampling, a set time period is again divided into intervals, but only the presence of the behavior at the very end of the interval is recorded. In addition to the designation of intervals devoted to observation, this approach also includes recording intervals in which no observations are attempted. In time sampling, therefore, a 7.5second observation interval might be followed by a 2.5second recording interval. Time sampling has been thought to be associated with fewer problems affecting accuracy than interval recording. However, both methods require that care be taken in the selection of interval sizes (McReynolds & Kearns, 1983). Intervals that are too short are likely to increase recording errors; those that are too long are likely to lose information due to wanting observer attention. The last of the observational codes described by McReynolds and Kearns (1983) is the recording of response duration, in which the duration of a specific event of interest (e.g., pause duration) is recorded using a stopwatch or other timing device. Although response duration may not be applicable to many language phenomena, it can nonetheless prove quite useful from time to time for children with language disorders. For example, a functional measure for a child with SLI who demonstrates pragmatic difficulties might consist of time spent engaged in conversation with one or more peers during recess. Alternatively, time spent in perseverative or noncommunicative speech (e.g., repeated recitation of a television commercial) during a group activity might be used as a functional measure for a child with autism. Damico (1992) provided an example of an online observational system, called Systematic Observation of Communicative Interaction (SOCI), which makes use of event recording and time sampling. In SOCI, problematic verbal and nonverbal behaviors are recorded along with information about several dimensions (such as illocutionary purpose) each time it occurs within a fixed time period (a 10second period that consists of a 7second observation and 3second recording interval). Recorded behaviors include failure to provide significant information, nonspecific vocabulary, message inaccuracy, poor topic maintenance, inappropriate response, linguistic nonfluency, and inappropriate intonation contour. Four to seven recording periods of approximately 12 minutes each are recommended. Although some data regarding reliability of this procedure are mentioned in Damico (1992), clearly this type of procedure warrants additional evidence to provide better guidance regarding its interpretation and validity.
Page 276 7. Dynamic Assessment
Dynamic assessment procedures represent a large number of procedures that are designed to examine a child’s changing response to levels of support provided by the clinician. Proponents of dynamic assessment might balk at its inclusion in the list of measures reviewed in this chapter, maintaining that it represents an approach to assessment that is entirely different from the rest. In fact, for proponents of dynamic assessment, most other forms of descriptive assessment can be lumped into the single, usually less desirable category “static.” Within this conceptualization, static assessments assume a constant set of stimuli and interactions between the child and tester, whereas dynamic assessments assume a changing set of stimuli and interactions that are manipulated to provide a richer description of how the child’s performance can be modified. Referred to as dynamic assessment here, a wide variety of related assessment strategies fall within this category. To those unfamiliar with the term dynamic assessment, Olswang and Bain (1991), two of its foremost advocates in language assessment, helpfully noted its strong resemblance to a more familiar and venerable concept. Specifically, they compare it with stimulability, in which unaided productions (usually in articulation testing) are followed by efforts to obtain the child’s “best” productions when aided by the clinician’s visual, auditory, and attentional prompts. In both stimulability and in dynamic assessment procedures, facilitating actions on the part of the clinician are designed to help determine the upper limits of a child’s performance. As a result, the boundaries of assessment and treatment are blurred. This blurring has led to the use of the term mediated learning experience (Feuerstein, Rand, & Hoffman, 1979; Lidz & Peña, 1996) to refer to one model of dynamic assessment. It also foreshadows the integration of such assessment techniques into treatment (e.g., Norris & Hoffman, 1993). Initially applied in cognitive and educational psychology by Feuerstein and others (e.g., Feuerstein, Rand, & Hoffman, 1979; Feuerstein, Miller, Rand, & Jensen, 1981; Lidz, 1987), dynamic assessment models are typically based on the work of Vygotsky (1978), who proposed the zone of proximal development (ZPD) as a conceptualization of the moving boundary of a child’s learning. The zone of proximal development is defined as “the distance between the actual developmental level as determined by independent problem solving and the level of potential development as determined through problem solving under adult guidance or in collaboration with more capable peers” (Vygotsky, 1978, p. 86). Problem solving or behaviors lying within this zone are thought to represent those areas where maturation is occurring and to characterize development “prospectively” rather than ‘‘retrospectively” as is done with typical, static assessment (Vygotsky, 1978). The ZPD has been interpreted as being indicative of learning readiness. Therefore, its description through dynamic assessment has been considered especially useful for identifying treatment goals (Bain & Olswang, 1995; Olswang & Bain, 1991; 1996). Specifically, Olswang and Bain (1991, 1996) suggested that tasks that children perform with little assistance do not warrant treatment, and those that children fail to perform, even when provided with maximal assistance, are not yet appropriate targets. Instead, the most appropriate targets are likely to be those that children perform only
Page 277 when given considerable assistance. Modifiability of performance in response to adult facilitation has also been shown to predict generalization of performance to new situations, such that children who demonstrate less modifiability show less transfer (Campione & Brown, 1987; Olswang, Bain, & Johnson, 1992). Another benefit of dynamic assessment observed by Olswang and Bain (1991) is that dynamic assessment strategies allow the clinician to determine not only what the child is learning, but also how that learning can be supported through the manipulation of antecedent and consequent events. They note that whereas consequent events such as the nature of reinforcement (e.g., tangible vs. social) and schedule of reinforcement (e.g., continuous vs. variable) have received attention for many years in speechlanguage pathology, antecedent events receive greater attention in dynamic assessment. Among the antecedent events highlighted in dynamic assessment are the use of models or prompts, the selection of the modalities of stimuli or cues that are used, and the number of stimulus presentations that are provided. Table 10.7 provides a hierarchy of verbal cues used to provide differing levels of support for children with specific expressive language impairment learning twoword utterances (Bain & Olswang, 1995). In the study, which was designed to validate the Table 10.7 A Sample Hierarchy of Verbal Cues Condition
Cue
General statement
Opportunity to child’s attention “Oh look at this. ”
Elicitation question
Opportunity plus an elicitation cue “What’s happening?” “What’s he doing? ’’
Close or sentence completion
More salient opportunity—contrasting particular feature of what is to be coded “Look the dog is sitting and__.” (manipulating dog so it is walking)
Indirect model
Repetition of opportunity + embedded or delayed model and elicitation cue “See, the dog is walking; what is he doing?”
Direct model evoking spontaneous imitation
Opportunity plus a direct model of desired utterance without elicitation cue—participant spontaneously imitates the utterance “Dog walk.”
Direct model plus an elicitation statement
Opportunity plus a direct model of desired utterance with an elicitation statement “Tell me, dog walk.”
Note. This table represents a sample hierarchy of verbal cues arranged from those providing least to most support for the production of twoword utterances in children with specific expressive language impairment who are producing few or no utterances of this type. This example uses cues designed to elicit Agent + Action (“dog walk”) as relevant objects are manipulated. From “Examining Readiness for Learning TwoWord Utterances by Children With Specific Expressive Language Impairment: Dynamic Assessment Validation,” by B. A. Bain and L. B. Olswang, 1995, American Journal of SpeechLanguage Pathology, 4, p. 84. Copyright 1995 by American SpeechLanguage Hearing Association. Reprinted with permission.
Page 278 use of dynamic assessment, 15 children who were producing few or no twoword utterances were assessed using standardized measures, language samples, and dynamic assessment, then treated for 3 weeks. Construct validity was supported through the demonstration that more supportive cues (i.e., those providing more information) resulted in more correctly produced twoword utterances than less supportive cues. In addition, predictive validity was supported through the demonstration that children who showed the greatest responsiveness to the hierarchy (responded to the less supportive cues) showed the greatest language change over the study period. One unexpected finding was that language sampling was associated with a greater variety of word combinations and twoword utterance types than was dynamic assessment. This finding was inconsistent with the outcome needed to support concurrent validity, thus suggesting the need for further study. The collaborative nature of the interaction promoted in dynamic assessment is thought to have immediate benefits to the child’s motivation. Lidz (1996) described this interaction as promoting “rapportbuilding and motivational variables, including reduced anxiety, [such that] assessment becomes more of an instructional conversation than a test” (p. 11). At the same time, a number of authors (GutierrezClellen, Brown, Conboy, & RobinsonZañartu, 1998; Lidz, 1996) noted that the use of dynamic assessment allows the clinician to determine how assessment conditions facilitate or obstruct the child’s attention or arousal, perception, memory, conceptual processing, and metacognitive processing. Thus, dynamic assessment may provide information not only about the child’s current and potential level of functioning on a given task, but also about the child’s learning needs and style that extend beyond the task at hand. Because of its complexity, dynamic assessment is recommended for some, but not all children whose language requires description. Much of the early work on dynamic assessment was directed at its use for children with mental retardation (Feuerstein et al., 1979). More recently it has received considerable attention as a nonbiased approach for use with children who come from linguistically or culturally diverse communities (GutierrezClellen et al., 1998; Lidz, 1996; Lidz & Peña, 1996; Peña, Quinn, & Iglesias, 1992). Reduced bias is expected for at least three reasons. First, dynamic assessment techniques can either circumvent or alter as needed the unfamiliar language and interaction routines that may penalize children from nondominant cultural backgrounds. Second, the collaborative nature of the interaction of child and clinician can facilitate more relaxed, confident, and, consequently, valid efforts from the child. Third, the embedding of instruction in dynamic assessment can reduce the effects of previous experience, a major source of bias for children who lack the experiences of the mainstream culture (Lidz, 1996). Bain and Olswang (1995) summarized the promise of dynamic assessment techniques as follows: Dynamic assessment offers clinicians the opportunity to obtain information as to who to treat, when to treat, what to treat, how to treat, and to determine prognosis. Such information will enable clinicians to make informed decisions as they provide services to children with language impairment. (p. 90)
Page 279 A growing body of data bolsters portions of these claims (e.g., see Long & Olswang, 1996; Olswang & Bain, 1996). However, the complexity and variety of procedures fitting within the umbrella of dynamic assessment mean that much work remains to be done to optimize the validity of these procedures for individual children and assessment purposes—or even to understand the extent to which traditional psychometric concepts can be applied to their evaluation (Embretson, 1987). 8. Qualitative Measures
Speechlanguage pathologists have always paid attention to a very wide range of information sources beyond those described thus far in the chapter, including teacher and parent comments, client observations, interviews, and official documents. More recently, sources such as student journals, portfolios, clinician journals, and critical incident reports, or “stand outs,” have been added (Schwartz & Olswang, 1996; Silliman & Wilkinson, 1991). Olswang and Bain (1994) used the terms descriptive and qualitative to refer to these sources of information and described them as subjective, in contrast to the more typical, operationally defined quantitative data. Olswang and Bain based their discussion of such measures on the work of authors (e.g., Bogdan & Biklen, 1992; Glesne & Peshkin, 1992) describing qualitative research, an umbrella term used to describe several research strategies in which subjective, inductive, and richly descriptive measures are systematically used to examine participants’ perspectives on phenomena of interest. Because of this close connection to a type of research that may be unfamiliar to many readers, a brief discussion of qualitative research is offered as background. Historically, qualitative research methods have been developed somewhat independently in anthropology, nursing, education, sociology and social work, among other disciplines (Bogdan & Biklen, 1998; Lancy, 1993), but have shown increasing crossfertilization. Recently, these methods., especially those described as “ethnographic,” have begun to be adopted in research and, to a lesser extent, in clinical practice in speechlanguage pathology (Kovarsky, 1994; Kovarsky et al., 1999; Silliman & Wilkinson, 1991; Westby, 1990). A thorough description of qualitative research is beyond the scope of this text, having, in fact, served as the focus for a dazzling array of texts in just the past decade (e.g., Berg, 1998; Bogdan & Biklen, 1998; Creswell, 1998; Denzin & Lincoln, 2000; Kelley, 1999; Lancy, 1993; Taylor & Bogdan, 1998). Nonetheless, a brief overview of some of the theoretical threads uniting different approaches within qualitative research can help guide our thinking about how qualitative data may be used in the assessment of children’s language disorders. Qualitative research strategies have been described as demonstrating, to greater or lesser degrees, the following 5 features, many of which clearly contrast with quantitative strategies (Bogdan & Biklen, 1998). First, the focus of qualitative research is a natural context in which the researcher serves as the primary ‘‘instrument.” Second, data are descriptive, rather than quantitative, in nature. Third, interactive social processes, rather than products, are of interest. Fourth, methods are inductive; thus, abstractions are made from the data that are present, rather than tested from data that
Page 280 are sought out. Fifth, meaning as experienced by individuals from their personal perspectives is of paramount interest. From a clinical vantage point, one of the chief attractions of qualitative methods is their potential to guide clinicians in the use of data that may have previously been seen as illicit. One of the major sources of evidence for the validity of qualitative data lies in the process of triangulation, which can be defined as the believability provided by repeated examples of a given behavior obtained in a variety of settings or using a variety of methods (Schwartz & Olswang, 1996). Janesick (1994) describes five kinds of triangulation: triangulation across (a) data sources, (b) researchers or evaluators, (c) multiple perspectives, (d) multiple methods, and (e) disciplines. It is the preponderance of evidence gained under these conditions that validates findings (Berg, 1998). Some authors (e.g., Bogdan & Biklen, 1998) object that the term triangulation is used differently by different authors and thus argue that the exact methods used to provide rich support for validity need to be specified. However, it is a useful term for capturing the way in which validity, or believability, is alternatively characterized within this research paradigm. Also, interestingly, it is related to the concept pervading mainstream psychometric discussions that theoretical constructs need to be studied using several indicators (Pedhazur & Schmelkin, 1991; Primavera, Allison & Alfonso, 1996). When thinking about how qualitative data may complement quantitative descriptions of the language and linguistic context for children with language impairment, the concept points toward a need for multiple sources and settings. Although additional research may help us understand how such data can best be used in combination with more established, quantitative measures, existing work on qualitative research can point to the types of questions for which qualitative data may be best suited (Schwartz & Olswang, 1996). Specifically, questions that relate to the diverse ways in which a child is viewed in his or her linguistic community or to the special expectations falling on a specific child in a specific community may best be addressed using qualitative data. Thus, questions that address concerns about handicap and disability, which relate to functional and participative effects of impairments (WHO, 1980), may be very effectively answered using qualitative methods. Practical Considerations Numerous practical considerations affect the way in which speechlanguage pathologists currently describe language disorders in children. Influences related to the consideration of the larger contexts in which language impairments occur include movements toward increasing use of assessments in which several professionals contribute their insights into the functioning of a child (coordinated assessment strategies). In addition, there has been a continuing movement in the past few decades toward assessments for schoolage children in which the functional demands of academic settings are recognized as the chief challenges facing them (curriculumbased assessment).These coordinated approaches toward assessment could have been profitably discussed in chapter 9, which dealt with identification. Nonetheless, they are discussed here because of the closer connection of description than identification to
Page 281 treatment planning—the area of clinical decision making thought to benefit most from coordination. Beyond these assessment strategies, perhaps the most important practical consideration affecting the descriptive measures chosen by clinicians is that of time and other practical resources. In this section, the role of coordinated assessment strategies and other practical factors is discussed briefly to illustrate some of the forces shaping descriptive practices in work with children with language impairments. Coordinated Assessment Strategies
Children with language impairment experience a range of needs that require the attention and care of individuals from a variety of disciplines, for example, speech language pathology and audiology, psychology, social work, occupational therapy, physical therapy, and a variety of other health professionals. As a child’s physical problems and other problems increase in number, coordination of assessments and interventions conducted by these professionals become crucial (Linder, 1993; Rosetti, 1986). Without coordination, professionals may work at cross purposes with families, overwhelm them with excessive or contradictory recommendations and, as a result, facilitate small gains in individual domains while undermining the overall quality of the child’s life (Calhoon, 1997; Raver, 1991). Rosetti described the difficulty facing a professional working alone with a child with many problems as suffering from tunnelvision, in which the child may be viewed from the exceedingly narrow perspective of that single individual’s academic and professional background. Particularly for very young children with multiple needs, the need for coordination has been recognized in legislation and in the development of sophisticated strategies of coordination. Three general strategies that attempt to meet the needs of children and families are multidisciplinary, interdisciplinary, and transdisciplinary approaches (Calhoon, 1997; Raver, 1991). Multidisciplinary assessment involves parallel planning, administration, and interpretation processes in which parents interact independently with individual disciplines. Interdisciplinary assessment involves coordination through team planning of assessment; consultation with team members as assessments occur within individual disciplines and parent involvement is encouraged. Transdisciplinary assessment involves shared assessments conducted by the entire team. This approach involves participation of all team members as well as the child’s parents throughout the planning, administration, and interpretation process. Two examples of specific transdisciplinary approaches include play and arena assessment, in which a criterionreferenced measurement strategy is implemented within a naturalistic context. Whereas multidisciplinary and interdisciplinary approaches tend to predominate in systems designed for older children, transdisciplinary approaches have become particularly popular for the assessment of infants and toddlers (Calhoon, 1997). Attempts at the coordination of disciplines, particularly those that increase the involvement of parents, are presumed to increase the validity of measurement and the effectiveness with which clinical decisions can be implemented (Crais, 1993). Further, greater coordination, particularly with parents, is required through IDEA (1990).
Page 282 As a consequence, it is likely that increasing attention will be paid to the validation of coordinated approaches to assessment and to the development of methods to increase their efficiency. CurriculumBased Assessment
For schoolage children with language impairments, coordination of disciplines is often more limited than for younger children, although it is at least as vital. For school age children, coordination will entail collaboration between classroom teachers, special educators, and speechlanguage pathologists. For this age group, collaborative assessment approach is the term most frequently used to refer to the way in which professionals (speechlanguage pathologists in this case) attempt to coordinate their activities with those of the other professionals serving the child in a school setting. Curriculumbased assessment is one particularly widespread component of collaborative assessment approaches (Prelock, 1997). Collaboration enables the speechlanguage pathologist and other members of the educational team to understand the specific language and communication demands facing the child with a given teacher, classroom, and curriculum. The purposes of this collaboration are to determine what demands present particular challenges to the child and to identify team resources for addressing them (Creaghead, 1992; Prelock, 1997; Silliman & Wilkinson, 1991). Curriculumbased assessment has been defined broadly as “evaluation of a student’s ability to meet curriculum objectives so that school success can be achieved” (Prelock, 1997, p. 35). Adding more detail to this concept, Nelson (1989, 1994) called attention to the presence of numerous kinds of curricula. Thus, for example, in addition to the official curriculum of the school district, there are the cultural curriculum consisting of unspoken expectations based on the mainstream culture and the underground curriculum consisting of the rules affecting peer social interactions. In order to understand and respond to school curricula in both the broad and more detailed senses, the speechlanguage pathologist will almost always need to use criterionreferenced measures. Such measures are sometimes aimed at characterizing the educational setting and its demands and sometimes aimed at determining whether the child is or is not able to meet those demands. Identifying the taxing aspects of language and communication within the classroom will benefit not just the child with language impairments but all students within that classroom (Prelock, 1997). Obviously, the benefit of collaborative curriculumbased assessments to children with language impairments is the possibility of describing and then responding to their difficulties. These responses by the speechlanguage pathologist and other team members can result in accommodations or other active steps to foster greater success in the regular classroom. In essence, curriculumbased assessments can help prevent impairment from necessarily being realized as a disability or handicap, in the terminology of the ICIDH (WHO, 1980). Alternatively, it can also be seen as preventing impairment from being realized as a limitation in activities or participation opportunities, in the terminology of ICIDH2 (WHO, 1998).
Page 283 Other Practical Factors
Practical factors beyond those discussed in this chapter, such as time and money, appear to affect the ways in which clinicians conduct language assessments (Beck, 1996; Wilson, Blackmon, Hall, & Elcholtz, 1991), including assessments designed to plan for treatment (Beck, 1996). Time demands seem to stem from the pressures of large caseloads. In particular, whereas ASHA (1993) recommended caseload sizes of 40, Shewan and Slater (1993) found that school clinicians have average caseloads of 52! Beck’s survey found that clinicians frequently reported that they did not have sufficient time to conduct complete assessments. In addition, clinicians also reported insufficient funds to buy “adequate materials for assessment.” Other data from the same source led Beck to ponder whether frequency of use might result from properties of a test as simple as its being appropriate for a wide age range and its addressing both receptive and expressive concerns. This possibility led her to comment “these are certainly not the ideal criteria on which to base selection of assessment methods” (Beck, 1996, p. 58). Further, Beck (1996) and Wilson et al. (1991) did not obtain detailed information about the entire range of descriptive measures used by clinicians. However, they did find that language sampling is very widely used. Given the expressed concerns about time and money, however, it seems likely that the timeconsuming descriptive measures and many of the exciting but emerging descriptive measures described in this chapter may not make it into the repertoire of techniques used by clinicians. At least this conclusion seems reasonable in the absence of considerable effort on the part of individual clinicians and the profession. These efforts may take the form of working to reduce caseload sizes and increase budgets. Alternatively, they may take the form of research studies aimed at increasing the efficiency and variety of descriptive measures. Fortunately, there is widespread realization that descriptive measures are the most appropriate tools to use in addressing many critical clinical questions—the first step needed to engage the attention of individual clinicians and of the profession as a whole. Summary 1. Descriptive measurement of language presents both greater challenges and greater rewards to the practicing clinician than does assessment aimed at screening or identification because of its steadfast tie to the heart of clinical practice: interventions designed to improve the social, communicative lives of children. 2. Even more than measures used in identification, descriptive measures of language require scrupulous attention by the clinician to achieve a match between the specific clinical question being posed and method used to achieve it. This is true largely because the specificity of the question being asked necessitates the use of informal measures that can only be “validated’’ through the actions of the individual clinician. 3. Damico et al. (1992) described authenticity, functionality, and richness of description as critical characteristics for descriptive measures. 4.
Page 284 A wealth of strategies have been proposed for use in description, including standardized normreferenced measures, standardized criterionreferenced measures, probes, rating scales, language analysis, online observations, dynamic assessment, and qualitative measures. 5. Because children with special language needs often require the attention of other professionals as well, assessment frameworks have arisen that reflect differing degrees of coordination across disciplines, as well as differing degrees of parent involvement. These range from multidisciplinary to interdisciplinary to transdisciplinary assessments. 6. The nature of coordinated assessment efforts changes according to the age of the child, with younger children more frequently served using methods that involve a greater degree of integration across professions and older children served using methods that acknowledge the primacy of the school environment for the schoolage child. Terms associated with coordinated assessments include arena and playbased assessment methods for younger children as well as curriculumbased assessment for older children. 7. Recent innovations, such as dynamic assessment and the thoughtful use of qualitative measures, challenge researchers and clinicians with opportunities for a richer description of the effects of language disorders on children, including those from nonmainstream cultures. 8. Future developments with regard to descriptive measures are likely to include the development and validation of new methods as well as the development of better practices leading to more efficient and effective application of existing approaches. Key Concepts and Terms authentic assessment: assessment occurring when skills to be assessed are selected to represent realistic learning demands conducted in reallife settings, such as classrooms, in which artificial and standard conditions are avoided (Schraeder et al., 1999). authenticity: the most complex of three primary characteristics described by Damico et al. (1992) as necessary for descriptive measures; it includes respect for and preservation of the intricate and meaningdirected nature of communication as well as traditional concepts of reliability and validity. collaborative assessment approach: any of several approaches in which professionals from different disciplines (e.g., speechlanguage pathologists, audiologists, special educators) work together to provide information leading to effective and efficient intervention for a given child. curriculumbased assessment: assessment aimed at examining a child’s skills and challenges in relation to curricular demands for purposes of planning interventions that may occur within and outside of the classroom. direct magnitude estimation: a type of rating method in which stimuli to be rated are compared with one another or against a standard stimulus.
Page 285 dynamic assessment: a variety of approaches to description in which stimuli and procedures are modified to identify the child’s potential performance with adult collaboration to help determine treatment goals and facilitative methods; considered especially useful as a means of nonbiased assessment for children who are bilingual or from nondominant cultural backgrounds. event recording: an observational method in which the frequency of specific behaviors (events) is recorded across the entire observational time period. functionality: one of three primary characteristics described by Damico et al. (1992) as necessary for descriptive measures, consisting of their ability to capture a child’s skill in transmitting meaning effectively, fluently, and appropriately. interval recording: A method of obtaining online observational data in which the observer notes the presence of a behavior or targeted characteristic within a relatively short time frame (e.g., 10 seconds). interval scaling: A rating technique in which raters are asked to assign a number or verbal label to a set of related stimuli. metathetic continuum: the type of rating shown when raters’ responses to differences between rated entities seem to reflect qualitative distinctions. Auditory stimuli differing in pitch appear to be treated in this fashion by raters. multidisciplinary assessment: assessment in which professionals involved with a child work in parallel to plan, conduct, and interpret their individual assessments with interactions between professionals occurring in a less structured fashion than interdisciplinary or transdisciplinary assessments. probe: an informal measure in which the clinician attempts to devise conditions that will elicit a response demonstrating a child’s knowledge of a particular area of form, content, or use. prothetic continuum: the type of rating shown when raters’ responses to differences between rated entities appear to reflect quantitative distinctions. Auditory stimuli differing in loudness appear to be judged in this fashion by raters. qualitative research: a range of research strategies designed to be naturalistic, descriptive, inductive in nature, and concerned with process and meaning (Bogdan & Biklen, 1998). richness of description: one of three primary characteristics described by Damico et al. (1992) as necessary for descriptive measures; it entails the use of sufficient detail to lead to an understanding of causality that may be used in planning treatment. time sampling: a method of observation in which the observation time period is divided into intervals and the presence of a targeted behavior is recorded at the end of each interval. transdisciplinary assessment: assessments in which team members from different disciplines share maximally in the assessment process; specific examples of this type of assessment include arena and playbased assessments, which are used most frequently with infants and toddlers.
Page 286 trial scoring: the recording of a response as correct or incorrect following a specific stimulus or trial (McReynolds and Kearns, 1983). triangulation: an approach to validation in which convergent findings are sought across varying methods, data courses, and data sources; recently emphasized in relation to qualitative research methods. zone of proximal development (ZPD): the range of behaviors lying between independent functioning and functioning that must be facilitated by a more expert interaction partner; thought to illustrate a child’s emerging mastery or learning readiness. Study Questions and Questions to Expand Your Thinking 1. On the basis of your reading of this chapter, formulate three ideas for research projects aimed at clarifying some psychometric characteristic (e.g., validity for a purpose, reliability) of a specific descriptive measure, thus making it more clinically useful. 2. Look at a recent issue of a journal containing articles on children with language impairments. See if you can find examples of probes that could be added to Table 10.3. 3. Engage in a conversation with two different people for a period of 10 minutes each, ideally taperecording it with their knowledge so that you can go back over the conversations. Then create a list of the factors affecting your word choice, the length of your sentences, the structure of your sentences, the nature of your turntaking, and so forth. Can you group the items on your list into related factors? Once you have done this, consider the extent to which children’s communications are likely to be similarly affected in the course of collecting a language sample. 4. Consider ways to triangulate information about a child’s lack of success in a reading class in a regular first grade class. Develop a small set of related questions about the child and the context and then consider what kinds of measures might provide you with a rich understanding of the child’s difficulties. 5. Find out what coordinated assessment methods exist in any clinical settings that serve children to which you have access. Consider what benefits might be gained, and at what costs, if greater integration were to occur across professional roles within that setting. Recommended Readings Damico, J. S., Secord, W. A., & Wiig, E. H. (1992). Descriptive language assessment at school: Characteristics and design. In W. Secord (Ed.), Best practices in school speechlanguage pathology: Descriptive/nonstandardized language assessment (pp. 1–8). San Antonio, TX: Psychological Corporation. Kovarsky, D. (1994). Distinguishing quantitative and qualitative research methods in communication sciences and disorders. National Student Speech Language Hearing Association Journal, 21, 59–64. Olswang, L. B., & Bain, B. A. (1991). When to recommend intervention. Language, Speech, and Hearing Services in Schools, 22, 255–263.
Page 287 References Agerton, E. P., & Moran, M. J. (1995). Effects of race and dialect of examiner on language samples elicited from Southern African American Preschoolers. Journal of Childhood Communication Disorders, 16, 25–30. American Psychological Association, American Educational Research Association, National Council on Measurement in Education. (1985). Standards for educational and psychological testing. Washington, DC: APA. American SpeechLanguageHearing Association. (1993). Guidelines for caseload size and speechlanguage delivery in the schools. Asha, 35(Suppl. 10), 33–39. Aram, D. M., Morris, R., & Hall, N. E. (1993). Clinical and research congruence in identifying children with specific language impairment. Journal of Speech and Hearing Research, 36, 580–591. Bain, B. A., & Dollaghan, C. (1991). The notion of clinically significant change. Language, Speech, and Hearing Services in Schools, 22, 264–270. Bain, B. A., & Olswang, L. B. (1995). Examining readiness for learning twoword utterances by children with specific expressive language impairment: Dynamic assessment validation. American Journal of SpeechLanguage Pathology, 4, 81–91. Bakervan den Goorbergh, L. (1990). CLEAR: Computerized languageerror analysis report. Clinical Linguistic and Phonetics, 4, 285–293. Barrow, J. D. (1992). Pi in the sky: Counting, thinking, and being. New York: Oxford University Bates, E., & MacWhinney, B. (1989). Functionalism and the competition model. In B. MacWhinney & E. Bates (Eds.), The crosslinguistic study of sentence processing (pp. 3–73). New York: Cambridge University Press. Beck, A. R. (1996). Language assessment methods for three age groups of children. Journal of Children’s Communication Development, 17, 51–66. Berg, B. L. (1998). Qualitative research methods for the social sciences. (3rd ed.). Boston: Allyn & Bacon. Berk, R. A. (1984). Screening and identification of learning disabilities. Springfield, IL: Thomas. Bishop, D. V. M. (1985). Automated LARSP (Language Assessment, Remediation, and Screening) [Computer program]. Manchester, England: University of Manchester. Blake, J., Quartaro, G., & Onorati, S. (1993). Evaluating quantitative measures of grammatical complexity in spontaneous speech samples. Journal of Child Language, 20, 139–152. Bogdan, R. C., & Biklen, S. K. (1992). Qualitative research for education: An introduction to theory and methods (2nd ed.). Boston: Allyn & Bacon. Bogdan, R. C., & Biklen, S. K. (1998). Qualitative research for education: An introduction to theory and methods (3rd ed.). Boston: Allyn & Bacon. Brinton, B., & Fujiki, M. (1992). Setting the context for conversational language sampling. In W. Secord (Ed.), Best practices in school speechlanguage pathology: Descriptive/nonstandardized language assessment (pp. 9–19). San Antonio, TX: Psychological Corporation. Brown, R. (1973). A first language: The early stages. Cambridge, MA: Harvard University. Burroughs, E. I., & Tomblin, J. B. (1990). Speech and language correlates of adults’ judgments of children. Journal of Speech and Hearing Disorders, 55, 485– 494. Butler, K. G. (1997). Dynamic assessment at the millennium: A transient tutorial for today! Journal of Children’s Communication Development, 19, 43–54. Calhoon, J. M. (1997). Comparison of assessment results between a formal standardized measure and a playbased format. Infant–Toddler Intervention, 7, 201– 214. Campbell, T., & Dollaghan, C. (1992). A method for obtaining listener judgments of spontaneously produced language: Social validation through direct magnitude estimation. Topics in Language Disorders, 12, 42–55. Campione, J., & Brown, A. L. (1987). Linking dynamic assessment with school achievement. In C. S. Lidz (Ed.), Dynamic assessment: An interactional approach to evaluating learning potential (pp. 82–116). New York: Guildford. Chapman, R. (1981). Computing mean length of utterance in morphemes. In J. F Miller (Ed.), Assessing language production in children (pp. 22–25). Baltimore: University Park Press.
Page 288 Cirrin, F. M., & Penner, S. G. (1992). Implementing change to descriptive language assessment approaches in the schools. In W. Secord (Ed.), Best practices in school speechlanguage pathology: Descriptive/nonstandardized language assessment (pp. 23–131). San Antonio, TX: Psychological Corporation. Cochran, P. S., & Masterson, J. J. (1995). NOT using a computer in language assessment/intervention: In defense of the reluctant clinician. Language, Speech, Hearing, Services in Schools, 26, 213–222. Conant, S. (1987). The relationship between age and MLU in young children: A second look at Klee and Fitzgerald’s data. Journal of Child Language, 14, 169– 173. Crais, E. R. (1993). Families and professionals as collaborators in assessment. Topics in Language Disorders, 14(1), 29–40. Creaghead, N. A. (1992). Classroom interactional analysis/script analysis. In W. Secord (Ed.), Best practices in speechlanguage pathology: Descriptive/nonstandardized language assessment (pp. 65–72). San Antonio, TX: Psychological Corporation. Creswell, J. W. (1998). Qualitative inquiry and research design: Choosing among five traditions. Thousand Oaks, CA: Sage Publications. Crystal, D. (1982). Profiling linguistic disability. London: Edward Arnold. Crystal, D. (1987). Toward a “bucket” theory of language disability: Taking account of interaction between linguistic levels. Clinical Linguistics and Phonetics, 1, 7– 22. Crystal, D., Fletcher, P., & Garman, M. L. (1989). Grammatical analysis of language disability (2nd ed.). London: Whurr. Damico, J. S. (1992). Systematic observation of communicative interaction: A valid and practical descriptive assessment technique. In W. Secord (Ed.), Best practices in school speechlanguage pathology: Descriptive/nonstandardized language assessment (pp. 133–143). San Antonio, TX: Psychological Corporation. Damico, J. S., Secord, W. A., & Wiig, E. H. (1992). Descriptive language assessment at school: Characteristics and design. In W. Secord (Ed.), Best practices in school speechlanguage pathology: Descriptive /nonstandardized language assessment (pp. 1–8). San Antonio, TX: Psychological Corporation. Denzin, N. K., & Lincoln, Y. S. (2000). The handbook of qualitative research (2nd ed.). Thousand Oaks, CA: Sage. Diedrich, W. M., & Bangert, J. (1980). Articulation learning. Houston, TX: CollegeHill Press. Dollaghan, C., & Campbell, T. (1992). A procedure for classifying disruptions in spontaneous language samples. Topics in Language Disorders, 12, 56–68. Dollaghan, C., Campbell, T., & Tomlin, R. (1990). Video narration as a language sampling context. Journal of Speech and Hearing Disorders, 55, 582–590. Elbert, M., Shelton, R. L., & Arndt, W. B. (1967). A task for evaluation of articulation change: I. Development of methodology. Journal of Speech and Hearing Research, 10, 281–289. Embretson, S. E. (1987). Toward development of a pyschometric approach. In C. S. Lidz (Ed.), Dynamic assessment: An interactional approach to evaluating learning potential (pp. 141–170). New York: Guilford. Evans, J. L. (1996a). Plotting the complexities of language sample analysis. In K. N. Cole, P. S. Dale, & D. J. Thal (Eds.), Assessment of communication and language (pp. 207–256). Baltimore: Brookes Publishing. Evans, J. L. (1996b). SLI subgroups: Interaction between discourse constraints and morphological deficits. Journal of Speech and Hearing Research, 39, 655–660. Evans, J. L., & Miller, J. F. (1999). Language sample analysis in the 21st century. Seminars in Speech and Language, 20, 101–115. Feuerstein, R., Miller, R., Rand, Y., & Jensen, M. (1981). Can evolving techniques better measure cognitive change? Journal of Special Education, 15, 201–219. Feuerstein, R., Rand, Y., & Hoffman, M. (1979). The dynamic assessment of retarded performers. Baltimore: University Park Press. Finnerty, J. (1991). Communication analyzer [Computer software]. Lexington, MA: Educational Software Research. Frattali, C. (Ed.). (1998). Measuring outcomes in speechlanguage pathology. New York: Thieme. Gavin, W. J., & Giles, L. (1996). Sample size effects on temporal reliability of language sample measures of preschool children. Journal of Speech, Language, and Hearing Research, 39, 1258–1262.
Page 289 Gavin, W. J., Klee, T., & Membrino, I. (1993). Differentiating specific language impairment from normal language development using grammatical analysis. Clinical Linguistics and Phonetics, 7, 191–206. Gerken, L., & Shady, M. (1996). The picture selection task. In D. McDaniel, C. McKee, & H. S. Cairns (Eds.), Methods for assessing children’s syntax (pp. 287– 302). Cambridge, MA: MIT Press. Goldstein, H., & Geirut, J. (1998). Outcomes measurement in child language and phonological disorders. In C. Frattali (Ed.), Measuring outcomes in speech language pathology (pp. 406–437). New York: Thieme. Goodluck, H. (1996). The actingout task. In D. McDaniel, C. McKee, & H. S. Cairns (Eds.), Methods for assessing children’s syntax (pp. 147–162). Cambridge, MA: MIT Press. GutierrezClellen, V. F., Brown, S., Conboy, B., & RobinsonZañartu, C. (1998). Modifiability: A dynamic approach to assessing immediate language change. Journal of Children’s Communication Development, 19, 31–42. HirshPasek, K., & Golinkoff, R. M. (1996). The intermodal preferential looking paradigm: A window onto emerging language comprehension. In D. McDaniel, C. McKee, & H. S. Cairns (Eds.), Methods for assessing children’s syntax (pp. 105–124). Cambridge, MA: MIT Press. Hixson, P. K. (1985). Developmental Sentence Scoring computer program [Computer program]. Omaha, NE: Computerized Language Analysis. Howard, S., Hartley, J., & Muller, D. (1995). The changing face of child language assessment: 1985–1995. Child Language Teaching and Therapy, 11, 7–22. Janesick, V. J. (1994). The dance of qualitative research design. In N. K. Denzin & Y. S. Lincoln (Eds.), Handbook of qualitative research (pp. 209–219). Thousand Oaks, CA: Sage. Kelley, D. L. (1999). Measurement made accessible: A research approach using qualitative, quantitative and TQM methods. Thousand Oaks, CA: Sage. Kemp, K., & Klee, T. (1997). Clinical language sampling practices: Results of a survey of speechlanguage pathologists in the United States. Child Language Teaching and Therapy, 13, 161–176. Kovarsky, D. (1994). Distinguishing quantitative and qualitative research methods in communication sciences and disorders. National Student Speech Language Hearing association Journal, 21, 59–64. Kovarsky, D., Duchan, J., & Maxwell, M. (Eds.). (1999). Constructing (in)competence. Mahwah, NJ: Lawrence Erlbaum Associates. Lahey, M. (1988). Language disorders and language development. New York: Macmillan. Lancy, D. (1993). Qualitative research in education: An introduction to the major traditions. White Plains, NY: Longman. Landa, R. M., & Olswang, L. (1988). Effectiveness of language elicitation tasks with twoyearolds. Child Language Teaching and Therapy, 4, 170–192. Lee, L. (1974). Developmental sentence analysis. Evanston, IL: Northwestern University Press. Leonard, L. (1996). Assessing morphosyntax in clinical settings. In D. McDaniel, C. McKee, & H. S. Cairns (Eds.), Methods for assessing children’s syntax (pp. 287–302). Cambridge, MA: MIT Press. Leonard, L., Prutting, C. A., Perozzi, J. A., & Berkley, R. K. (1978). Nonstandardized approaches to the assessment of language behaviors. Asha, 20, 371–379. Lidz, C. S. (1987). Dynamic assessment: An interactional approach to evaluating learning potential. New York: Guilford. Lidz, C. S. (1996, November). Dynamic assessment: Theory, application and research. Handout for seminar presented at the American SpeechLanguage Hearing Association meeting, Seattle, WA. Lidz, C. S., & Peña, E. D. (1996). Dynamic assessment: The model, its relevance as a nonbiased approach, and its application to Latin American preschool children. Language, Speech, and Hearing Services in Schools, 27, 367–372. Linder, T. W. (1993). Traditional assessment and transdisciplinary playbased assessment. In T. W. Linder (Ed.), Transdisciplinary playbased assessment (pp. 9– 22). Baltimore: Brookes Publishing. Long, S. H. (1991). Integrating microcomputer applications into speech and language assessment. Topics in Language Disorders, 11, 1–17. Long, S. H. (1999). Technology applications in the assessment of children’s language. Seminars in Speech and Language, 20, 117–132. Long, S. H., & Fey, M. E. (1989). Computerized Profiling Version 6.2 (Macintosh and MSDOS series) [Computer program]. Ithaca, NY: Ithaca College.
Page 290 Long, S. H., Fey, M., & Channell, R. W. (1998). Computerized profiling (CP) [computer program]. Cleveland, OH: Department of Communication Sciences, Case Western Reserve University. Long, S. H., & Masterson, J. J. (1993, September). Computer technology: Use in language analysis. Asha, 35, 40–41, 51. Long, S. H., & Olswang, L. B. (1996). Readiness and patterns of growth in children with SELI. American Journal of SpeechLanguage Pathology, 5, 79–85. Lucas, D. R., Weiss, A. L., & Hall, P. K. (1993). Assessing referential communication skills: The use of a nonstandardized assessment procedure. Journal of Childhood Communication Disorders, 15, 25–34. Lund, N. J., & Duchan, J. (1983). Assessing children’s language in naturalistic contexts. Englewood Cliffs, NJ: PrenticeHall. Lund, N. J., & Duchan, J. (1993). Assessing children’s language in naturalistic contexts. (3rd ed.). Englewood Cliffs, NJ: PrenticeHall. Lust, B., Flynn, S., & Foley, C. (1996). Why children know what they say: Elicited imitation as a research method for assessing children’s syntax. In D. McDaniel, C. McKee, & H. S. Cairns (Eds.), Methods for assessing children’s syntax (pp. 55–76). Cambridge, MA: MIT Press. MacWhinney, B. (1991). The CHILDES project: Tools for analyzing talk. Hillsdale, NJ: Lawrence Erlbaum Associates. McCauley, R. J. (1996). Familiar strangers: Criterionreferenced measures in communication disorders. Language, Speech, Hearing Services in Schools, 27, 122– 131. McCauley, R., & Swisher, L. (1984). Use and misuse of normreferenced tests in clinical assessment: A hypothetical case. Journal of Speech and Hearing Disorders, 49, 338–348. McDaniel, D., McKee, C., & Cairns, H. S. (Eds.) (1996). Methods for assessing children’s syntax. Cambridge, MA: MIT Press. McReynolds, L. V., & Kearns, K. (1983). Singlesubject experimental designs in communicative disorders. Austin, TX: ProEd. Messick, S. (1989). Validity. In R. L. Linn (Ed.), Educational measurement (3rd ed., pp. 13–103). New York: American Council on Education and Macmillan. Miller, J. F. (1981). Assessing language production in children: Experimental procedures. Baltimore, MD: University Park Press. Miller, J. F. (1996). Progress in assessing, describing, and defining child language disorder. In K. N. Cole, P. S. Dale, & D. J. Thal (Eds.), Assessment of communication and language (pp. 309–324). Baltimore: Brookes Publishing. Miller, J. F., & Chapman, R. (1982). SALT: Systematic analysis of Language Transcripts [computer software]. Madison, WI: University of WisconsinMadison, Waisman Research Center, Language Analysis Laboratory. Miller, J. F., & Chapman, R. (1998). SALT: Systematic analysis of Language Transcripts [computer software]. Madison, WI: University of WisconsinMadison, Waisman Research Center, Language Analysis Laboratory. Miller, J., Freiberg, C., Rolland, M. B., & Reeves, M. (1992). Implementing computerized language sample analysis in the public schools. Topics in Language Disorders, 12(2), 69–82. Miller, J. F., & Klee, T. (1995). Computational approaches to the analysis of language impairment. In P. Fletcher & B. MacWhinney (Eds.), The handbook of child language (545–572). Oxford: Blackwell. Miller, J. F., & Paul, R. (1995). The clinical assessment of language comprehension. Baltimore: Brookes Publishing. Minifie, F., Darley, F., & Sherman, D. (1963). Temporal reliability of seven language measures. Journal of Speech and Hearing Research, 6, 139–149. MoellmanLanda, R., & Olswang, L. B. (1984). Effects of adult communication behaviors on languageimpaired children’s verbal output. Applied Psycholinguistics, 5, 117–134. Mordecai, D. R., Palin, M. W., & Palmer, C. B. (1985). Lingquest 1 [computer software]. Columbus, OH: Macmillan. Morris, R. (1994). A review of critical concepts and issues in the measurement of learning disabilities. In R. Lyon (Ed.), Frames of reference for the assessment of learning disabilities: New views on measurement issues (pp. 615–626). Baltimore: Brookes Publishing.
Page 291 Muma, J. (1998). Effective speechlanguage pathology: A cognitive socialization approach. Mahwah, NJ: Lawrence Erlbaum Associates. Muma, J., Morales, A., Day, K., Tackett, A., Smith, S., Daniel, B., Logue, B., & Morriss, D. (1998). Language sampling: Grammatical assessment. In J. Muma (Ed.), Effective speechlanguage pathology: A cognitive socialization approach (pp. 310–345). Mahwah, NJ: Lawrence Erlbaum Associates. Nelson, N. W. (1989). Curriculumbased language assessment and intervention. Language, Speech, and Hearing Services in Schools, 20, 170–184. Nelson, N. W. (1994). Curriculumbased language assessment and intervention across the grades. In G. P. Wallach & K. G. Butler (Eds.), Language learning disabilities in schoolage children and adolescents: Some principles and applications (104–131). New York: Merrill. Norris, J., & Hoffman, P. (1993). Whole language intervention for schoolage children. San Diego, CA: Singular Publishing. Olswang, L. B., & Bain, B. A. (1991). When to recommend intervention. Language, Speech, and Hearing Services in Schools, 22, 255–263. Olswang, L. B., & Bain, B. A. (1994). Data collection: Monitoring children’s treatment progress. American Journal of Speechlanguage Pathology, 3, 55–66. Olswang, L. B., & Bain, B. A. (1996). Assessment information for predicting upcoming change in language production. Journal of Speech and Hearing Research 39, 414–423. Olswang, L. B., Bain, B. A., & Johnson, G. A. (1992). Using dynamic assessment with children with language disorders. In S. F. Warren & J. Reichle (Eds.), Causes and effects in communication and language intervention (pp. 187–215). Baltimore: Brookes Publishing. Panagos, J., & Prelock, P. (1982). Phonological constraints on the sentence productions of language disordered children. Journal of Speech and Hearing Research, 25, 171–176. Paul, R., & Shriberg, L. (1982). Associations between phonology and syntax in speech disordered children. Journal of Speech and Hearing Research, 25, 536– 546. Pedhazur, R. J., & Schmelkin, L. P. (1991). Measurement, design, and analysis: An integrated approach. Hillsdale, NJ: Lawrence Erlbaum Associates. Peña, E., Quinn, R., & Iglesias, A. (1992). The application of dynamic assessment methods to language assessment: A nonbiased procedure. Journal of Special Education, 26, 269–280. Prelock, P. A. (1997). Languagebased curriculum analysis: A collaborative assessment and intervention process. Journal of Children’s Communication Development, 19, 35–42. Primavera, L. H., Allison, D. B., & Alfonso, V. C. (1996). Measurement of dependent variables. In R. D. Franklin, D. B. Allison, & B. S. (Gorman (Eds.), Design and analysis of singlecase research (pp. 41–92). Mahwah, NJ: Lawrence Erlbaum Associates. Pye, C. (1987). The Pye analysis of language [computer software]. Lawrence, KS: Author. Raver, S. A. (1991). Transdisciplinary approach to infant and toddler intervention. In S. A. Raver (Ed.), Strategies for teaching atrisk and handicapped infants and toddlers. A transdisciplinary approach (pp. 26–44). New York: Merrill. Roeper, T., de Villiers, J., & de Villiers, P. (1999, November). What every 5 year old should know: Syntax, semantics and pragmatics. Presentation to the American SpeechLanguageHearing Association convention, San Francisco. Rondal, J. A., Ghiotto, M., Bredart, S., & Bachelet, J. F. (1988). Agerelation, reliability, and grammatical validity of measures of utterance length. Journal of Child Language, 14, 433–446. Rosetti, L. (1986). Highrisk infants: Identification, assessment, and intervention. Boston: CollegeHill. Roth, F., & Spekman, N. (1984). Assessing the pragmatic abilities of children: Part I. Organizational framework and assessment parameters. Journal of Speech and Hearing Disorders, 49, 2–11. Salvia, J., & Ysseldyke, J. E. (1981). Assessment in remedial and special education (2nd ed.). Boston: HoughtonMifflin. Scarborough, H. S. (1990). Index of productive syntax. Applied Psycholinguistics, 11, 1–22. Scarborough, H. S., Wyckoff, J., & Davidson, R. (1986). A reconsideration of the relation between age and mean utterance length. Journal of Speech and Hearing Research, 29, 394–399.
Page 292 Schiavetti, N. (1992). Scaling procedures for the measurement of speech intelligibility. In R. Kent (Ed.), Intelligibility in speech disorders: Theory, measurement and management (pp. 11–34). Philadelphia: John Benjamins. Schmidt, R. A., & Bjork, R. A. (1992). New conceptualizations of practice: Common principles in three paradigms suggest new concepts for training. Psychological Sciences, 3, 207–217. Schraeder, T., Quinn, M., Stockman, I. J., & Miller, J. F. (1999). Authentic assessment as an approach to preschool speechlanguage screening. American Journal of SpeechLanguage Pathology, 8, 195–200. Schwartz, I. S., & Olswang, L. B. (1996). Evaluating child behavior change in natural settings: Exploring alternative strategies for data collection. Topics in Early Childhood Special Education, 16, 82–101. Secord, W. A. (1981). CPAC: Clinical Probes of Articulation Consistency. San Antonio, TX: Psychological Corporation. Secord, W. A., & Shine, R. E. (1997). SecordContextual Articulation Tests. Sedona, AZ: Red Rock Educational Publications. Semel, E., Wiig, E. H., & Secord, W. A. (1996). Clinical Evaluation of Language Fundamentals 3. San Antonio, TX: Psychological Coproration. Shewan, C., & Slater, S. (1993). Caseloads of speechlanguage pathologists, Asha, 35, 64. Silliman, E. R., & Wilkinson, L. C. (1991). Communicating for learning: Classroom observation and collaboration. Gaithersburg, MD: Aspen Publishers. Simon, C. (1984). Evaluating communicative competence: A functional pragmatic procedure. Tucson, AZ: Communication Skill Builders. Smith, A. R., McCauley, R. J., & Guitar, B. (in press). Development of the Teacher Assessment of Student Communicative Competence (TASCC) for children in grades 1 through 5. Communication Disorders Quarterly. Stevens, S. S. (1975). Psychophysics. New York: Wiley. Stromswold, K. (1996). Analyzing spontaneous speech. In D. McDaniel, C. McKee, & H. S. Cairns (Eds.), Methods for assessing children’s syntax (pp. 23–53). Cambridge, MA: MIT Press. Taylor, S. J., & Bogdan, R. (1998). Introduction to qualitative research methods: A guidebook and resources (3rd ed.). New York: Wiley. Templin, M. C. (1957). Certain language skills in children. Minneapolis: University of Minnesota Press. Terrell, F., Terrell, S. L., & Golin, S. (1977). Language productivity of black and white children in black versus white situations. Language and Speech, 20, 377–383. Thorton, R. (1996). Elicited production. In D. McDaniel, C. McKee, & H. S. Cairns (Eds.), Methods for assessing children’s syntax (pp. 77–102). Cambridge, MA: MIT Press. Turner, R. G. (1988). Techniques to determine test protocol performance. Ear and Hearing, 9, 177–189. Tyack, D., & Gottsleben, R. (1974). Language sampling, analysis, and training. Palo Alto, CA: Consulting Psychologists Press. Vetter, D. K. (1988). Designing informal assessment procedures. In D. E. Yoder & R. D. Kent (Eds.), Decision making in speechlanguage pathology (pp. 192– 193). Baltimore: Brookes Publishing. Vygotsky, L. S. (1978). Mind in society: The development of higher psychological processes. Edited by M. Cole, V. JohnSteiner, S. Scribner, & E. Souberman. Cambridge, MA: Harvard University Press. Weiner, F. F. (1988). Parrot Easy Language Sample Analysis (PELSA) [Computer software]. State Park, PA: Parrot Software. Westby, C. E. (1990). Ethnographic interviewing: Asking the right questions to the right people in the right ways. Journal of Childhood Communication Disorders, 13, 101–111. Wilson, K., Blackmon, R., Hall, R, & Elcholtz, G. (1991). Methods of language assessment: A survey of California public school clinicians. Language, Speech, and Hearing Services in Schools, 22, 236–241. World Health Organization. (1980). ICIDH: The International Classification of Impairments, Disabilities, and Handicaps. Geneva, Switzerland: World Health Organization. World Health Organization. (1998). Toward a common language for functioning and disablement: ICIDH2: The International Classification of Impairments, Activities and Participation. Geneva, Switzerland: World Health Organization.
Page 293 CHAPTER
11 Examining Change: Is This Child’s Language Changing? The Nature of Examining Change Special Considerations for Asking This Clinical Question Available Tools Practical Considerations David is 8 years old and was diagnosed at the age of 7 with a fatal form of a genetic neurodegenerative disease, adrenoleukodystrophy. He had developed normally until about age 6½, when he began showing signs of clumsiness and behavior problems that had initially been attributed to the stresses of a cross country move and beginning first grade. Currently, he follows simple verbal directions with some consistency but rarely speaks. His family is interested in both his current level of comprehension and in information about the rate at which his communication skills are declining so that they can facilitate the child’s participation in the family and plan more for his ongoing care. Tamika, a 5yearold girl with specific expressive language impairment, has been seen for treatment since age 3. Initially her treatment was aimed at increasing the frequency and intelligibility of single word productions; more recent goals have focused on her use of grammatical morphemes and monitoring comprehension of directions. In her efforts to adjust Tamika’s treatment and monitor her overall progress, Tamika’s speechlanguage pathologist uses periodic standardized testing along with frequent
Page 294 informal probes, including probes of treated, generalization, and control items. The speechlanguage pathologist is concerned about her ability to assess the true impact of treatment on Tamika’s social communication with peers and family members because Tamika’s family speaks Black English, whereas the clinician does not. She would like to find an appropriate assessment strategy to help document Tamika’s ongoing communication skills. The five certified speechlanguage pathologists working within a small Vermont school district are eager to demonstrate the efficacy of their work with schoolage children because of concerns about cutbacks in neighboring special education budgets. They decide to participate in ASHA’s National Outcomes Measurement System and begin collecting data for each of their students. In addition, because of their commitment to improving the quality of their practice, they also decide to use a computerized language sampling system with all of their preschool and first grade children with language problems. The Nature of Examining Change The examination of change in children’s language disorders actually encompasses a fairly large number of related questions—Is this child’s overall language changing? What aspects in particular are changing? Is observed change likely to be due to treatment rather than to maturation or other factors? Should a specific treatment be continued, or has maximum progress been made? Should termination of treatment occur? How effective is this particular clinical practice group in achieving change with the children it serves? These assessment questions present some of the most challenging issues facing speechlanguage pathology professionals (e.g., Diedrich & Bangert, 1980; Elbert, Shelton, & Arndt, 1967; Mowrer, 1972; Olswsang, 1990; Olswang & Bain, 1994). Described with regard to a single child, methods used to examine change will fuel decisions regarding how the child moves through a given treatment plan, whether alternative treatment strategies should be explored, and, finally, whether treatment should be terminated. Providing a more formal categorization, Campbell and Bain (1991) drew on the framework of Rosen and Proctor (1978, 1981) to describe three dimensions or kinds of change: ultimate, intermediate, and instrumental. Ultimate outcomes constitute grounds for ending treatment, and they should be established at the initiation of treatment. They are similar to longterm treatment objectives, with levels of final expected performance defined in terms of ‘‘age appropriate, functional, or maximal communicative effectiveness” (Campbell & Bain, 1991, p. 272). Modification of an ultimate outcome might occur. For example, a functional outcome level might initially be set for a child because of expectations that performance at a level with sameage peers was unrealistic. However, if treatment data suggested otherwise, a revision in outcome level would be appropriate (Campbell & Bain, 1991). Intermediate outcomes were seen by Campbell and Bain (1991) as more specific and numerous for a given client. They relate to individual behaviors that must be acquired in order for the ultimate outcome to be achieved and for progression through
Page 295 a given hierarchically arranged treatment to occur. Data from treatment tasks within a session are given as an example of such data. Instrumental outcomes illustrate the likelihood that additional change will occur without additional treatment (Campbell & Bain, 1991). Data documenting generalization fit into this third category. Campbell and Bain acknowledged that this type of outcome is challenging to identify because of the difficulty in knowing at what point evidence of generalization reliably predicts improvement towards ultimate outcomes. The feature that most complicates the assessment of change in children is that children’s behavior is characterized by change stemming from a variety of sources, most of which are related to growth and development. With few exceptions, children—even those with quite significant difficulties—are benefiting from developmental advances that enhance their communication skills. Sometimes change occurs broadly and sometimes in some areas more than others. Even children who have sustained severe brain damage during early childhood will experience developmental benefits as well as the physiological benefits of biological recovery. Only a few exceptions to this upward trend exist—for example, in children with very severe neurologic damage or with neurodegenerative disease and in children who tend to regress in performance when therapy is withdrawn (e.g., some children with developmental dyspraxia of speech or mental retardation). In all cases, however, the speech language pathologist’s assessment of whether change is occurring and why it is occurring must be gauged on an terrain that is rarely flat and is sometimes a series of foothills. Clinical questions involving change make use of many of the same types of measures discussed in chapters 9 and 10 and often examine similar issues across the added dimension of time. Nonetheless, despite their importance for work with children with language disorders, at least until recently such questions have generally received less attention than questions related to screening, identification, or description at a given point in time. Thankfully, a variety of external factors affecting clinical practice described in preceding chapters, such as the demand for greater accountability in schools and hospitals, are helping to encourage and even mandate greater research attention to the assessment of change (Frattali, 1998b; Olswang, 1990, 1993, 1998). Once, broad questions regarding the value of treatment approaches lay principally within the purview of researchers, who conducted treatment efficacy research in highly controlled conditions. Over the past decade, however, concerns about accountability have caused individual professionals in speechlanguage pathology to become more active in collecting and using such data as well (Eger, 1988; Eger, Chabon, Mient, & Cushman, 1986). The primary emphasis on evidence obtained in tightly controlled conditions has been shifted to include emphases on evidence obtained under the very conditions in which treatment is typically conducted—data that are typically referred to as outcomes. In this chapter, the specific considerations affecting the assessment of change in clinical practice are addressed, followed by the special considerations relating to tools that are available to address this issue. Finally, practical considerations related to outcome assessment are discussed for the ways in which they shape professional practices in this area of assessment.
Page 296 Special Considerations for Asking This Clinical Question At least four special concerns complicate the process of answering clinical questions regarding change: (a) identifying reliable, or real, change; (b) determining that the change that is observed is important; (c) determining responsibility for change; and (d) predicting the likelihood of future change (Bain & Dollaghan, 1991, Campbell & Bain, 1991; McCauley & Swisher, 1984; Schwartz & Olswang, 1996). These concerns affect both global inferences regarding a child’s overall progress—ultimate outcomes as well as the more specific decisions involved in specific treatment goals—intermediate and instrumental outcomes (Bain & Dollaghan, 1991; Campbell & Bain, 1991; Olswang & Bain, 1996). Identification of Reliable and Valid Change
Because examination of change depends on a comparison of measurements made on at least two occasions, reliability in the measurement of change is no more certain than the reliability of a single measurement. In fact, there is every indication that it is less so (McCauley & Swisher, 1984; Salvia & Ysseldyke, 1995). In order to get an idea of the effect of measurement error on the examination of change, consider the case of a child whose score on a specific measure taken 4 months apart changes from 15 to 30, where 80 is the highest possible score. Initially, this change would appear to be cause for some degree of celebration—more restrained if you looked just at the number of points gained out of the number possible; less restrained if you looked at the fact that the child had doubled his score. However, once you remind yourself that measures vary in their reliability (sometimes quite wildly), you realize that more information is needed before party invitations can be sent out. Depending on the reliability of the measure, each observed score could fall quite off the mark of the test taker’s real score, with unfortunate consequences for the believability of observations about the difference between the two testings. The difference between these two scores could be described as a difference score or, more frequently in this kind of situation, a gain score. In fact, gain scores are often less reliable than the measures on which they are based (Mehrens & Lehman, 1980; Salvia & Ysseldyke, 1995). Although concerns about gain scores are typically expressed in relation to standardized normreferenced measures, they apply equally to other quantitative measures. The nature of the measure used in the preceding example was intentionally ambiguous in order to emphasize that point. The advantage of some standardized normreferenced tests is the availability of information allowing one to estimate the risk of error associated with individual gain scores. Using the standard error of measurement and methods like those used to examine difference scores when they occur in profiles, it is possible to examine the likelihood that a difference score is reliable (Anastasi, 1982; Salvia & Ysseldyke, 1995). Indeed, some tests include graphic devices on their scoring sheets that will help users determine whether a difference is likely to be reliable. However, there is still reason to believe that numerous normreferenced tests continue to fail to provide this information for users (Sturner et al., 1994).
Page 297 The problem facing normreferenced instruments, however, is equally shared or even more intense for informal measures: Informal quantitative measures will almost never provide that information. Thus, additional strategies are needed for providing evidence of reliability—that is, evidence that a measure is likely to be consistent over short periods of time, when used by different clinicians, and so forth—and is thus able to reflect real change, rather than error, when it occurs. As you will see later in this chapter, single subject designs constitute the most powerful of these strategies. As a sophisticated observer of psychometric properties, you may be waiting for the other shoe to drop—the validity shoe. Although it might be possible for developers of highly developed standardized measures to study the ability of their measure to capture significant change as a form of criterionrelated validity evidence, they almost never do so. Instead, for most measures in speechlanguage pathology and other applied behavioral sciences as well, the examination of validity has been couched in terms of discussions of “importance”: Is observed change that appears to be reliable also important? Determining That Observed Change Is Important
Issues about the importance of change can be complex. They include questions such as, Is the change large enough to be significant? and Is the nature of the change such that it is likely to affect the child’s communicative and social life? These are some of the questions that Bain and Dollaghan (1991) explored under the notion of clinically significant change. A number of complementary indicators of “importance” have been put forward. The most important of these are (a) effect size—Did much happen? (Bain & Dollaghan, 1991); (b) social validation—Did it make a difference in this person’s communicative life? (Bain & Dollaghan, 1991; Campbell & Bain, 1991; Kazdin, 1977, 1999; Schwartz & Olswang, 1996); and (c) the use of multiple measures (Campbell & Bain, 1991; Olswang & Bain, 1994; Schwartz & Olswang, 1996). Effect Size
In the statistical and research design literature, a distinction is made between statistical significance and substantive importance, or meaningfulness. That distinction, although often overlooked by researchers who focus on statistical significance as if it were the holy grail (Young, 1993), is a valuable one for our thinking about the clinical importance of change we observe in children. Effect size, which refers to the magnitude of difference observed, is frequently discussed in relation to substantive importance, or clinical significance, and is discussed at some length later in this section. Statistical significance is a relatively straightforward concept. Specifically, when a research finding is statistically significant, a statistical test has suggested that the finding is unlikely to have occurred by chance, that it is rare (Pedhazur and Schmelkin, 1991). More complex, however, is the matter of determining whether a statistically significant finding is meaningful, that is, whether it says anything important about the matter under study (Pedhazur and Schmelkin, 1991). A term frequently used to refer to the meaningfulness or substantive importance of a difference to clinical decision mak
Page 298 ing is clinical significance (Bain & Dollaghan, 1991; Bernthal & Bankson, 1998). Other terms applied to this concept in the rich psychological literature on the topic include social validity, clinical importance, qualitative change, educational relevance, ecological validity and cultural validity (Foster & Mash, 1999). A research example using a difference between two groups at a single point in time can help illustrate the distinction between statistical significance and substantive importance. In a research study one might compare the performance of two groups on a given test with 100 items and find that the two groups differed in their performance by just 2 items. Further, the difference might be shown to be statistically significant. Despite the statistical significance, however, most observers, if aware of the size of the difference, would consider a difference of just 2 points to merit no more than a yawn—no matter how much verbal arm waving the researcher in question might use to inspire interest. In contrast, if a much larger difference had been obtained and found to be statistically significant, most observers would be moved to rapt attention, having been persuaded that the basis for group assignments had at least some sort of important relationship to the subject covered by the test. Using an analogous clinical example, one can imagine achieving a very consistent result when using a particular treatment with a given child—for instance, Tamika, from the introduction of this chapter. Perhaps Tamika makes gains of one or two items on untreated probes that are used over the course of a semester to monitor her progress in the use of grammatical morphemes. That relatively high consistency (or reliability) of change, however, would probably not please you (or Tamika) and would probably send you scrambling to find an alternative, more effective intervention strategy. The clinical significance of change observed for Tamika simply would not warrant contentment with the current treatment. Effect size, which can be measured in a variety of ways, generally refers to the magnitude of the difference between two scores or sets of scores, or of the correlation between two sets of variables (Pedhazur & Schmelkin, 1991). Authors regularly suggest that researchers in speechlanguage pathology and elsewhere appear to fixate on statistical significance at the expense of effect size or other measures that are more amenable to decisions about the value of information to decision making (e.g., Pedhazur & Schmelkin, 1991; Young, 1993). Because information about the reliability of difference scores is difficult and often impossible to come by for the measures clinicians use to examine change, clinicians and their constituents are much more likely to want to inspect the actual magnitude of changes with an eye toward its clinical meaning. Effect size alone cannot be the sole data used to determine the meaning of a particular difference because other factors will need to be taken into account (e.g., the social significance of the difference, the likely generalizability of the difference). However, it can be an important element in that process (Bain & Olswang, 1995). Bain and Dollaghan (1991) described a couple of strategies for looking at effect size. One of these strategies uses standard scores, takes into account the absolute amount of change that has occurred, and is therefore primarily limited to use with normreferenced standardized measures. The other uses ageequivalent scores, looks at the relative size of change, and is subject to the vagaries associated with that inferior method of characterizing performance.
Page 299 Using standard scores to examine change, Bain and Dollaghan (1991) noted that the amount of change can be expressed in terms of standard deviation units and compared against an arbitrary standard. Thus, a difference might be considered of practical significance if it met or exceeded a change of so many standard deviation units—with those authors citing 1 standard deviation as a frequently used standard. For instance, imagine that at Time 1, a child receives a standard score of 70 on a test with a mean of 100 and standard deviation of 10. Then, at Time 2, the child receives a score of 81 on that same test. The amount of change would be considered of clinical significance because it corresponded to slightly more than one standard deviation. As long as the measure that is being used has been carefully selected for its validity for the given child and content area, this method seems a reasonable one for many purposes. In particular, its use is strengthened if the time period encompassed by the comparison results in a comparison against a single normative subgroup. Specifically, if a child’s performance can be compared with just a single normative subgroup over time (e.g., all of the children age 5 years, 1 month to 6 years), then the extra variability introduced by comparing his her first performance with one set of children (e.g., the children from 5 years to 5 years, 6 months) and then with another (e.g., the children from 5 years, 7 months to 6 years) can be avoided. The use of standard scores is also preferable to the same method applied using ageequivalent scores and a cutoff established around a certain ageequivalent gain (Bain & Dollaghan, 1991) because of the poor reliability of such scores (McCauley & Swisher, 1984). Admittedly, at this point, selection of the cutoff in this strategy using standard scores is arbitrary—how much change should be regarded as clinically significant can serve as a point of considerable argument. However, additional research by test developers and others could validate specific levels in a manner quite analogous to that proposed for cutoffs used in other areas of clinical decision making (Plante & Vance, 1994). The Proportional Change Index (PCI), the alternative strategy for examining effect size described by Bain and Dollaghan (1991), provides a relative measure of change arising from the work of Wolery (1983). The measure is relative in the sense that it attempts to examine the rate of change characteristic of the child’s behavior for the period before treatment as compared with the rate observed during treatment. Specifically, the PCI is the proportion created when the child’s preintervention rate of development is divided by the child’s rate of development during intervention. The preintervention rate of change is estimated by dividing the child’s age equivalent score on a measure taken just before the beginning of treatment by his age in months. The rate of development during intervention is estimated by dividing the gain score obtained for that measure when it is readministered after a period of treatment by the duration of treatment. For a child whose behavior is being monitored over time without intervention, the measure might be used to examine the period before observation with that observed during the period of observation. The merit of this particular measure is that it “takes into account the number of months actually gained, the number of months in intervention [or observation] and the child’s rate of development at the pretest date” (Wolery, 1983, p. 168). Figure 11.1 illustrates the calculation of PCI for two children: Shana, who shows excellent gains in receptive vocabulary, with twice as
Page 300 Fig. 11.1. A hypothetical example showing the calculation of the Proportional Change Index (Bain & Dollaghan, 1991; Wolery, 1983) for two children.
Page 301
Page 302 much progress in treatment as prior to treatment; and Jason, who shows progress in receptive vocabulary acquisition that is no better in treatment than it had been prior to treatment. If the two rates of change used in the equation for PCI are similar, the calculated value for PCI will approach a value of one. On the other hand, if treatment or other factors have accelerated development, the PCI should be positive, with larger PCI’s indicating greater acceleration. Thus, for example, a PCI of 3 would imply that change had occurred three times as quickly during treatment as preceding it. Alternatively, a PCI of .5 would suggest that change had occurred at half the rate during the treatment or observation period as preceding it. As described earlier, the PCI is usually recommended for its utility in examining change during a period of intervention in which positive change is expected. Nonetheless, it might also be used if one were interested in examining alterations in rates of change occurring under conditions like those described for David at the beginning of the chapter. Recall that David had been diagnosed with a neurodegenerative disease that was predicted to result in skill loss. It might also be used under conditions in which problems in development were suspected (as in the case of a suspected ‘‘late talker”), but the child’s clinician had opted for a watchandsee strategy with a planned 6month reevaluation. Bain & Dollaghan (1991) noted that the PCI rests on two problematic assumptions, with the first being that change in children’s skills occurs at a constant rate in the absence of intervention. A plausible alternative to this assumption is that change may occur at varying rates during development—with children’s behaviors sometimes racing ahead, sometimes holding steady, and sometimes, perhaps, even regressing for a time. The problem with the assumption of constant change embodied in the PCI is addressed to some extent by the use of single subject designs, a specific method that is described in greater detail later in the chapter. Single subject designs escape this assumption through the clinician’s active examination of change patterns during periods in which intervention is not occurring as well as when it is. Thankfully, too, the question of whether change is constant can be addressed empirically. Although additional information is needed to determine the extent to which this assumption is tenable, efforts to examine patterns of change are underway and suggest that over shorter time periods the assumption of a constant rate of change is probably false (Diedrich & Bangert, 1980; Olswang & Bain, 1985). The second problematic assumption of the PCI lies in its use of ageequivalent scores and the temptation that it presents for clinicians to use tests that present such scores without much in the way of empirical support—either for the ageequivalent scores or for the test in its entirety. Bain and Dollaghan (1991) acknowledged this potential drawback and implicitly recommended that clinicians should search for the highest quality measures to use for documenting change. However, they also suggested that in the absence of such measures, the PCI may offer a better alternative than the simple assumption that a gain in ageequivalent scores over time represents progress. An additional limitation affecting the PCI is the need for users to adopt an arbitrary basis for determining when a certain amount of change is sufficient to support the use of time and other resources required to achieve a particular gain. Thus far, no meas
Page 303 ure described herein or proposed elsewhere has been able to claim a rational basis for its particular standard or cutoff. In principle, then, the two measures of effect size that I have described (standard score gain scores and PCI) seem to represent strong contenders for use in decisions about the importance of observed change—both for change observed during treatment or for change observed over a period of time in which intervention is not used but a child’s performance is monitored. However, additional research is needed to validate their use in decision making, particularly in the case of the PCI in which the strength of the logic behind the measure is undermined by its dependence on ageequivalent scores. I also call readers’ attention to the fact that both of these methods will more readily be implemented for standardized normreferenced tests than for other types of measures that might be used to describe a child’s language. Social Validation
In examining the importance of change, clinicians are almost always interested in considering whether observed changes conform to theoretical expectations, especially developmental expectations, that imply a hierarchy of learning in which some behaviors are seen as prerequisites to others (Bain & Dollaghan, 1991; Lahey, 1988). Put differently, clinicians are interested in determining whether the child has made gains that theoretically appear to be movements along the “right” path. Gains on those behaviors that are seen as precursors to further advancement are judged to be more important than those that are not. Additionally, clinicians have always valued and sometimes solicited family and teacher reports asserting progress as de facto evidence that change has occurred and is important. This way of thinking about the importance of language change falls under the term social validation.Social validation also complements the use of effect size in fostering the richest possible conceptualization of “importance.” Acknowledging that such evidence has value is consistent, first of all, with an appreciation that the functional and social effects of communication disorders warrant greater incorporation into clinical practice (Frattali, 1998b; Goldstein & Geirut, 1998; Olswang & Bain, 1994). In a different context (discussing research significance as opposed to clinical significance), Pedhazur and Schmelkin (1991) offered a quotation from Gertrude Stein: “A difference in order to be a difference must make a difference” (p. 203). If rephrased slightly, this quotation also seems to speak to efforts to examine the importance of change in children’s language: For change in a child’s language to be significant, it must make a difference in the child’s life. Use of measures to examine the functional and social impact of change is also consistent with the growing appreciation of qualitative data described in the last chapter. Because qualitative data are unapologetically subjective in nature (Glesne & Peskin, 1992), they may be used very effectively—more effectively than reams of quantitative data—to address questions related to the social context supporting and affecting a child and to how the child is viewed in that context. Over the past few decades, quantitative as well as qualitative measures have received growing attention for the purpose of assessing function and social impacts of treatment (Bain & Dollaghan, 1991;
Page 304 Campbell & Bain, 1991; Campbell & Dollaghan, 1992; Koegel, Koegel, Van Voy, & Ingham, 1988; Olswang & Bain, 1994; Schwartz & Olswang, 1996). Kazdin (1977) described a process by which such measures can be used to look at the importance of behavioral change. In particular, he focused on behavioral change achieved through applied behavior analysis and based his work on that of Wolf and his colleagues (e.g., Maloney et al., 1976; Minkin et al., 1976; Wolf, 1978). Kazdin defined social validation as the assessment of “the social acceptability of intervention,” where such acceptability could be assessed with regard to intervention focus, procedures, and—importantly for this discussion—behavior change. More recently, he has defined clinical significance as ‘‘the practical or applied value or importance of the effect of an intervention—that is, whether the intervention makes real (e.g., genuine, palpable, practical, noticeable difference in everyday life to the clients or others with whom the clients interact” (Kazdin, 1999, p. 332). Although Kazdin and numerous other authors working in the area of clinical psychology (e.g., Foster & Mash, 1999; Jacobson, Roberts, Berns, & McGlinchey, 1999; Kazdin, 1999) have continued to elaborate on the concepts outlined in Kazdin (1977), basic issues raised in that earlier work remain relevant. In particular, this relevance derives from the lack of empirical validation supporting many of the highly developed measures of clinical significance proposed in the clinical psychology literature (Kazdin, 1999). Kazdin (1977) recommended two general approaches to the social validation of behavior change that have been embraced by a number of researchers in child language disorders—social comparison and subjective evaluation (Bain & Dollaghan, 1991; Campbell & Bain, 1991; Campbell & Dollaghan, 1992; Olswang & Bain, 1994; Schwartz & Olswang, 1994). Social comparison involves comparisons conducted preand postintervention between behaviors exhibited by the child receiving intervention with those of a group of sameage peers who are unaffected by language impairment (Campbell & Bain, 1991). Astute readers will find this method reminiscent of a normative comparison. However, instead of comparisons on a standardized measure against a relatively large group of ostensible “peers,” here the child’s performance on a more informal measure (usually a cliniciandesigned probe) is compared against that of a relatively small group of actual peers. The value of this technique will certainly be affected by the care taken to choose a representative, if small, comparison group. In addition, it may also prove most valuable in cases where a normreferenced comparison using a larger group is unavailable because no appropriate measures or appropriate normative samples exist for the targeted behavior and particular client. Subjective evaluation involves the use of procedures designed to determine whether individuals who interact frequently with the child see perceived changes as important (Kazdin, 1977). Methods that have been proposed for these purposes in speechlanguage pathology range from quite informal to relatively sophisticated. Thus, for example, at the informal end of the continuum, it has been suggested that parents, teachers and other adults who are familiar with the child be asked to appraise the adequacy of a child’s performance following a period of intervention (Bain & Dollaghan, 1991; Campbell & Bain, 1991). Clearly these data may be qualitative in nature (Olswang & Bain, 1994; Schwartz & Bain, 1995) and would benefit from the clinician’s use of triangulation with other sources, as discussed in the previous chapter,
Page 305 thus implying the use of multiple measures. This is consistent with the idea emphasized in Kazdin’s (1999) recent work, that “clinical significance invariably includes a frame of reference or perspective” (p. 334). A more intermediate level of complexity might involve use of an existing rating scale, such as the Observational Rating Scales of the Clinical Evaluation of Language Functions—3 (Semel, Wiig, & Secord, 1996), in which a similar rating scale is completed by the child, the parent(s), and a classroom teacher. The growing interest in the development of functional measures for use with children in school settings will certainly provide many new alternatives of this kind. Addition of this type of measure to the very detailed measures of progress being used for Tamika may not only provide strong evidence of functional impact, but may also help reduce possible bias in the assessment of progress achieved by a child who speaks a dialect usually underrepresented in standardized measures. A higher level of complexity in the use of subjective evaluation would involve the use of a panel of naive listeners who could be asked to use a rating strategy such as direct magnitude estimation to make judgments about some aspect of the communicative effectiveness of a child’s productions. Campbell and Dollaghan (1992) described the use of a 13person panel that was asked to rate the informativeness (“amount of verbal information conveyed by a speaker during a specified period of spontaneous language”, p. 50) of utterances produced by nine children with brain damage and their controls. This example of social validation is particularly complex given that Campbell and Dollaghan applied a hybrid method that used both social comparison and subjective evaluation components. Although methods as complex as these are probably not practical in many clinical settings, they provide a valuable illustration of how flexible social validation procedures can be. In summary, social validation methods add greatly to our estimation of how important an observed change is. In particular, they can help us see how observed differences make a change in a child’s communicative and social functions and opportunities. They vary dramatically in terms of their complexity and sophistication. Further, because they can be applied to qualitative as well as quantitative data, the use of informal measures is an especially attractive feature. Use of Multiple Measures
The augmentation of measures designed to directly assess linguistic behaviors with measures intended to provide social validation constitutes one very important way in which multiple measures may be used to enhance our ability to tease out the contribution of treatment to change. However, the kinds of multiple sources of data recommended by clinical researchers do not stop there (Campbell & Bain, 1991; Olswang & Bain, 1994; Schwartz & Olswang, 1996). They extend to considering the value of multiple indicators in helping one best address the construct of interest—an idea that was introduced in Fig. 2.2 and in chapter 2. Whether the construct is one related to a particular linguistic skill or to a child’s communicative function within a given setting, there is general agreement that making use of several measures can best support conclusions about the construct under consideration.
Page 306 Writing from a research perspective, Primavera, Allison, and Alfonso (1996) noted that Cook and Campbell (1979) introduced the idea of multioperationalism into behavioral research, in which a construct is operationalized using as many indicators as possible in order to truly capture its essence. In a similar vein, Pedhazur and Schmelkin (1991) offered a detailed account explaining why the use of a single indicator of a construct “almost always poses insurmountable problems” (p. 56) related to knowing to what extent the indicator reflects the construct rather than error. Whereas researchers may have greater opportunities and rewards for practicing multioperationalism, clinicians, too, can benefit from its application. When a clinician uses a single measure (e.g., a single test of receptive vocabulary) to support conclusions about a construct (e.g., receptive language), both the clinician and his or her audience either immediately feel skeptical that the part (receptive vocabulary) represents the whole (receptive language) or should feel skeptical if they give it much thought. Even if conclusions are limited to those about receptive vocabulary, however, a quick reminder about the nature of most such tests—that they frequently address only pictureable nouns—should cause the clinician to pause. Clearly, the single indicator seems unlikely to capture the construct of interest. The time demands of clinical practice can sometimes make the collection of even one measure seem onerous and the idea of multiple measures an author’s fantasy and clinician’s nightmare. However, becoming aware of the value of such measures may help clinicians decide to take the extra time and provide support for that decision in select cases. Further, in cases where the use of multiple measures has not seemed practical, it can help lead to more limited and therefore more valid interpretations. In this section, three principal strategies for examining the importance of change were briefly introduced: use of multiple measures, social validation and effect size. Authors such as Bain, Campbell, Dollaghan, and Olswang have begun to venture deep into the literatures of related disciplines to explore this relatively new territory for the resources it might contribute to measurement in communication disorders. Given the value of their work to date, their efforts will undoubtedly continue and be joined by those of others who respond to recent calls for more persuasive evidence that speechlanguage pathology services make a difference for children with communication disorders. Determining Responsibility for Change
Whereas determining the extent to which change in language has occurred and determining its importance are closely related tasks, verifying the clinician’s contributions to that change is an altogether different and more daunting task. Granted, simply noting the extent to which change has occurred and its nature can be useful in instances where no intervention has taken place—for example, in cases where a child’s development is being monitored because of suspicion that the child is a late talker. More commonly, assessment of change for children in treatment involves cases where all stakeholders are comfortable with the unexamined assumption that change will be primarily the result of intervention efforts. However, there are times when demonstrating that treatment is responsible for observed changes is crucial. In this era of growing attention to accountability and quality assurance, these times are becoming more common (Eger, 1988; Frattali, 1998a 1998b).
Page 307 The difficulty in pinning down causal explanations for human behavior or behavior change is a driving force behind developments in psychology and related disciplines over the past 100 years. Again and again, the problem with determining causality seems to be ruling out alternative explanations in cases where stringent control over potential causes is either not possible or not ethical. Treatment for language disorders in children presents the classic difficulty in this regard. The possibility of factors other than treatment—such as development, environmental influences, and changes in the child’s physiology through recovery from a disease process or trauma—make it very difficult to identify treatments or indirect management strategies as having “caused” gains that are seen in a child’s performance. At least two design elements have provided a logical basis for increasing the plausibility that gains in performance seen while a child is undergoing treatment are attributable to treatment rather than to alternative explanations. These two elements are repeated observations over a period of time prior to the onset of treatment and the use of treatment, generalization, and control probes. Both of these elements have been incorporated into the framework of research known as single subject experimental design (Franklin, Allison, & Gorman, 1996; Kratochwill & Levin, 1992; McReynolds & Kearns, 1983). In addition, each has been identified separately as a means of enhancing support for treatment as a causal factor in cases of behavioral gains (Bain & Dollaghan, 1991; Campbell & Bain, 1991; Olswang & Bain, 1994; Schwartz & Bain, 1996). Pretreatment Baselines
The use of multiple observations over a period of time prior to the initiation of treatment is frequently referred to as a baseline or the A condition in a single subject experimental design. Multiple observations function as a window into the stability of the behavior and the measure used to characterize it. If little variation is observed, it seems most likely that the behavior is not changing and that the measure being used to track the behavior is not introducing error (i.e., that it is probably reliable). This means that departures from stability observed after the onset of treatment can be more readily attributed to treatment than to either the instability of the behavior being measured or to measurement error. The presence of stability during baseline observations might alternatively be interpreted as suggesting that the behavior being measured and the measure being used for that purpose are varying: in ways that cancel each other out—a most unlikely prospect. In contrast, when considerable variation is observed, it can be difficult to determine which of the two possible sources of variation (change in the behavior vs. error in the measurement) is the culprit. Consequently, as a rule, baselines are easiest to interpret and they provide the strongest support for observing changes that might occur under conditions such as treatment, when they are sufficiently lengthy, show no obvious trends, and appear to be stable (McReynolds & Kearns, 1983). With regard to length, three observations is often referred to as a minimum (McReynolds and Kearns, 1983), with longer baselines required if the behavior shows a trend or other lack of stability. The presence of a trend (consistent increase or decrease in data values in the direction of expected change with treatment) can be problematic, as can lack of stability in
Page 308 which both increases and decreases in a specific measure are noted. Because stability is a relative quality, we again are in a position of looking toward expert advice to help us agree on an acceptable range of variation. McReynolds and Kearns (1983) pointed to a historic standard of 5 to 10%. However, they noted that lower levels of stability achieved during a baseline will simply necessitate greater amounts of change to justify claims of effective treatment. Proponents of single subject experimental designs who are the chief resources for interpreting baseline data have often suggested that visual inspection of such data is sufficient for the detection of stability and systematic change. Recently, however, the complexity of this judgment task has led to questions about its use (Franklin, Gorman, Beasley, & Allison, 1996; Parsonson & Baer, 1992). In particular, researchers have noted a tendency for visual analysis to fail to detect change when it has actually occurred, thus suggesting a lack of sensitivity to smaller levels of change. This reduced sensitivity may present serious problems for clinicians who believe that small amounts of change will be important to documenting the effect of their treatment. On the other hand, for those who attempt to target behaviors on which they expect larger changes (larger effect sizes, to use our previous terminology) the reduction in sensitivity may represent a reasonable tradeoff against the relative simplicity of graphic analysis. Nonetheless, clinicians who may wish to rely on visual analysis would do well to look into the emerging complexities of this aid to data interpretation (Franklin et al., 1996; Parsonson & Baer, 1992). Researchers and clinicians with sufficient resources might also consider alternative interpretations that make use of emerging methods (Gorman & Allison, 1996). Treatment, Control, and Generalization Probes
The idea of treatment and control probes draws once again on the single subject experimental design literature (Bain & Dollaghan, 1991). In that context, treatment probes represent quantitative measures focusing on behaviors that are or will be the target of treatment. They are usually the minimum type of data collected to provide evidence of change. In contrast, control probes represent quantitative measures obtained periodically over the course of a study to allow the clinician to monitor the effects of extraneous variables on an individual’s behavior. They are usually constructed or selected so that they measure behaviors that are unrelated to the treated behavior. If the treated behavior shows change whereas the untreated, control behavior monitored using control probes does not, then the clinician can feel confident that maturation and other factors have not produced global advances from which treated stimuli would have benefited with or without the implementation of treatment. (Of course, one of the perils involved in the selection of control probes is that developmental forces may cause changes in the behavior they are used to track even without a direct effect of treatment; Demetras, personal communication, February, 2000). Generalization probes are used to track behaviors that are related but distinct from those receiving treatment. Thus, their use involves a violation of the expected lack of relationship from treated behaviors characteristic of control probes within single sub
Page 309 ject designs (Bain & Dollaghan, 1991; Fey, 1988). In the construction of generalization probes, the clinician looks for behaviors that are related to treated behaviors in a manner thought likely to cause generalization that will affect them. On the basis of the current understanding of generalization, generalization probes would be expected to show similar but smaller changes than treatment probes in response to the implementation of an effective treatment. Although generalization across behaviors may be the most common dimension in which generalization probes are studied clinically, generalization across situations will also prove of interest as will generalization across time (McReynolds & Kearns, 1983). The use of generalization and control probes allows for a clear demonstration that treatment is behaving as predicted relative to the targeted behavior. Specifically, their use can help demonstrate that treatment is having its greatest effect on treated behaviors, a lesser effect on untreated or other generalization behaviors, and no effect on control behaviors. Their use can thus contribute to the plausibility of arguments that treatment, rather than the myriad of other variables that might help a child’s behavior improve, is the agent responsible for observed change. Campbell and Bain (1991) further argued that evidence of generalization obtained during treatment offers speechlanguage pathologists their clearest opportunity to show instrumental outcomes (i.e., outcomes suggesting the likelihood that treatment will lead to additional outcomes without further treatment). More support for these varied measures comes from the motor learning literature, in which it was observed that data obtained during a learning condition (e.g., a treatment session) can overestimate learning compared to generalization or maintenance data (e.g., see Schmidt & Bjork, 1992). An example illustrating the use of treatment, generalization, and control probes is described in Bain and Dollaghan (1991) as part of a single subject design. Using the case of a hypothetical preschooler with SLI, they suggested a treatment target consisting of the production of a twoword semantic relation Agent + Action. As a generalization behavior, they proposed the production of Action + Object because its shared component, Action, was thought to make generalization likely. Finally, as a control behavior, they proposed the production of Entity + Locative because it seemed unlikely to change without direct treatment. Each probe consisted of the child’s percentage of correct production of 10 unfamiliar exemplars that the clinician attempted to elicit through manipulation of several toys and the context. Treatment, generalization, and control probes often involve elicited behaviors such as those described under that heading in the preceding chapter. However, other measures, such as performance on language samples and analyses, could also serve as measures that might be used to examine treatment, generalization, and probe behaviors. Although there is a tendency for treatment probes to be obtained frequently so that the process of treatment as well as the product may be illuminated (McReynolds & Kearns, 1983), generalization and control probes are frequently evaluated on a less frequent basis (Bain & Dollaghan, 1991). The frequency with which treatment probes are used may depend on the expected rate of change; Bain and Dollaghan pointed out that the behaviors of a child with cognitive delays indicative of an overall slower rate of learning may require less frequent collection of data.
Page 310 Determining Whether Additional Change Is Likely to Occur
As an additional aspect of examining change, authors have sometimes called attention to the value of predicting whether future change is likely. In particular, this general question has been asked specifically with regard to addressing predictions of change at two different ends of the treatment process: initiation and termination. First, successful prediction of whether change is likely might help in judging whether treatment should be initiated because of a child’s ‘‘readiness” for change in a particular area (Bain & Olswang, 1995; Long & Olswang, 1996; Olswang & Bain, 1996). Second, successful prediction might help in judging whether treatment should be terminated, or at least temporarily discontinued, because additional change is unlikely (Campbell & Bain, 1991; Eger et al., 1986). Both kinds of questions will require substantial empirical investigations to arrive at universal recommendations for best practices. Nonetheless, each depends on evidence that a particular technique is valid for predicting a given outcome—thus suggesting that evidence of predictive criterionrelated validity is at the root of both of these questions. This realization is implicit in the work of Bain and Olswang (1995), in which they sought to demonstrate the predictive validity of dynamic assessment to support its use in determining readiness for the production of twoword phrases. Posing the question of when treatment might most profitably be initiated goes beyond the clinical assumption that treatment should be undertaken any time a child is found to demonstrate a significant problem in language or communication skills. The question itself suggests the possibility that there are times when children may exhibit evidence of a language disorder but that treatment would be unlikely to be effective—either in a global sense or in relation to a specific domain or behavior. Timing the onset of treatment or at least the onset of treatment aimed at specific targets to coincide with children’s areas of readiness could be expected to yield major enhancements to treatment efficiency (Long & Olswang, 1996). Olswang and Bain (1996) discussed the use of profiling in static assessment versus dynamic assessment as tools to use in addressing the question of readiness. The use of profiles, which are most often created by comparing a child’s performances on several tests or subtests, was discussed at some length in chapter 9. Even though the use of profiles has been largely debunked as a strategy for highlighting domains or children that might exhibit the greatest change in treatment, Olswang and Bain (1996) decided both to pursue it as one of the few methods in static assessment that has been proposed for addressing the prediction of future change and to compare it with techniques from dynamic assessment. One of the greatest promises of dynamic assessment has been its use in identifying the moving boundary of a child’s learning, or zone of proximal development (ZPD; Olswang & Bain, 1996; Vygotsky, 1978). As described in chapter 10, the ZPD is thought to reflect the loci of a child’s active developmental processes and thus to suggest areas in which treatment might be aimed to achieve optimal change. As a result of this promise, Olswang and Bain (1996) decided to compare the relative merits of profiles based on static assessments as well as performances on other selected variables versus measures of dynamic assessment techniques in predicting responses to
Page 311 treatment. The dynamic measures were found to have the stronger correlation than the static measures to a measure of change (PCI) calculated following a 3week treatment period. The results of their study led Olswang and Bain (1996) to propose that dynamic assessment procedures are better than other techniques at determining the likelihood of immediate change. However, they noted that additional research is needed to determine whether observed changes would have occurred even in the absence of treatment. They might also have noted that additional research is needed to determine whether the predictive powers of dynamic assessment would have performed as well over longer periods of treatment. As Campbell and Bain (1991) advised, decisions regarding treatment termination can be based on predetermined exit criteria or on demonstrations that no change has occurred over a given period of time. Such decisions, however, can also be based on empirical evidence that additional change is unlikely. This last alternative thus demands a prediction of future change levels akin to that sought by Olswang and Bain in their efforts to identify harbingers of change prior to treatment initiation. Campbell and Bain (1991) touched on the possibility of predicting future change for purposes of making a rational decision about the end of treatment in their discussion of ultimate or instrumental outcomes. Whereas ultimate outcomes can be defined as a child’s achievement of ageappropriate or maximal communicative effectiveness, such outcomes can also be defined as functional communicative effectiveness, which implies that the child has achieved his or her best approximation of maximal communicative effectiveness. Additionally, “instrumental outcomes” can be defined as outcomes suggestive that additional change will be forthcoming in the absence of treatment. The notions of “functional” communicative effectiveness and instrumental outcomes each involve implications related to the prediction of future change. Specifically when functional communicative effectiveness is seen as a legitimate ultimate outcome, it is almost invariably because the prospect of additional change is seen as unlikely or as prohibitive in terms of the time and effort required to produce it. Similarly, instrumental outcomes depend on the notion that additional change is likely. At this point in time, it appears that generalization data, such as that described in the preceding section, may represent the best method for addressing questions regarding future change. Research designed to identify more appropriate methods of predicting future change will undoubtedly need to proceed handinhand with research aimed at understanding the nature of language learning and of threats to language learning posed by language disorders before substantial progress on these clinical questions can be made. Measures of predictive validity will also undoubtedly play a role in helping us arrive at satisfying answers. Available Tools The kinds of tools available for use in addressing questions of change in children’s language disorders largely overlap those available for description that were described in the preceding chapter. Therefore, in this chapter, discussion of available tools is
Page 312 quite brief and focuses on those measures that are most frequently used to examine behavioral change and the special considerations that arise when they are used for that purpose. The only new tool to be introduced in this chapter is single subject designs, a family of methods that has been alluded to throughout this chapter but has not been adequately introduced as a specific method for examining change. Standardized, NormReferenced Tests
Repeated administration of standardized, normreferenced tests is probably the most widespread method used by speechlanguage pathologists to examine broad changes in language behaviors over time (McCauley & Swisher, 1984). More so than other measures used to examine change, standardized normreferenced measures are often accompanied by data concerning their reliability and validity. This represents a distinct potential advantage because such data can enhance the clinician’s ability to determine whether observed changes are likely to be reliable and important. Regrettably, however, normreferenced measures often do not provide sufficiently detailed data to make this potential a reality (Sturner et al., 1994). As additional barriers to their effective use for evaluating change, there are a number of pitfalls that must be avoided. The most important of these relates to the tendency for such measures to have been devised so that they are more sensitive to large differences in knowledge between individuals than to small differences (Carver, 1974; McCauley & Swisher, 1984). Yet it is small differences that are characteristic of the changes most likely to occur in treatment within a given individual (Carver, 1974; McCauley & Swisher, 1984). Thus, clinicians who use such measures to assess change must be aware that their efforts are likely to prove insensitive to very important changes in behaviors that simply are not addressed by a given test. Such tests should be used when broad changes are of interest. Among other possible pitfalls cited by McCauley and Swisher (1984), as well as others, are the need to avoid situations in which the test is explicitly taught by a well meaning clinician or implicitly taught through repeated administrations that occur so closely in time as to allow the child an unwarranted advantage at the second administration. Another pitfall is the use of normreferenced instruments to assess change, which can be problematic if changes in the normative groups occur over the time interval studied or if different measures (albeit those that ostensibly tap the same behavior) are used at different times. Now, it may be tempting to view change as having occurred because a child has received a relatively better score on Test B of Language Behavior X than she or he did on Test A of Language Behavior X. However, the huge amount of error that could be introduced by differences in the content of Tests A and B (despite their similar names) as well as by differences in their normative samples are likely to make such a conclusion completely erroneous. One method that has been recommended (e.g., McCauley & Swisher, 1984) as helping remove the additional error associated with gain scores has been to simply reexamine a child with the same initial question: Is this child’s language (or the particular aspect of it that is under scrutiny) impaired? However, a recent study looking at remission rates for reading disability among children examined in two studies over
Page 313 a 2year time period suggested that measurement error can lead to significant overestimates of recovery rates even when this more cautious strategy is applied (Fergusson, Horwood, Caspi, Moffitt, & Silva, 1996). However, the chief source of difficulty was not in how change was examined, but that the question of measurement error had not been explored sufficiently by the original investigators at the time of the children’s original diagnoses. Careful analysis by Fergusson and his colleagues suggested that the overidentification of many children at their first testing, due to a lack of appreciation of testing error, was the villain. It is now an empirical question to determine whether the findings of Fergusson et al. are echoed in the identification of children as having a language impairment. However, I include this brief description of their work here as a cautionary tale suggesting that careful use of normreferenced measures in assessing change begins with their careful use in identification processes. In short, despite their frequent use for the assessment of change, normreferenced tests are most useful when broad changes are expected and when clinicians are careful to avoid the several problems that can undermine the validity of their use for this purpose as well as for purposes of identification. Standardized CriterionReferenced Measures
Because criterionreferenced measures are more often developed so that they exhaustively examine knowledge within a given domain, they have been hailed as superior to normreferenced measures for purposes of examining change (Carver, 1974; McCauley, 1996; McCauley & Swisher, 1984). However, their relative rarity (as shown by the sampling of such tools in Table 10.1) means that their value in assessing language change in children has not been extensively evaluated. Clinicians need to examine documentation for such measures to determine whether the author has presented a reasonable evidence base supporting their use to examine change over time. Especially desirable is evidence suggesting that changes in performance of specific magnitudes are likely to reflect significant functional changes in performance. Nonetheless, where they are used as a simple description of the specific content on which gains have been achieved, such evidence is not as critical. Probes and Other Informal CriterionReferenced Measures
As argued throughout this book, probes have a relative advantage in their malleability to the specific clinical questions posed by the speechlanguage pathologist. Thus, they can be devised or selected to address very specific questions about change that coincide with the very focus of treatment for a given child. That they are often relatively brief and straightforward in interpretation represent further advantages. To contemplate the possible pitfalls of the use of probes, however, readers need only return to their description in chapter 10. Without the considerable effort entailed in standardization, cliniciandevised probes or probes that are borrowed from other nonstandardized sources are unknown with respect to reliability and validity. Although their possible fit to the question being asked presents a great potential for excellent construct validity, the tendency for probes to be haphazardly constructed,
Page 314 administered, and interpreted represents a potentially devastating threat to that potential. Because of the expectation that repeated use of probes will be required if they are to be used to assess change, the standardization strategies described in Figure 10.1 become particularly vital defenses against those threats. Dynamic Assessment Methods
The growing literature aimed at exploring the utility of dynamic assessment methods in predicting readiness for language change (Bain & Olswang, 1995; Long & Olswang, 1996; Olswang & Bain, 1996) supports a hopeful but questioning view regarding the uniformity with which such techniques succeed. Although by definition such methods are intended to elicit conditions that change a child’s likelihood of acquiring a more mature behavior, they may at times provide no more than transient predictions with a tenure that makes them of lesser value for signaling treatment focus. Nonetheless, exploration of their predictive value in specific domains and for specific clients warrants further investigation. In the meantime, their greatest promise appears to lie in the insights they provide regarding how intervention might best take place and in providing more valid assessments for children who are highly reactive to a testing. There are also numerous suggestions that they promise to provide more valid assessments than other available methods for children from diverse backgrounds who may lack the experiences assumed by more conventional testing methods. Single Subject Designs
In their groundbreaking work on the application of single subject experimental designs to speechlanguage pathology, McReynolds and Kearns (1983) noted that such designs had the promise of wide application by clinicians because of their practicality and clinical relevance. Despite their wide acceptance as an alternative method of scientific inquiry, however, such designs have been resisted by speechlanguage clinicians in daily practice—probably because their practicality falls short of that demanded by most clinical settings. Nonetheless, they remain the strongest available method when the clinical question at hand centers on whether treatment is the likely cause of observed changes in behavior. The most frequently used measures in single subject designs are elicited probes and other informal measures, which are referred to as dependent measures in this context. These informal measures often lack the documentation regarding validity and reliability that can adorn more formal measures. Nonetheless, their use is strengthened by their close tie to the specific construct for which they have been created or selected. Ideally, they represent highly defensible operationalizations of the behavior or ability of interest. Their use is further strengthened when measures of inter and intraexaminer agreement, or other basic measures aimed at demonstrating reliability, are obtained. They can also be enhanced by blind measurement procedures in which the person making the measurement is unaware of the purpose it will serve or, ideally, the individual on whom it was obtained (Fukkink, 1996).
Page 315 As part of the systematic structuring of observations that underlies the rationale behind single subject designs, dependent measures are obtained frequently and can thus provide persuasive evidence of consistency or change. In addition, the temporal structure of such designs is intended to provide logical support for the role of treatment versus alternative explanations as agents of change. On the basis of these ideals, single subject experimental designs have been lauded not only for their ability to provide superior evidence about causation at the level of the individual but also about both the outcome and process of treatment (McReynolds & Kearns, 1983; McReynolds & Thompson, 1986). A simple consideration of a few of the books on the subject suggests that detailed discussion of the methods and logic supporting the application of single subject designs in communication disorders is well beyond the scope of this book (e.g., Franklin, Allison, & Gorman, 1996; Kratochwill & Levin, 1992; McReynolds & Kearns, 1983). Nonetheless, a simple example can be used to illustrate the logic that supports causal interpretation of such designs and thus their potential for addressing the question of whether treatment is likely to be responsible for a child’s behavioral change. The example I show in Fig. 11.2 is a hypothetical example from Bain and Dollaghan (1991). It was described previously for its use of control, generalization, and treatment probes. It is described here for the way in which the stability of data, timing of treatment, and demonstrations of change lead one to the conclusion that observed changes probably resulted from treatment. As you look at Fig. 11.2, notice first the top graph, in which probes for the primary focus of treatment (Agent + Action) are studied first without the presence of treatment during a baseline condition. Because the baseline is clearly unchanging, it is reasonable to conclude that factors such as maturation, informal instruction by a parent, and so forth are not playing a role in the child’s acquisition of the target form prior to the initiation of treatment. Although the initiation of treatment does not result in instantaneous change, change does occur over the course of the treatment interval. Further, that change seems likely to be due to the effects of treatment rather than alternative explanatory factors because of the implausibility that such factors would commence by chance in such close proximity to the onset of treatment. Whereas in most single subject designs, the period labeled “withdrawal’’ is considered a second baseline, here it is described as withdrawal because the experimenter would probably expect some additional growth (generalization) due to learning effects. This kind of design in which treatment is absent, then present, then absent again is often referred to as an ABA or withdrawal design. ABA designs are often avoided in classical single subject designs in cases where an effective treatment would be expected to show “carryover” in this way. Instead such designs would more typically be used for behaviors that are expected to return to baseline when treatment is ended. When language development is studied, however, the presence of generalization is not considered a serious detractor from the logic of an experiment when it occurs as part of a set of predictions made in advance by the clinician or experimenter. In the second graph of Fig. 11.2, a second dependent measure (or generalization probe), Action + Object, is observed with the expectation that its relationship to the targeted variable, Agent + Action, will cause some developmental change to occur
Page 316
Fig. 11.2. A hypothetical multiple baseline single subject design that makes use of treatment (Agent + Action), generalization (Action + Object), and control (Entity + Locative) probes (Graphs 1, 2, and 3, respectively). From “The Notion of Clinically Significant Change,” by B. A. Bain and C. A. Dollaghan, 1991, Language, Speech, and Hearing Services in Schools, 22, p. 266. Copyright 1991 by the American SpeechLanguageHearing Association. American SpeechLanguage Hearing Association. Reprinted with permission. during treatment and possibly beyond. However, the presence of an initial period of stability prior to the onset of change in this measure is again helpful in strengthening the plausibility of the argument that the observed change is likely to result from the treatment rather than other factors. In addition, that argument is strengthened if the generalization probe does not improve to the same extent as the target probe, or does so following a delay relative to the actual target of treatment. In the third graph of Fig. 11.2, the control probe, Entity + Locative, is shown with a stable but longer baseline, thus indicating that extraneous variables are unlikely to
Page 317 be acting on the child’s language development for the entire duration of the baseline. It is important that the baseline for this variable, which was predicted to be unaffected by generalization, remained stable throughout the entirety of treatment directed at Agent + Action and its withdrawal period in order to support the treatment effect on the other variables. As importantly, it begins to show improvement only after the initiation of treatment in which it has become the direct target. The practical requirements in terms of data collection and display are not inconsequential for single subject designs. However, as this example illustrates, they do not have to be overly burdensome either, with the chief investment here being the periodic (and staggered) collection of probe data for two additional forms. This cost seems well worth it when weighed against the value of evidence documenting the effectiveness of the treatment used for two different targets and of realtime insights into the generalization patterns of the individual child. In addition to numerous books dealing more comprehensively with the large number of designs that can be applied in clinical settings (Franklin, Allison, & Gorman, 1996; Kratochwill & Levin, 1992; McReynolds & Kearns, 1983), a set of three classic articles (Connell & Thompson, 1986; Kearns, 1986; McReynolds & Thompson, 1986) represent a wonderful initiation to the promise such designs hold for clinicians interested in children’s language disorders. Practical Considerations With regard to assessing change, the largest practical consideration appearing on the horizon has been the presence of professional and societal forces urging clinicians to find measures that document the value of what they do on a broader scale and with greater regularity. Therefore, although other practical issues exist as very real pressures on clinicians’ decision making regarding all of the areas of change discussed in this chapter, the issue of outcome measurement seems to warrant the full attention of the remaining pages of this chapter and, indeed, the concluding pages of this book. In speechlanguage pathology, interest in how language treatment affects children has been around for quite some time (e.g., Schriebman & Carr, 1978; Wilcox & Leonard, 1978). However, a continuing complaint has been that not enough such research on treatment is being done (e.g., McReynolds, 1983; Olswang, 1998), and the research that is being done involves treatment procedures that, although useful for purposes of scientific rigor, cannot readily be applied to real clinical settings. Thus, the generalizability of a small research base has been at issue. Nonetheless, existing treatment research has provided at least some preliminary evidence of the effectiveness of treatment extending beyond the level of the individual clinician. More recently, interest in accountability (e.g., Eger, 1988; Eger et al., 1986; Mowrer, 1972) has arisen at a grassroots level because of growing demands from individual consumers and their advocates. This interest has been joined in an intense topdown fashion by ASHA as it responds to protect its members’ roles in fast changing health care and educational systems (Frattali, 1998a,b; Hicks, 1998). In a chapter addressing the specific nature of topdown pressures necessitating greater attention to outcomes
Page 318 assessment in speechlanguage pathology, Hicks (1998) described at least three sources of influence to which the profession must respond: 1. accrediting agencies (e.g., the Rehabilitation Accreditation Commission; the Joint Commission on Accreditation of Healthcare Organizations, JCAHO; ASHA’s Professional Services Board, PSB); 2. payer requirements (e.g., Medicare; Medicaid; and Managed Care Organizations, MCOs); and 3. legislative and regulatory requirements (e.g., Omnibus Budget Reconciliation Act of 1987, Public Law 100203, and the Social Security Act, Part 484) At first glance, these forces would seem to come primarily from those clinical settings that serve adults and, thus, it might be thought that they would not affect clinicians who work with children in primarily educational settings. However, as appreciation of the value of outcomes measures has become more widespread and as the great divide between education and healthcare breaks down (as illustrated in Medicaid funding for some children enrolled in school programs), the blissful luxury of considering treatment outcomes someone else’s challenge has all but disappeared. Eger (1998) noted that Congress’s passing of the Education of All Handicapped Children Act of 1975 (P.L. 94142) served as a possible precursor to formal outcomes measurement activities in special education because it included as one of its four main goals the assessment and assurance of educational effectiveness. The passage of the 1997 amendments to IDEA (P.L. 10517) further reinforces the importance of further developments in this area. In order to respond to the challenges facing the professions across settings, ASHA has begun the development of treatment outcomes measures that can be used by groups of clinicians to document their value and provide a basis for comparisons by important groups (e.g., school districts, thirdparty payers). At this point, readers who are unfamiliar with the terminology that accompanies outcomes measurement may feel a tad bewildered. Therefore, some background on the relationship between treatment efficacy research and treatment outcomes research seems in order. Despite some important underlying similarities and overlapping methods, an important distinction can be made between these two terms (Frattali, 1998a; Olswang, 1998). Olswang (1998) pointed out that both efficacy research and outcome research represent strategies for examining the influence of treatment on individuals with communication disorders. Nonetheless, whereas efficacy research emphasizes the importance of documenting treatment as a cause for change, outcomes research emphasizes the benefits associated with treatment as it is administered in realworld circumstances. Frattali (1998a) described the distinction quite succinctly by saying that “efficacy research is designed to prove,” whereas “outcomes research can only identify trends, describe, or make associations or estimates” (p. 18). Whereas past efficacy research has focused primarily on the behaviors that fall at the impairment level in terms of the ICIDH classification system, a broadening of concerns to embrace behaviors falling at the levels of disorder and handicap is an emerging trend (Olswang, 1998).
Page 319 Treatment efficacy is often defined as encompassing treatment effectiveness, efficiency, and effects (e.g., see Kreb & Wolf, 1997; Olswang, 1990, 1998). Treatment effectiveness refers to the traditional idea of whether or not a given treatment is likely to be responsible for observed changes in behavior. Treatment efficiency refers to the relative effectiveness of several treatments or to the role of components of a treatment in contributing to its effectiveness. Finally, treatment effects refers to the specific changes that can be seen in a constellation of behaviors in response to a given treatment. Similar components have also been identified as falling within the province of treatment outcomes as well (Kreb & Wolf, 1997). Whereas treatment efficacy research is usually conducted under optimal conditions, or at least wellcontrolled clinical conditions, outcomes measurement is, by definition, conducted under typical conditions (Frattali, 1998b; Olswang, 1998). On the downside, this means treatment outcomes research will almost never be able to contribute to arguments about the cause and effect relationships of treatments and observed benefits. Nonetheless, outcomes research will almost always be in a better position than treatment efficacy research to address concerns about the value of services offered to professional constituencies (e.g., within a given hospital or school district). Consequently, outcomes research has a very special value to individual clinicians. It can enable them to demonstrate accountability not in the abstract, based on treatments conducted solely by other clinician–researchers working under controlled conditions, but by comparing their own outcomes with those obtained by others through participation in the largescale, multisite efforts that are characteristic of such research. In 1997, the National Center for Treatment Effectiveness in Communication Disorders began work on a database that will involve clinicians in the collection of outcomes data on a national basis. This complex database, the National Outcomes Measurement System (NOMS), will eventually include information about all of the populations served by speechlanguage pathologists and audiologists. Currently, however, NOMS is limited to information about adults seen in healthcare settings, preschool children who are served in school or healthcare settings, and children in kindergarten through the sixth grade who are seen in schools. (Note that data concerning infant hearing screenings are just beginning to be collected.) In order to participate, schoolbased clinicians work cooperatively to provide data for a given school system in which at least 75% of the speechlanguage pathologists hold ASHA certification and in which all students will be included in the data that are collected. These two restrictions are designed to improve the quality and representativeness of the data. For schools, data for the NOMS are collected at the beginning and conclusion of services, or at the beginning and end of the school year, with data collection procedures designed to take no more than 5 to 10 minutes per child. Data include information about demographics, eligibility for services, the nature of treatment (i.e., model of services, amount, and frequency of services), teacher and family satisfaction, and the results of the Functional Communication Measures (FCMs), a 7 point scale developed by ASHA. The scale addresses functional performance within the educational environment. It includes items such as ‘‘The student responds to questions regarding everyday and classroom activities” and “The student knows and uses ageappropriate
Page 320 interaction with peers and staff.” These items are rated on the following scale: 0 = No basis for rating; 1 = Does not do; 2 = Does with maximal assistance; 3 = Does with moderate to maximal assistance; 4 = Does with moderate assistance; 5 = Does with minimal to moderate assistance; 6 = Does with minimal assistance; and 7 = Does. ASHA’s goals for the NOMS are lofty. Besides demonstrating positive outcomes for children receiving speechlanguage pathology services, it is hoped that the NOMS will facilitate administrative planning (e.g., caseload assignments) as well as individual decisions about intervention. Among particular aspirations are that it will provide information about when intervention is most effective, how much progress can be expected over an academic year, what service delivery model and frequency of service results in the greatest gains for a given kind of communication disorder, and what entrance and dismissal criteria are reasonable. In addition, it is hoped that comparative NOMS data might allow individual school systems or groups of school systems to demonstrate their effectiveness and efficiency in ways that will help them negotiate in an era of strained educational resources. The success of the system in meeting these goals will depend greatly on widespread participation allowing the representative samples required for specific generalizations such as those just described. In terms of the utility of the system for providing comparative data across school systems or units, a greater tailoring of reports available to participants may be necessary before those aspirations can be actualized. Beyond the NOMS, Eger (1998) described numerous ways in which an outcomes approach can be incorporated within school practice. These range from simple modifications of the way goals and objectives are written for individualized educational plans (IEPs) to the development of empirically motivated dismissal criteria to more elaborate investigations of effectiveness of specific service delivery models (e.g., classroombased interventions, selfcontained classroom). These three examples run the gamut from those that can be implemented by the individual clinician to those requiring more extensive resources, akin to those required by the NOMS. In terms of how the individual speechlanguage pathologists can modify the IEPs they write, Eger (1998) provided an example. She noted that a goal that might currently be written as “The student will improve expressive language skills” could be replaced with one or more of the following: “The student will apply problem solving and decision making skills in math and English classes,” ‘‘The student will use language to create dialogues with teachers and peers to facilitate learning,” or “The student will be able to follow written directions on objective tests” (Eger, 1998, p. 447). Regardless of whether speechlanguage pathologists working with children actively work to include an outcomes perspective in their practice, the outcomes movement will undoubtedly drive extensive changes in clinical practice over the next decade, especially as these relate to the documentation of change in children’s communication. Responsible reactions to these changes will depend on sensitivity to the measurement virtues (i.e., functionality and the development of common best practices) as well as the measurement perils. Many of these perils are those shared with all measurement strategies, such as concerns about the quality of data collection at its source and the size of the sample used for any particular decision. Some, however, are unique to such a large undertaking—the relinquishment of decisions about how interpretation
Page 321 will take place and, thus, the possible relinquishment of feelings of personal responsibility as well. Still, it is an exciting time for measurement in communication disorders, one in which sizeable resources may finally be funneled to some of the questions that most trouble speechlanguage pathologists. The desired outcome of such investments is the proliferation of innovative measurement strategies and refinement of existing tools to help us arrive at a sophisticated armamentarium of tools for addressing our clinical questions. Summary 1. The assessment of change underlies both critical and commonplace decisions made in the management of children’s language disorders. These include decisions about individuals, such as when to begin and end treatment and whether treatment tactics should be altered during the course of treatment. 2. When questions of treatment efficacy and accountability are raised, the assessment of change can also fuel decisions about the relative merit of various treatment approaches or the relative productivity of groups of clinicians. 3. Three types of outcomes observed in clinical settings include ultimate outcomes, intermediate outcomes, and instrumental outcomes. Whereas ultimate outcomes relate to decisions about treatment termination, intermediate and instrumental outcomes relate to clinical decisions made during the course of treatment. 4. Measurement error presents an especially difficult challenge to interpretation when measures are examined at multiple points in time, such as when past change is examined or future change is predicted. 5. Clinically significant change must not only be reliable, it must also represent an important change to the life of the child. Three methods used to address whether an observed change is likely to be important involve considerations of effect size, social validation, and the use of multiple measures. 6. Determining that positive changes in a child’s language are caused by treatment is made extraordinarily difficult by the thankfully unavoidable but nonetheless confounding influences of growth and development. Increased understanding of those influences within and across children are needed to help address this very thorny measurement problem. 7. Single subject experimental designs offer clinicians the best currently available means for demonstrating that treatment is responsible for observed changes, but have thus far been used primarily by researchers. 8. Measurement elements strengthening arguments that treatment is the cause of observed changes include the presence of pretreatment baselines and the use of treatment, generalization, and control probes. 9. Treatment efficacy research is concerned with documenting whether treatment is effective, efficient, and whether the effects of treatment extend to a number of significant behaviors. 10.
Page 322 Treatment outcomes research is designed to demonstrate benefits associated with treatment as it is conducted in everyday contexts. Cooperation from all members of the profession is needed to collect some kinds of particularly persuasive treatment outcomes data, such as those being collected in the NOMS database by ASHA. Key Concepts and Terms clinically significant change: a change that makes an immediate impact on the communicative life of a child or that represents significant progress toward the acquisition of critical aspects of language. effect size: the magnitude of the difference between two scores or sets of scores, or of the correlation between two sets of variables. Functional Communication Measures (FCMs): one of several rating scales designed by ASHA for use in tracking functional communication gains made by clients. gain scores: the difference between scores obtained by an individual at two points in time when that difference represents a positive change in performance; also called difference scores. instrumental outcomes: individual behaviors acquired during treatment that suggest the likelihood of additional change; generalization probe data function as instrumental outcomes. intermediate outcomes: individual behaviors that must be acquired for progress in treatment to have occurred; treatment probe data can function as intermediate outcomes. National Outcomes Measurement System (NOMS): an outcomes database for speechlanguage pathology and audiology that is being developed to address the professions’ need for largescale outcomes data. outcome measurement: the use of measures designed to describe the effects of treatment conducted under typical, rather than controlled conditions. Proportional Change Index (PCI): a method for examining the rate of change observed in a given behavior during treatment relative to that observed prior to treatment. single subject experimental designs: a group of related research designs that permit the user to support claims of causal relationship between variables, such as the effect of treatment on a targeted behavior. social comparison: a social validation method that involves the use of a comparison between language behaviors of a given child or group of children and those of a small group of peers. social validation: methods used to indicate the social importance of changes occurring in treatment.
Page 323 subjective evaluation: a social validation method in which procedures are used to determine whether individuals who interact frequently with a child who is receiving treatment see perceived changes as important. treatment effectiveness: the demonstration that a treatment, rather than other variables, is responsible for changes in behavior (Kreb & Wolf, 1997; Olswang, 1990). treatment effects: changes in multiple behaviors that appear to result from a given treatment (Olswang, 1990). treatment efficacy research: research designed to demonstrate the complex property of a treatment that includes its effectiveness, efficiency, and effects (Olswang, 1990, 1998). treatment efficiency: the effectiveness of a treatment relative to an alternative; a more efficient treatment is one in which goals are accomplished more rapidly, completely, or more costeffectively than a less efficient treatment (Olswang, 1990). ultimate outcomes: individual behaviors that signal successful treatment, either because ageappropriate or functionally adequate levels of performance had been achieved or because further treatment would be unlikely to yield significant additional gains. Study Questions and Questions to Expand Your Thinking 1. Arrange to see a clinical case file for a child who is receiving treatment for a language disorder. List the ways in which change is currently documented. Consider ways in which that documentation might be strengthened including how efforts might be made to address changes in educational or social function as well as in the nature of impairment. 2. Discuss the advantages and disadvantages of using a standard battery of normreferenced tests to look at a child’s overall language functioning over time. If you were to devise such a battery, what would you look for in its components? Would that battery differ on the basis of the etiology of the disorder? If so, how? 3. With regard to the different tools that might be used to examine change, discuss how you might explain that method to a child’s parents. 4. Visit the web site for the NOCMS at http://www.asha.org/nctecd/treatment_outcomes.htm. Determine what barriers might exist to participating in the NOMS. On the basis of the information you obtained in this chapter and through that web site, what arguments might be made to justify efforts to overcome these barriers? 5. Look at the treatment efficacy studies for child language disorders collected at the NOMS web site under the Efficacy Bibliographies link. On the basis of the information you can glean from reading the titles of articles listed there, what kinds of aspects of treatment efficacy seem to have gotten the greatest attention? 6. On the basis of what you know about clinical decisions regarding change, discuss specific changes that might warrant the use of a method such as a single subject
Page 324 design or social validation techniques. Although these methods are more complex than some other methods, they have the respective advantages of demonstrating the clinician’s responsibility for change or the social impact of change. Recommended Readings Bain, B. A., & Dollaghan, C. (1991). The notion of clinically significant change. Language, Speech, and Hearing Services in Schools, 22, 264–270. Kazdin, A. E. (1999). The meanings and measurement of clinical significance. Journal of Consulting and Clinical Psychology, 67, 332–339. Kreb, R. A., & Wolf, K. E. (1997). Treatment outcomes terminology. In R. A. Kreb & K. E. Wolf (Eds.), Successful operations in the treatmentoutcomes driven world of managed care. Rockville, MD: National Student SpeechLanguageHearing Association. Schwartz, I. S., & Olswang, L. B. (1996). Evaluating child behavior change in natural settings: Exploring alternative strategies for data collection. Topics in Early Childhood Special Education, 16, 82–101. References Anastasi, A. (1982). Psychological testing (5th ed). New York: Macmillan. Bain, B. A., & Dollaghan, C. (1991). The notion of clinically significant change. Language, Speech, and Hearing Services in Schools, 22, 264–270. Bain, B. A., & Olswang, L. B. (1995). Examining readiness for learning twoword utterances by children with specific expressive language impairment: Dynamic assessment validation. American Journal of SpeechLanguage Pathology, 4, 81–91. Bernthal, J. E., & Bankson, N. W. (1998). Articulation and phonological disorders (4th ed.). Englewood Cliffs, NJ: PrenticeHall. Campbell, T., & Bain, B. A. (1991). Treatment efficacy: How long to treat: A multiple outcome approach. Language, Speech, and Hearing Services in Schools, 22, 271–276. Campbell, T., & Dollaghan, C. (1992). A method for obtaining listener judgments of spontaneously produced language: Social validation through direct magnitude estimation. Topics in Language Disorders, 12 (2), 42–55. Carver, R. (1974). Two dimensions of tests: Psychometric and edumetric. American Psychologist, 29, 512–518. Connell, P. J., & Thompson, C. K. (1986). Flexibility of singlesubject experimental designs. Part III: Using flexibility to design and modify experiments. Journal of Speech and Hearing Disorders, 51, 214–225. Cook, T. D., & Campbell, D. T. (1979). Quasiexperimentation: Design and analysis issues for field settings. Boston: Houghton Mifflin. Diedrich, W. M., & Bangert, J. (1980). Articulation learning. Houston, TX: CollegeHill Press. Education for All Handicapped Children Act of 1975. Pub. L. No. 94–142. 89 Stat. 773 (1975). Eger, D. (1988). Accountability in action: Entry, measurement, exit. Seminars in Speech and Language, 9, 299–319. Eger, D. (1998). Outcomes measurement in the schools. In C. Frattali (Ed.), Measuring outcomes in speechlanguage pathology (pp. 438–452). New York: Thieme. Eger, D., Chabon, S. S., Mient, M. G., & Cushman, B. B. (1986). When is enough enough? Articulation therapy dismissal considerations in the public schools. Asha, 28, 23–25. Elbert, M., Shelton, R. L., & Arndt, W. B. (1967). A task for evaluation of articulation change: I. Development of methodology. Journal of Speech and Hearing Research, 10, 281–289. Fergusson, D. M., Horwood, L. J., Caspi, A., Moffitt, T. E., & Silva, P. A. (1996). The (artefactual) remission of reading disability: Psychometric lessons in the study of stability and change in behavioral development. Developmental Psychology, 32, 132–140.
Page 325 Fey, M. (1988). Generalization issues facing language interventionists: An introduction. Language, Speech, and Hearing Services in Schools, 19, 272–2 1. Foster, S. L., & Mash, E. J. (1999). Assessing social validity in clinical treatment research: Issues and procedures. Journal of Consulting and Clinical Psychology, 67, 308–319. Franklin, R. D., Allison, D. B., & Gorman, B. S. (Eds.). (1996). Design and analysis of singlecase research. Mahwah, NJ: Lawrence Erlbaum Associates. Franklin, R. D., Gorman, B. S., Beasley, T. M., & Allison, D. B. (1996). Graphical display and visual analysis. In R. D. Franklin, D. B. Allison, & B. S. Gorman (Eds.), Design and analysis of singlecase research (pp. 119–158). Mahwah, NJ: Lawrence Erlbaum Associates. Frattali, C. (1998a). Measuring modalityspecific behaviors, functional abilities, and quality of life. In C. Frattali (Ed.), Measuring treatment outcomes in speech language pathology (pp. 55–88). New York: Thieme. Frattali, C. (Ed.). (1998b). Measuring treatment outcomes in speechlanguage pathology. New York: Thieme. Fukkink, R. (1996). The internal validity of aphasiological singlesubject studies. Aphasiology, 10, 741–754. Glesne, C., & Peshkin, A. (1992). Becoming qualitative researchers: An introduction. White Plains, NY: Longman. Goldfried, M. R., & Wolfe, B. E. (1998). Toward a more clinically valid approach to therapy research. Journal of Consulting and Clinical Psychology, 66, 143– 150. Goldstein, H., & Geirut, J. (1998). Outcomes measurement in child language and phonological disorders. In C. Frattali (Ed.), Measuring outcomes in speech language pathology (pp. 406–437). New York: Thieme. Gorman, B. S., & Allison, D. B. (1996). Statistical alternatives for singlecase designs. In R. D. Franklin, D. B. Allison, & B. S. Gorman (Eds.), Design and analysis of singlecase research (pp. 159–214). Mahwah, NJ: Lawrence Erlbaum Associates. Hicks, P. L. (1998). Outcomes measurement requirements. In C. Frattali (Ed.), Measuring outcomes in speechlanguage pathology (pp. 28–49). New York: Thieme. Individuals with Disabilities Education Act (IDEA) Amendments of 1997. Pub. L. 105–17. 111 Stat. 37 (1997). Jacobson, N. S., Roberts, L. J., Berns, S. B. & McGlinchey, J. B. (1999). Methods for defining and determining the clinical significance of treatment effects: Description, application, and alternatives. Journal of Consulting and Clinical Psychology, 67, 300–307. Kamhi, A. (1991). Clinical forum: Treatment efficacy, an introduction. Language, Speech and Hearing Services in Schools, 22, 254. Kazdin, A. E. (1977). Assessing the clinical or applied significance of behavioral change through social validation. Behavior Modification, 1, 427–452. Kazdin, A. E. (1999). The meanings and measurement of clinical significance. Journal of Consulting and Clinical Psychology, 67, 332–339. Kazdin, A. E., & Weisz, J. R. (1998). Identifying and developing empirically supported child and adolescent treatments. Journal of Consulting and Clinical Psychology, 66, 19–36. Kearns, K. P. (1986). Flexibility of singlesubject experimental designs. Part II: Design selection and arrangement of experimental phases. Journal of Speech and Hearing Disorders, 51, 204–214. Koegel, R., Koegel, L. K., Van Voy, K., & Ingham, J. (1988). Withinclinic versus outsideofclinic selfmonitoring of articulation to promote generalization. Journal of Speech & Hearing Disorders, 53, 392–399. Kratochwill, T. R., & Levin, J. R. (1992). Singlecase research design and analysis: New directions for psychology and education. Hillsdale, NJ: Lawrence Erlbaum Associates. Kreb, R. A., & Wolf, K. E. (1997). Treatment outcomes terminology. Successful operations in the treatmentoutcomes driven world of managed care. Rockville, MI: National Student SpeechLanguageHearing Association. Lahey, M. (1988). Language disorders and language development. New York: Macmillan. Long, S. H., & Olswang, L. B. (1996). Readiness and patterns of growth in children with SELI. Language, Speech, and Hearing Serivces in Schools, 5, 79–85. Maloney, D. M., Harper, T. M., Braukmann, C. J., Fixsen, D. L., Phillips, E. L., & Wolf, M. M. (1976). Teaching conversationrelated skills to predelinquent girls. Journal of Applied Behavioral Analysis, 9, 371.
Page 326 McCauley, R. J. (1996). Familiar strangers: Criterionreferenced measures in communication disorders. Language, Speech, and Hearing Services in Schools, 27, 122–131. McCauley, R. J., & Swisher, L. (1984). Use and misuse of normreferenced tests in clinical assessment: A hypothetical case. Journal of Speech and Hearing Disorders, 49, 338–348. McReynolds, L. V. (1983). Discussion: VII. Evaluating program effectiveness. ASHA Reports 12, 298–306. McReynolds, L. V., & Kearns, K. P. (1983). Singlesubject experimental designs in communicative disorders. Austin, TX: ProEd. McReynolds, L. V., & Thompson, C. K. (1986). Flexibility of singlesubject experimental designs. Part I: Review of the basics of singlesubject designs. Journal of Speech and Hearing Disorders, 51, 194–203. Mehrens, W., & Lehman, I. (1980). Standardized tests in education (3rd ed.). New York: Holt, Rinehart & Winston. Minkin, N., Braukmann, C. J., Minkin, B. L., Timbers, G. D., Timbers, B. J., Fixsen, D. L., Phillips, E. L., & Wolf, M. M. (1976). The social validation and training of conversational skills. Journal of Applied Behavioral Analysis, 9, 127–139. Mowrer, D. (1972). Accountability and speech therapy in the public schools. Asha, 14, 111–115. Olswang, L. B. (1990). Treatment efficacy research: A path to quality assurance. Asha, 32, 45–47. Olswang, L. B. (1993). Treatment efficacy research: A paradigm for investigating clinical practice and theory. Journal of Fluency Disorders, 18, 125–131. Olswang, L. B. (1998). Treatment efficacy research. In C. Frattali (Ed.), Measuring treatment outcomes in speechlanguage pathology (pp. 134–150). New York: Thieme. Olswang, L. B., & Bain, B. A. (1985). Monitoring phoneme acquisition for making treatment withdrawal decisions. Applied Psycholinguistics, 6, 17–37. Olswang, L. B., & Bain, B. A. (1994). Data collection: Monitoring children’s treatment progress. American Journal of SpeechLanguage Pathology, 3, 55–66. Olswang, L. B., & Bain, B. A. (1996). Assessment information for predicting upcoming change in language production. Journal of Speech and Hearing Research 39, 414–423. Parsonson, B. S., & Baer, D. M. (1992). The visual analysis of data, and current research into the stimuli controlling it. In T. R. Kratochwill & J. R. Levin (Eds.), Singlecase research design and analysis: New directions for psychology and education (pp. 15–40). Hillsdale, NJ: Lawrence Erlbaum Associates. Pedhazur, R. J., & Schmelkin, L. P. (1991). Measurement, design, and analysis: An integrated approach. Hillsdale, NJ: Lawrence Erlbaum Associates. Plante, E., & Vance, R. (1994). Selection of preschool speech and language tests: A databased approach. Language, Speech, and Hearing Services in Schools, 25, 15–23. Primavera, L. H., Allison, D. B., & Alfonso, V C. (1996). Measurement of dependent variables. In R. D. Franklin, D. B. Allison, & B. S. Gorman (Eds.), Design and analysis of singlecase research (pp. 41–89). Mahwah, NJ: Lawrence Erlbaum Associates. Rosen, A., & Proctor, E. K. (1978). Distinctions between treatment outcomes and their implications for treatment process: The basis for effectiveness research. Journal of Social Service Research, 2, 25–43. Rosen, A., & Proctor, E. K. (1981). Distinctions between treatment outcomes and their implications for treatment evaluation. Journal of Consulting and Clinical Psychology, 49, 418, 425. Salvia, J., & Ysseldyke, J. E. (1995). Assessment. (6th ed.). Boston: Houghton Mifflin. Schmidt, R. A., & Bjork, R. A. (1992). New conceptualizations of practice: Common principles in three paradigms suggest new concepts for training. Psychological Sciences, 3, 207–217. Schreibman, L., & Carr, E. G. (1978). Elimination of echolalic responding to questions through the training of a generalized verbal response. Journal of Applied Behavior Analysis, 11, 453–463. Schwartz, I. S., & Olswang, L. B. (1996). Evaluating child behavior change in natural settings: Exploring alternative strategies for data collection. Topics in Early Childhood Special Education, 16, 82–101. Semel, E., Wiig, E. H., & Secord, W. A. (1996). Clinical Evaluation of Language Fundamentals 3. San Antonio, TX: Psychological Coproration. Sturner, R. A., Layton, T. L., Evans, A. W, Heller, J. H., Funk, S. G., & Machon, M. W. (1994). Preschool speech and language screening: A review of currently available tests. American Journal of Speech, Language, and Hearing, 3, 25–36.
Page 327 Vygotsky, L. S. (1978). Mind in society: The development of higher psychological processes. Cambridge: Harvard University Press. Wilcox, M. J., & Leonard, L. B. (1978). Experimental acquisition of Whquestions in languagedisordered children. Journal of Speech and Hearing Research, 21, 220–239. Wolery, M. (1983). Proportional change index: An alternative for comparing child change data. Exceptional Children, 50, 167–170. Wolf, M. M. (1978). Social validity: The case for subjective measurement or how applied behavior analysis is finding its heart. Journal of Applied Behavior. Analysis, 11, 203–214. Young, M. A. (1993). Supplementing tests of statistical significance: Variation accounted for. Journal of Speech and Hearing Research, 36, 644–656.
Page 328
APPENDIX A
Page 329 NormReferenced Tests Designed for the Assessment of Language in Children, Excluding Those Designed Primarily for Phonology (Appendix B)
Test
Ages
Oral Language Modalities and Domains
Written Language Included?
Complete Reference
Reviewed in MMY? (x = Computer Form)
Assessing Semantic Skills Through Everyday Themes
3 to 9 years
R and ESem no
Barrett, M., Zachman, L., & Huisingh, R. (1988). Assessing Semantic Skills Through Everyday Themes. East Moline, IL: LinguiSystems.
no
Bankson Language Test–2
3 years to 6 years, 11 months
ESem, Morph, Syn, Prag
no
Bankson, N. W. (1990). Bankson Language Test–2. San Antonio, TX: ProEd.
x
Boehm Test of Basic Concepts–Preschool
3 to 5 years
RSem
no
Boehm, A. E. (1986). Boehm Test of Basic Concepts– Preschool Version. San Antonio, TX: Psychological Corporation.
x
Boehm Test of Basic Concepts–Revised
Kindergarten to Grade 2
RSem
no
Boehm, A. E. (1986). Boehm Test of Basic Concepts–Revised. x San Antonio, TX: Psychological Corporation.
Bracken Basic Concept Scale– Revised
2½ to 8 years RSem
no
Bracken, B. A. (1986). Bracken Basic Concept Scale. San Antonio, TX: Psychological Corporation.
x
Carrow Elicited Language Inventory
3 years to 7 years, 11 months
EMorph, Syn no
CarrowWoolfolk, E. (1974). Carrow Elicited Language Inventory. Austin, TX: Learning Concepts.
x
Clinical Evaluation of Language Fundamentals–3
6 to 21 years
R and ESem, Syn, Rapid no Naming
Semel, E., Wiig, E. H., & Secord, W. A. (1996). Clinical Evaluation of Language Fundamentals–3. San Antonio, TX: Psychological Corporation.
x
Clinical Evaluation of Language Fundamentals– Preschool
3 to 6 years, 11 months
R and ESem, no Syn
Wiig, E. H., Secord, W., & Semel, E. (1992). Clinical Evaluation of Language Fundamentals–Preschool. San Antonio, TX: Psychological Corporation.
x
(Continued)
Page 330 Appendix A (Continued)
Test Communication Abilities Diagnostic Test
Ages
3 to 9 years
Comprehensive Assessment of Spoken 3 to 21 years Language
Oral Language Modalities and Domains
Written Language Included?
Complete Reference
Reviewed in MMY? (x = Computer Form)
R and ESem, no Syn, Prag
Johnston, E. B., & Johnston, A. V. (1990). Communication Abilities Diagnostic Test. Chicago: Riverside.
x
R and ESem, Morph, no Syntax, Prag
CarrowWoolfolk, E. (1999). Comprehensive Assessment of Spoken Language. Circle Pines, MN: American Guidance Service.
no
Comprehensive Receptive and 4 to 17 years, R and ESem no Expressive Vocabulary 11 months Test
Wallace, G., & Hammill, D. D. (1994). Comprehensive Receptive and Expressive Vocabulary Test. San Antonio, TX: x Psychological Corporation.
Evaluating Acquired Skills in Communication– Revised
Riley, A. M. (1991). Evaluating Acquired Skills in Communication–Revised. San Antonio, TX: Psychological Corporation.
x
no
Gardner, M. F. (1990). Expressive OneWord Picture Vocabulary Test–Revised. Austin, TX: ProEd.
x
no
Williams, K. T. (1997). Expressive Vocabulary Test. Circle Pines, MN: American Guidance Service.
no
Thorum, A. R. (1986). Fullerton Language Test for Adolescents (2nd ed.). San Antonio, TX: ProEd.
x
R and ESem, 3 months to 8 Morph, no years Syntax, Prag
Expressive OneWord Picture Vocabulary 2 to 12 years Test Revised
ESem
Expressive Vocabulary 2½ to 90 years E Test Fullerton Language Test for Adolescents
11 years to adult
R and ESem, no Morph, Syntax
Language Processing Test–Revised
5 to 11 years, ESem 11 months
no
Richard, G. J., & Hanner, M. A. (1985). Language Processing x Test–Revised. East Moline, IL: LinguiSystems.
Oral and Written Language Scales: Listening Comprehension and Oral Expression
3 to 21 years R and E for oral
no
CarrowWoolfolk, E. (1995) Oral and Written Language Scales: Listening Comprehension and Oral Expression. Circle x Pines, MN: American Guidance Service.
Page 331 Oral and Written Language Scales: Written Expression
5 to 21 years
E
Writing Morph, Syn
CarrowWoolfolk, E. (1996). Oral and Written Language Scales: Written Expression. Circle Pines, MN: American Guidance Service.
x
Patterned Elicitation Syntax Test With Morphophonemic Analysis
3½ to 7 years
ESem, Morph, Syn
no
Young, E. C., & Perachio, J. J. (1993). The Patterned Elicitation Syntax Test with Morphophonemic Analysis. Tucson, AZ: Communication Skill Builders.
x
Peabody Picture Vocabulary Test–III
2½ to 90+ years
RSem
no
Dunn, L., & Dunn, L. (1997). Peabody Picture Vocabulary Test–III. Circle Pines, MN: American Guidance Service.
no
R and E
no
Porch, B. E. (1979). Porch Index of Communicative Ability in x Children. Chicago: Riverside.
Porch Index of Communicative Ability 4 to 12 years in Children Preschool Language Scale–3
Birth to 6 years, 11 months
R and ESem, no Morph, Syntax
Zimmerman, I. L., Steiner, V., & Pond, R. (1992). Preschool Language Scale–3. San Antonio, TX: Psychological Corporation.
no
Receptive OneWord 12 years to 15 Picture Vocabulary years, 11 RSem Test–Upper Extension months
no
Brownell, R. (1987). Receptive OneWord Picture Vocabulary TestUpper Extension. Novato, CA: Academic Therapy x Publications.
Receptive OneWord 2 years, 11 Picture Vocabulary months to 12 Test years
RSem
no
Gardner, M. F. (1985). Receptive OneWord Picture x Vocabulary Test. Novato, CA: Academic Therapy Publications.
Reynell Developmental 1 year to 6 Language Scales–U.S. years, 11 Edition months
R and E
no
Reynell, J., & Gruber, C. P. (1990). Reynell Developmental Language Scale. US. Edition. Windsor, Ontario, Canada: NFERNelson.
x
Structured Photo graphic Expressive Language Test–II
4 to 9 years, 5 EMorph, Syn no months
Werner, E., & Kresheck, J. D. (1983). Structured Photographic MMY9a Expressive Language Test–II. Sandwich, IL: Janelle.
Test for Examining Expressive Morphology
3 years to 7 years, 11 months
Shipley, K. G., Stone, T. A., & Sue, M. B. (1983). Test for Examining Expressive Morphology. Tucson, AZ: Communication Skill Builders.
(Continued)
ESyn
no
MMY10b
Page 332 Appendix A (Continued)
Test
Ages
Oral Language Modalities and Domains
Written Language Included?
Complete Reference
Reviewed in MMY? (x = Computer Form)
Hammill, D. D., Brown, V. L., Larsen, S. C., & Wiederholt, J. Test of Adolescent and R and ESem, Writing: Sem, 12 to 21 years L. (1994). Test of Adolescent and Adult Language–3. Austin, x Adult Language–3 Morph, Syn Syn TX: ProEd. Test of Adolescent/ Adult Word Finding
12 to 80 years EWF
no
German, D. J. (1990). Test of Adolescent/Adult Word Finding. x San Antonio, TX: Psychological Corporation.
Test of Auditory Comprehension of Language–3
3 years to 9 years, 11 months
RSem, Morph, Syn
no
CarrowWoolfolk, E. (1999). Test of Auditory Comprehension no of Language–3. Austin, TX: ProEd.
Test of Children’s Language
5 years to 8 years, 11 months
E
Reading, writing
Barenbaum, E., & Newcomer, P. (1996). Test of Children’s Language. San Antonio, TX: ProEd.
x
R and ESem, no Syn
Hresko, W. P., Reid, K., & Hammill, D. D. (1991). Test of Early Language Development (2nd ed.). Austin, TX: Proed.
x
Test of Language Competence— Expanded
5 to 18 years, R and ESem, no 11 months Syn, Prag
Wiig, E. H., & Secord, W. (1989). Test of Language Competence—Expanded Edition. San Antonio: Psychological Corporation.
x
Test of Language Development– Intermediate: 3
8 years to 12 R and ESem, years, 11 no Syn months
Hammill, D. D., & Newcomer, P. L. (1997). Test of Language Development—Intermediate: 3. Circle Pines, MN: American x Guidance Service.
Test of Language Development— Primary: 3
4 years to 8 years, 11 months
no
Newcomer, P., & Hammill, D. (1997). Test of Language Development—Primary: 3. Austin, TX: ProEd.
x
Test of Pragmatic Language
5 to 13 years, R and E 11 months
no
PhelpsTerasaki, D., & PhelpsGunn, T. (1992). Test of Pragmatic Language. San Antonio, TX: Psychological Corporation
x
Shulman, B. B. (1986). Test of Pragmatic Skills (Revised). Tucson, AZ: Communication Skill Builders.
no
3 years to 7 Test of Early Language years, 11 Development months
Test of Pragmatic Skills 3 to 8 years (Revised)
R and E Phon, Sem, Syn
R and ESem, no Prag
Page 333 Test of Relational Concepts
3 years to 7 years, 11 months
RSem
no
Edmonston, N., & Thane, N. L. (1988). Test of Relational Concepts. Austin, TX: ProEd.
Test of Word Finding
6½ to 12 years, 11 months
EWF
no
German, D. J. (1989). Test of Word Finding. San Antonio, TX: x Psychological Corporation.
Test of Word Finding in Discourse
6½ to 12 years, 11 months
EWF
no
German, D. J. (1991). Test of Word Finding in Discourse. Chicago: Riverside Publishing.
x
Test of Word Knowledge
5 to 17
R and ESem no
Wiig, E. H., & Secord, W. (1992). Test of Word Knowledge. San Antonio, TX: Psychological Corporation.
x
Test of Written Expression
6½ years to 14 years, 11 — months
Writing
McGhee, R., Bryant, B. R., Larsen, S. C., & Rivera, D. M. (1995). Test of Written Expression. San Antonio: ProEd.
x
Test of Written Language2
17 years, 11 months
Writing
Hammill, D. D., & Larsen, S. C. (1988). Test of Written Language–2. San Antonio, TX: Psychological Corporation.
x
The Word Test Adolescent
12 years to 17 years, 11 ESem months
no
Bowers, L., Huisingh, R., Orman, J., Barrett, M., & LoGiudice, C. (1989). The Word TestAdolescent. East Moline, IL: no LinguiSystems.
The Word Test Revised Elementary
7 to 11 years
ESem
no
Bowers, L., Huisingh, R., Barrett, M., & LoGiudice, C., & Orman, J. (1990). The Word Test–Revised Elementary. East Moline, IL: LinguiSystems.
no
Token Test for Children
3 to 12 years
RSem, Syn
no
DiSimoni, F. (1978). Token Test for Children. Chicago: Riverside.
MMY9
E
3 years to 10 Utah Test of Language years, 11 R and E Syn no Development–3 months Woodcock Language Proficiency Battery– Revised
2 to 95 years
x
Mecham, M. J. (1989). Utah Test of Language Development– x 3. Austin, TX: ProEd.
R and ESem, Woodcock, R. W. (1991). Woodcock Language Proficiency– reading, writing x Syn Revised. Chicago: Riverside.
Note. Modalities and domains are abbreviated as follows: Receptive (R), Expressive (E), Semantics (Sem), Morphology (Morph), Syntax (Syn), Pragmatics (Prag), Phonology (Phon), and Word Finding (WF). The presence of a review in the Mental Measurements Yearbook (MMY) database or print series is noted in the final column, with x indicating a computerized version and numerals representing the specific print volume containing the review. aMitchell, J. V. (Ed.). (1985). The ninth mental measurements yearbook. Lincoln, NE: Buros Institute of Mental Measurement. bConoley, J. C., & Kramer, J. J. (Eds.). (1989). The tenth mental measurements yearbook. Lincoln, NE: Buros Institute of Mental Measurement.
Page 334
APPENDIX B
Page 335 NormReferenced and CriterionReferenced Tests Designed Primarily for the Assessment of Phonology in Children
Test with Reference Information
Criterion referenced (CR) and/or Norm referenced (NR)
Ages
Stimuli, Processes, and Other Features
Reviewed in MMY? (x = computer form)
Assessment Link Between Phonology and Articulation: ALPHA (Revised ed.) (Lowe, 1995). Mifflinville, PA: Speech and Language Resources.
NR/CR
3–0 to 8–11
Sentence or single words elicited using delayed imitation no and pictures; 15 processes examined
Assessment of Phonological Processes–Revised (Hodson, 1986). Danville, IL: Interstate Press.
CR
Preschool to age 10
Single words elicited using objects; 30 processes including 10 basic processes that are used in calculating x overall score that allows clarification of severity
Arizona Articulation Proficiency Scale, 2nd ed. (Fudala & Reynolds, 1994). Los Angeles: Western Psychological Services.
NR/CR
Single words and stimuli for elicited connected speech; consonants, consonant clusters, vowels, and diphthongs 1–6 to 13–11 x are assessed; omission, substitution, and distortion error analysis; also allows calculation of severity
BanksonBernthal Test of Phonology (Bankson & Bernthal, 1990). Chicago: Riverside Press.
NR/CR
3–0 to 9–11
Single words; 10 most frequently occurring processes in x standardization samples
FisherLogemann Test of Articulatory Competence (Fisher & Logemann, 1971). Boston: Houghton Mifflin.
CR
2 to 3 years and up
Single word and sentence forms; including consonants, consonant clusters, vowels, and diphthongs are assessed; ? place/manner/voicing analysis only
(Continued)
Page 336 Appendix B (Continued)
Test with Reference Information
Criterion referenced (CR) and/or Norm referenced (NR)
Ages
Stimuli, Processes, and Other Features
Reviewed in MMY? (x = computer form)
GoldmanFristoe Test of Articulation–2 (Goldman & NR/CR Fristoe, 2000). Austin, TX: ProEd.
2 to 21 years
44 single words and 2 sets of pictures for connected speech elicitation; error analysis does not include x features, but the KhanLewis is designed for use with the earlier version of this test
Kaufman Speech Praxis Test for Children (Kaufmann, NR/CR 1995). Detroit: Wayne State University Press.
2 years to 6 years
Limited normative data; assess productions at 4 levels: oral movement, simple phonemic/syllabic, complex phonemic/syllabic, and spontaneous length and complexity
no
x
KhanLewis Phonological Analysis (Khan & Lewis, 1986). Circle Pines, MN: American Guidance Service.
NR/CR
2 to 6 years
Stimulus materials are those of the GoldmanFristoe Test of ArticulationRevised; 15 phonological processes; one of the few tests with normative data for processes
Natural Process Analysis (Shriberg & Kwiatkowski, 1980). New York: John Wiley & Sons.
CR
any age
Analysis method for continuous speech sample; 8 natural no processes
Phonological Process Analysis (Weiner, 1979). Baltimore: University Park Press, 1979.
CR
preschool children
Single words or sentences; 16 phonological processes
no
Page 337
Photo Articulation Test (Pendergast, Dickey, Selmar & NR/CR Soder, 1984). Austin, TX: ProEd.
Screening Test for Developmental Apraxia of Speech NR/CR (Blakely, 1980). Austin, TX: ProEd.
3 to 12 years
Single word and stimuli to elicited connected speech; including consonants, consonant clusters, and diphthongs; omissions, substitutions, and distortions are scored
MMY9a
4 to 12 years
Scores given for expressive language discrepancy, vowels and diphthongs, oral motor movement, verbal sequencing, articulation, motorically complex words, transpositions, prosody, and total; based on a small population
x
SCAT. Secord Consistency of Articulation Tests (Secord, 1997). Sedona, AZ: Red Rock Educational Publications.
CR
all ages
Two test components (1) Contextual Probes of Articulation Competence(CPAC) probes for production of individual sounds and processes in words, no clusters and sentence, (2) SPAC Storytelling Probes of Articulation Competence(SPAC) probes for production in a narrative task
SmitHand Articulation and Phonology Evaluation (SHAPE; Smit & Hand, 1997). Los Angeles: Western Psychological Services.
NR/CR
3 to 9 years
Single words elicited through pictures or delayed imitation; 11 processes examined
3 to 8 years
Single words; several subtests, including screening, Iowa Pressure consonants test (those affected by MMY7b velopharyngeal insufficiency), vowels, and diphthongs; omissions, substitutions, and distortions are scored
TemplinDarley Tests of Articulation (Templin & Darley, 1969). Iowa City, IA: Bureau of Educational Research and Service, University of Iowa.
NR/CR
no
Note. The presence of a review in Mental Measurements Yearbook (MMY), with x indicating a computerized version and numerals representing the specific print volume containing the review. aMitchell, J. V. (Ed.). (1985). The ninth mental measurements yearbook. Lincoln, NE: Buros Institute of Mental Measurement. bBuros, O. K. (Ed.). (1992). The seventh mental measurements yearbook. Highland Park, NJ: Gryphon Press.
Page 338
Page 339
AUTHOR INDEX Entries in italics appear in reference lists. A Abbeduto, L., 149, 155, 166 Abkarian, G., 160, 164 Aboitiz, F., 119, 141 Agerton, E. P., 273, 287 Aitken, K., 169, 186 Alcock, K., 118, 145 Alfonso, V. C., 264, 280, 291, 306, 326 Allen, D., 114, 130, 143, 171, 172, 173, 178, 186, 233 Allen, J., 175, 184 Allen, M. J., 22, 47, 55, 56, 57, 58, 59, 66, 68, 76 Allen, S., 231, 245 Allison, D. B., 264, 280, 291, 306, 307, 308, 315, 317, 325, 326 Ambrose, W. R., 222, 246 American College of Medical Genetics, 153, 164 American Educational Research Association (AERA), 10, 12, 31, 47, 50, 62, 72, 75, 76, 89, 96, 105, 107, 252, 287 American Psychiatric Association, 111, 114, 115, 130, 134, 140, 148, 149, 150, 161, 164, 169, 170, 171, 172, 173, 178, 180, 181, 182, 183, 184 American Psychological Association (APA), 10, 12, 31, 47, 50, 62, 72, 75, 76, 89, 96, 105, 107, 217, 228, 244, 252, 287 American SpeechLanguageHearing Association (ASHA), 82, 84, 85, 104, 107, 196, 207, 264, 287 Anastasi, A., 36, 47, 55,60, 61, 62, 76, 96, 107, 296, 324 Andrellos, P. J., 237, 246 Andrews, J. F., 197, 198, 210 Angell, R., 181, 184 Annahatak, B., 231, 245 Apel, K., 241, 242, 245 Aram, D. M., 116, 117, 118, 119, 128, 130, 140, 143, 170, 184, 231, 240, 245, 270, 287 Archer, P., 213, 246 Arensberg, K., 229, 249 Arndt, S., 171, 185 Arndt, W. B., 259, 288, 294, 324 Aspedon, M., 238, 248 Augustine, L. E., 82, 108, 229, 230, 245 B Bachelet, J. F., 270, 291 Bachman, L. F., 103, 107 Baddeley, A., 125, 141 Badian, N., 27, 47 Baer, D. M., 308, 326 Bailey, D., 237, 245
Page 340 Bain, B. A., 230, 248, 251, 252, 255, 256, 276, 277, 278, 279, 286, 287, 291, 294, 295, 296, 297, 298, 299, 300, 302, 303, 304, 305, 307, 308, 309, 310, 311, 314, 315, 316, 324, 326 Baker, K. A., 239, 247 Baker, L., 133, 140 Baker, N. E., 7, 13 Bakervan den Goorbergh, L., 269, 287 Ball, E. W., 137, 140 Balla, D., 163, 166, 215, 249 Baltaxe, C. A. M., 176, 184 Bangert, J., 259, 288, 294, 302, 324 Bankson, N. W., 29, 40, 47, 298, 324, 329, 335 Barenbaum, E., 332 BaronCohen, S., 175, 184 Barrett, M., 329, 333 Barrow, J. D., 252, 287 Barsalou, L.W., 7, 12 Barthelemy, C., 181, 185 Bashir, A., 135, 140, 241, 245 Bates, E., 237, 245, 246, 268, 287 Batshaw, M. L., 149, 164 Battaglia, F., 153, 154, 155, 160, 166 Baumeister, A. A., 147, 149, 152, 154, 156, 158, 164 Baumgartner, J. M., 120, 142 Beasley, T. M., 308, 325 Beck, A. R., 266, 283, 287 Becker, J., 170, 185 Bedi, G., 125, 126, 144 Bedor, L., 125, 142 Beitchman, J. H, 130, 140 Bejar, I. I., 66, 77 Bell, J. J., 158, 165 Bellenir, K., 151, 152, 159, 164 Bellugi, U., 158, 166, 188, 198, 207, 208 Benavidez, D. A., 178, 181, 185 Berg, B. L., 279, 280, 287 Bergstrom, L., 197, 207 Beringer, M., 232 Berk, R. A., 6, 13, 56, 76, 158, 163, 165, 256, 287 Berkley, R. K., 251, 289 Berlin, L. J., 258 Bernthal, J. E., 29, 40, 47, 215, 246, 298, 324, 335 Berns, S. B., 304, 325 Bess, F. H., 189, 192, 203, 207 Bettelheim, B., 173, 184 Biederman, J., 161, 165 Bihrle, A., 158, 166 Biklen, S. K., 279, 280, 287 Bishop, D. V. M., 114, 118, 119, 124, 130, 140, 141, 144, 269, 287 Bjork, R. A., 251, 292, 309, 326 Blackmon, R., 283, 292 Blake, J., 270, 287 Blakeley, R. W., 337 Blank, M., 258 Bliss, L. S., 103, 107, 233 Bloodstein, O., 159, 165 Boehm, A. E., 329 Bogdan, R. C., 279, 280, 287, 292 Bondurant, J., 121, 141 Botting, N., 238, 245 Boucher, J., 181, 185 Bow, S., 121, 122, 141 Bowers, L., 333 Bracken, B. A., 215, 245, 329 Brackett, D., 191, 192, 194, 195, 200, 203, 207, 209 Bradley, L., 27, 47 BradleyJohnson, S., 188, 189, 200, 201, 204, 207 Braukmann, C. J., 304, 325, 326 Bredart, S., 270, 291 Breecher, S. V. A., 238, 248 Brennan, R. L., 102, 108 Bretherton, I., 237, 245 Bridgman, P. W., 19, 47 Brinton, B., 133, 141, 241, 245, 262, 287 Broks, P., 173, 185 Bronfenbrenner, U., 79, 107 Brown, A. L., 226, 245, 272, 277, 287 Brown, J., 226, 245 Brown, R., 266, 287 Brown, S., 230, 246, 255, 278, 289 Brown, V. L., 332 Brownell, R., 331 Brownlie, E. B., 133, 140 Bruneau, N., 181, 185 Bryant, B. R., 59, 77, 333 Bryant, P., 27, 47 Brzustowicz, L., 117, 141 Buckwalter, P., 114, 118, 145 Bunderson, C. V., 31, 47 Buros, O., 337 Burroughs, E. I., 262, 287 Butkovsky, L., 122, 143 Butler, K. G., 132, 145, 255, 287 Byma, G., 125, 144 Bzoch, K. R., 59, 76, 103, 107, 237, 245, 258 C Cacace, A. T., 192, 207 Cairns, H. S., 259, 290
Page 341 Calhoon, J. M., 281, 287 Camarata, M., 122, 143 Camarata, S., 115, 122, 141, 143, 192, 209 Camaioni, L., 237, 245 Campbell, D., 55, 76, 306, 324 Campbell, M., 184 Campbell, R., 120, 141 Campbell, T., 223, 230, 245, 262, 264, 265, 268, 274, 275, 287, 288, 294, 295, 296, 304, 305, 307, 309, 311, 324 Campione, J., 277, 287 Cantekin, E. I., 191, 208 Cantwell, D., 133, 140 Carney, A. E., 189, 194, 196, 203, 207 Carpentieri, S., 171, 185 Carr, E. G., 317, 326 Carr, L., 124, 142 CarrowWoolfolk, E., 232, 329, 330, 331, 332 Carver, R., 57, 76, 312, 313, 324 Casby, M., 128, 141 Caspi, A., 313, 324 Castelli, M. C., 237, 245 Chabon, S. S., 295, 310, 317, 324 Channell, R. W., 269, 290 Chapman, A., 238, 248 Chapman, J. P., 8, 13 Chapman, L. J., 8, 13 Chapman, R., 158, 166, 268, 272, 287, 290 Cheng, L. L., 231, 245 Chial, M. R., 30, 47 Chipchase, B. B., 130, 144 Chomsky, N., 123, 141 Chung, M. C., 174, 175, 184 Cibis, G., 161, 166 Cicchetti, D., 163, 166, 215, 249 Cirrin, F. M., 241, 245, 255, 273, 288 Clahsen, H., 124, 141 Clark, M., 118, 121, 122, 135, 141, 143 Cleave, P. L., 116, 124, 128, 141, 144, 231, 246 Clegg, M., 133, 140 Cochran, P. S., 269, 270, 288 Coe, D., 178, 181, 185 Cohen, I. L., 153, 165, 173, 178, 184 Cohen, M., 120, 141, 149, 164, 165 Cohen, N. J., 134, 141 Cohrs, M., 215, 246 Compton, A. J., 232 Compton, C., 104, 108, 232, 245 Conant, S., 270, 288 Conboy, B., 230, 246, 255, 278, 289 Connell, P. J., 317, 324 Connor, M., 152, 161, 165 Conoley, J. C., 103, 104, 108, 333 Conover, W. M., 30, 47 ContiRamsden, G., 238, 245 Cook, T. D., 306, 324 Cooke, A., 158, 165 Cooley, W. C., 151, 152, 165 Cooper, J., 126, 144 Cordes, A. K., 66, 68, 76 Corker, M., 195, 207 Coryell, J., 195, 196, 207 Coster, W. J., 237, 246 Courchesne, E., 173, 184, 185 Crago, M., 118, 124, 135, 141, 142, 231, 245 Craig, H. K., 133, 141, 268 Crais, E. R., 79, 81, 108, 236, 245, 281, 288 Creaghead, N. A., 10, 13, 282, 288 Creswell, J. W., 279, 288 Crittenden, J. B., 196, 207 Cromer, R., 149, 165 Cronbach, L. J., 66, 76 Crutchley, A., 238, 245 Crystal, D., 268, 269, 288 Cueva, J. E., 184 Culatta, B., 23, 48 Culbertson, J. L., 189, 192, 203, 207, 208 Cunningham, C., 121, 122, 141 Curtiss, S., 117, 118, 144 Cushman, B. B., 295, 310, 317, 324 D Dale, P., 237, 246 Damasio, A. R., 181, 184 Damico, J. S., 82, 108, 229, 230, 241, 245, 251, 252, 253, 255, 257, 274, 275, 283, 284, 285, 286, 288 D’Angiola, N., 176, 184 Daniel, B., 271, 291 Darley, F., 251, 290, 337 Davidson, R., 270, 291 Davies, C., 238, 249 Davine, M., 134, 141 Davis, B., 128, 145 Dawes, R. M., 8, 13 Day, K., 271, 291 de Villiers, J., 262, 291 de Villiers, P., 262, 291 DeBose, C. E., 229, 231, 249 DellaPietra, L., 235, 245 Demers, S. T., 84, 85, 108 Denzin, N. K., 279, 288 Derogatis, L. R., 235, 245
Page 342 Deyo, D. A., 196, 209 Dickey, S., 337 Diedrich, W. M., 259, 288, 294, 302, 324 DiLavore, P., 174, 175, 184 Dirckx, J. H., 197, 208 DiSimoni, F., 333 Dobrich, W., 135, 144 Dodds, J., 213, 215, 246 Doehring, D. G., 231, 245 Dollaghan, C., 223, 230, 245, 251, 262, 264, 265, 268, 274, 275, 287, 288, 296, 297, 298, 299, 300, 302, 303, 304, 305, 307, 308, 309, 315, 316, 324 DonahueKilburg, G., 82, 83, 108, 203, 208 Donaldson, M. D. C., 158, 165 Dowdy, C. A., 134, 141 Downey, J., 158, 165 Downs, M. P., 188, 189, 190, 191, 193, 194, 197, 207, 209 Dubé, R. V., 196, 208 Dublinske, S., 241, 245 Duchan, J., 253, 259, 262, 263, 279, 289, 290 Dunn, Leota, 40, 51, 57, 71, 76, 232, 245, 331 Dunn, Lloyd, 40, 51, 57, 71, 76, 232, 245, 331 Dunn, M., 171, 172, 186, 240, 245 Durkin, M. S., 147, 148, 165 Dykens, E. M., 149, 152, 153, 158, 159, 161, 164, 165 E Eaton, L. F., 161, 165 Eaves, L. C., 171, 184 Edelson, S. M., 181, 185 Edmonston, A., 333 Edwards, E. B., 241, 245 Edwards, J., 118, 124, 141, 142 Edwards, S., 159, 160, 164, 166 Eger, D., 295, 306, 310, 317, 318, 320, 324 Ehlers, S., 171, 174, 175, 180, 184, 185 Ehrhardt, A. A., 158, 165 Eichler, J. A., 191, 208 Eisele, J. A., 119, 140 Ekelman, B., 130, 140 Elbert, M., 259, 288, 294, 324 Elcholtz, G., 283, 292 Ellis Weismer, S., 124, 125, 141, 237, 249 Ellis, J., 23, 48 Embretson, S. E., 279, 288 Emerick, L. L., 215, 248 Engen, E., 202, 208 Engen, T., 202, 208 Erickson, J. G., 62, 77 Evans, A. W., 104, 109, 238, 239, 249, 296, 312, 326 Evans, J., 125, 141, 266, 267, 268, 288 Evans, L. D., 188, 189, 200, 201, 204, 207 Eyer, J., 125, 142 F Fandal, A., 215, 246 Farmer, M., 133, 141 Faust, D., 7, 8, 13 Fay, W., 176, 184 Feeney, J., 215, 246 Fein, D., 171, 172, 173, 178, 186 Feinstein, C., 171, 172, 173, 178, 186 Feldt, L. S., 102, 108 Fenson, L., 237, 246 Ferguson, B., 133, 140 FergusonSmith, M., 152, 161, 165 Fergusson, D. M., 313, 324 Feuerstein, R., 276, 278, 288 Fey, M., 116, 128, 141, 221, 231, 246, 269, 290, 309, 325 Finnerty, J., 269, 288 Fiorello, C., 84, 85, 108 Fisher, H. B., 335 Fiske, D. W., 55, 76 Fixsen, D. L., 304, 325, 326 Flax, J., 240, 247 Fleiss, J. L., 68, 77 Fletcher, J. M., 21, 48 Fletcher, P., 118, 145, 269, 288 Flexer, C., 188, 195, 199, 208 Fluharty, N., 239, 246 Flynn, S., 260, 290 Foley, C., 260, 290 Folstein, S., 173, 184 Foster, R., 258 Foster, S. L., 298, 304, 325 Fowler, A. E., 159, 165 Fox, R., 157, 165 Francis, D. J., 21, 48 Frankenburg, W. K., 213, 215, 246 Franklin, R. D., 307, 308, 315, 317, 325 Fraser, G. R., 197, 208 Frattali, C., 87, 108, 251, 288, 295, 303, 306, 318, 319, 325 Fredericksen, N., 66, 77 Freedman, D., 30, 48 Freese, P., 130, 133, 135, 143, 145 Freiberg, C., 266, 290
Page 343 Fria, T. J., 191, 208 Fristoe, M., 336 Frith, U., 173, 178, 181, 184 Fudala, J., 335 Fujiki, M., 133, 141, 262, 287 Fukkink, R., 314, 325 Funk, S. G., 104, 109, 238, 239, 249, 296, 312, 326 G Gabreels, F., 147, 166 Gaines, R., 161, 166 Galaburda, A., 119, 141 Gardiner, P., 157, 167 Gardner, M. F., 40, 48, 60, 77, 100, 108, 233, 330, 331 Garman, M. L., 269, 288 Garreau, B., 181, 185 Gathercole, S., 125, 141 Gauger, L., 119, 121, 141 Gavin, W. J., 266, 271, 289 Geers, A. E., 201, 202, 208, 209 Geirut, J., 251, 262, 289, 303, 325 Gerken, L., 261, 289 German, D. J., 54, 77, 332, 333 Gertner, B. L., 133, 141 Geschwind, N., 119, 120, 141, 142 GeschwintRabin, J., 154, 166 Ghiotto, M., 270, 291 Gibbons, J. D., 30, 47 Giddan, J. J., 258 Gilbert, L. E., 189, 203, 208 Giles, L., 271, 289 Gilger, J. W., 117, 118, 140, 142 Gillam, R., 126, 128, 140, 142, 145 Gillberg, C., 171, 174, 175, 180, 184, 185 Girolametto, L., 237, 246 Glaser, R., 58, 77 Gleser, G. D., 66, 76 Glesne, C., 279, 303, 325 Goldenberg, D., 215, 247 Goldfield, B. A., 302 Goldman, R., 336 Goldman, S., 134, 142 Goldsmith, L., 204, 208, 237, 248 Goldstein, H., 251, 262, 289, 303, 325 Golin, S., 273, 292 Golinkoff, R. M., 261, 289 Good, R., 234, 248 Goodluck, H., 261, 289 Gopnik, M., 118, 124, 135, 141, 142 GordonBrannan, M., 241, 242, 245 Gorman, B. S., 307, 308, 315, 317, 325 Gottlieb, M. L., 165 Gottsleben, R., 269, 292 Gould, S. J., 20, 47, 48 Graham, J. M., 151, 152, 165 Grandin, T., 178, 184 Green, A., 161, 166 Green, J. A., 239, 249 Greene, S. A., 158, 165 Grela, B., 125, 142 Grievink, E. H., 205, 209 Grimes, A. M., 241, 245 Gronlund, N., 31, 48, 67, 71, 76, 77 Grossman, H. J., 149, 165 Gruber, C. P., 331 Gruen, R., 158, 165 Gruner, J., 120, 144 Guerin, P., 181, 185 Guidubaldi, J., 201, 209 Guitar, B., 10, 13, 238, 249, 264, 292 GutierrezClellen, V. F., 230, 237, 246, 255, 278, 289 H Haas, R. H., 173, 185 Haber, J. S., 239, 246 Hadley, P., 121, 133, 141, 142, 237, 246 Haley, S. M., 237, 246 Hall, N. E., 116, 128, 135, 140, 142, 170, 184, 231, 245, 270, 287 Hall, P., 259, 290 Hall, R., 283, 292 Hallin, A., 184 Haltiwanger, J. T., 237, 246 Hammer, A. L., 96, 100, 108 Hammill, D. D., 40, 48, 54, 57, 77, 103, 109, 233, 240, 247, 330, 332, 333 Hand, L., 337 Hanna, C., 233 Hanner, M. A., 330 Hansen, J. C., 224, 244, 246 Harper, T. M., 171, 185, 304, 325 Harris, J. L., 229, 231, 246 Harris, J., 195, 208 Harrison, M., 194, 208 Harryman, E., 215, 248 Hartley, J., 268, 289 Hartung, J., 237, 246 Haynes, W. O., 215, 248 Hecaen, H., 120, 144
Page 344 Hedrick, D., 233, 236, 237, 246 Heller, J. H., 104, 109, 238, 239, 249, 296, 312, 326 Hemenway, W. G., 197, 207 Hersen, M., 164, 165 Hesketh, L. J., 125, 141 Hesselink, J., 121, 142 Hicks, P. L., 317, 318, 325 Hirshoren, A., 222, 246 HirshPasek, K., 261, 289 Hixson, P. K., 269, 289 Ho, H. H., 171, 184 Hodapp, R. M., 149, 152, 153, 158, 159, 161, 164, 165 Hodson, B., 241, 242, 245, 335 Hoffman, M., 276, 278, 288 Hoffman, P., 276, 291 Holcomb, T. K., 195, 196, 207 Holmes, D. W., 199, 208 Hopkins, J., 104, 108, 217, 238, 246 Horodezky, N., 134, 141 Horwood, L. J., 313, 324 Howard, S., 268, 289 Howe, C., 153, 154, 155, 160, 166 Howlin, P., 133, 144 Hresko, W., 54, 77, 103, 109, 233, 332 Hsu, J. R., 68, 77 Hsu, L. M., 68, 77 Huang, R., 104, 108, 217, 238, 246 Huisingh, R., 329, 333 Hummel, T. J., 235, 246 Hurford, J. R., 132, 140, 142 Hutchinson, T. A., 96, 107, 108, 222, 246, 248 Hux, K., 238, 248 I Iglesias, A., 278, 291 Impara, J. C., 103, 104, 108 Ingham, J., 304, 325 Inglis, A., 133, 140 Ingram, D., 124, 142 Inouye, D. K., 31, 47 Isaacson, L., 134, 141 J Jackson, D. W., 121, 142, 194, 200, 204, 209 JacksonMaldonado, D., 237, 246 Jacobson, J. W., 148, 165, 304, 325 Janesick, V. J., 280, 289 Janosky, J., 230, 245 Jauhiainen, T., 204, 210 Jenkins, W., 125, 126, 143, 144 Jensen, M., 276, 288 Jernigan, T., 121, 142 Johansson, M., 171, 180, 185 Johnson, G. A., 277, 291 Johnson, G., 230, 248 Johnston, A. V., 330 Johnston, E. G., 330 Johnston, J. R., 115, 142 Johnston, P., 126, 143 Jones, S. S., 31, 48 Juarez, M. J., 222, 248 K Kahneman, D., 8, 13 Kalesnik, J. O, 215, 216, 235, 236, 238, 248 Kallman, C., 125, 144 Kamhi, A., 8, 13, 115, 116, 128, 142, 216, 229, 231, 241, 245, 246 Kanner, L., 181, 185 Kaplan, C. A., 130, 144 Kapur, Y. P., 194, 197, 198, 208 Karchmer, M. A., 204, 208 Kaufman, A. S., 215, 246 Kaufman, N. L., 215, 246, 336 Kayser, H., 229, 231, 246, 247 Kazdin, A. E., 297, 304, 305, 324, 325 Kazuk, E., 215, 246 Kearns, K. P., 68, 76, 77, 274, 275, 286, 290, 307, 308, 309, 314, 315, 317, 325, 326 Kelley, D. L., 279, 289 Kelly, D. J., 217, 247 Kemp, K., 266, 267, 270, 289 Kent, J. F., 64, 77 Kent, R. D., 8, 13, 64, 77 Kerlinger, F. N., 19, 48 Keyser, D. J., 104, 108 Khan, L. M., 336 King, J. M., 161, 166 Kingsley, J., 156, 166 Klaus, D. J., 58, 77 Klee, T., 189, 192, 203, 207, 266, 267, 270, 289, 290 Klein, S. K., 203, 208 Kline, M., 232 Koegel, L. K., 304, 325 Koegel, R., 304, 325 Koller, H., 156, 161, 166
Page 345 Kovarsky, D., 253, 279, 286, 289 Kozak, V. J., 201, 202, 209 Kramer, J. J., 333 Krassowski, E., 127, 142, 231, 247 Kratochwill, T. R., 307, 315, 317, 325 Kreb, R. A., 319, 323, 324, 325 Kretschmer, R., 121, 141 Kresheck, J., 215, 232, 248, 331 Kuder, G. F., 69, 77 Kuehn, D. P., 120, 142 Kulig, S. G., 239, 247 Kunze, L., 239, 249 Kwiatkowski, J., 336 L Lahey, M., 10, 13, 48, 118, 124, 128, 141, 142, 223, 231, 247, 269, 289, 303, 325 Lancee, W., 133, 140 Lancy, D., 279, 289 Landa, R. M., 273, 289 Larsen, S. C., 332, 333 Larson, L., 199, 210 Layton, T. L., 104, 109, 199, 208, 238, 239, 249, 296, 312, 326 Le Couteur, A., 175, 185 League, R., 59, 76, 103, 107, 237, 246, 258 Leap, W. L., 231, 247 Leckman, J. F., 149, 152, 153, 159, 164, 165 Lee, L., 218, 247 Lehman, I., 296, 326 Lehr, C. A., 215, 247 Lehrke, R. G., 152, 166 Lemme, M. L., 120, 142 Leonard, C., 119, 121, 141 Leonard, L., 114, 117, 118, 119, 121, 122, 123, 124, 125, 126, 128, 130, 131, 132, 137, 140, 142, 221, 223, 230, 240, 247, 251, 270, 289, 317, 327 Leverman, D., 233 Levin, J. R., 307, 315, 317, 325 Levitsky, W., 120, 142 Levitz, M., 156, 166 Levy, D., 125, 143 Lewis, N. P., 336 Lidz, C. S., 255, 276, 278, 289 LilloMartin, D., 188, 198, 207, 208 Lincoln, A. J., 173, 185 Lincoln, Y. S., 279, 288 Linder, T. W., 281, 289 Ling, D., 195, 208 Linkola, H., 204, 210 Lipsett, L., 134, 141 Locke, J., 125, 143 Loeb, D., 124, 143 Logemann, J. A., 335 Logue, B., 271, 291 LoGuidice, C., 333 Lombardino, L., 119, 121, 141 Loncke, F., 192, 209 Long, S. H., 116, 128, 141, 231, 246, 266, 269, 270, 278, 289, 290, 310, 314, 325 Longobardi, E., 237, 245 LonsburyMartin, B. L., 206, 208 Lord, C., 174, 175, 184, 185 Love, S. R., 178, 181, 185 Lowe, R., 335 Lubetsky, M. J., 147, 161, 166 Lucas, C. R., 259, 290 Luckasson, R., 148, 166 Ludlow, L. H., 237, 246 Lugo, D. E., 232, 245 Lund, N. J., 253, 259, 262, 263, 290 Lust, B., 260, 290 Lyman, H. B., 76 M Machon, M. W., 104, 109, 238, 239, 249, 296, 312, 326 Macmillan, D. L., 148, 149, 166 MacWhinney, B., 268, 269, 287, 290 Maino, D. M., 161, 166 Maloney, D. M., 304, 325 Malvy, J., 181, 185 Marchman, V., 237, 246 MardellCzudnoswki, C., 215, 247 Marks, S., 158, 166 Marlaire, C. L., 63, 77, 236, 244, 247 Martin, G. K., 206, 208 Mash, E. J., 298, 304, 325 Masterson, J. J., 269, 270, 288, 290 Matese, M. J., 178, 181, 185 Matkin, N. D., 189, 203, 209 Matson, J. L., 178, 181, 185 Matthews, R., 133, 145 Mauk, G. W., 194, 208 Maurer, R. G., 181, 184 Mawhood, L., 133, 144 Maxon, A., 191, 192, 194, 200, 209 Maxwell, L. A., 154, 166, 199, 200, 208 Maxwell, M. M., 253, 279, 289
Page 346 Maynard, D. W., 63, 77, 236, 244, 247 McCarthy, D. A., 215, 249 McCauley, R. J., 7, 12, 13, 35, 38, 48, 102, 104, 108, 217, 220, 225, 231, 234, 238, 247, 249, 251, 252, 253, 256, 264, 290, 292, 296, 299, 312, 313, 326 McClave, J. T., 30, 48 McDaniel, D., 259, 290 McGhee, R., 333 McGlinchey, J. B., 304, 325 McKee, C., 259, 290 McFarland, D. J., 192, 207 McReynolds, L. V., 68, 76, 77, 274, 275, 286, 290, 307, 308, 309, 314, 315, 317, 326 Mecham, M. J., 333 Meehl, P. E., 8, 13, 223, 247 Mehrens, W., 296, 326 Mellits, D., 125, 144 Membrino, I., 266, 289 Menolascino, F. J., 161, 165 Menyuk, P., 130, 143 Merrell, A. M., 104, 108, 217, 242, 247 Mervis, C. B., 158, 166 Merzenich, M., 125, 126, 143, 144 Messick, S., 4, 13, 76, 77, 252, 290 Mient, M. G., 295, 310, 317, 324 Miller, J. F., 158, 166, 230, 235, 236, 249, 252, 253, 262, 258, 263, 266, 268, 269, 270, 271, 273, 284, 288, 290, 292 Miller, R., 276, 288 Miller, S., 125, 126, 143, 144 Miller, T. L., 217, 248 Milone, M. N., 204, 208 Minifie, F., 251, 290 Minkin, B. L., 304, 326 Minkin, N., 304, 326 Mislevy, R. J., 66, 77 Mitchell, J. V., 333, 337 Moeller, M. P., 189, 194, 196, 200, 203, 207, 208 MoellmanLanda, R., 273, 290 Moffitt, T. E., 313, 324 Mogford, K., 195, 208 MogfordBevan, K., 188, 203, 208 Moldonado, A., 233 Montgomery, A. A., 104, 109 Montgomery, J. K., 241, 242, 248 Moog, J. S., 201, 202, 208, 209 Moores, D. F., 196, 209 Morales, A., 271, 291 Moran, M. J., 273, 287 Mordecai, D. R., 269, 290 Morgan, S. B., 171, 184 Morishima, A., 158, 165 Morisset, C., 237, 245 Morris, P., 79, 107 Morris, R., 116, 128, 140, 171, 172, 173, 178, 186, 231, 245, 252, 270, 287, 290 Morriss, D., 271, 291 Mowrer, D., 294, 317, 326 Mulick, J. A., 148, 165 Muller, D., 268, 289 Muma, J., 79, 108, 217, 247, 253, 271, 290, 291 Murphy, L. L., 104, 108 Musket, C. H., 199, 209 Myles, B. S., 170, 185 N Nagarajan, S., 125, 126, 144 Nair, R., 133, 140 Nanda, H., 66, 76 Nation, J., 130, 140 National Council on Measurement in Education (NCME), 10, 12, 31, 47, 50, 62, 72, 75, 76, 89, 96, 105, 107, 252, 287 Needleman, H., 230, 245 Neils, J., 117, 118, 143 Nelson, K. E., 122, 143, 192, 209 Nelson, N. W., 235, 247, 282, 291 Newborg, J., 201, 209 Newcomer, P. L., 40, 48, 57, 77, 103, 108, 240, 247, 332 Newcorn, J. H., 161, 165 Newhoff, M., 121, 143 Newman, P. W., 10, 13 Newport, E., 198, 209 Nicolosi, L., 215, 248 Nielsen, D. W., 6, 10, 13 Nippold, M. A., 104, 108, 134, 143, 217, 238, 246, 248 Nitko, A. J., 49, 58, 67, 71, 77 Nordin, V., 171, 174, 175, 184, 185 Norris, J., 276, 291 Norris, M. K., 222, 248 Norris, M. L., 239, 246 Northern, J. L., 188, 189, 190, 191, 193, 194, 207, 209 Nunnally, J., 225, 248 Nuttall, E. V., 215, 216, 235, 236, 238, 248 Nyden, A., 171, 180, 185 Nye, C., 241, 242, 248 O O’Brien, M., 114, 145 O’Grady, L., 188, 207
Page 347 Olsen, J. B., 31, 47 Olswang, L. B., 128, 129, 143, 223, 230, 248, 249, 251, 252, 255, 256, 273, 276, 277, 278, 279, 280, 286, 287, 289, 290, 291, 292, 294, 295, 296, 297, 298, 302, 303, 304, 305, 307, 310, 311, 314, 317, 318, 319, 323, 324, 325, 326 Onorati, S., 270, 287 Orman, J., 333 Ort, S. I., 149, 165 Owens, R. E., 221, 248 Oyler, A. L., 189, 203, 209 Oyler, R. F., 189, 203, 209 P Padilla, E. R., 232, 245 Page, J. L., 23, 48, 181, 185 Palin, M. W., 269, 290 Palmer, P., 171, 185, 269, 290 Pan, B. A., 174, 185 Panagos, J., 268, 291 Pang, V. O., 231, 248 Papoudi, D., 169, 186 Parsonson, B. S., 308, 326 Passingham, R., 118, 145 Patell, P. G., 133, 140 Patton, J. R., 134, 141 Paul, P. V., 194, 196, 200, 204, 207, 209 Paul, R., 128, 143, 174, 176, 177, 185, 203, 209, 221, 222, 223, 248, 253, 262, 263, 268, 290, 291 Payne, K. T., 228, 229, 249 Pedhazur, R. J., 10, 13, 17, 18, 22, 23, 24, 28, 48, 55, 56, 76, 264, 280, 291, 297, 298, 303, 306, 326 Pembrey, M., 117, 143 Peña, E., 230, 248, 255, 276, 278, 289 Pendergast, K., 337 Penner, S. G., 255, 273, 288 Perachio, J. J., 331 Perkins, M. N., 222, 248 Perozzi, J. A., 251, 289 Perret, Y. M., 149, 164 Peshkin, A., 279, 303, 325 Peters, S. A. F., 205, 209 Pethick, S., 237, 246 PhelpsGunn, T., 332 PhelpsTerasaki, D., 332 Phillips, E. L., 304, 325, 326 Piercy, M., 125, 144 Pindzola, R. H., 215, 248 Pisani, R., 30, 48 Piven, J., 171, 185 Plake, B. S., 104, 108 Plante, E., 104, 108, 116, 118, 120, 121, 127, 135, 141, 142, 143, 217, 218, 220, 222, 231, 242, 247, 299, 326 Plapinger, D., 199, 209 Poizner, H., 198, 208 Pollock, K. E., 229, 231, 246 Polloway, E. A., 134, 141 Pond, R. E., 59, 77, 233, 331 Porch, B. E., 274, 331 Prather, E. M., 233, 236, 237, 238, 246, 248 Prelock, P. A., 241, 245, 268, 282, 289, 291 Primavera, L. H., 264, 280, 291, 306, 326 Prinz, P., 196, 198, 209 Prizant, B. M., 171, 174, 185, 214, 248 Proctor, E. K., 294, 326 Prutting, C. A., 251, 289 Purves, R., 30, 48 Pye, C., 269, 291 Q Quartaro, G., 270, 287 Quigley, S. P., 207 Quinn, M., 230, 235, 236, 249, 252, 284, 292 Quinn, R., 278, 291 R Radziewicz, C., 81, 109 Rajaratnam, N., 66, 76 Ramberg, C., 171, 180, 185 Rand, Y., 276, 278, 288 Rapcsak, S., 120, 143 Rapin, I., 114, 130, 143, 171, 172, 173, 174, 178, 179, 180, 181, 185, 186, 191, 203, 208, 209 Raver, S. A., 281, 291 Records, N. L., 7, 10, 13, 114, 130, 133, 135, 143, 145 Rees, N. S., 192, 209 Reeves, M., 266, 290 Reichler, R. J., 175, 185 Reid, D., 54, 77, 103, 109, 233, 332 Reilly, J., 237, 246 Remein, Q. R., 6, 13, 214, 249 Renner, B. R., 175, 185 Reschly, D. J., 148, 149, 166 Rescorla, L., 128, 143, 237, 248 Resnick, T. J., 191, 209 Reveron, W. W., 229, 248 Reynell, J., 331 Reynolds, W. M., 335
Page 348 Reznick, S., 237, 246 Rice, M. L., 117, 119, 121, 124, 133, 141, 142, 143, 144, 217, 237, 247, 249 Richard, G. J., 330 Richardson, M. W., 69, 77 Richardson, S. A., 156, 161, 166 Ries, P. W., 188, 209 Riley, A. M., 330 Rimland, B., 173, 181, 185 Risucci, D., 130, 145 Rivera, D. M., 333 Robarts, J., 169, 186 Roberts, J. E., 237, 245 Roberts, L. J., 304, 325 RobinsonZañartu, C., 230, 231, 248, 255, 278, 289 Roby, C., 136, 144 Rodriguez, B., 128, 129, 143, 241, 245 Roeleveld, N., 147, 166 Roeper, T., 262, 291 Rolland, M.B. 266, 290 Romeo, D., 121, 141 Romero, I., 215, 216, 235, 236, 238, 248 Rondal, J. A., 159, 160, 161, 164, 166, 270, 291 Rosa, M., 229, 249 Rose, S. A., 258 Rosen, A., 294, 326 Rosen, G., 119, 141 Rosenbek, J. C., 64, 77 Rosenberg, L. R., 233 Rosenberg, S., 149, 155, 166 Rosenzweig, P., 233 Rosetti, L., 237, 248, 281 Ross, M., 189, 191, 192, 194, 200, 209 Ross, R., 117, 118, 144 Roth, F., 267, 291 Rothlisberg, B. A., 103, 109 Rounds, J., 7, 13 Rourke, B., 21, 48 Roush, J., 194, 203, 208, 209 Roussel, N., 232, 248 Roux, S., 181, 185 Rowland, R. C., 3, 13 Ruscello, D., 40, 48 Rutter, M., 133, 144, 169, 173, 174, 175, 184, 185 S Sabatino, A. D., 217, 248 Sabers, D. L., 100, 107, 109, 222, 248 Sabo, H., 158, 166 Salvia, J., 33, 35, 36, 48, 63, 64, 77, 96, 102, 107, 109, 225, 231, 233, 248, 252, 264, 291, 296, 326 Sanders, D. A., 192, 195, 209 Sandgrund, A., 161, 166 Sanger, D., 238, 248 Sattler, J. M., 37, 47, 48, 76, 158, 166, 225, 226, 249 Sauvage, D., 181, 185 Scarborough, H., 135, 144, 269, 270, 291 Schachter, D. C., 133, 137, 140, 144 Scheetz, N. A., 191, 207, 209 Schiavetti, N., 262, 265, 292 Schilder, A. G. M., 205, 209 Schlange, D., 161, 166 Schloss, P. J., 204, 208 Schmelkin, L. P., 10, 13, 17, 18, 22, 23, 24, 28, 48, 55, 56, 76, 264, 280, 291, 297, 298, 303, 306, 326 Schmidt, R. A., 251, 292, 309, 326 Schopler, E., 169, 175, 184, 185 Schraeder, T., 230, 235, 236, 249, 252, 284, 292 Schreibman, L., 173, 185, 317, 326 Schreiner, C., 125, 126, 143, 144 Schupf, N., 152, 167 Schwartz, I. S., 223, 249, 255, 279, 280, 292, 296, 297, 304, 305, 307, 324, 326 Scientific Learning Corporation, 126, 144 Secord, W. A., 10, 13, 59, 77, 230, 233, 238, 245, 249, 251, 252, 253, 255, 257, 259, 264, 274, 283, 284, 285, 286, 288, 292, 305, 326, 329, 332, 333, 337 Selmar, J., 337 Semel, E., 59, 77, 233, 238, 249, 264, 292, 305, 326, 329 Sevin, J. A., 178, 181, 185 Shady, M., 261, 289 Shanteau, J., 7, 13 Shaywitz, B., 21, 48 Shaywitz, S. E., 21, 48 Shelton, R. L., 259, 288, 294, 324 Shenkman, K., 118, 135, 143 Shepard, L. A., 235, 249 Sherman, D., 251, 290 Sherman, G., 119, 141 Shewan, C., 283, 292 Shields, J., 173, 185 Shine, R. E., 259, 292 Shipley, K. G., 331 Short, R. J., 148, 166 Shriberg, L., 268, 291, 336 Shu, C. E., 158, 165
Page 349 Shulman, B., 241, 242, 245, 332 Siegel, L., 121, 122, 141 Silliman, E. R., 279, 282, 292 Silva, P. A., 313, 324 Silverman, W., 152, 167 Simeonsson, R. J., 148, 166 Simon, C., 262, 263, 292 Simpson, A., 173, 185 Simpson, R. L., 170, 185 Slater, S., 283, 292 Sliwinski, M., 240, 247 Smedley, T., 199, 209 Smit, A., 222, 249, 337 Smith, A. R., 238, 249, 264, 292 Smith, B., 174, 175, 184 Smith, E., 114, 145 Smith, M., 82, 108,229, 230, 245 Smith, S., 271, 291 Smith, T. E. C., 134, 141 Snyder, L., 237, 245 Snow, C. E., 123, 144, 174, 185 Snow, R., 63, 77 Snowling, M. J., 130, 144 Soder, A. L., 337 Sowell, E., 121, 142 Sparks, S. N., 155, 166 Sparrow, S. S., 149, 163, 165, 166, 215, 249 Spekman, N., 266, 290 Spencer, L., 192, 196, 209 Sponheim, E., 174, 185 Sprich, S., 161, 165 St. Louis, K. O., 40, 48 Stafford, M. L., 238, 248 Stagg, V., 157, 166 Stark, J., 258 Stark, R. E., 115, 125, 137, 144 Stein, Z. A., 147, 148, 165 Steiner, V., 59, 77, 233, 331 Stelmachowicz, P., 199, 210 Stephens, M. I., 104, 109, 239, 249 Stephenson, J. B., 158, 165 Stevens, G., 60, 77 Stevens, S. S., 20, 43, 48, 265, 292 Stevenson, J., 133, 144 Stewart, T. R., 7, 13 Stillman, R., 63, 77 Stock, J. R., 201, 209 Stockman, I. J., 230, 235, 236, 249, 252, 284, 292 Stokes, S., 238, 249 Stone, T. A., 331 Stothard, S. E., 130, 144 Stout, G. G., 195, 209 Strain, P. S., 184, 185 Stratton, K., 153, 154, 155, 160, 166 StrayGunderson, K., 151, 164, 166 Striffler, N., 239, 249 Strominger, A., 135, 140 Stromswold, K., 266, 292 Strong, M., 196, 198, 209 Sturner, R. A., 104, 109, 238, 239, 249, 296, 312, 326 Sue, M. B., 331 Supalla, S., 198, 209 Supalla, T., 198, 209 Svinicki, J., 201, 209 Sweetland, R. C., 104, 108 Swisher, L., 35, 48, 104, 108, 115, 120, 141, 143, 217, 220, 225, 231, 234, 247, 252, 256, 290, 296, 299, 312, 313, 326 T Tackett, A., 271, 291 TagerFlusberg, H., 126, 144 Taitz, L. S., 161, 166 Tallal, P., 115, 117, 118, 121, 125, 126, 137, 142, 143, 144, 145 Taylor, O. L., 228, 229, 249 Taylor, S. J., 279, 292 Templin, M. C., 266, 292, 337 Terrell, F., 229, 249, 273, 292 Terrell, S. L., 229, 249, 273, 292 Teszner, D., 120, 144 Thal, D., 237, 246 Thane, N. L., 333 Thompson, C. K., 315, 317, 324, 326 Thordardottir, E. T., 237, 249 Thorner, R. M., 6, 13, 214, 249 Thorton, R., 260, 292 Thorum, A. R., 330 Thurlow, M. L., 215, 247 Tibbits, D. F., 241, 245 Timbers, B. J., 304, 326 Timbers, G. D., 304, 326 Timler, G., 128, 145, 146 Tobin, A., 233, 236, 237, 246 Tomblin, J. B., 7, 10, 13, 114, 117, 118, 121, 122, 130, 133, 135, 142, 143, 144, 145, 262, 265, 287 Tomlin, R., 265, 288 Torgesen, J. K., 59, 77 Toronto, A. S., 233 Toubanos, E. S., 103, 109
Page 350 Townsend, J., 173, 185 Tracey, T. J., 7, 12, 13 Trauner, D., 121, 145 Trevarthen, C., 169, 186 Tsang, C., 232, 233 Turner, R. G., 6, 10, 13, 222, 249, 256, 292 Tversky, A., 8, 13 Tyack, D., 269, 292 TyeMurray, N., 192, 209 Tynan, T., 157, 167 Tzavares, A., 120, 144 U Udwin, O., 160, 166 V van Bon, W. H. J., 205, 209 Van den Bercken, J. H. L., 205, 209 van der Lely, H., 124, 145 van der Spuy, H., 121, 122, 141 Van Hasselt, V. B., 164, 165 van Hoek, K., 188, 207 Van Keulen, J. E., 229, 231, 249 van Kleeck, A., 82, 109, 128, 145 Van Riper, C., 62, 77 Van Voy, K., 304, 325 Vance, H. B., 217, 248 Vance, R., 104, 108, 120, 143, 218, 220, 222, 247, 299, 326 VarghaKadeem, F., 118, 145 VaughnCooke, F. B., 223, 229, 230, 249 Veale, T. K., 126, 145 Veltkamp, L. J., 161, 167 Vernon, M., 197, 198, 210 Vetter, D. K., 10, 13, 96, 109, 253, 292 Volterra, V., 237, 245 Vostanis, P., 174, 175, 184 Voutilainen, R., 204, 210 Vygotsky, L. S., 276, 292, 310, 327 W Wallace, E. M., 238, 248 Wallace, G., 330 Wallach, G. P., 132, 145 Walters, H., 133, 140 Wang, X., 125, 126, 144 Warren, K., 63, 77 Washington, J. A., 229, 230, 249 Wasson, P., 157, 167 Waterhouse, L., 169, 171, 172, 173, 178, 186 Watkins, K., 118, 145 Watkins, R. V., 114, 121, 130, 145 Wechsler, D., 18, 48 Weddington, G. T., 229, 231, 249 Weiner, F. F., 269, 292, 336 Weiner, P., 135, 145 Weiss, A., 7, 13, 230, 247, 259, 290 Welsh, J., 122, 143 Wender, E., 134, 145, 181, 186 Werner, E. O., 232, 331 Wesson, M., 161, 166 Westby, C., 241, 245, 279, 292 Wetherby, A. M., 171, 174, 185, 214, 248 Wexler, K., 124, 144 White, K. R., 194, 208 Whitehead, M. L., 206, 208 Whitworth, A., 238, 249 Wiederholt, J. L., 332 Wiig, E. D., 31, 48 Wiig, E. H., 59, 77, 230, 233, 238, 245, 249, 251, 252, 253, 255, 257, 258, 264, 274, 283, 284, 285, 286, 288, 292, 305, 326, 329, 332, 333 Wiig, E. S., 31, 48 Willis, S., 239, 249 Wilcox, M. J., 317, 327 Wild, J., 133, 140 Wilkinson, L. C., 279, 282, 292 Williams, D., 176, 186 Williams, F., 24, 28, 48 Williams, K. T., 97, 103, 109, 330 Wilson, A., 158, 165 Wilson, B., 130, 133, 140, 145 Wilson, K., 283, 292 Wiltshire, S., 103, 109 Windle, J., 195, 209 Wing, L., 171, 172, 173, 178, 180, 186 Wise, P. S., 157, 165 Wnek, L., 201, 209 Wolery, M., 299, 300, 327 Wolf, M. M., 304, 305, 319, 323, 324, 325, 326 Wolfram, W., 231, 249 WolfSchein, E. G., 169, 173, 175, 186 Wolk, S., 204, 208 Woodcock, R. W, 333 WoodleyZanthos, P., 152, 154, 164 Woodworth, G. G., 192, 198, 209 World Health Organization, 85, 87, 109, 169, 186, 253, 280, 282, 292
Page 351 Worthington, D. W., 199, 210 Wulfeck, B., 121, 145 Wyckoff, J., 270, 291 Y Yaghmai, F., 120, 141 Yen, W. M., 22, 47, 55, 56, 57, 58, 59, 68, 76 YeungCourchesne, R., 173, 185 Ying, E., 199, 200, 210 Yoder, D. E., 8, 13, 258 Yonce, L. J., 223, 245 YoshinagoItano, C., 200, 203, 210 Young, E. C., 331 Young, M. A., 29, 48, 297, 298, 327 Ysseldyke, J. E., 33, 35, 36, 48, 64, 77, 96, 102, 107, 109, 215, 225, 231, 234, 247, 248, 252, 264, 291, 296, 326 Yule, W., 160, 166 Z Zachman, L., 329 Zelinsky, D. G., 149, 165 Zhang, X., 114, 145 Zielhuis, G. A., 147, 166 Zigler, E., 149, 165 Zigman, A., 152, 167 Zigman, W. B., 152, 167 Zimmerman, I. L., 59, 77, 233, 331
Page 352
Page 353
SUBJECT INDEX Page numbers followed by a t indicate tables and those followed by an f indicate figures. 14morpheme count, 240 A Ability testing, 30, 44 Accountability, 295, 306–309, 315, 317 Achievement testing, 30, 44 Acquired epileptic aphasia, see LandauKleffner syndrome Actingout tasks, 261 Activity, ICIDH2 proposed definition of, 87 African American culture, Black English, 236 family attitudes, 83 Age differentiation studies of construct validity, see Construct validity, developmental studies of Ageequivalent scores, 35–36t, 44 Agreement measures, 68–69f Akinesia, 181–182 Alternateforms reliability, 67–68, 71 Amazing University of Vermont Test, 32 American Sign Language (ASL), 196 American SpeechLanguageHearing Association, 317, 319–320 Anastasi, Anne, autobiographical statement, 60–61 Anxiety disorder, 134, 138 Arena assessment, 281, 284 Arizona Articulation Proficiency Scale, 2nd ed., 335t Asian culture, 83 Asperger’s disorder, 169, 171–172t, 180t, 182 Asperger syndrome, see Asperger’s disorder Asperger Syndrome Screening Questionnaire (ASSQ), 175 Assessing Semantic Skills Through Everyday Themes, 329t Assessment Link Between Phonology and Articulation: ALPHA (Revised ed.), 335t Assessment of change, importance of, 294, 317–321 outcome measurement and, 294–295, 317, 322 prediction of future change, 310–311 recommended readings, 324 special considerations, 296–311 types of methods used, dynamic assessment, 310–311, 314 informal criterionreferenced measures, 313–314 normreferenced tests, 312–313 single subject experimental designs, 314–317, 316f standardized criterionreferenced measures, 258t, 313 Assessment of Children’s Language Comprehension, 258t Assessment of Phonological Processes–Revised, 335t
Page 354 Assigning Structural Stage, 269t Attention deficit hyperactivity disorder (ADHD), 133–134, 138 definition, 134 specific language impairment and, 133–134 Atypical autism, see Pervasive developmental disorder not otherwise specified (PDDNOS) Auditory integration training, 181 Auditory training, 190 Austin Spanish Articulation Test, 232t Authentic assessment, 236, 252, 284 Authenticity, 252, 284 Autism, see Autistic spectrum disorder Autistic disorder, definition, 170t, 182 highfunctioning, 174, 176, 179t lowfunctioning, 180t other terms for, 169 symptoms of, 169 Autistic spectrum disorder, behavioral checklists and interviews, 174–175t classification of subgroups, 169, 178 DSMIV diagnostic categories, 169 dyspraxia and, 181 fragile X syndrome and, 153, 173 mental retardation and, 169, 171 motor abnormalities, 179t–180t, 181 personal perspective, 176–177 play and, 170, 174, 178 pragmatic deficits, 169–170t, 172, 174, 179t, 180t prevalence, 169 recommended readings, 184 sensory differences, 181 sleep disorders and, 181 stereotypical behaviors, 170t, 178 suspected causes, 173–174 genetic, 173 infectious disease, 173 neurologic, 173 suspected neurologic abnormalities, 173 theory of mind and, 181 written language and, 179 Autism Diagnostic InverviewRevised (ADIR), 175t B BanksonBernthal Test of Phonology, 335t Bankson Language Test2, 329t Baseline measures, 307, 315–317, 316f Batelle Developmental Inventory, 201t Behavioral objectives, 19, 44 Belief in the law of small numbers, 8, 11 Bellugi’s negation test, 263 BerSil Spanish Test, 232t Bilingual Syntax MeasureChinese, 232t Bilingual Syntax MeasureTagalog, 232t Bioecological model of development, 79 Blind measurement procedures, definition of, 314 Boehm Test of Basic ConceptsPreschool, 329t Boehm Test of Basic ConceptsRevised, 329t Bracken Basic Concept ScaleRevised, 329t Bradykinesia, 181, 182 Bronfenbrenner, influence in developmental research, 79 C Carolina Picture Vocabulary Test, 199 Carrow Elicited Language Inventory, 329t Case examples, 1, 2, 3, 38–42, 113–114, 168–169, 187–188, 213–214, 228t, 250–251, 293–294 Caseloads and assessment practices, 283 Causation, confusion with correlation, 29–30, 43 single subject design and study of, 307 Central auditory processing disorders, 191 Chapter summary, 11, 46–47, 75–76, 106–107, 137, 161–162, 181–182, 205, 242–243, 283–284, 323–324 Checklist for Autism in Toddlers (CHAT), 175t Childhood Autism Rating Scale (CARS), 175t Childhood disintegrative disorder, 169, 172t, 182 Chinese, 232t Chromosomes, 162 Classical psychometric theory, 66–67 Classical true score theory, see Classical psychometric theory Clinical decision making, definition, 4, 11 disconfirmatory strategy in, 8 ethics and, 50, 72 fallacies in, 7, 8 measurement and, 252 model of, 7, 9f types of, 5t Clinical Evaluation of Language Fundamentals3, 264, 329t Clinical Evaluation of Language Fundamentals3 Spanish Edition, 233t Clinical Evaluation of Language FundamentalsPreschool, 329t Clinical Probes of Articulation (CPAC), 259 Clinically significant change, 321–322 Clinical significance, 29, 44, 297–306, 321 Cochlear implants, 192, 205 Coefficient alpha, 69
Page 355 Coefficient of determination, 29 Cognitive referencing, see also Discrepancy testing definition, 127, 138, problems with, 127–128, 231–235 Collaborative assessment approaches, 280–282 types of, 281 for younger children, 236–237 Communication Abilities Diagnostic Test, 330t Communication Analyzer, 269t Communication Screen, 239t Compton Speech and Language Screening EvaluationSpanish, 232t Comprehensive Assessment of Spoken Language, 330t Comprehensive Receptive and Expressive Vocabulary Test, 330t Computerized Language Analysis (CLAN), 268 Computerized Language Error Analysis Report (CLEAR), 269t Computerized Profiling Version 6.2 and 1.0, 269t Computers and language assessment and treatment, 31, 44, 126 Concurrent validity, 59t, 61–62 Conduct disorder, 134, 138 Confidence interval, 70, 73, 224–226, 225f Confirmatory strategy in decision making, 7–8, 11 Congenital aphasia, see Specific language impairment Construct validity, centrality of, 53, 72 contrasting groups evidence, 53–55, 54t, 74 convergent and discriminant validation, 55–56, 74 definition, 52, 73 developmental studies of, 53–54, 54t, 74 factor analysis and, 55 Content, Form and Use Analysis, 269 Contentrelated validity, see Content validity Content validity, see also Item analysis, Content coverage, 56 Content relevance, 56 Definition, 73 Expert evaluation of, 56 Test design and, 56 Contexts, affecting children and families, 79–83, 80f, 83t, 87 affecting clinicians, 79–80f, 84t, 84–88, 240–242, 283, 317–321 Coordinated assessment strategies, 280–282, 284 Correlation, 26–28, 27f Correlation coefficients, interpretation of magnitude, 28t Correlation coefficients, types of, 28 Criterionreferenced measures construction of, 33–34f, 58, 253–255, 254f definition of, 31, 44 examples of, 32t interpretation of, 31, 43, 60 scores for, 38, 101–102 use in screening and identification, 217, 230, 236 Criterionrelated validity, 61–62, 74 concurrent validity, 59t criterion selection, 61 predictive validity, 59t, 61–62, 310 Cultural validity, see Clinical significance Curricula, types of, 282 Curriculumbased assessment, 280, 282, 284 Cutoff score confidence intervals and, 224–226 definition, 33, 243 determining local cutoffs, 222 empirical selection of, 222 recommended levels for identification of language impairment, 221–224 Cutting score, see Cutoff score D Deaf culture, 195–196 Deafness, see Hearing impairment, deafness Decision matrix, 5–6f, 11, 219f Del Rio Language Screening Test, 233 Denver Developmental Screening TestRevised, 215 Derived scores, 35–37, 44 Description of language, see Descriptive measures Descriptive measures, see also Criterionreferenced measures; Informal measures characteristics of, 252–253, 283 criterionreferenced tests as, 257 normreferenced tests as, 255–256 purposes, 230 recommended readings, 286 types of criterionreferenced, 257, 258t dynamic assessment, 276–279, 277t, 310–311 online observations, 274–275 normreferenced, 255–256 probes, see also Informal measures, 257, 259–261, 260t–261t, 263t, 285, 308–310, 316f qualitative measures, 279–280 rating scales, 262–266 use in examining treatment effectiveness, 251 use in treatment planning, 251 validity and, 250–255, 280, 283 Developmental dysphasia, see Specific language impairment
Page 356 Developmental Indicators for Assessment of LearningRevised, 215 Developmental scores, see Ageequivalent scores; Gradeequivalent scores Developmental Sentence Scoring (DSS), 267, 269t Deviation IQ, 38 Diadochokinesis, 64 Diagnosis, see Identification Diagnostic and Statistical Manual of Mental Disorders IV, diagnostic categories related to autistic disorder, 170t diagnostic categories related to specific language impairment, 114–115t Dichotomous scoring, 69 Difference scores, 116, 296 Differential diagnosis, 3, 12 Direct magnitude estimation, 263, 284 Disability, ICIDH definition of, 86 Discrepancy analysis, see Discrepancy testing Discrepancy testing, see also Cognitive referencing criticisms of, 116, 231–235 mental retardation and, 158, 162 specific language impairment and, 116 state regulations and, 241–242 use in description, 255–257 Discriminant analysis, 222 Distributions, statistical, 24, 37f, 43–44 Down syndrome, definition, 162 dementia and, 5, 152 health problems and, 151–152 pattern of strengths and weaknesses, 159t personal perspective, 156 prevalence, 150–152, 151f, 152f Dynamic assessment, 276–279 definition, 276, sample hierarchy of cues, 277t use in identification, 278 use in planning treatment, 276–278 use with children from diverse cultures, 230, 278 use with children with mental retardation, 278 validation, 278–279 Dyskinesia, 181–182 Dyspraxia, 181–182 E Echolalia, 176, 179, 182 Ecological validity, see Clinical significance Eduational relevance, see Clinical significance Effect size, 123, 138, 297–303, 322–323, see also Clinical significance Elicitation strategies, imitation, 260t production, 260t syntax, 260t–261t Eligibility for special education services, 241–242 Emotional/Behavioral problems hearing impairment and, 204 mental retardation and, 161 specific language impairment and, 133–134 Enabling behaviors, 63–65, 74, 100 English as a Second Language (ESL), 227 Epilepsy and language disorders, 119, 161, 181, 183 Error, see Measurement error Evaluating Acquired Skills in Communication—Revised, 330t Event recording, 275, 285 Expert systems, 7 Expressive language disorder, 114–115t Expressive OneWord Picture Vocabulary TestRevised, 330t Expressive OneWord Picture Vocabulary TestSpanish, 233t Expressive Vocabulary Test, 97f–99f, 103, 109, 330t Extended optional infinitive account of SLI, 124 F Face validity, 61, 74 Factor analysis, 74 Fallacies in decision making, 7–8 Family assessment, 81 Family members as partners in assessment, 78, 81, 236–237, 281 Fast ForWord, 126, 138 Fetal alcohol effect (FAE), 153, 163 Fetal alcohol syndrome, 153–155t, 163 FisherLogemann Test of Articulatory Competence, 335t Fluharty Preschool Speech and Language Screening Test, 239t FM radio systems, 206 Formative testing, 30, 31 Fragile X syndrome attention deficit and hyperactivity disorder, 153 autism and, 153, 173 definition, 163 gender and, 152 prevalence and, 152 sensory problems, 153 Fullerton Language Test for Adolescents, 330t
Page 357 Functional Communication Measures (FCMs), 319–320, 322 Functional Status Measures (Educational Settings) of the Pediatric Treatment Outcomes Form, 264 Functionality, 252, 285 G Gain scores, 296, 322, see also Difference scores General allpurpose verbs, 129t, 138 General processing deficit accounts of SLI, 124–125, 138 Generalizability theory, 66 Generalization, 295, 311, 315 Genetics, basic concepts, 150, 162–163 chromosomal disorders, 150 concordance, 117, 138 Down syndrome and, 150–151f family studies of specific language impairment, 117–118 fragile X syndrome and, 152–153, 154f genetic disorders versus inherited disorders, 151 hearing impairment and, 197 incomplete penetrance, 118, 138 pedigree studies of specific language impairment, 117 premutation, 152–153, specific language impairment and, 117–119 transmission modes autosomal versus Xlinked, 118, 162 dominant versus recessive, 118 twin studies of specific language impairment, 117 GoldmanFristoe Test of Articulation—Revised, 336t Gold standard, 218, 243 Gradeequivalent scores, 35, 36t, 44 Grammar, recommended tutorial text, 132 Grammatical Analysis of Elicited Language—Complex Sentence Levels (GAELC), 201t Grammatical Analysis of Elicited Language–Presentence Level (GAEL–P), 201t Grammatical Analysis of Elicited Language—Simple Sentence Level (GAELS), 201t Grammatical complexity, see Linguistic complexity Grammatical morphemes, inflectional morphemes, 133 specific language impairment and, 131, 133 H Handicap, ICIDH definition of, 86 objections to use of this term, 86–87 Hard of hearing, definition, 191, 206 Health and Psychosocial Instruments (HaPI) database, 105 Hearing aids, 195 Hearing impairment, academic difficulties, 189, 203 age at identification, 194 assessment of American sign language (ASL), 198–199 bilingual model of language development for Deaf children, 196 causes genetic, 197 infectious disease, 197 ototoxic agents, 197, 206 prematurity, 197–198, 206 rh incompatibility, 197, 206 configuration of, 192–193f, 206 deafness cultural considerations, 195, 196, see also Deaf culture definition, 188, 205 differences from other levels of hearing impairment effects on oral language acquisition, 203–204 emotional/behavioral disorders and, 204 implications for oral language assessment, norms, 200 procedures, 199–201t interventions for mild and moderate hearing impairment, 190t, 195t for profound hearing impairment, 190t laterality of, 192 magnitude of, 189–190t personal perspective, 189 prelingual, 194 prevalence, 188 recommended readings, 207 sign language, 188, 195–196 special considerations in assessment planning, 198–200, 203 syndromes associated with, 197 total communication and, 195–196 types of, central auditory processing disorders, 191–192 conductive, 191, 205 mixed, 191, 206 sensorineural, 191, 206 Hispanic culture, 83t Homogeneity of item content, 69
Page 358 I ICIDH: International Classification of Impairments, Disabilities, and Handicaps, 85–87, 282 ICIDH2: International Classification of Impairments, Activities, and Participation of the World Health Organization, 87 IDEA, see Legislation, Individuals with Disabilities Education Act of 1990 (IDEA) Identification of language impairment, cognitive referencing and, 231–235 definition, 215 diagnosis versus, 215 disorder versus difference question, 227–231 federal legislation and, see Legislation importance of, 216 local regulations and, 128 recommended cutoffs, 221–224 recommended levels of sensitivity and specificity, 220, 222 recommended readings, 244 special challenges in, 217–236 use of criterionreferenced measures in, 238–240 use of normreferenced measures in, 217–240 use of standardized measures in, 217 Illinois Test of Psycholinguistic Abilities, 267 Index of Productive Syntax (IPSyn), 269t Imitation, 260t Impairment, ICIDH definition of, 86 ICIDH2 proposed definition of, 87 Indicators definition, 17, 19f, 43, 44 formative, 18, 19, 44 reflective, 18, 45 value of multiple indicators, 305–306 Individual Educational Plans (IEPs), 81, 320 Individualized Family Service Plans (IFSPs), 81 Individuals with Disabilities Education Act (IDEA), 84, 106, 108 Informal measures, see also Criterionreferenced measures; Descriptive measures development of, 254f relationship to criterionreferenced measures, 251 relationship to experimental measures, 251 reliability, 68–69t Informativeness, 265 Instrumental outcomes, 295, 311–322 Intelligence testing, 20, see also Cognitive referencing Interdisciplinary teams, 281 for children with autistic spectrum disorder, 3 for children with hearing impairment, 203 requirement for nondiscriminatory assessment, 85 Interexaminer agreement, 69t, 74 Interexaminer reliability, 70, 74 Intermediate outcomes, 294, 322 Internal consistency, see Reliability, types of Interval level of measurement, 21t–22, 43–44 Interval recording, 275, 285 Interval scaling, 262, 285 Item analysis, 57–59, 74 Item difficulty, 57 Item discrimination, 57 Item formats, 100 Item tryout, 57 J Jangle fallacy, 56 Jingle fallacy, 56 K Kaufman Assessment Battery for Children, 215 Kaufman Speech Praxis Test for Children, 336t KE family, 118 Key concepts and terms, 11–12, 44–46, 73–75, 106, 138–139, 162–163, 182–183, 205–206, 243, 284–286, 322–323 KhanLewis Phonological Analysis, 336t KuderRichardson formula 20 (KR20), 69 L Labeling negative effects of, 216 purposes of, 216 LandauKleffner syndrome, 119 Language Assessment, Remediation, and Screening Procedure (LARSP), 269t Language development, as a guide to treatment planning, 130 regression in childhood disintegrative disorder, 172t regression in Rett’s disorder, 172t regression in LandauKleffner syndrome, 119 variability in, 128–129 domains, 90 modalities, 90 Language Development Survey, 237t Language difference, 228, 243
Page 359 Language diversity, current levels of diversity, 82, 227 implications for screening and identification, 3, 33, 81, 227–231, 243 norms, 33, 229–230 recommended readings, 231 Language impairment versus language delay, 130, 132, 216 Language knowledge deficit accounts of SLI, 123–124 Language Processing TestRevised, 330t Language sample analysis, 266–274 analysis methods, 267–271 computerized programs, 266, 269t–270 elicitation procedures, 269t, 273–274 factors affecting results, 271, 273–274 history of use, 266–271, 283 innovations in, 266 use in assessing change, 266 use in examining interactions in language performance, 268 use in identification, 240 use in treatment planning, 266 use with diverse populations, 230 Language Sampling, Analysis & Training (LSAT), 267, 269t Language tests, criterionreferenced measures, 32t for children under age 3, 237t for children with hearing impairment, 201t–202t for languages other than sign languages or English, 232t–233t normreferenced measures, 329t–337t processingdependent measures, 230 sign languages, 198–199 written language, 329t–337t Latent variables, 18 Late talkers, 128–130, 129t, 138 Learning disabilities and measurement issues, 18 Learning readiness, see Assessment of change, prediction of future change Legislation Education for All Handicapped Children Act of 1975 (PL 94–142), 84, 108 Education of the Handicapped Act Amendments of 1986, 81, 108 Individuals with Disabilities Education Act of 1990 (IDEA), 84, 85, 106, 108, 281–282 Individuals with Disabilities Education Act Amendments of 1997, 84, 85, 108, 318 Newborn and Infant Hearing Screening and Intervention Act of 1999, 194 Limited English proficiency (LEP), 227, 243 Lingquest, 269 Linguistic complexity, 259, 266 Linguistic universals, 267 Lipreading, see Speech reading Local norms, 33, 44, 222–223 M MacArthur Communicative Development Inventories, 237t Magnetic resonance imaging (MRI), 119–120, 139 Manual communication, see Sign languages Mastery, 33 Maximal performance measures, 64 McCarthy Scales of Children’s Abilities, 215 Mean, 24, 45 Mean length of utterance (MLU), 240, 267, 270–271 calculation of, 272t Measurement of behavior definition, 4, 12, 252 history of, 20, 49 levels of, 20–23 relationship to selection of appropriate statistical methods, 23 Measurement error, 224–226f assessment of change and, 296 base rates and, 235 referral rates and, 235 relationship to reliability, 67, 224 types, 6 false negatives, 219f false positives, 219f Measurement scales, see Measurement of behavior, levels of Median, 25, 45 Mental measurements yearbook series, 104–105, 106, 240 Mental retardation adaptive functioning and, 147, 162 age at identification, 147, 161–162 alcohol and, 153–154 attention deficit and hyperactivity disorder, 153, 159t–161, 160t, autism and, 153, 171 causes nonorganic, 155–156 organic, 149–155 toxins, 153–154, 156 cerebral palsy and, 147 communication strengths and weaknesses, 159t–160t definitions of, 147, 148t, 163
Page 360 Mental retardation (Continued) dementia and, 152, 162 emotional/behavioral disorders, 153, 159t–161, 160t familial, 155 fetal alcohol syndrome and, 153–155f fluency disorder, 159t fragile X syndrome and, 149, 152–153, 154f hearing impairment and, 151, 155, 159t, 160t longterm outcomes, 171 maltreatment and, 161 personal perspective, 156 prevalence, 147, 161 recommended readings, 164 sensory differences, 151, 153, 155, 159t–160t severity, 147, 148 MillerYoder Language Comprehension Test, 258t Mixed expressivereceptive language disorder, 114–115, 115t Mode, 25, 45 Mosaic Down syndrome, 151, 163 Multidisciplinary assessment, 281, 285 Multiple measures, 223, 305–306, see also Multipleoperationalism Multipleoperationalism, 306 N National norms, 32, 45 National Outcomes Measurement System (NOMS), 294, 319–320, 322–323 Native American culture family attitudes, 83t Natural Process Analysis, 336t Nominal level of measurement, 20–21t, 43, 45 Nondiscriminatory assessment definition of, 85, 106 methods for achieving, 229–231, 278 Nonparametric statistics, 30, 45 Nonreciprocal language, see Stereotypic language Normal curve, see Normal distribution Normal distribution, 30, 37f Normative group, 32, 45, 101, 234 Normreferenced measures construction of, 33–34f, 57–58 definition of, 31, 45 examples of, 32t interpretation of, 31, 43, 60, 218–227, 234 scores, 34–35, 101 use in description, 255–256 use in screening and identification, 217 Norms definition, 32, 45 local, 33, 44 national, 32, 45 O Observational Rating Scales, 305 Observed score, 66, 74 Omega squared, 29 Operational definitions, 19, 45 Oral and Written Language Scales: Listening Comprehension and Oral Expression, 330t Oral and Written Language Scales: Written Expression, 331t Ordinal level of measurement, 21t–22, 43, 45 Otitis media, 129, 151, 191, 199, 203, 205–206 Otoacoustic emissions and early identification of hearing impairment, 194, 206 Outliers, 24 Outoflevel testing, 158, 163, 256 Overshadowing, 204 P Paperandpencil tests, 31, 45 Parallelforms reliability, see Alternateforms reliability Parametric statistics, 30 Parent involvement in assessment, 236, 304 Parent questionnaires, 236–238 Parrot Early Language Sample Analysis (PELSA), 269 Participation, ICIDH2 proposed definition of, 87 Patterned Elicitation Syntax Test with Morphophonemic Analysis, 331t Peabody Picture Vocabulary Test, 267 Peabody Picture Vocabulary TestIII, 51–52, 57, 71, 331t Pearson Product Moment correlation coefficient, 28, 43 Percentile ranks, 36 Performance standard, 38 Performance testing, 31, 45 Perisylvian areas, 119–120 Person first nomenclature, 216, 243 Pervasive developmental disorder (PDD), 169, 172t, 183 Pervasive developmental disorder not otherwise specified (PDDNOS), 169, 172t, 183 Phenotype, 117, 139 Phonological awareness, 137, 139 Phonological memory deficit account of specific language impairment, 125 Phonological Process Analysis, 336t Phonology tests, 335t–337t Photo Articulation Test, 337t Physician’s Developmental Quick Screen, 239t Picture selection task, 261
Page 361 Placement testing, 30 Playbased assessment, 281, 284 Porch Index of Communicative Ability in Children, 274, 331t Prader Willi syndrome, 158 Predictive validity, see Criterionrelated validity, predictive validity Preferential looking, 261t Preferential seating, 190t, 195 Prelingual hearing loss, 206 PreLinguistic Autism Diagnostic Observation Schedule (PLADOS), 175 Preschool Language Assessment Instrument (PLAI), 258t Preschool Language Scale3 (PLS3), 331t Preschool Language Scale3–Spanish edition, 233t Preuba del Desarrollo Inicial del Lenguaje, 233t Principles and parameters framework (Chomsky), 123–124 Proband, 117, 139 Probes, see also Criterionreferenced measures; Descriptive measures; Informal measures control probes, 308, 315 generalization probes, 251, 308–309, 315 phonology, 259 pragmatics, 259, 260 sources for finding, 259, 260t–261t syntax, 260t–261t treatment probes, 308–309, 315 Profile analysis, see Discrepancy testing Profile in SemanticsGrammar (PRISMG), 269t Profile in SemanticsLexicon (PRISML), 269t Pronominal reversals, 176, 183 Proportional Change Index (PCI), 299–303, 322 Psychiatric diagnoses and language impairment, 134 Public relations validity, 73, see also Face validity Pye Analysis of Language (PAL), 269t Q Qualitative change, see Clinical significance Qualitative measures, 279–280 Qualitative research, 280, 285 R Range, 26, 45 Rating scales, 262–266 halo effects, 264 leniency effects, 264 metathetic continuum, 265, 285 prothetic continuum, 265, 285 Ratio level of measurement, 21t–23, 43, 45 Raw scores, 34–35 Recasts, 122, 139 ReceptiveExpressive Emergent Language Test2, 103, 237t, 258t Receptive OneWord Picture Vocabulary TestUpper Extension, 331t Receptive OneWord Picture Vocabulary Test, 331t Regional dialect, 82, 227–231 Reification and intelligence tests, 20 Reliability coefficients, 66 definition, 65–66, 75 differences in methods for criterion versus normreferenced measures, 67, 102 factors affecting, 71, 72 recommendation regarding levels, 102 relationship to agreement, 68–69f relationship to validity, 51, 65f–66, 73 types of, 73 alternate forms reliability, 67–68, 71 internal consistency, 68–70 testretest reliability, 67–68, 75 Restriction of range, effect on reliability, 71 Rett’s disorder, 169, 172t, 183 Reynell Developmental Language ScalesU.S. Edition, 331t Rhode Island Test of Language Structure, 202 Richness of description, 253, 285 Risk factors definition of, 116, 139 for language impairment, 116–127 Rosetti InfantToddler Language Scale, 237t S Scales of Early Communication Skills (SECS), 202 School language, 82 Scores, types of ageequivalent, 35–36t, 44 criterionreferenced, 38 gradeequivalent, 35–36t, 44 normreferenced, 34–38 percentile ranks, 36 standard scores, 36–37, 46 Screening, see also Identification base rates and, 235–236 characteristics of, 214, 242, comprehensive tests that include communication, 215 federal and local legislation, 240–242 indirect methods, 214 language measures for, 236–239t
Page 362 Screening, (Continued) reasons for, 214–215 referral rates, 235–236, 243 Screening Test for Developmental Apraxia of Speech, 337t SecordConsistency of Articulation Tests (SCAT), 259, 337t Segregation studies, see Genetics, pedigree studies of specific language impairment SEM, see Standard error of measurement (SEM) Sensitivity definition of, 218, 243 language tests and, 220–221 Sentence Repetition Screening Test, 239t Sequenced Inventory of Communication Development (SICD), 236–237t Sequenced Inventory of Communication Development (SICD) Spanish translation, 233t Severity ratings, 43, Sex chromosomes, 163 Sign languages, tests of, 198–199 varieties of, 196 Signed English, 196 Signing Essential English (SEE1), 196 Signing Exact English (SEE2), 196 Simultaneous communication, see Hearing impairment, total communication and Single subject experimental designs, clinical use of, 307–310, 314 definition, 322 interpretation of, 307–308, 315–317 recommended readings, 317 statistical versus visual analysis, 308 withdrawal, 315 SmitHand Articulation and Phonology Evaluation, 337t Social comparison as a method of social validation, 304, 322 Social deprivation, effects on development, 156 Social dialect, 82, 227–231 Social validation, 297, 303, 322 Social validity, see Clinical significance Sound Production Task (SPT), 259 Spanish, 231–233t, 237 Spanish Structured Photographic Expressive Language Test, 232t Specific language impairment (SLI) academic difficulties and, 132, 134–135 alternative terms for, 114–116, 119 argument structure and, 131t brain differences, 119–121 affecting dominance, 119 perisylvian areas, 119–121 planum temporale, 119–120f versus damage, 119 definition of, 114–115, 137, 139 demographic variables and, 121–122 emotional/behavioral disorders and, 133 environmental variables and, 121–123 figurative language and, 132t, 134 gender differences and, 114 genetic factors, 117–119 illusory recovery, 135 language patterns, 130–133t, 137 longterm outcomes, 135 morphological deficits and, 131t narrative skills and, 132 nature of, 223–224 personal perspective, 136 phonology and, 131t, 135, 137 pragmatics and, 132t prevalence, 114 recommended readings, 140 subgroup identification, 115, 130 suspected causes, 116–127 syntactic deficits and, 131t theoretical accounts (after Leonard), 123–127 crosslinguistic data and, 123–125, linguistic knowledge deficit accounts, 123–124, 138 generalized processing deficit accounts, 124–125 specific processing deficit accounts, 125–126, 139 written language and, 135, 137 Specificity, definition, 218, 243 language tests and, 218–221, 222 Speech reading, 188, 190t Splithalf reliability, 68–69 Stability, 67 Standard deviation, 25, 46 Standard error of measurement (SEM), 67, 70, 75, 101, 224 Standard scores, 36–37, 46 Standards for Educational and Psychological Testing, 50, 62, 89, 96, 105, 107 Statistical measures of central tendency, 24–25, 43 of variability, 24–26, 43 Statistical significance, 28–29, 46, 297 Stephens Oral Language Screening Test, 239t Stereotypic language, 177 Stereotypy, 182–183
Page 363 Stimulability testing, relationship to dynamic assessment, 276 Strabismus, 161, 163 Structured Photographic Expressive Language TestII, 331t Subjective evaluation as a method of social validation, 304–305, 323 Summative testing, 31 Surface hypothesis account of SLI, 125 Syndrome, definition of, 149 Systematic Analysis of Language Transcripts (SALT), 268, 269t, 271t T Tagalog, 233 Talking Task (TT), 259 Teacher Assessment of Student Communicative Competence (TASCC), 264 Teacher Assessment of Grammatical Structures (TAGS), 202 Teacher questionnaires, 237–238 TemplinDarley Tests of Articulation, 336t Temporal processing account of specific language impairment, 125–126 Termination of treatment, 4, 310–311 Test, definition, 49, 75 effect of length on reliability, 71 Test administration, adaptations, 63, 157–158, 200, 203, 229t importance of, 10, 63 motivation, 63 suggestions for, 64t Test de Vocabulario en Imagenes Peabody, 232t Test for Examining Expressive Morphology, 331t Test manuals, how to use, 88–103 Test of Adolescent and Adult Language, 332t Test of Adolescent/Adult Word Finding, 332t Test of Auditory Comprehension of Language3, 332t Test of Children’s Language, 332t Test of Early Language Development, 332t Test of Early Reading AbilityDeaf or Hard of Hearing, 103, 109 Test of Language CompetenceExpanded, 332t Test of Language DevelopmentIntermediate: 3, 57, 332t Test of Language DevelopmentPrimary: 3, 240, 332t Test of Pragmatic Language, 332t Test of Pragmatic Skills (Revised), 332t Test of Relational Concepts, 333t Test of Word Finding, 333t Test of Word Finding in Discourse, 333t Test of Word Knowledge, 333t Test of Written Expression, 333t Test of Written Language–2, 333t Test review guide, annotated, 90f–92f basic form, 93f–95f completed example, 97f–99f Test reviews clientoriented, 88–89, 106 computerized sources of, 104–5 populationoriented, 88–89, 106 steps in, 88–103 sources of published reviews, 103–105, 104t Testing of limits, 158 Texas Preschool Screening, 239t Theoretical construct, 18–19f, 43, 46, 51, 57, 306 Theory, 18, 46 Theory of mind, 181, 183 Time sampling, 275, 285 Token Test for Children, 333t Transdisciplinary assessment, 281, 285 Treatment effectiveness, 319, 323 effects, 319, 323 efficacy research, 295, 318, 321 efficiency, 319, 323 outcomes, 294–295 outcomes research, 318 Trial scoring, 274, 286 Triangulation of qualitative data, 280, 286 Trisomy 21, 151, 163 True score, 66, 75 T score, 38 Turner Syndrome, 158 Typetoken ratio, 240 U Ultimate outcomes, 294, 311, 323 Utah Test of Language Development3, 333t V Validity centrality to discussions of measurement quality, 50 definition, 51, 75 factors affecting, 10, 61, 62–66, 235–236 ‘‘types of,” see Validation, strategies of evidence gathering
Page 364 Validation differences for criterion versus normreferenced measures, 56–60 strategies of evidence gathering, 52–62 content validity, 52, 56t–60 criterionrelated validity, 52, 61–62, 310 construct validity, 52–56, 53f Variable, 19, 46 Variance, 25, 46 Variance accounted for, 29 Verbal auditory agnosia, 191 Vineland Adaptive Behavior Scales, 163, 215 Visuospatial languages, see Sign languages W “Watch and see” policy toward late talkers, 128 Wechsler Intelligence Scale for ChildrenRevised, 18 Wiig CriterionReferenced Inventory of Language, 258t Williams syndrome, 158, 160t, 163 Woodcock Language Proficiency BatteryRevised, 333t Word Test–Adolescent, 333t Word TestRevised, 333t World Health Organization, 85 Written language, 241 Z Zone of proximal development (ZPD), 276, 286 Zscores, 37