Copyrighted Matena)
series e d i t o r
ALAN
BRYMAN
Surveyins the Social World Principles and practice in survey research
Surveying the social world
Understanding Social Research Series Editor: Alan Bryman
Published titles Surveying the Social World A l a n A l d r i d g e a n d K e n Levine Ethnography J o h n D . Brewer Unobtrusive
Methods
R a y m o n d M . Lee Biographical Research B r i a n Roberts
in Social
Research
Surveying the social world PRINCIPLES AND PRACTICE IN SURVEY RESEARCH
A L A N ALDRIDGE a n d KEN LEVINE
Open University Press Buckingham • Philadelphia
For Meryl, Eileen, Alice and Max Open University Press Celtic Court 22 Ballmoor Buckingham MK18 1XW email:
[email protected] world wide web: www.openup.co.uk and 325 Chestnut Street Philadelphia, PA 19106, USA First published 2001 Copyright © Alan Aldridge and Ken Levine, 2001 A l l rights reserved. Except for the quotation of short passages for the purpose of criticism and review, no part of this publication may be reproduced, stored i n a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, without the prior written permission of the publisher or a licence from the Copyright Licensing Agency Limited. Details of such licences (for reprographic reproduction) may be obtained from the Copyright Licensing Agency L t d of 90 Tottenham Court Road, London, W1P OLP. A catalogue record of this book is available from the British Library ISBN 0 335 20240 3 (pb)
0 335 20241 1 (hb)
Library of Congress Cataloging-in-Publication Data Aldridge, Alan (Alan E.) Surveying the social w o r l d : principles and practice in survey research/ Alan Aldridge and Ken Levine. p. cm. — (Understanding social research) Includes bibliographical references and index. ISBN 0-335-20241-1 — ISBN 0-335-20240-3 (pbk.) 1 . Social surveys—Methodology. 2. Questionnaires. I . Levine, Kenneth, 1945-. I I . Title. I I I . Series. H M 5 3 8 . A 5 3 2001 300'.723—dc21
Typeset by Type Study, Scarborough Printed and bound i n Great Britain by Marston Book Services Limited, Oxford
00-068921
Contents
Series editor's Preface
foreword
1
W h y survey? O u r a p p r o a c h t o this b o o k W h a t is a survey? M e t h o d s o f data c o l l e c t i o n i n surveys Surveys a n d other research strategies T h e success o f the survey Critiques o f surveys Response t o the critiques o f surveys T h e social c o n t e x t o f surveys W h y are people w i l l i n g t o take p a r t i n surveys? W h y are people r e l u c t a n t t o take p a r t i n surveys? Research ethics A n i n v i t a t i o n t o survey research Further reading
\ 1 5 6 7 9 12 14 15 17 19 22 23 24
2
T h e o r y i n t o practice T h e c o m p o n e n t s o f the m o d e r n social survey T h e survey as a research strategy Types o f survey design
25 25 28 31
vi
3
4
5
6
Surveying the social world Relations between t h e o r y a n d research I n c o r p o r a t i n g a theoretical d i m e n s i o n i n t o surveys
32 35
Reliability and validity Further reading
39 41
P l a n n i n g y o u r project R e v i e w i n g y o u r assets
42 42
Setting the timetable C o m p u t i n g a n d software resources
46 47
G a i n i n g access t o organizations Three methods o f g a t h e r i n g data E m a i l a n d interactive surveys Diaries C h o o s i n g a m e t h o d o f gathering data C o m b i n i n g methods o f data gathering
49 51 56 57 58 58
Further reading
60
Selecting samples Introduction Theoretical populations P r o b a b i l i t y s a m p l i n g strategies Accuracy, precision a n d confidence intervals Sample size a n d s a m p l i n g e r r o r O t h e r types o f e r r o r t h a t affect surveys
61 61 63 64 75 76
Sampling strategies: n o n - p r o b a b i l i t y s a m p l i n g
79
Further r e a d i n g
83
C o l l e c t i n g y o u r data D o i n g i t yourself C o m m i s s i o n e d research C o v e r i n g letters f o r postal questionnaires
84 84 85 86
A p p r o a c h i n g respondents f o r a n i n t e r v i e w
90
Piloting D i s t r i b u t i o n a n d r e t u r n o f questionnaires Further reading
90 92 93
D e s i g n i n g the questions: w h a t , w h e n , w h e r e , w h y , h o w m u c h a n d h o w often? T h e sociological i m a g i n a t i o n U n d e r s t a n d i n g w h a t matters t o respondents Recognizing differences between respondents U s i n g u n a m b i g u o u s language sensitively T h e role o f open-ended questions T a c k l i n g the social desirability p r o b l e m
94 94 95 96 98 101 103
Contents
vii
Questions a b o u t respondents' k n o w l e d g e A v o i d i n g o v e r l a p p i n g categories A s k i n g a b o u t age A v o i d i n g d o u b l e - b a r r e l l e d questions A v o i d i n g negatives, d o u b l e negatives a n d worse T h e m a i n things t h a t go w r o n g i n designing questions, a n d h o w t o prevent t h e m T h e m o s t f r e q u e n t l y raised p r o b l e m s , a n d o u r answers Questionnaire layout Designing i n t e r v i e w schedules Setting u p f o r c o d i n g Further reading
107 112 114 118 121 123
7
Processing responses Introduction M a n u a l , s e m i - a u t o m a t e d a n d a u t o m a t e d data i n p u t D a t a file f o r m a t s a n d data types C o n s t r u c t i n g the c o d e b o o k Levels o f measurement Pre-coding a n d p o s t - c o d i n g M i s s i n g data M u l t i p l e responses C h e c k i n g a n d cleaning the data Further r e a d i n g
124 124 125 127 128 129 131 132 132 133 134
8
Strategies f o r analysis Introduction D i m e n s i o n s o f analysis Analysis o f open-ended responses E x a m i n i n g single variables Measures o f central tendency, dispersion, spread a n d shape Standardizing variables Statistical inference a n d s a m p l i n g e r r o r Cross-tabulation Testing hypotheses a n d statistical significance Measures o f association f o r n o m i n a l variables Measures o f association f o r o r d i n a l variables Measures o f association f o r r a t i o variables - c o r r e l a t i o n M u l t i v a r i a t e analysis Further r e a d i n g
135 135 136 137 139 142 143 143 144 147 149 152 152 155 160
9
104 104 105 106 106
Presenting y o u r findings
161
W r i t i n g f o r an audience
161
Characteristics o f the classic research r e p o r t
163
viii
Surveying the social world Use o f tables, figures a n d diagrams R e p o r t i n g o n the research m e t h o d s
166 170
Advantages o f the classic a p p r o a c h Disadvantages o f the classic a p p r o a c h Writing up Writing W r i t i n g u p and w r i t i n g Further r e a d i n g
171 172 173 173 174 175
Glossary Appendix Appendix References Index
1: The T r a v e l Survey questionnaires 2: Websites of professional associations
176 184 189 190 194
Series editor's foreword
T h i s U n d e r s t a n d i n g Social Research series is designed t o help students t o understand h o w social research is c a r r i e d o u t a n d t o appreciate a v a r i e t y o f issues i n social research m e t h o d o l o g y . I t is designed t o address the needs o f students t a k i n g degree p r o g r a m m e s i n areas such as sociology, social policy, psychology, c o m m u n i c a t i o n studies, c u l t u r a l studies, h u m a n geography, p o l i t i c a l science, c r i m i n o l o g y a n d o r g a n i z a t i o n studies a n d w h o are r e q u i r e d to take modules i n social research m e t h o d s . I t is also designed t o meet the needs o f students w h o need t o c a r r y o u t a research project as p a r t o f their degree requirements. Postgraduate research students a n d novice researchers w i l l find the b o o k s equally h e l p f u l . T h e series is concerned t o help readers t o ' u n d e r s t a n d ' social research methods a n d issues. T h i s w i l l m e a n d e v e l o p i n g an a p p r e c i a t i o n o f the pleasures a n d f r u s t r a t i o n s o f social research, an u n d e r s t a n d i n g o f h o w t o i m p l e m e n t certain techniques, a n d an awareness o f key areas o f debate. T h e relative emphasis o n these different features w i l l v a r y f r o m b o o k t o b o o k , but i n each one the a i m w i l l be t o see the m e t h o d or issue f r o m the p o s i t i o n of a p r a c t i s i n g researcher a n d n o t s i m p l y t o present a m a n u a l o f ' h o w t o ' steps. I n the process, the series w i l l c o n t a i n coverage o f the m a j o r methods of social research a n d w i l l address a v a r i e t y o f issues a n d debates. Each b o o k i n the series is w r i t t e n b y a p r a c t i s i n g researcher w h o has experience o f the techniques o r debates t h a t he or she is addressing. A u t h o r s are encouraged to d r a w o n t h e i r o w n experiences a n d inside k n o w l e d g e .
x
Surveying the social world
T h i s n e w b o o k o n surveys by A l a n A l d r i d g e a n d K e n Levine is very m u c h i n tune w i t h the aims o f the series. I t is concerned t o b r i n g o u t n o t just the principles t h a t are i n v o l v e d i n survey research b u t also a host o f p r a c t i c a l issues. H o w e v e r , i n survey research there are d i f f e r e n t contexts t o w h a t m i g h t be meant by a t e r m l i k e ' p r a c t i c a l issues'. Q u i t e r i g h t l y , A l d r i d g e a n d Levine refer quite o f t e n t o large, f r e q u e n t l y c o m p l e x exercises i n survey research t o illustrate some o f their m a i n p o i n t s . B u t f o r many, i f n o t m o s t , readers o f this b o o k such a c o n t e x t is very far f r o m the r e a l i t y they w i l l be facing i f they w i s h t o c a r r y o u t a social survey. I t is this second scenario w i t h w h i c h this b o o k is largely concerned. Students, w h e t h e r undergraduate or postgraduate, are l i k e l y t o have l i m i t e d resources a n d i n v a r i a b l y l i m i t e d t i m e at their disposal. Texts o n survey research t h a t focus p r i m a r i l y o n large, lavishly f u n d e d n a t i o n a l surveys are h a r d l y p e r t i n e n t t o such a s i t u a t i o n . A l d r i d g e a n d Levine's b o o k is f u l l o f advice o n h o w t o devise survey research i n the k i n d o f e n v i r o n m e n t t h a t t y p i c a l l y c o n f r o n t s a student: namely, h a v i n g a f a i r l y t i g h t l y focused set o f research questions t h a t are t o be answered using a survey a p p r o a c h , b u t w i t h l i m i t e d resources. A l d r i d g e a n d Levine b r i n g their experience o f c o n d u c t i n g a relatively small-scale survey o n a h i g h l y focused t o p i c - t r a v e l t o w o r k decisions a n d behaviour o f staff a n d students at their u n i v e r s i t y - t o p u t some flesh o n the bones o f the principles o f survey research. T h e y b r i n g o u t the k i n d s o f issue t h a t need t o be t a k e n i n t o account w h e n c o n d u c t i n g such research. I n the process, they i d e n t i f y c r u c i a l decisions a b o u t the c o n d u c t o f surveys: w h a t k i n d o f sample t o select, w h e t h e r t o i n t e r v i e w o r t o use a self-completion questionnaire, h o w t o design survey questions, a n d so o n . I n a d d i t i o n , they address various h a r d w a r e a n d software issues a n d p r o v i d e a h e l p f u l overv i e w o f approaches t o q u a n t i t a t i v e data analysis. But i t is the sense o f being i n o n the reality o f w h a t i t is l i k e t o d o a survey t h a t distinguishes this b o o k f r o m others o n the survey a p p r o a c h a n d t h a t w i l l p r o v e indispensable t o f u t u r e survey researchers. Social surveys are rarely i f ever perfect. H o w e v e r , there are n u m e r o u s traps t h a t can ensnare the u n w a r y a n d this b o o k w i l l alert readers t o w a y s o f a v o i d i n g t h e m , as w e l l as i n t r o d u c i n g the realities o f survey research. Alan Bryman
Preface
I n an era i n w h i c h 'social i n c l u s i o n ' , 'active citizenship' a n d 'customercentred' feature a m o n g the p o p u l a r buzz w o r d s , i t is n o t s u r p r i s i n g t h a t an increasing n u m b e r o f i n d i v i d u a l s a n d i n s t i t u t i o n s are attracted t o the social survey as a w a y o f c o n s u l t i n g interest groups, audiences a n d clients. Surveys a b o u n d , b u t m a n y o f the people w h o w i l l c a r r y t h e m o u t lack any f o r m a l t r a i n i n g i n social research methods a n d need guidance a b o u t principles as w e l l as p r a c t i c a l k n o w - h o w . There is n o shortage o f existing t e x t b o o k s t h a t deal w i t h social surveys and some o f t h e m have established w o r t h y r e p u t a t i o n s . H o w e v e r , f o r the purposes o f the non-professional people m e n t i o n e d above a n d also f o r students being i n t r o d u c e d systematically t o the m e t h o d f o r the first t i m e , m a n y existing w o r k s have one or b o t h o f t w o d r a w b a c k s . First, they f a i l t o distinguish between the possibilities open t o an i n d i v i d u a l o r small g r o u p c o n d u c t i n g a modest survey o n a l i m i t e d budget, a n d w h a t is possible f o r a research centre c o m m a n d i n g a sizeable team sustained o n the basis o f a substantial research g r a n t . A variety o f strategies a n d techniques are r u l e d o u t i f the resources a n d staff t o i m p l e m e n t t h e m are l a c k i n g , a n d w e have t r i e d t o signal t h r o u g h o u t w h a t is feasible i n small-scale a n d solo projects. Second, some t e x t b o o k s m a k e the successful c o m p l e t i o n o f a social survey appear e x t r a o r d i n a r i l y u n l i k e l y . There is a tendency t o counsel p e r f e c t i o n a n d t o appeal t o ideals w i t h o u t p r a c t i c a l w o r k a r o u n d s being offered. There seem t o be so m a n y traps, hazards a n d obstacles t h a t o n l y an I n d i a n a Jones,
xii
Surveying the social world
p r o p e l l e d b y massive d e t e r m i n a t i o n a n d s u p e r h u m a n p o w e r s o f foresight, c o u l d overcome t h e m a l l . W h i l e i t is t r u e t h a t there are a v a r i e t y o f factors t h a t have t o be entertained, w e set o u t t o reassure readers t h a t surveys can indeed be c o n d u c t e d b y o r d i n a r y m o r t a l s . W e have n o t neglected the p r o b lems a n d p i t f a l l s b u t w e have t r i e d t o offer alternatives a n d remedies w h e r ever possible. B e y o n d t h a t , w e have sought t o strike a positive note a n d t o offer reassurance at the f e w p o i n t s i t is l i k e l y t o be needed. Part o f the e d i t o r i a l b r i e f w e were given was t o a v o i d a heavily statistical a p p r o a c h . T h e analysis o f survey data, even f o r small-scale investigations, necessarily involves the selection a n d use o f statistical t o o l s , so this is n o t an easy task. W e have concentrated o n the general role p l a y e d w i t h i n surveys by descriptive a n d i n f e r e n t i a l statistics, seeking t o m a i n t a i n a focus o n h o w they fit i n w i t h the other dimensions o f survey analysis a n d r e f e r r i n g readers t o other sources f o r the step-by-step detail o f procedures. B o t h o f the a u t h o r s have been associated w i t h the Survey U n i t at the U n i versity o f N o t t i n g h a m , U K , a n d one o f the investigations i t c o n d u c t e d , the Travel Survey, is used as a r u n n i n g example t h r o u g h o u t the b o o k . W e w o u l d like t o take this o p p o r t u n i t y t o t h a n k the present a n d past staff o f the U n i t , Jan Wagstaff, Beth Rogers, N e r y s A n t h o n y , D r N i c o l a Hendey, H e l e n Foster a n d Becky N u n n f o r their h a r d w o r k a n d g o o d h u m o u r i n i n n u m e r a b l e p r o jects. T h e undergraduates f r o m the School o f Sociology a n d Social Policy ( f o r m e r l y the School o f Social Studies), together w i t h postgraduates f r o m various departments t a k i n g the Q u a n t i t a t i v e M e t h o d s m o d u l e , also deserve o u r t h a n k s . T h e y have c o n f i r m e d once again t h a t teaching a n d l e a r n i n g are always t w o - w a y processes. W e a c k n o w l e d g e the c o n t r i b u t i o n o f Sue Parker i n the School o f Sociology a n d Social Policy, a n ever-helpful source o f supp o r t a n d encouragement t o student l e a r n i n g i n modules i n v o l v i n g surveys a n d statistics. O u r t h a n k s are due t o Paddy R i l e y o f A c a d e m i c C o m p u t i n g Services, U n i v e r s i t y o f N o t t i n g h a m , f o r the benefit over m a n y years o f his expertise w i t h SPSS a n d other c o m p u t e r packages. Finally, w e are i n d e b t e d t o Professor A l a n B r y m a n , the series editor, f o r his m a n y h e l p f u l suggestions. A l l o f the above r e m a i n entirely blameless f o r any errors o f o m i s s i o n o r commission.
I ) Why survey?
Our approach in this book Every b o o k o n social surveys is t r y i n g t o be h e l p f u l . Despite the g o o d i n t e n t i o n s , i t is a l l t o o easy t o be unrealistic a n d o f f - p u t t i n g . W h y is this? We suggest the f o l l o w i n g reasons: • Checklists of do's and don'ts: T h e don'ts always seem t o o u t n u m b e r the do's. Survey research sounds like a m i n e f i e l d . • Counsels
of perfection:
A n y f a i l u r e t o abide by the do's a n d don'ts appears
2
Surveying the social world t o i n v a l i d a t e the w h o l e survey. M a n y readers sense they w i l l never m a t c h u p t o this i d e a l , so w h y bother t r y i n g ?
• Too much technique, not enough imagination: T h e design a n d analysis o f surveys involves technicalities - hence the do's a n d don'ts. B u t i f t h a t were all there is t o i t , i t w o u l d be very d u l l . L u c k i l y , i t does n o t have t o be like t h a t . Successful surveys i n v o l v e an exercise o f the sociological i m a g i n a t i o n , as w e l l as s k i l f u l use o f techniques. Survey research is a c r a f t , like t h r o w i n g a p o t , a n d brings m u c h the same satisfactions (and f r u s t r a t i o n s ) . • Statistics: Statistical analysis is a p o w e r f u l i n s t r u m e n t , a n d i t is f o o l i s h t o attack i t . B u t statistics are t o o l s , n o t an end i n themselves. T h e h a r d p a r t is usually n o t the statistics, b u t the sociological i m a g i n a t i o n . I f these are the p r o b l e m s w i t h b o o k s o n social surveys, h o w have w e dealt w i t h them? U s i n g a n d extended example We use a recent a n d real life example of a survey, w h i c h w e refer t o t h r o u g h o u t the b o o k , t o illustrate the practical a n d theoretical issues w h i c h arise at each stage. T h u s t h r o u g h o u t y o u w i l l f i n d discussions o f the Travel Survey. T h e purpose is t o examine the p l a n n i n g and execution o f a single real survey t h a t y o u can f o l l o w step by step t o see h o w the different aspects a n d activities t h a t make u p a social survey f i t together. M a n y chapters c o n t a i n a b o x focusi n g o n features o f the Travel Survey relevant t o the topics dealt w i t h i n t h a t chapter. T h e Travel Survey questionnaires are reproduced i n A p p e n d i x 1 .
Box
I. I
The Travel
Survey
The Travel Survey was commissioned f r o m the Survey Unit at the University of Nottingham, U K , early in 1998 by the administrative department responsible f o r buildings, parking and transport facilities. They needed information on the commuting habits of students and staff, so that they could fulfil the commitments they had given t o the local authority t o minimize the traffic congestion likely t o be caused by the construction of a new satellite campus about half a mile f r o m the main site. They also wanted t o preserve the parkland character of the main campus by encouraging 'environmentally friendly' forms of commuting such as buses and bicycling. The survey was intended t o generate detailed data on commuting patterns and related attitudes among staff and students that would enable transport consultants t o advise the university on a variety of 'green' policies. Thus its objective was primarily descriptive rather than analytic: the task was t o describe variations in commuting patterns rather than t o offer explanations of them.
W h y survey?
3
T h e sociological i m a g i n a t i o n L i k e a l l methods o f social research, surveys call f o r a n exercise o f the sociological i m a g i n a t i o n . I n surveys, as i n f i e l d w o r k , w e have t o 'take the role o f the o t h e r ' (George H e r b e r t Mead's phrase); t h a t is, w e m a k e a n i m a g i n a t i v e leap i n t o the roles o f o u r respondents, t r y i n g t o get inside t h e i r experiences, their p r i v a t e troubles, their joys a n d aspirations, a n d their w a y s o f t h o u g h t a n d expression. We have t o be sensitive t o nuances o f language, t o the w i d e r c u l t u r e , a n d o f t e n t o the o r g a n i z a t i o n a l a n d o c c u p a t i o n a l setting. W e have t o a v o i d stereotypes a n d stereotyped t h i n k i n g .
Box t .2
The sociological imagination: sensitive topics
In the 1960s, a team of sociologists at the University o f Cambridge, U K conducted an investigation into the values, beliefs and social activities of relatively well paid working-class people: the Affluent Worker studies. As part of their survey, they asked a sample o f respondents t o keep a diary logging their weekly social and leisure activities. Some respondents were embarrassed that most of their leisure time was spent on everyday activities like mowing the lawn, cleaning the car and going shopping. They were w o r r i e d that the researchers would think their lives were dull - an example of the social desirability* problem. This example reminds us t o be imaginative about what the potentially sensitive topics are likely t o be. Looking at it positively, sensitive issues also tend t o be the most interesting sociologically and the most important socially.
B e i n g realistic Every researcher k n o w s t h a t compromises have t o be made a n d desirable things left u n d o n e . We o f t e n have the simple choice: m a k e the best o f i t , o r do n o t h i n g .
Box 1.3
Being realistic: no time to do a pilot
In 1993, Aldridge was approached by a senior administrator at the University of Nottingham, U K . T h e university's Management Group was debating whether o r n o t t o build a day nursery on campus f o r the children of students and staff. They were n o t sure what the level and pattern of
The first use of a term included in the glossary is printed in bold.
4
Surveying the social world
demand w o u l d be. Could Aldridge help by conducting a survey of staff and mature students? This was half-way through October. The Vice-Chancellor wanted a r e p o r t and recommendations by mid-December. A f t e r discussion, i t was agreed that this could be put back t o early January at the latest. Strictly, Aldridge did not have the time and resources t o do t h e survey 'properly', in t e x t b o o k fashion. But i t seemed a very important project. Better that the university have some objective information t o go o n than none at all. Aldridge therefore w e n t ahead, but had t o make some compromises. He decided that there was no time t o conduct a pilot survey to test the question wording f o r all the problems that can arise. All surveys are supposed t o be carefully piloted; t o o m i t this is risky. (It is not, despite the impression sometimes given, unethical.) Aldridge decided t o do the following: • undertake crash reading about nursery provision, t o identify the key issues (Aldridge knew very little about the topic); • show the draft questionnaire t o a few friends and colleagues, asking them t o be extremely critical and pull no punches; • keep the questionnaire as simple as possible, covering only the key issues and avoiding anything fancy; • spend a lot o f time o n the covering letter, t o t r y t o ensure that the questionnaire would be well received by a very diverse group of respondents: n o t just academic staff and students, but secretaries, porters, cleaners, ground staff and so on; not just people w i t h infant children, but childless people, childfree people, and people w h o would have been desperate f o r a nursery but for w h o m it was t o o late because their children were grown up; • dispense w i t h a follow-up (reminder) letter, even though it would certainly have boosted the response rate; • keep the analysis straightforward and the final r e p o r t short and t o the point. Happily, i t turned out well. The response rate was reasonable, respondents were very cooperative, and the r e p o r t was w r i t t e n on time. The ViceChancellor was pleased. And the university decided t o build the nursery.
O u r readers' experience a n d resources We are w r i t i n g m a i n l y f o r readers w h o have h a d very little experience i f any o f d o i n g a survey. Some readers w i l l have t a k e n p a r t i n a survey as a respondent - w h i c h m a y o r m a y n o t have been a s t i m u l a t i n g experience. We are also assuming t h a t , i n m o s t cases, the sort o f survey the reader w i l l be l i k e l y t o u n d e r t a k e , at least t o begin w i t h , w i l l be a relatively small-scale
W h y survey?
5
one w i t h l i m i t e d resources. These can be very w o r t h w h i l e - size is n o t the m o s t i m p o r t a n t t h i n g . T h e reader m a y w e l l be w o r k i n g solo or, i f n o t , i n a small t e a m . T h e reader m a y be a student, or someone w a n t i n g d o a survey o n behalf o f an o r g a n i z a t i o n . A l t h o u g h o u r b o o k does sometimes refer t o large-scale surveys like the General H o u s e h o l d Survey or the Census, w e are n o t p r i m a r i l y w r i t i n g a b o u t those. A f t e r a l l , i f y o u are w o r k i n g o n such a survey y o u w i l l receive t r a i n i n g a n d be t o l d w h a t t o d o ! H i n t s a n d examples Each survey is u n i q u e . T h e r e f o r e , lists o f do's a n d don'ts are t o o i n f l e x i b l e . A s o l u t i o n i n one survey m a y n o t w o r k i n another. W e p r o v i d e general h i n t s , n o t inflexible rules. W e also give real examples o f h o w w e have t r i e d t o solve p r o b l e m s i n o u r o w n research. W e use o u r Travel Survey f o r the U n i v e r s i t y o f N o t t i n g h a m as a n extended example r u n n i n g t h r o u g h o u t the b o o k . Statistics O u r a i m is t o i n t r o d u c e the b r o a d p r i n c i p l e s o f statistical analysis, t o c l a r i f y which statistics are a p p r o p r i a t e when, a n d t o indicate w h a t statistics can a n d c a n n o t d o . W e p r o v i d e suggestions f o r f u r t h e r r e a d i n g o n the technical aspects.
W h a t is a survey? A social survey is a type o f research strategy. By this w e m e a n t h a t i t involves an o v e r a l l decision - a strategic decision - a b o u t the w a y t o set a b o u t g a t h ering a n d analysing data. T h e strategy i n v o l v e d i n a survey is t h a t we collect the same information about all the cases in a sample. Usually, the cases are i n d i v i d u a l people, a n d a m o n g other things w e ask a l l o f t h e m the same questions. T h i s is the type o f survey w e concentrate o n i n this b o o k . T h e items o f i n f o r m a t i o n w e gather f r o m o u r respondents are the v a r i ables. Variables can be classified i n t o three b r o a d types, depending o n the type o f i n f o r m a t i o n they p r o v i d e : • attributes - t h a t is, characteristics such as age, sex, m a r i t a l previous e d u c a t i o n • behaviour - questions such as w h a t ? when? h o w often? (if at all)
status,
• opinions, beliefs, preferences, attitudes - questions o n these f o u r characteristics are p r o b i n g the respondent's p o i n t o f view. We shall examine the n a t u r e o f variables m o r e f u l l y i n C h a p t e r 2. For the m o m e n t , the key p o i n t is t h a t a survey aims t o gather s t a n d a r d i n f o r m a t i o n i n respect o f the same variables f o r everyone i n the sample.
6
Surveying the social world
Methods of data collection in surveys Social surveys e m p l o y a variety o f methods t o gather i n f o r m a t i o n , such as questionnaires,
face-to-face i n t e r v i e w s , telephone
interviews a n d obser-
vation. Questionnaires These are f o r m s c o n t a i n i n g sets o f questions w h i c h the respondent c o m pletes a n d returns t o the researcher. One m a i n type is the postal (/mail) questionnaire, w h i c h is sent a n d r e t u r n e d t h r o u g h the post. Questionnaires m a y also be completed a n d r e t u r n e d o n the spot, f o r example i n a classroom or dentist's w a i t i n g - r o o m . T h e r a p i d g r o w t h o f e m a i l has opened u p another interesting p o s s i b i l i t y f o r the d i s t r i b u t i o n a n d r e t u r n o f questionnaires.
Face-to-face interviews I n this b o o k w e d o n o t refer t o questionnaires w h e n t a l k i n g a b o u t interviews. Rather, w e say t h a t the i n t e r v i e w e r has a n i n t e r v i e w schedule (for use i n a s t r u c t u r e d i n t e r v i e w ) o r an i n t e r v i e w guide (for use i n an u n s t r u c t u r e d or semi-structured i n t e r v i e w ) . (Some sociologists use 'questionnaire' m o r e broadly, t o include i n t e r v i e w schedules. W h e n they w a n t t o m a k e the dist i n c t i o n , they use the t e r m self-completion questionnaire.) Face-to-face interviews can be classified u n s t r u c t u r e d , a n d semi-structured.
i n t o three types:
structured,
1 I n a s t r u c t u r e d i n t e r v i e w , the questions a n d the question o r d e r are pre-set. The interviewer aims t o be i n c o n t r o l o f the i n t e r a c t i o n , a n d the respondent is just t h a t - someone w h o responds t o questions t h a t are p u t t o h i m or her. The i n t e r v i e w schedule is like a questionnaire, except i t is read o u t a n d filled i n by the interviewer. 2 I n u n s t r u c t u r e d i n t e r v i e w s , neither the questions n o r the question order are p r e d e t e r m i n e d . U n s t r u c t u r e d interviews are e x p l o r a t o r y , a n d i n p r i n ciple n o n - d i r e c t i v e : i t is m o r e l i k e a focused conversation. The a i m is t o enable people t o express themselves i n their o w n w o r d s , h i g h l i g h t i n g their o w n feelings, preferences a n d p r i o r i t i e s rather t h a n those o f the researcher. A l t h o u g h there is n o i n t e r v i e w schedule the interviewer m a y w e l l have an i n t e r v i e w guide, consisting o f a set o f p r o m p t s t o r e m i n d t h e m w h a t m a i n topics need t o be covered. 3 A semi-structured i n t e r v i e w is one w h i c h aims t o have the best o f b o t h w o r l d s . Parts o f the i n t e r v i e w are s t r u c t u r e d , w i t h a set o f questions directed i n sequence t o the respondent, w h i l e other parts o f the i n t e r v i e w are relatively u n s t r u c t u r e d e x p l o r a t i o n s o f p a r t i c u l a r or general issues. U n s t r u c t u r e d interviews are w i d e l y used i n therapy a n d counselling. T h e y
W h y survey?
7
clearly d o n o t meet the r e q u i r e m e n t , i n t r i n s i c t o the survey m e t h o d , t h a t standardized i n f o r m a t i o n is gathered systematically f r o m a l l respondents. A survey, by d e f i n i t i o n , cannot be w h o l l y based o n u n s t r u c t u r e d i n t e r v i e w s . T h i s does n o t m e a n t h a t survey researchers a n d n o n - d i r e c t i v e interviewers have t o be at loggerheads. T h r o u g h o u t this b o o k w e p o i n t t o the advantages o f m u l t i - m e t h o d research strategies. Questionnaires, u n s t r u c t u r e d interviews, focus g r o u p s , p a r t i c i p a n t o b s e r v a t i o n , diaries: a l l these m e t h o d s , a n d others besides, can be c o m b i n e d i n i m a g i n a t i v e a n d i n n o v a t i v e w a y s . Telephone i n t e r v i e w s T h e nature o f telephone interactions w i t h strangers implies t h a t telephone interviews are i n v a r i a b l y o f the s t r u c t u r e d variety. Observation Examples o f surveys based o n o b s e r v a t i o n are t r a f f i c censuses, a n d studies o f pedestrian f l o w s t h r o u g h c i t y centres (very useful c o m m e r c i a l l y t o anyone w a n t i n g t o k n o w w h e r e t o set u p shop).
Surveys and other research strategies T h e survey is one o f the three b r o a d research strategies available i n social research. T h e others are the e x p e r i m e n t a n d the case study. T h e experiment W i t h i n the social sciences, experiments have tended t o be c o n d u c t e d exclusively by psychologists. Some experiments are carried o u t l a b o r a t o r y , others i n n a t u r a l settings, ' i n the field' - t h o u g h i t is far f o r field experiments i n the social sciences t o m a t c h the ideal type o f called ' t r u e ' e x p e r i m e n t a l design. A m i d the w i d e v a r i e t y o f types o f m e n t a l design w e can d i s t i n g u i s h the f o l l o w i n g key features.
almost i n the harder the soexperi-
Experiments are usually designed t o test hypotheses (tentative explanations a n d predictions) a b o u t the causal relations between variables. T h e researcher carefully c o n t r o l s the independent variables (the p o t e n t i a l causes) i n order t o measure their i m p a c t o n the dependent variables, the effects. The people t a k i n g p a r t i n the e x p e r i m e n t , the subjects, are d i v i d e d i n t o t w o or m o r e groups, o n the basis o f a r a n d o m assignment o f i n d i v i d u a l s t o groups. These groups are exposed t o d i f f e r e n t e x p e r i m e n t a l treatments. Statistical tests are used t o determine the extent t o w h i c h any differences i n the measurement o f outcomes (dependent variables) are due t o each independent variable. I n an e x p e r i m e n t , the researcher deliberately introduces a
difference
8
Surveying the social world
between the people t a k i n g p a r t . For example, take a c l i n i c a l t r i a l i n w h i c h some subjects receive a n e w d r u g designed t o relieve headaches, w h i l e others receive n o t r e a t m e n t at a l l . V e r y o f t e n , n o t r e a t m e n t means being given a placebo - a harmless p r e p a r a t i o n w h i c h has n o m e d i c a l value o r p h a r m a c o logical effects. A t the outset, each subject stands an equal chance o f being i n the e x p e r i m e n t a l g r o u p receiving the d r u g or i n a c o n t r o l g r o u p receiving n o t r e a t m e n t o r the placebo. Clearly, i t w o u l d be hopeless i f a l l the m e n were p u t i n t o one g r o u p a n d all the w o m e n i n t o the other, because t h e n w e w o u l d n o t k n o w w h e t h e r differences i n o u t c o m e were due t o the d r u g or t o the sex o f the p a r t i c i p a n t . R a n d o m assignment o f i n d i v i d u a l s t o groups is a statistically derived technique f o r addressing the p r o b l e m t h a t other independent variables, i n this example the subject's sex, m i g h t be causing the differences i n the dependent variables, i n this example headache relief. E q u a l l y clearly, i t w o u l d be n o g o o d i f subjects k n e w w h e t h e r they were receiving a d r u g or a placebo. I f they d i d k n o w , i t m i g h t w e l l affect t h e i r response t o the t r e a t m e n t , thereby i n v a l i d a t i n g the e x p e r i m e n t . F o r this reason, active placebos are sometimes used - t h a t is, placebos w h i c h m i m i c the side-effects o f the d r u g b u t w i t h o u t its hypothesized therapeutic benefits. Very o f t e n , i t is also desirable t h a t the researchers themselves d o n o t k n o w , at the t i m e they are a d m i n i s t e r i n g a t r e a t m e n t , w h e t h e r i t is a d r u g o r a placebo. I f they d i d k n o w , they m i g h t u n i n t e n t i o n a l l y c o m m u n i c a t e their feelings a n d expectations t o their subjects, s u b t l y i m p l y i n g t h a t the d r u g w o u l d w o r k whereas the placebo w o u l d n o t . A n e x p e r i m e n t o r c l i n i c a l t r i a l i n w h i c h neither the subjects n o r the researchers k n o w , d u r i n g the e x p e r i m e n t , t o w h i c h g r o u p the subjects have been assigned, is k n o w n as a double blind procedure. I n surveys, b y contrast, the researcher is dealing w i t h differences between respondents t h a t are g i v e n , n o t e x p e r i m e n t a l l y created. M e n a n d w o m e n , smokers, people w h o have given u p s m o k i n g a n d people w h o have never s m o k e d , car drivers, m o t o r c y c l i s t s , cyclists a n d pedestrians: w e d o n o t e x p e r i m e n t a l l y create these differences, o u r respondents present t h e m t o us.
T h e case study As the name i m p l i e s , a case study involves an i n - d e p t h i n v e s t i g a t i o n i n t o a p a r t i c u l a r example o f a social p h e n o m e n o n o r i n s t i t u t i o n . T w o areas o f socio l o g y i n w h i c h case studies have p l a y e d a p r o m i n e n t p a r t are the sociology o f e d u c a t i o n , w h e r e detailed w o r k has focused o n social interactions i n classrooms, s t a f f r o o m s , p l a y g r o u n d s a n d so o n , a n d the sociology o f r e l i g i o n , w h e r e studies have focused o n m i n o r i t y religious movements such as the M o o n i e s (Barker 1 9 8 4 ) , e x a m i n i n g , f o r example, the r e l a t i o n s h i p between M o o n a n d his f o l l o w e r s , a n d p r o b i n g the question, have members exercised choice or are they brainwashed?
W h y survey?
9
Case studies t y p i c a l l y i n v o l v e a w i d e range o f research techniques, i n c l u d i n g o b s e r v a t i o n , p a r t i c i p a n t o b s e r v a t i o n , i n t e r v i e w s , d o c u m e n t a r y analysis, a n d asking people t o keep a diary. T h e y m a y also i n v o l v e some survey w o r k - case studies a n d surveys are n o t i n c o m p a t i b l e .
The success of the survey M o d e r n survey research is the f r u i t o f a l o n g a n d c o m p l e x h i s t o r y o f social, scientific a n d p h i l o s o p h i c a l development. W e t e n d t o take surveys f o r granted, b u t v i e w e d h i s t o r i c a l l y they are a n achievement. Survey research t o d a y is u n d e r p i n n e d b y discoveries i n s a m p l i n g theory, m u l t i v a r i a t e analysis a n d scaling m e t h o d s . Readily obtainable c o m p u t e r packages m a k e sophisticated a n a l y t i c a l tools w i d e l y available. F u n d a m e n t a l ideas such as the concept o f the respondent - a person w h o is b o t h the object o f e n q u i r y a n d a n i n f o r m a n t - were very s l o w t o develop ( M a r s h 1 9 8 2 : 1 9 ) . These advances t o o k place i n a range o f disciplines - sociology, psychology, demography, geography, m a r k e t i n g , o r g a n i z a t i o n research, statistics - a n d this c o n t r i b u t e d t o their success, since n o one discipline h a d a m o n o p o l y o n the survey. I n First W o r l d countries, surveys are f o u n d everywhere, a n d are conducted by a l l manner o f organizations, b o t h large a n d small, f r o m government agencies t h r o u g h large business c o r p o r a t i o n s t o small v o l u n t a r y organizations. I f surveys were as hopeless as some o f their m o r e extreme critics suggest, i t is h a r d t o e x p l a i n w h y they are so widespread a n d so e n d u r i n g . B o x 1.4 gives an example o f a survey whose i m p a c t has been incalculable.
Box 1.4
Smoking and lung cancer
In the first half of the twentieth century, lung cancer death rates increased sharply in several countries. By the 1950s, there was evidence f r o m both laboratory w o r k and studies of hospital patient records that appeared t o implicate smoking as a factor. However, the tobacco companies and some doctors remained unconvinced, arguing that atmospheric pollution and improved diagnosis were plausible alternatives and that t h e causal processes underlying respiratory cancers had not been identified. In 1951, Richard Doll and A . Bradford Hill (with the later collaboration of Richard Peto) embarked o n a major epidemiological study of smoking and cancer. They arranged for the British Medical Association (BMA) t o send questionnaires about smoking behaviour t o every d o c t o r on t h e Medical Register in Britain at four points over the next 21 years (eliciting responses f r o m over 34,000 individuals). They also traced and analysed the death certificates of 10,072 doctors w h o died over the period.
10
Surveying the social world
The results (see, f o r example, Doll and Hill 1952 and Doll and Peto 1976) showed that the lung cancer death rate of those doctors under 70 years w h o smoked was twice that of lifelong non-smokers of comparable age, w i t h increased death rates for other respiratory tract conditions and degenerative heart disease. Although the research did not attempt t o explain what i t was about smoking that caused lung cancer and the other associated conditions, it did provide large-scale evidence of a link between smoking and ill health. This evidence was hard t o refute and impossible t o ignore. The report's publication marked a significant turning point in official and public awareness of the dangers of tobacco smoking. Some other noteworthy features of the study are listed below. • Doctors were selected not because of any especially high o r low levels of smoking o r any suspected special susceptibility t o cancer, but mainly because they were a population likely t o be interested in the research, motivated t o cooperate and capable of reporting their smoking accurately and honestly. • A n o t h e r reason for choosing doctors was the existence of an accurate and ready-made sampling frame, the Medical Register, which meant there would be less difficulty in tracing doctors than a sample f r o m the general population. • The study was able t o show that the risks o f death increased steadily w i t h the number of cigarettes smoked. • It also revealed significant reductions in the death rate of the group o f doctors w h o gave up smoking compared t o those w h o continued t o smoke. The overall death rate f r o m lung cancer declined over the course of the study as many doctors gave up smoking, while other non-respirat o r y cancer rates remained stable. • The consistency o f the lung cancer death rate among doctors across different areas cast doubt on both atmospheric pollution and diagnostic improvement as major contributory factors. If these t w o had been operating, a differential between rural and urban rates would have been detected. • The study put the onus on those sceptical of the smoking-cancer link t o find a factor that varied simultaneously w i t h the incidence of smoking (the independent variable) and disease rates (the dependent
variables).
A m o n g the m o s t p o t e n t i a l l y i m p o r t a n t b u t also p r o b l e m a t i c surveys are those w h i c h i n v o l v e i n t e r n a t i o n a l comparisons. One example is discussed i n B o x 1.5.
W h y survey?
Box 1.5
An international survey of adult literacy
Comparative survey research may seek the collection of data f r o m respondents w h o belong t o different ethnic groups, cultures o r nation states. The design of such studies requires both methodological and administrative problems t o be addressed. W i t h o u t sacrificing a standardized approach, the data collection instruments may need t o be translated into different languages and t o use quite different forms of expression t o reflect divergent cultural perspectives. Practical considerations can rule o u t the use of self-completion questionnaires (rural postal services may be inadequate). The idea that certain low status social categories (children, unmarried women) will give their opinions freely t o strangers in private interviews may be locally unfamiliar o r unacceptable. The International A d u l t Literacy Survey, conducted on behalf of the Organization for Economic Cooperation and Development in 1994, was an attempt t o establish the comparative levels of adult literacy and numeracy in eight developed societies (Canada, Germany, the Netherlands, Poland, Sweden, Switzerland, the United States and Eire) using a suite of common tests and schedules ( O E C D 1995). A research team f r o m each nation conducted a probability sample that was designed t o be representative of its non-institutionalized population aged 16-65. In total, well over 20,000 individuals were involved. In Canada, respondents were given a choice of English o r French test materials; in Switzerland, the sample was restricted t o French-speaking and German-speaking cantons w i t h respondents required t o use the corresponding language. Respondents completed a test booklet and their demographic and employment details were gathered in an interview lasting, on average, one hour, conducted in their homes. People w i t h very low levels of literacy were screened o u t of the samples by initial test questions. A m o n g the general findings were marked national differences: for example Sweden had large proportions at the t o p levels of the numeracy and literacy scales w i t h small proportions at the lowest level, while the position in Poland was the reverse. There were also strong links in all countries between being currently unemployed and low levels of literacy. H o w ever, the significance and implications of the findings of such complex surveys is open t o challenge. The measurement procedures used rest on assumptions that can be contested while the nature of the causal links lying behind the observed differences, in this case links between individual skills, employment and economic development, may be contentious. For a critique of the IALS, see Levine (1999).
II
12
Surveying the social world
Critiques of surveys O v e r the years there have been n u m e r o u s criticisms o f social surveys as a research strategy. We can classify t h e m i n t o t w o b r o a d , d i a m e t r i c a l l y opposed types: scientific critiques a n d h u m a n i s t i c critiques.
Scientific critiques o f surveys Critiques o f this k i n d are o f t e n m o u n t e d by people f o r w h o m the experim e n t a l m e t h o d is the o n l y v a l i d means o f a r r i v i n g at scientific findings. A c c o r d i n g t o t h e m , surveys m a y display some o f the t r a p p i n g s o f science reliance o n statistical analysis, use o f j a r g o n , the appearance o f o b j e c t i v i t y b u t a l l this is superficial. T h e charge is t h a t surveys cannot be scientific because the variables are n o t p r o p e r l y c o n t r o l l e d . I n experiments, the researcher makes strenuous efforts t o c o n t r o l f o r the possible effects of extraneous independent v a r i ables. I n a r a n d o m i z e d c l i n i c a l test, f o r example, e v e r y t h i n g is geared t o measuring the effects o f the d r u g . T h a t a n d t h a t alone is w h a t interests us. Experiments are designed t o isolate a very s m a l l n u m b e r o f key variables so as t o measure the causal relations between t h e m . Surveys, i n contrast, are s p r a w l i n g constructions, t y p i c a l l y i n v o l v i n g a large n u m b e r o f variables c o v e r i n g a respondent's a t t r i b u t e s , behaviour a n d o p i n i o n s . N o v a l i d causal inferences can be d r a w n f r o m survey research, i t is said. I f w e find a c o r r e l a t i o n between, say, respondents' religious a f f i l i a t i o n a n d their level o f e d u c a t i o n a l a t t a i n m e n t , w e have n o w a y o f k n o w i n g w h a t the causal mechanisms are o r even, i n m a n y cases, i n w h i c h d i r e c t i o n the causality r u n s . T h e m o s t t h a t can be h o p e d f o r f r o m a survey is some descriptive m a t e r i a l t h a t m a y suggest hypotheses w h i c h can be scientifically tested t h r o u g h an e x p e r i m e n t a l design.
H u m a n i s t i c critiques o f surveys I n this perspective, the p r o b l e m w i t h surveys is n o t t h a t they f a i l t o be scient i f i c , b u t t h a t the a i m t o be scientific is misconceived. T h i s c r i t i q u e has a n u m b e r o f dimensions. O n e m a j o r o b j e c t i o n is t h a t surveys are atomistic: they treat society a n d c u l t u r e as n o m o r e t h a n the sum o f the i n d i v i d u a l s w i t h i n i t . T h e sociology o f r e l i g i o n provides an example. C a n w e really measure the religiosity o f a society by asking a sample o f the p o p u l a t i o n a b o u t their o w n religious beliefs a n d behaviour? A r g u a b l y , w e s h o u l d assess the social a n d c u l t u r a l i m p o r t a n c e o f r e l i g i o n by e x a m i n i n g the influence o f r e l i g i o n o n the educ a t i o n system, o n the l a w , o n the p o l i t i c a l process, a n d o n the c o m m e r c i a l decisions o f business c o r p o r a t i o n s ( A l d r i d g e 2 0 0 0 ) . I f o u r i n v e s t i g a t i o n shows t h a t r e l i g i o n has l i t t l e influence, t h e n society has been secularized -
W h y survey?
13
r e l i g i o n has lost social significance - even i f o u r surveys s h o w t h a t the m a j o r ity o f people say they believe i n G o d . Paradoxically, a l t h o u g h surveys are a t o m i s t i c they are n o t really c o n cerned w i t h i n d i v i d u a l s at a l l . T h e t h r u s t is t o p r o d u c e aggregate data: 80 per cent are this, 55 per cent t h i n k t h a t , 2 per cent d o the other. T h e language of survey research betrays its lack o f concern w i t h the i n d i v i d u a l : respondents, samples, cases. A n d w h a t are statistics, i f n o t a means o f analysing aggregate data? Focus o n the i n d i v i d u a l h u m a n being, a n d the statistician is silent. O n e s t r a n d i n the h u m a n i s t i c c r i t i q u e o f survey research has been t r e n c h a n t l y expressed b y B l u m e r (1956) i n his attack o n the l i m i t a t i o n s o f ' v a r i able analysis' - b y w h i c h he means the r e d u c t i o n o f social processes t o the c o r r e l a t i o n between variables. These variables, he argues, are n o t generic: they d o n o t stand f o r abstract categories, a n d so c a n n o t be generalized b e y o n d the specific c o n t e x t o f the survey. T h e y are l o c k e d i n t o w h a t B l u m e r calls the 'here a n d n o w ' , w h i c h , w e m a y n o t e , soon becomes the 'there a n d t h e n ' . T h e depressing c o n c l u s i o n is t h a t variable analysis results i n k n o w ledge w h i c h is neither generalizable n o r c u m u l a t i v e . N o r does i t offer any insight i n t o the i n t e r p r e t i v e processes t h r o u g h w h i c h social reality is c o n structed. A c c o r d i n g t o the h u m a n i s t i c c r i t i q u e , surveys are o n l y m a r g i n a l l y less a r t i ficial t h a n experiments. Surveys c a n n o t overcome the p r o b l e m o f the react i v i t y o f research i n s t r u m e n t s , because they are by their very n a t u r e a crashing i n t r u s i o n i n t o the n o r m a l f l o w o f social l i f e . Respondents are selfconsciously behaving as respondents. O n e o b v i o u s a n d ineradicable expression o f this is the p r o b l e m o f social desirability. Respondents' answers are influenced b y t h e i r desire t o be h e l p f u l a n d t o live u p t o t h e i r o w n self-image or t o a n ideal w h i c h they t h i n k w i l l l o o k g o o d t o the researcher. Respondents w i l l therefore o v e r - r e p o r t t h e i r v i r t u o u s acts a n d p l a y d o w n o r ignore their failings a n d foibles. T h e y w i l l also t r y t o appear consistent, w i t h the result t h a t their o p i n i o n s a n d beliefs w i l l seem m o r e coherent t h a n they really are. Part o f the a r t i f i c i a l i t y o f surveys, a c c o r d i n g t o critics, is t h a t they are d r i v e n by the concerns o f the researcher rather t h a n the respondent. T h e essence o f a social survey is t o p u t questions t o respondents. W h a t e v e r efforts w e m a k e t o a l l o w respondents t o express themselves i n their o w n w o r d s , w e c a n n o t go very far. I t is s i m p l y n o t possible i n a survey design t o have a large n u m b e r o f open-ended questions, w h e r e respondents are free t o answer i n w h a t e v e r w o r d s they choose. M o s t o f o u r questionnaire or interv i e w w i l l i n e v i t a b l y consist o f closed questions, w h e r e w e present a series o f choices f r o m w h i c h respondents are asked t o choose. I t f o l l o w s f r o m this, say the critics, t h a t w e shall find i t a l m o s t impossible to gauge the salience o f issues t o o u r respondents. I t is w e , after a l l , w h o are raising the issues i n the first place. W e can o f course ask respondents h o w i m p o r t a n t given issues are f o r t h e m . Even so, this is h a r d l y a s o l u t i o n t o the
14
Surveying the social world
social desirability p r o b l e m . A d m i t t i n g t o a researcher t h a t y o u have n o interest i n issues apparently deemed t o be i m p o r t a n t is a d i f f i c u l t t h i n g t o d o . Some critics conclude f r o m a l l this t h a t the o n l y v a l i d use o f surveys is t o gather basic factual i n f o r m a t i o n , as i n n a t i o n a l censuses. M a r k e t researchers can use surveys t o find o u t w h a t p r o d u c t s w e b u y a n d w h a t possessions w e o w n . T h i s , however, is h a r d l y the stuff o f a v i b r a n t social science. I t is m u n d a n e , u n t h e o r i z e d f a c t - g r u b b i n g - w h a t C. W r i g h t M i l l s (1970) called abstracted empiricism. I t shows, w h a t is m o r e , t h a t the basic f u n c t i o n of social surveys is t o p r o v i d e useful i n f o r m a t i o n t o people w h o have p o w e r over us. Some d r a w the devastating conclusion t h a t surveys are an i n s t r u ment used by O r w e l l i a n B i g Brothers t o keep tabs o n the proles.
Response to the critiques of surveys Few sociologists n o w a d a y s see sociology as a h a r d science o n a par w i t h nuclear physics o r m i c r o b i o l o g y . M o s t people agree. For t h a t reason, the scientific c r i t i q u e o f surveys is less pressing t h a n the h u m a n i s t i c c r i t i q u e . Despite the scientific c r i t i q u e , w e believe t h a t surveys have a p a r t t o p l a y i n establishing causal relations, as w e shall e x p l a i n . B u t causality is always c o m p l e x because society is c o m p l e x . Decades o f research i n t o the effects o f the mass m e d i a , i n c l u d i n g a host o f true experiments, have p r o d u c e d very little h a r d evidence. The t r u t h is - t h o u g h pressure groups find i t impossible t o accept - w e s i m p l y d o n o t k n o w m u c h a b o u t the effects o f the mass m e d i a , a n d perhaps w e never w i l l . For us, f o r o u r students, a n d w e suspect f o r m o s t o f o u r readers, i t is the h u m a n i s t i c c r i t i q u e t h a t is p o t e n t i a l l y the m o r e d a m a g i n g . O u r response t o i t , developed t h r o u g h o u t the b o o k , is i n essence, this.
Poor surveys I t is sadly t r u e t h a t t o o m a n y surveys are p o o r l y designed, b a d l y executed a n d i n c o r r e c t l y analysed. T h e y y i e l d n o t h i n g o f value. Clearly, t h o u g h , exactly the same is true o f ill-conceived experiments a n d botched field w o r k . ' R u b b i s h i n , r u b b i s h o u t ' applies t o a l l research strategies. O u r a i m is t o p r o m o t e the cause o f g o o d surveys.
A m u l t i - m e t h o d approach Surveys can be f r u i t f u l l y c o m b i n e d , i n a l l sorts o f imaginative w a y s , w i t h u n s t r u c t u r e d i n t e r v i e w s , observational fieldwork, d o c u m e n t a r y analysis, focus groups a n d so o n . U s i n g m o r e t h a n one research strategy enables us t o t r i a n g u l a t e data, t h a t is, t o use a variety o f methods t o test the v a l i d i t y a n d r e l i a b i l i t y o f o u r findings. We give examples i n Chapter 3. We do n o t accept
W h y survey?
i5
t h a t surveys c a n n o t address sensitive a n d subtle issues. I n o u r view, i t is disastrous t o erect a sectarian barrier between surveys a n d fieldwork, q u a n t i tative a n d q u a l i t a t i v e m e t h o d s . As O a k l e y has argued ( 1 9 9 8 ) , one danger is the c r e a t i o n o f a gendered h i e r a r c h y o f k n o w l e d g e i n w h i c h q u a n t i t a t i v e research is represented as objective, hard-edged a n d masculine, w h i l e q u a l i tative research is subjective, sensitive a n d f e m i n i n e . T h e a p p a r e n t l y sharp o p p o s i t i o n between q u a n t i t a t i v e a n d q u a l i t a t i v e research is a social c o n struct t h a t perpetuates p a t r i a r c h y ; u p o n serious e x a m i n a t i o n , a l l social research t u r n s o u t t o have q u a n t i t a t i v e a n d q u a l i t a t i v e elements.
T h e r o l e o f social t h e o r y Social surveys can p l a y a significant p a r t i n the development a n d testing o f sociological theory. Surveys d o n o t have t o be f a c t - g r u b b i n g . I t is w o r t h a d d i n g t h a t i n m a n y cases so little is k n o w n a b o u t a t o p i c t h a t a f e w facts w o u l d n o t go amiss.
Servants o f p o w e r I t is t r u e t h a t survey research is useful t o c o m m e r c i a l organizations a n d t o the state. O n the other h a n d , survey research can give a voice t o the general p u b l i c , t o consumers, a n d t o disadvantaged a n d disprivileged g r o u p s . T h i s brings us t o the social c o n t e x t o f surveys.
The social context of surveys Social surveys as w e u n d e r s t a n d t h e m are a m o d e r n p h e n o m e n o n . T h e y developed d u r i n g the p e r i o d o f i n d u s t r i a l i z a t i o n , a n d came t o f u l l f r u i t i o n i n the t w e n t i e t h century. T h e B r i t i s h Census began i n 1 8 0 1 , a n d has been carried o u t every ten years since t h a t date w i t h the e x c e p t i o n o f 1 9 4 1 , at the height o f the Second W o r l d War. Similarly, the decennial (ten yearly) Census of P o p u l a t i o n i n the U n i t e d States began i n 1790. T w o o f the m o s t i m p o r t ant surveys ever carried o u t i n the U K were Charles Booth's Life and Labour of the People in London (published 1 8 8 9 - 1 9 0 2 i n seventeen volumes) a n d Seebohm Rowntree's study o f Y o r k , Poverty: A Study of Town Life ( 1 9 0 2 ) . A n u m b e r o f i n f l u e n t i a l surveys were c a r r i e d o u t by Mass Observation, w h i c h was f o u n d e d i n 1936 a n d w h i c h h a d a keen sense o f a m i s s i o n t o i n f o r m the general p u b l i c a b o u t the state o f the n a t i o n . I n o u r o w n times, m a j o r n a t i o n a l surveys include the General Household Survey a n d the Labour Force Survey. I n a d d i t i o n t o these large-scale affairs, there are c o u n t less small surveys t a k i n g place every week o f the year ( t h o u g h they slacken off d u r i n g m a j o r h o l i d a y s ) . There is every sign t h a t surveys w i l l c o n t i n u e t o f l o u r i s h i n the t w e n t y - f i r s t century a n d b e y o n d .
16
Surveying the social world
Social surveys are also a feature o f First W o r l d societies. T h e y depend u p o n s t r o n g central i n s t i t u t i o n s a n d advanced c o m m u n i c a t i o n s i n f r a s t r u c tures. T h e T h i r d W o r l d c a n n o t always a f f o r d t h e m , a n d the c o m m a n d economies o f the c o m m u n i s t societies h a d less need o f t h e m . A t the t i m e o f w r i t i n g , the f o r m e r c o m m u n i s t countries are experiencing p r o f o u n d c o n flicts i n their t r a n s i t i o n t o a m a r k e t economy, a liberal-democratic p o l i t y a n d a consumer society. T h e i r p r o b l e m s are n o t just economic b u t social a n d c u l t u r a l . T h e neglect o f serious survey w o r k was characteristic o f their lack o f responsiveness t o consumer interests. Reading texts o n social survey design a n d analysis w r i t t e n i n the 1960s a n d 1970s can be revealing. A t times i t seems almost another w o r l d . T h e social scientist t y p i c a l l y comes across as an a u t h o r i t y d e m a n d i n g coopera t i o n f r o m respondents. A c u l t u r a l g u l f lies between us a n d t h e m . These respondents are i n c o m p e t e n t . T h e y w i l l m i s u n d e r s t a n d o u r i n t e r v i e w questions a n d mess u p o u r questionnaires unless w e give precise instructions a n d spell e v e r y t h i n g o u t i n m i n u t e d e t a i l . Similarly, the survey director, a m a n , w i l l have t o give lengthy t r a i n i n g a n d detailed instructions t o his h i r e d - h a n d interviewers, w h o are w o m e n . These w o m e n , l i k e the respondents, t e n d t o get things w r o n g unless their w o r k is closely m o n i t o r e d . T h e alienated l a b o u r o f the mass p r o d u c t i o n assembly-line is thus reflected i n the closely c o n t r o l l e d routines o f the h i r e d - h a n d researcher. T h e advice given i n these texts is n o t so m u c h p o o r as o u t o f date. T h e language o f this era carries over i n t o c o n t e m p o r a r y research, a n d m u c h o f i t can feel u n c o m f o r t a b l e . T h e people w h o take p a r t i n o u r research are c o n v e n t i o n a l l y called 'respondents', w h i c h m a y suggest a s t i m u l u s response m o d e l o f o u r r e l a t i o n s h i p w i t h t h e m . Some researchers t h i n k i t w o u l d be better t o speak o f ' i n f o r m a n t s ' , as the social a n t h r o p o l o g i s t s d o , t o acknowledge the p o i n t t h a t people are s u p p l y i n g us w i t h i n f o r m a t i o n w h i c h they have a n d w e w a n t . A t least, u n l i k e m a n y psychologists a n d medical researchers, w e d o n o t refer t o people as 'subjects'. A n o t h e r u n f o r t u n a t e w o r d is ' i n s t r u c t i o n s ' . W h e n w e ask people t o fill o u t questionnaires, w e need t o give t h e m guidance o n w h a t w e are l o o k i n g for, as w e l l as some e x p l a n a t i o n o f the rationale u n d e r l y i n g o u r questions. T h i s guidance is c o n v e n t i o n a l l y t e r m e d ' i n s t r u c t i o n s ' , even t h o u g h w e d o n o t have the p o w e r o f c o m m a n d t h a t the t e r m appears t o i m p l y . People can refuse t o be i n t e r v i e w e d , o r p u t the p h o n e d o w n o n us, o r t h r o w o u r quest i o n n a i r e i n t o the waste paper basket. T h e y can also c o m p l a i n t o us, o u r sponsors o r o u r employers, as w e have b o t h discovered. I n some w a y s , this language o f 'respondents' a n d ' i n s t r u c t i o n s ' does n o t matter. A f t e r a l l , w e d o n o t use i t i n t a l k i n g to the people w e are surveying; i t is an o c c u p a t i o n a l discourse w e e m p l o y a m o n g ourselves t o t a l k about t h e m . T h e p o i n t is, w e need constantly t o r e m i n d ourselves a b o u t o u r relationship w i t h o u r respondents. Just as a c o m m e r c i a l firm w h i c h treats its
W h y survey?
17
customers as ' p u n t e r s ' is l i k e l y t o lose business, so a survey w h i c h sees respondents as i g n o r a n t d i m w i t s is a survey scarcely w o r t h d o i n g . Today, m o r e t h a n ever before, people are uneasy a b o u t the w a y i n w h i c h social surveys use aggregate data. I n surveys w e are t y p i c a l l y c o m p a r i n g m e n w i t h w o m e n , smokers w i t h n o n - s m o k e r s , a n d car drivers w i t h cyclists a n d pedestrians. I n d i v i d u a l s are submerged i n t o a category - w h i c h w e m a y find objectionable. People have c o m p l a i n e d f o r years a b o u t sociologists' alleged obsession w i t h social class. I n the w o r d s o f N u m b e r 6, the lead character i n the 1960s c u l t T V series The Prisoner: ' I a m n o t a number, I a m a h u m a n b e i n g ! ' T h e m o r e w e c a n persuade o u r respondents t h a t their o w n individual experience a n d o p i n i o n s c o u n t f o r s o m e t h i n g , the better. Reflecting o n the s o c i o - c u l t u r a l c o n t e x t o f surveys can help us t o i d e n t i f y the reasons w h y people are w i l l i n g t o take p a r t i n t h e m a n d the m a i n sticki n g - p o i n t s . F r o m these reflections, w e c a n d r a w some b r o a d conclusions a b o u t basic principles w h i c h c a n guide us i n designing o u r research.
Why are people willing to take part in surveys? H e l p i n g the researcher T h i s has always been one o f the m o s t p o w e r f u l motives f o r filling o u t questionnaires a n d agreeing t o be i n t e r v i e w e d . People w a n t t o be h e l p f u l . U n f o r tunately, t h e i r help m a y take the f o r m o f t e l l i n g us w h a t they t h i n k w e w a n t to hear. T h i s is another example o f the social desirability p r o b l e m , one o f the m a i n challenges t h a t c o n f r o n t the social researcher. Altruism As w e l l as h e l p i n g the researcher, respondents are o f t e n m o t i v a t e d b y the hope t h a t the research w i l l p r o m o t e social progress. People v o l u n t e e r f o r a l l sorts o f social activities i n o r d e r t o m a k e the w o r l d a better place. R i c h a r d Titmuss's classic w o r k ( 1 9 7 0 ) , The Gift Relationship, uses the U K ' s v o l u n t a r y b l o o d d o n o r system as a case study o f the p o w e r o f a l t r u i s m .
Citizenship R e s p o n d i n g t o surveys c a n be a w a y o f expressing one's democratic r i g h t as a citizen t o have a voice i n p u b l i c affairs. T h i s is p r o b a b l y the m a i n reason w h y people t u r n o u t t o vote i n elections. Even i f o u r vote is u n l i k e l y t o m a k e any difference t o the o u t c o m e , w e m a y still h o l d i t i m p o r t a n t t o have o u r say i n the democratic process. So i t is w i t h surveys. T h i s implies t h a t people's m o t i v a t i o n t o take p a r t i n a survey w i l l be strengthened i f they believe t h a t their expressions o f o p i n i o n w i l l c o u n t f o r s o m e t h i n g .
18
Surveying the social world
Let's t a l k a b o u t us O f t e n , w e survey n o t the general p u b l i c b u t a p a r t i c u l a r g r o u p w i t h i n i t : students, clergy, people w i t h literacy p r o b l e m s , members o f an ethnic m i n o r i t y a n d so o n . I n such surveys i t is n o r m a l l y clear t o respondents t h a t the reason they have been selected is t h a t they are members o f a p a r t i c u l a r g r o u p or s t r a t u m i n society. O u r survey gives t h e m the chance t o be representatives of their g r o u p . T a k i n g p a r t i n survey research is one w a y a g r o u p o f people can g a i n a hearing f o r t h e i r o p i n i o n s , experiences a n d ideas. T h i s m o t i v e can be very p o w e r f u l w h e n a g r o u p feels a sense o f grievance t h a t its p o i n t of v i e w has been m i s u n d e r s t o o d a n d its p r o b l e m s i g n o r e d . L u c k i l y f o r us, most groups feel t h a t w a y .
Let's t a l k a b o u t me G i v e n a p p r o p r i a t e safeguards, people like t a l k i n g a b o u t themselves. I t m a y n o t always be the noblest m o t i v e , b u t i f w e expect o u r respondents t o be saints w e s h o u l d consider an alternative career t o survey research. I n the n e x t section, w e discuss some o f the reasons w h y people m a y be reluct a n t o r u n w i l l i n g t o take p a r t i n surveys. V e r y o f t e n , motives are m i x e d . People m a y have s t r o n g reasons b o t h t o p a r t i c i p a t e a n d n o t t o d o so. B o x 1.6 gives one instance o f this ambivalence.
Box 1.6
Establishing the 'dark figure' of unrecorded crimes
in the late 1960s, the US government sought t o expand its intelligence about crime and criminals beyond the information available in the Uniform Crime Reports (UCR) the standard format in which official police and c o u r t statistics are presented in the USA (of similar status t o the H o m e Office's Criminal Statistics in the U K ) . O n e objective was t o estimate the size of the dark figure, the volume of crimes that had actually been committed but which, f o r various reasons, w e n t unrecorded in the UCR. Researchers adopted an ambitious design which included surveys of selected US cities and of businesses in order t o gather information on levels of white collar crimes like fraud. However, the most influential strand was a survey of the general public, based on eliciting details f r o m a representative sample of US households about the occasions on which household members had been the victims of eight major types of crime, defined in the same way as they were in the UCR (Coleman and Moynihan 1996:71-2). f
The first US nationwide victimization survey in 1972 suggested crime rates three t o five times those of the UCR. Despite a variety of methodological problems, including doubts about the accuracy of respondent recall
W h y survey?
19
and the capacity of the researchers t o translate the respondents' common sense definitions of offences into the legalistic framework of the UCR, the findings commanded considerable public and official attention. Although the use of victimization surveys spread quickly beyond the USA, the British government waited until 1981 before commissioning the first national U K survey which was conducted in 1983. Over time, the methodology used has been refined and the research objectives expanded t o cover the processes underlying the non-reporting of incidents and the variety of roles victims can play in precipitating crimes. Victimization surveys are n o w generally accepted as an important complement t o o r t h o d o x criminal and judicial statistics, though the initial claim that they could establish the ' t r u e ' prevalence of crime is n o w regarded sceptically. Self-report surveys, in which samples of the population are asked about their o w n offending, are a further way in which the o r t h o d o x official statistics can be supplemented. Inviting respondents t o admit they have broken the law, o r even that they have engaged in lesser kinds of deviant conduct, necessitates very careful question design and interviewing technique. Self-report surveys have been used extensively in connection w i t h so-called victimless crimes like drug abuse, and also w i t h schoolchildren regarding smoking, substance abuse and under-age alcohol consumption.
Why are people reluctant to take part in surveys? Decline o f deference I t used t o be said p o p u l a r l y t h a t B r i t a i n was a class-conscious society, r i v e n by class d i s t i n c t i o n a n d snobbery. I n sociological terms, w h a t was being referred t o was n o t class b u t status: l o o k i n g u p t o o r d o w n o n people d e p e n d i n g o n t h e i r social b a c k g r o u n d , o c c u p a t i o n , e d u c a t i o n a n d style o f life c o m p a r e d t o y o u r o w n . P r o f o u n d social a n d c u l t u r a l changes are e r o d i n g these status-conscious patterns o f t h o u g h t a n d behaviour, as d e m o n s t r a t e d by the Affluent Worker studies o f the 1960s ( G o l d t h o r p e et al.). O n e consequence is t h a t social researchers can n o longer expect deferential c o o p e r a t i o n f r o m ' o r d i n a r y ' people. W h a t is t r u e o f B r i t a i n is true elsewhere.
Scepticism a b o u t experts L i n k e d t o the decline o f deference is a g r o w i n g scepticism a b o u t the expertise of scientists a n d professionals. H i g h l y publicized scandals have reduced p u b l i c confidence i n the pronouncements o f experts. The BSE crisis i n the U K is one d r a m a t i c example. One practical consequence is t h a t s i m p l y
20
Surveying the social world
m e n t i o n i n g a university or scientific a f f i l i a t i o n i n a covering letter is n o longer accepted as a guarantee o f h o n o u r a b l e intentions i n the w a y i t once was.
Consumerism T h e rise o f consumer society is one o f the key issues i n c o n t e m p o r a r y sociology. C o n s u m e r i s m implies choice, i n c l u d i n g the choice o f exit - i n this case, refusal t o take p a r t . I t m a y also i m p l y an o r i e n t a t i o n t o w a r d s cost-benefit analysis: w h y s h o u l d I take part? W h a t w i l l I g a i n , a n d w h a t w i l l i t cost me i n t i m e , e f f o r t , o r f r u s t r a t i o n ?
C o m p e t i t i o n f r o m m a r k e t research a n d salespeople Social scientists are n o t the o n l y people c o n d u c t i n g surveys. T h e fact t h a t c o m m e r c i a l m a r k e t research relies o n surveys as a p r i n c i p a l source o f i n f o r m a t i o n is a sign o f the p o w e r o f the survey m e t h o d . Salespeople sometimes p r e t e n d t h a t they are c o n d u c t i n g a survey w h e n they are really t r y i n g t o sell us s o m e t h i n g (a t r i b u t e vice pays t o v i r t u e ) . Telephone sales pitches r o u t i n e l y begin w i t h the false assurance, ' D o n ' t w o r r y , M r A l d r i d g e , I ' m n o t t r y i n g t o sell y o u a n y t h i n g ' . W e need t o distinguish o u r o w n research f r o m these other activities.
Survey fatigue A r g u a b l y , there are just t o o m a n y surveys g o i n g o n . People get fed u p (technically, survey fatigue), a n d are n o t w i l l i n g t o take p a r t i n yet another survey unless i t is w e l l designed a n d seems especially w o r t h w h i l e .
Intensification o f social life T h e society o f leisure, once p r e d i c t e d i n the 1960s, has n o t yet a r r i v e d . M a n y people feel under increasing pressure at w o r k . A l l sorts o f therapies are available t o help people cope w i t h the stress o f m o d e r n l i v i n g . O u r telephone c a l l , o u r r i n g o n the d o o r b e l l , o u r questionnaire o n the d o o r m a t , m a y seem like yet another i n t r u s i o n i n t o people's precious free t i m e . O u r surveys need to come across as p a r t o f the s o l u t i o n , n o t p a r t o f the p r o b l e m . Dislike of form-filling O n e source o f stress is filling o u t o f f i c i a l f o r m s . W e have yet t o meet anyone w h o enjoys d o i n g t h e i r t a x r e t u r n . A questionnaire w h i c h feels l i k e an o f f i c i a l f o r m is p r o b a b l y n o t one t h a t w i l l achieve a h i g h response rate. So, t o o , an i n t e r v i e w t h a t is experienced as an i n t e r r o g a t i o n is u n l i k e l y t o y i e l d r i c h i n f o r m a t i o n or deep insights.
W h y survey?
21
Privacy T h e concept o f 'the i n f o r m a t i o n society' has received a l o t o f a t t e n t i o n f r o m sociologists (Webster 1 9 9 5 ) . People are concerned a b o u t the data t h a t c o m mercial a n d p u b l i c agencies h o l d o n t h e m ; hence m a n y societies have passed laws o n data p r o t e c t i o n a n d f r e e d o m o f i n f o r m a t i o n . I n general, people are nowadays far m o r e suspicious a b o u t the uses t o w h i c h data are p u t t h a n they were i n the past. T h i s means t h a t any guarantee o f confidentiality has t o be seen t o be w a t e r t i g h t . I f the researcher can guarantee a n o n y m i t y , so m u c h the better, even t h o u g h i t can raise problems f o r the researcher. T h e nature o f the guarantee o f c o n f i d e n t i a l i t y o r a n o n y m i t y s h o u l d be realistic a n d crystal clear.
Box 1.7 lessons
Encouraging people to take part in surveys: general
• Value of the research W e presumably think that o u r w o r k will be valuable. The m o r e we can convince respondents that this is true, the better. W h e r e v e r possible, we should find ways of feeding the main findings back t o o u r respondents and t o people like them. • Value of respondents contribution W h y should a respondent bother t o answer o u r questions, when they have plenty of other things t o do? W h a t difference will their participation make t o the value of o u r research? Some respondents are w o r ried that they have nothing original and interesting t o say, o r that they don't k n o w much about the topic. W e need t o convince people that their o w n individual response is important. • Being explicit In modern societies, respondents are increasingly sophisticated and critical. They are familiar w i t h surveys, and alert t o deceptive techniques of persuasion. Many people are concerned about the researchers' hidden agenda, their sources of funding, and the uses t o which the findings will be put. W e need t o make the rationale of o u r research as explicit as possible. • A humanistic approach Many respondents, and most sociologists, do not believe that sociology is a hard science like nuclear physics o r inorganic chemistry. If our style of research - for example, rigidly structured questionnaires and interviews, w i t h little opportunity for respondents t o express their o w n views in their o w n words - suggests that we are treating people as the objects of scientific research, we are likely t o encounter resistance. People should have the opportunity t o express their views in their o w n words.
22
Surveying the social world
Research ethics Professional research ethics can be seen i n the c o n t e x t o f the w i d e r c u l t u r a l factors w e have just been r e v i e w i n g . T h e f u n d a m e n t a l principles o f research ethics f l o w f r o m the n a t u r e o f the social r e l a t i o n s h i p between researcher a n d respondent, a r e l a t i o n s h i p w h i c h is necessarily embedded i n a set o f c u l t u r a l values, n o r m s a n d codes o f c o n d u c t . A l l o f the m a j o r professional bodies such as the B r i t i s h Sociological A s s o c i a t i o n (BSA), the B r i t i s h Psychological Society (BPS) a n d the A m e r i c a n Sociological A s s o c i a t i o n p u b l i s h guidelines o n research ethics t o w h i c h their members are expected t o adhere. A p p e n d i x 2 suggests a f e w w e b addresses w h e r e these m a y be v i e w e d . T h e general principles o f research ethics i m p a c t s o m e w h a t differently, depending o n the research strategy chosen. Despite its different a p p l i c a t i o n s , the core o f research ethics is due respect f o r the i n t e g r i t y o f people p a r t i c i p a t i n g i n o u r research. Respect f o r o u r respondents can be b r o k e n d o w n i n t o three key c o m ponents: i n f o r m e d consent, c o n f i d e n t i a l i t y a n d sensitivity.
I n f o r m e d consent C o m p a r e d t o fieldwork observations, one p o t e n t i a l v i r t u e o f surveys is t h a t they are relatively o v e r t . T h e p r o b l e m s o f covert research are far less pressi n g f o r the survey researcher t h a n they m a y be f o r the ethnographer w o r k i n g i n the field. Even so, as survey researchers w e need t o be as open as w e reasonably can be a b o u t the purposes o f o u r research, the sources o f f u n d i n g , a n d the p o t e n t i a l audiences f o r a n d uses o f o u r findings. We s h o u l d m a k e i t easy f o r respondents t o raise any queries they m a y have. I n some cases i t m a y be desirable t o give the name o f a responsible person w h o m they can contact i f they w a n t t o v e r i f y w h o w e are a n d the n a t u r e o f o u r research. I n i n t e r v i e w s , w e s h o u l d have p r o o f o f o u r i d e n t i t y re a d i l y available. I t m a y also be desirable t o indicate t h a t o u r research has the a p p r o v a l o r s u p p o r t o f a relevant person o r b o d y - a trade u n i o n , say, or a c h a r i t y . W e also need t o consider w a y s i n w h i c h w e can m a k e a s u m m a r y o f o u r findings available t o o u r respondents, so t h a t i n f o r m e d consent comes t o f r u i t i o n i n an i n f o r m e d outcome.
Confidentiality Respondents are usually offered an assurance o f confidentiality. I n some cases this extends f u r t h e r t o a n o n y m i t y , w h i c h is the stronger guarantee t h a t n o t even the researchers w i l l be able t o i d e n t i f y w h o the respondent is -
W h y survey?
23
something w h i c h is o n l y easily achieved i n the case o f self-completion questionnaires. O u r assurances need t o be as clear as possible, so t h a t people are n o t misled. We also need t o be aware t h a t , i n some cases, i t is a l l t o o possible f o r a knowledgeable reader t o i d e n t i f y a respondent even i f w e have given t h e m a p s e u d o n y m a n d apparently concealed their identity. T h i s is a p a r t i c u l a r p r o b l e m w h e n the researcher is surveying the members o f an o r g a n i z a t i o n : there m a y be very f e w w o m e n or members o f ethnic m i n o r i ties, p a r t i c u l a r l y i n senior positions. H o w are w e g o i n g t o represent their responses w h i l e concealing their i d e n t i t y f r o m t h e i r f e l l o w w o r k e r s a n d their bosses?
Sensitivity One i m p o r t a n t area i n w h i c h sensitivity needs t o be exercised is i n the use o f language, p a r t i c u l a r l y as regards 'race' a n d ethnicity, sex a n d gender, age, a n d disability. Examples o f g o o d practice can be f o u n d i n other people's p u b l i s h e d w o r k . A p a r t i c u l a r l y useful source is The Q u e s t i o n B a n k , a resource centre f u n d e d by the ESRC (Economic a n d Social Research C o u n cil) a n d r u n by the N a t i o n a l Centre f o r Social Research, a n d the Universities o f S o u t h a m p t o n a n d Surrey i n the U K . Its internet address is: h t t p : //www.natcen.ac.uk/cass/. Language evolves, a n d varies cross-culturally i n the English-speaking w o r l d , so i t is i m p o r t a n t t o keep u p t o date o n acceptable usage i n the c u l t u r e i n question. W h i l e encouraging respondents t o take p a r t i n a survey is entirely a p p r o p r i a t e , a t t e m p t i n g t o p u t pressure o n t h e m is n o t . As citizens they have the r i g h t t o refuse (except w h e n there is a legal requirement t o respond, as i n the decennial Census). T h e p o t e n t i a l f o r undue pressure is greatest i n an o r g a n i z a t i o n a l c o n t e x t , where people m a y feel t h a t they w i l l be j u d g e d ' u n c o o p e r a t i v e ' i f they decline t o p a r t i c i p a t e .
An invitation to survey research O u r b o o k is an i n v i t a t i o n t o survey research. T h e w o r d i n v i t a t i o n implies j o i n i n g i n s o m e t h i n g w o r t h w h i l e a n d enjoyable. Surveys can be b o t h , w e believe. There are, o f course, p r o b l e m s a n d f r u s t r a t i o n s , a n d w e have t r i e d t o be open a b o u t t h e m . A t the same t i m e , w e are positive. T h e p r o b l e m s are there t o be overcome, a n d a successful survey can c o n t r i b u t e t o understandi n g social life i n the hope o f m a k i n g things better.
24
Surveying the social world
Key summary p« * * * *
Surveys are a for They involve the They have t o caf They can be con
P o i n t s f o r reHeci
Further reading Marsh (1982) The Survey Method: The Contribution of Surveys to Sociological Explanation is an excellent account of the survey as a research strategy. It examines the major critiques and vigorously defends surveys against them. For a comprehensive account of the survey method see Babbie (2001) The Practice of Social Research.
2 ) Theory into practice
The components of the modern social survey I n Chapter 1 , the social survey was defined as a strategy i n w h i c h the same i n f o r m a t i o n was collected f r o m a l l the cases i n a sample (or f o r the w h o l e p o p u l a t i o n o f interest). T h i s d e f i n i t i o n n o w requires some e l a b o r a t i o n . Because i t is b r o a d , i t covers a great m a n y o f the exercises c o n d u c t e d t h r o u g h o u t h i s t o r y by agents o f r u l i n g elites i n order t o establish, f o r example, the p o p u l a t i o n numbers o f key ethnic, religious or o c c u p a t i o n a l g r o u p s , the scale o f enterprises f o r t a x c o l l e c t i o n purposes, o r the m a n p o w e r
26
Surveying the social world
available f o r m i l i t a r y service. I n this b o o k , however, w e are p r i m a r i l y concerned w i t h the survey i n its c o n t e m p o r a r y f o r m s . T h e m o d e r n survey is a synthesis o f certain ideas a n d m e t h o d o l o g i c a l i n n o v a t i o n s t h a t were a v a i l able t o be used together o n l y by the m i d d l e o f the t w e n t i e t h century. These components are discussed b e l o w a n d a p a r t i a l e x p l a n a t i o n is offered t o a question t h a t m a y have arisen i n the m i n d of some readers - w h y is the systematic social survey such a recent development?
Respondent/informant orientation As suggested i n Chapter 1 , the idea t h a t the p r o v i d e r o f the i n f o r m a t i o n i n a survey is a respondent or an i n f o r m a n t is an i m p o r t a n t conceptual development, i n itself reflecting changing ideas o f citizenship a n d social part i c i p a t i o n . I n f o r m a n t s deserve t o be treated w i t h respect as knowledgeable a n d , w i t h i n l i m i t s , reliable, their c o o p e r a t i o n has t o be carefully sought a n d their rights a c k n o w l e d g e d (for example, t o have the i n f o r m a t i o n they p r o vide treated c o n f i d e n t i a l l y ) . There are exceptions t o the d e f i n i t i o n o f respondent offered i n Chapter 1 : some are proxies f o r the real subjects o f the i n q u i r y (parents f o r small c h i l d r e n , members f o r the o r g a n i z a t i o n a l teams o f w h i c h they are p a r t ) ; some censuses a n d other o f f i c i a l surveys d o impose legal sanctions f o r n o n - c o o p e r a t i o n . Nevertheless, the m o d e r n survey revolves a r o u n d i d e n t i f y i n g strategic i n f o r m a n t s , persuading t h e m t o cooperate, a n d p a i n s t a k i n g l y c o n s t r u c t i n g questionnaires a n d i n t e r v i e w schedules c o n t a i n i n g questions t h a t w i l l be m e a n i n g f u l t o t h e m . I n f o r m a t i o n c o l l e c t i o n is the p r i n c i p a l a i m w i t h other objectives set aside o r made subsidiary (these m i g h t include p r o m o t i n g awareness o f goods a n d services or r e c r u i t i n g p o t e n t i a l f o l l o w e r s t o a cause or interest g r o u p ) . I n contrast, pre¬ m o d e r n surveys were o f t e n exercises i n c o m p u l s i o n c o n d u c t e d o n groups unable t o refuse. I n some cases, such as B r i t i s h nineteenth-century research o n the poor, direct contact w i t h the g r o u p itself was o f t e n m i n i m i z e d a n d the p r i n c i p a l resort was t o t e s t i m o n y f r o m v a r i o u s expert intermediaries such as inspectors, the police or employers. T h e aims o f such exercises were o f t e n very b l u r r e d a n d concern f o r the interests of p a r t i c i p a n t s was rarely paramount. Standardized data c o l l e c t i o n i n s t r u m e n t s The questions posed i n p r e - m o d e r n surveys were o f t e n ad hoc a n d i l l considered, unselfcritically reflecting the social w o r l d s , assumptions a n d language o f the authors (or the bureaucracies i n w h i c h they w o r k e d ) rather t h a n the i n f o r m a n t s . There were a d d i t i o n a l obstacles t o be overcome. L o n g after the i n a u g u r a t i o n o f n a t i o n - w i d e m a i l deliveries, the use o f postal questionnaires w i t h samples o f the general p u b l i c was h a m p e r e d by l o w levels o f mass literacy a n d by w i d e s p r e a d suspicion t h a t the i n f o r m a t i o n being
Theory into practice
27
requested was an extension o f surveillance by agencies o f social c o n t r o l . Structured questionnaires a n d i n t e r v i e w schedules w i t h standardized w o r d i n g a n d e x p l i c i t d e f i n i t i o n s , wherever possible tested i n p i l o t exercises o n small groups o f respondents, c o u l d o n l y become an i n t e g r a l p a r t o f the m o d e r n social survey w h e n the social i n f r a s t r u c t u r e f a c i l i t a t e d their use. A significant p a r t o f the e f f o r t i n designing c o n t e m p o r a r y instruments is devoted t o m a k i n g the r e s u l t i n g schedule o r questionnaire comprehensive, applicable t o every respondent o r s i t u a t i o n t h a t m i g h t be encountered. A t the same t i m e , user-friendliness is a m a j o r c o n s i d e r a t i o n : respondents need t o feel at ease d u r i n g i n t e r v i e w s , w h i l e response rates f o r s e l f - c o m p l e t i o n questionnaires are p r o m o t e d i f the f o r m s are made as s t r a i g h t f o r w a r d as possible by clear l a y o u t a n d h e l p f u l graphics.
Systematic selection procedures T h e p r e s u m p t i o n t h a t scientific thoroughness o b l i g e d social researchers t o collect i n f o r m a t i o n f r o m every m e m b e r o f a c o m m u n i t y o r every h o u s e h o l d i n the p a r i s h was an e n d u r i n g one a n d p r o b a b l y delayed the social science a p p l i c a t i o n o f the statistical ideas u n d e r p i n n i n g p r o b a b i l i t y ( r a n d o m ) s a m p l i n g w h i c h were i n c i r c u l a t i o n since the m i d d l e o f the nineteenth cent u r y . T h e p r a c t i c a l a p p l i c a t i o n o f s a m p l i n g t h e o r y t o social surveys t o o k place i n the first decades o f the t w e n t i e t h century a n d the five t o w n s i n q u i r y by B o w l e y a n d B u r n e t t - H u r s t , p u b l i s h e d i n 1915, was possibly the first B r i t i s h sociological study t o include estimates o f the r e l i a b i l i t y o f findings based o n samples ( M a r s h 1982: 2 6 ) . A m o n g other t h i n g s , the advent o f sample surveys solved the p r o b l e m o f h a v i n g t o secure the resources t o f u n d large teams t o process massive a m o u n t s o f i n f o r m a t i o n f r o m large numbers o f respondents. T h i s h a d p r e v i o u s l y l o c k e d surveying i n t o fields w h e r e c h a r i table o r state s u p p o r t was f o r t h c o m i n g : s a m p l i n g procedures helped t o open the d o o r f o r small groups a n d i n d i v i d u a l s t o use the survey as a t o o l .
M u l t i v a r i a t e analysis T h e final core c o m p o n e n t o f the m o d e r n survey a n d the m o s t recent t o be integrated w i t h the others is m u l t i v a r i a t e analysis; t h a t is, statistical p r o cedures f o r analysing the relations between sets o f variables whose values are v a r y i n g simultaneously. Adequate descriptive statistics dealing w i t h n u m e r i c a l observations have l o n g been available. H o w e v e r , a key task i n surveys w i t h e x p l a n a t o r y goals is t o u n r a v e l causal processes after they have operated i n the real w o r l d s o f respondents a n d the cultures a n d social structures i n w h i c h they are located. M u l t i v a r i a t e techniques can i n t r o d u c e statistical c o n t r o l s t h a t eliminate c o m p l i c a t i n g variables a n d enable answers t o ' w h a t i f ? ' questions t o be f o r m u l a t e d . These techniques are c r u c i a l f o r u n r a v e l l i n g c o m p l e x issues where m a n y factors are i n p l a y - h o w d i f f e r e n t
28
Surveying the social world
social classes pass o n d i f f e r e n t i a l advantages t o their o f f s p r i n g over generations, the precise extent t o w h i c h academic achievement is the p r o d u c t o f h o m e b a c k g r o u n d , i n d i v i d u a l a b i l i t y a n d school characteristics. T h e w i d e s p r e a d use o f m u l t i v a r i a t e statistics f o r the analysis o f survey data o n l y developed after the Second W o r l d War. O n e o f the factors t h a t arrested its d i f f u s i o n was the t i m e a n d d r u d g e r y t a k e n u p b y the elaborate statistical calculations i n v o l v e d . T h e advent o f first m a i n f r a m e t h e n later desktop computers r u n n i n g s o f t w a r e applications specifically designed f o r social surveys represented m a j o r advances. Investigators n o w possess an unprecedented capacity t o e x p l o r e survey data t h o r o u g h l y using e x p l o r a t o r y a n d analytic statistical t o o l s . L i s t i n g these f o u r c o m p o n e n t s m a y present an idealized p i c t u r e o f the m o d e r n social survey. N o t every instance employs systematic s a m p l i n g , w h i l e the objectives i n some c o n t e m p o r a r y descriptive surveys render m u l t i variate analysis superfluous. T h e p o i n t , however, is t h a t the c o m p o n e n t s are available f o r use i n those research situations able t o e x p l o i t t h e m , a n d their j o i n t use creates a t o o l o f social i n q u i r y o f exceptional p o t e n t i a l . H o w e v e r , effective synthesis does n o t take place b y itself. Each c o m p o n e n t has a slightly different logic a n d requirements a l l o f w h i c h require h a r m o n i z a t i o n . T h e task i n designing surveys is t o m a k e these f o u r c o m p o n e n t s fit together as seamlessly as possible.
The survey as a research strategy T h i s section considers some o f the m a i n characteristics o f the survey as a research strategy, t a k i n g account b o t h o f its strengths a n d its l i m i t a t i o n s . As w e l l as o f f e r i n g guidance o n the choice between strategies, this discussion aims t o help the reader assess h o w a survey c o u l d be l i n k e d t o a n d c o o r d i nated w i t h o t h e r types o f research strategy so as t o c i r c u m v e n t the l i m i tations o f each. Extensive/intensive Surveys are the p r i m e example o f an extensive research technique i n the social sciences, one capable o f gathering comparable i n f o r m a t i o n f r o m respondents across a w i d e range o f different social g r o u p s . O n e frequently-used tactic is t o e m p l o y a survey i n the first phase o f a project t o establish w h a t the general outlines o f the researchable p r o b l e m are a n d t h e n t o use the data collected t o design a m o r e intensive second phase using case studies or other intensive approaches. For example, a h y p o t h e t i c a l investigation i n t o h o m e w o r k i n g m i g h t use a survey t o m a p the sectors o f business a n d i n d u s t r y i n w h i c h i t was most prevalent a n d t o establish w h a t types o f employee were w o r k i n g f r o m h o m e w i t h w h a t general levels o f success a n d satisfaction. A second stage
Theory into practice
29
c o u l d then a d o p t a n a r r o w e r b u t m o r e detailed focus. O n the basis o f the results o f phase one, i t w o u l d p r o b a b l y be possible t o locate organizations t h a t use h o m e - w o r k i n g intensively a n d also some t h a t have t r i e d a n d abandoned i t , a n d t o gather m o r e i n f o r m a t i o n a b o u t the arrangements i n these contrasting instances. A l t e r n a t i v e l y , i t m i g h t be possible t o set u p a p r o gramme o f i n - d e p t h interviews w i t h i n d i v i d u a l s w h o have h o m e w o r k i n g experience t h a t w o u l d e x p a n d the data available o n matters such as domestic management problems a n d c o m m u n i c a t i o n w i t h w o r k colleagues. T h i s is merely an i l l u s t r a t i o n o f one m a j o r strength o f the survey: i t is also entirely possible, p r o v i d e d the t o p i c a n d setting are suited, t o design a survey t h a t is a i m e d at a relatively small g r o u p o f respondents a n d w h i c h collects detailed data f r o m t h e m i n a single c o l l e c t i o n o p e r a t i o n .
Naturalness/artificiality/intrusiveness Surveys stand i n an intermediate p o s i t i o n between a h i g h l y naturalistic strategy such as p a r t i c i p a n t observation a n d the clearly a r t i f i c i a l l a b o r a t o r y experiment. A w e l l - c o n d u c t e d i n t e r v i e w has some o f the character a n d f a m i l i a r i t y o f a n o r m a l conversation (and i t m a y take place i n the respondent's h o m e , w o r k p l a c e o r other f a m i l i a r setting) b u t i t is nevertheless a c o n versation w i t h an i n t e r v i e w e r w h o is n o r m a l l y a stranger. A l t h o u g h there are advantages speaking t o respondents o n their h o m e ' g r o u n d ' , interviewers are usually entering an e n v i r o n m e n t t h a t has, t o a greater o r lesser extent, been prepared t o receive t h e m , so the s i t u a t i o n cannot be regarded as entirely authentic. Indeed, i f there is l i m i t e d space a n d privacy, an entirely ' n o r m a l ' a n d authentic s i t u a t i o n can u n d e r m i n e the p o s s i b i l i t y o f c o n d u c t i n g any k i n d o f i n t e r v i e w because o f i n t e r r u p t i o n s f r o m other f a m i l y m e m bers, colleagues o r the telephone. Street i n t e r v i e w i n g is self-evidently h i g h l y intrusive a n d refusals are c o m m o n . Self-completion instruments such as email o r postal questionnaires are l o w o n n a t u r a l i s m b u t they have the balancing advantage o f a l l o w i n g the respondent t o select the t i m e f o r their c o m p l e t i o n , an o p t i o n t h a t significantly reduces t h e i r intrusiveness.
Qualitative/quantitative Surveys are o f t e n characterized as a p r e - e m i n e n t l y q u a n t i t a t i v e research strategy b u t this is a m i s p e r c e p t i o n . A p r i m e advantage o f surveys is precisely t h a t they a l l o w the simultaneous c o l l e c t i o n o f b o t h types o f data. Open-ended questions are n o t s i m p l y devices t o deal w i t h the cases n o t covered by the closed categories offered i n a previous question. The m a t e r i a l they elicit can open u p i m p o r t a n t insights i n t o respondent m o t i v a t i o n a n d perceptions. There are, o f course, l i m i t s o n the extent t o w h i c h the respondents t o postal questionnaires can be expected t o w r i t e lengthy essays o n
30
Surveying the social world
their views or preferences, a n d these are situations i n w h i c h personal interviews m a y be preferable. I n general, the capacity o f surveys t o deliver m u t u ally s u p p o r t i n g q u a l i t a t i v e a n d q u a n t i t a t i v e data s h o u l d n o t be neglected.
Causal inference I n some t e x t b o o k s , the social survey is c o m p a r e d u n f a v o u r a b l y w i t h the experiment a n d is p o r t r a y e d as a p o o r a n d l o g i c a l l y deficient r e l a t i o n . As n o t e d o n page 8, this is largely because the classical l a b o r a t o r y experimenter has the advantage o f being able t o m a n i p u l a t e the key independent variables i n 'real t i m e ' . I n a d d i t i o n , c o n f o u n d i n g factors can be c o n t r o l l e d t h r o u g h the l a b o r a t o r y i s o l a t i o n o f the subjects a n d their r a n d o m a l l o c a t i o n t o the e x p e r i m e n t a l a n d c o n t r o l groups. I n terms o f m a k i n g causal inferences f r o m data, the l a b o r a t o r y e x p e r i m e n t appears t o have a m a j o r advantage over the survey w h i c h , as w e have seen, has t o reconstruct n a t u r a l l y - o c c u r r i n g causal processes after they have t a k e n place (ex post facto) t h r o u g h statistical m a n i p u l a t i o n o f the data. One o f the several p r o b l e m s this creates is a m b i g u i t y a b o u t the precise sequence o f changes a n d thus p o t e n t i a l u n c e r t a i n t y over w h e t h e r the data implies causation or o n l y c o - v a r i a t i o n . There are t w o p o i n t s t o be made here. T h e first is t h a t a p r o p e r l y designed survey s h o u l d be able t o reconstruct causal relationships, b u t i t requires careful design a n d it necessitates a sample large e n o u g h t o p e r m i t the use o f sufficiently sophisticated statistical t o o l s . Second, the analysis above neglects the role o f t h e o r y : w i t h o u t an adequate theoretical f r a m e w o r k i n play, neither the experimenter n o r the surveyor is i n a p o s i t i o n t o i d e n t i f y w h i c h are the salient variables t o include i n the design or t o m a k e a p p r o p r i a t e inferences a b o u t any patterns i n the observed data.
F l e x i b i l i t y /r i g i d i t y A f r e q u e n t l y o v e r l o o k e d difference between research strategies is the d i f f e r ent degrees o f flexibility they p e r m i t the researcher. Some essentially q u a l i tative research strategies a l l o w a p r e l i m i n a r y analysis o f the first w a v e o f data so t h a t the o u t c o m e can be used t o determine the venues a n d the topics t o be pursued i n the second w a v e , a n d so o n . T h i s a l t e r n a t i o n between data c o l l e c t i o n a n d analysis is especially useful i n p r e l i m i n a r y research w h e r e there are m a n y p a r a l l e l avenues t h a t c o u l d be e x p l o r e d , i n the l i g h t o f w h i c h the researcher w a n t s as m a n y o p t i o n s left open as possible. Surveys d o n o t lend themselves t o such a r o l l i n g strategy. T h e y m a y be t h o u g h t o f as ' f r o n t l o a d e d ' i n the sense t h a t a series o f m a j o r i n t e r l o c k i n g decisions c o v e r i n g a l l the m a i n components m e n t i o n e d under the previous key heading need t o be made before any data c o l l e c t i o n can begin. Once testing a n d p i l o t i n g have been c o m p l e t e d , the s t a n d a r d i z i n g logic o f surveys p r o h i b i t s changes t o the d e f i n i t i o n o f the target p o p u l a t i o n , the sample design or the contents o f the
Theory into practice
31
questionnaire/schedule. I t is n o t possible t o change tack i f early responses d o n o t live u p t o expectations. T h i s r i g i d i t y reinforces the emphasis t h a t s h o u l d be placed o n t h o r o u g h p r e p a r a t i o n a n d pre-testing.
Types of survey design There are three basic designs f o r surveys t h a t reflect the m a i n directions o f c o m p a r i s o n t h a t w i l l be made at the data analysis stage. Cross-classificatory
(cross-sectional)
I n some senses, this is the f u n d a m e n t a l survey design. There is a single stage o f data c o l l e c t i o n (sometime referred t o as 'single shot') a n d the u n i t o f analysis is a case w i t h a l l o f its characteristics (variables). A l t h o u g h a case is f r e q u e n t l y equivalent t o a respondent, this is n o t inevitable (the case m i g h t actually be a h o u s e h o l d a n d the respondent s i m p l y a member p r o v i d i n g i n f o r m a t i o n a b o u t its e x p e n d i t u r e patterns o r leisure activities). T h e m a i n focus is the c o m p a r i s o n o f aggregate groups o f cases characterized b y d i f f e r ent values o n k e y variables rather t h a n the p r o f i l e o f characteristics possessed b y a n y p a r t i c u l a r case. T h e objective is t o see i f groups o f cases have c o - v a r y i n g values o n other, dependent, variables. T h e Travel Survey is an example o f a cross-sectional survey. T h e a i m is t o see w h a t characteristics go w i t h the choice o f d i f f e r e n t modes o f t r a v e l f o r the j o u r n e y t o w o r k a n d w h a t attitudes are associated w i t h , f o r example, car use as against the use o f bicycles o r p u b l i c t r a n s p o r t . T h e analysis o f stand-alone cross-classificatory surveys revolves a r o u n d the c o n s t r u c t i o n a n d c o m p a r i s o n o f such subgroups. Part o f this design's strength lies i n the w a y a n analyst c a n c h o p u p a sample i n t o m a n y quite d i f f e r e n t sub-groups t o e x p l o r e the separate dimensions o f the research t o p i c .
L o n g i t u d i n a l a n d panel studies I n a l o n g i t u d i n a l survey, data c o l l e c t i o n takes place repeatedly i n order t o m o n i t o r the o p e r a t i o n o f social processes over t i m e (the data generated are k n o w n as t i m e series). I n the special case o f a panel study, the same respondents are i n v o l v e d at each stage ( a l l o w i n g f o r d r o p - o u t s ) . T h e B r i t i s h H o u s e h o l d Panel Survey (see B o x 2.1) is a large-scale example o f a panel study. T h e presence o f t i m e as a n e x p l i c i t d i m e n s i o n i n these research designs makes certain k i n d s o f causal inference m u c h easier. There are v a r i o u s statistical techniques t a i l o r e d t o the requirements o f l o n g i t u d i n a l studies: these include those based o n A R I M A (AutoRegressive Integrated M o v i n g Average, also k n o w n as Box-Jenkins models) a n d a c t u a r i a l techniques dealing w i t h the d i f f e r e n t i a l s u r v i v a l o f cases i n a p o p u l a t i o n .
32
Surveying the social world
Hierarchical I n this m o r e c o m p l e x design, the m a i n line o f c o m p a r i s o n is between the characteristics o f a case a n d the characteristics o f a c o l l e c t i v i t y i n w h i c h the case is a u n i t or member. O n e o f the p r i n c i p a l research aims i n hierarchical designs is t o trace the influence o f the c o l l e c t i v i t y o n its members. A n illust r a t i o n o f the a p p l i c a t i o n o f hierarchical analysis w o u l d be a study o f the r e c i d i v i s m - relapsing i n t o crime - o f the ex-inmates o f a p r i s o n . The research question m i g h t be w h e t h e r r e c i d i v i s m was closely connected t o the characteristics o f the p r i s o n such as its physical l o c a t i o n , staff characteristics a n d the average l e n g t h o f sentences, a n d w h e t h e r any o f these factors interacted w i t h features o f the personal b i o g r a p h y o f the i n d i v i d u a l inmates (such as the n u m b e r a n d type o f offences they h a d p r e v i o u s l y c o m m i t t e d ) so as t o increase the chances o f f u r t h e r o f f e n d i n g . N o t i c e t h a t there are t w o l o g i c a l l y different k i n d s o f variable i n v o l v e d i n this design: prisons have some p r o p erties as i n s t i t u t i o n s t h a t c a n n o t be i n f e r r e d f r o m the aggregation o f the characteristics o f i n d i v i d u a l inmates (and vice versa). M u l t i l e v e l statistical models are available t o facilitate the analysis o f this k i n d o f t w o - l e v e l data. As w e l l as s h o w i n g t h a t the n o m i n a t e d independent variables c o - v a r y w i t h the n o m i n a t e d dependent variables, a l l three types o f design m u s t include the capacity t o detect the o p e r a t i o n o f r i v a l independent variables t h a t c o u l d lead t o m i s i n t e r p r e t a t i o n o f the findings. T h e i d e n t i t y o f these r i v a l variables m a y be self-evident, or alternatively the m a i n candidates w i l l have been p r o p o s e d i n the research l i t e r a t u r e . O n e example o f a p o t e n t i a l l y c o n f o u n d i n g variable is d i f f e r e n t i a l exposure t o atmospheric p o l l u t a n t s as an alternative cause o f cancer i n the study discussed i n B o x 1.4. T h i s factor was c o n t r o l l e d i n this study by the a b i l i t y o f the investigators t o demonstrate t h a t there was a higher cancer prevalence f o r s m o k i n g doctors t h a n n o n s m o k i n g doctors across b o t h r u r a l a n d u r b a n locations, w h e r e atmospheric p o l l u t a n t s w o u l d be absent a n d present respectively. A f u r t h e r example is the need t o c o n t r o l f o r i n d i v i d u a l a b i l i t y w h i l e s t u d y i n g the i m p a c t o f class size a n d teaching styles o n p u p i l academic achievement (see Bennett et a l . 1 9 7 6 ) . C o n f o u n d i n g variables need t o be measured a n d statistically c o n t r o l l e d , or alternatively be excluded entirely f r o m a design (for e x a m p l e , b y defining the target p o p u l a t i o n n a r r o w l y ) .
Relations between theory and research T h e n a t u r a l w a y t o l a u n c h this t o p i c is t o define ' t h e o r y ' , b u t this m u s t be done i n a p r e l i m i n a r y f a s h i o n so as t o a v o i d discussions t h a t w o u l d go b e y o n d the scope o f this b o o k . • There is a w i d e measure o f agreement t h r o u g h o u t the social a n d n a t u r a l sciences t h a t theories are the m o s t i m p o r t a n t a n d the m o s t intellectually
Theory into practice
Box 2.1
33
Panel studies
Some of the largest scale surveys use a panel design. O n e of the most important in the U K is the British Household Panel Survey (BHPS) administered by the Institute of Social and Economic Research at the University of Essex. In surveys of this type, a designated group of households are monitored over periods as long as a decade w i t h repeated waves of data collection. This enables their response t o general shifts that have occurred in the economic and social environment t o be examined. Such research also provides insights into the manner in which the impact of macro factors (for instance, changes in the labour market opportunities for women) depends on micro factors such as the age and generational composition of a household. Some studies attempt t o interview ail the adult members in the household, while others use a key informant t o supply information on behalf of themselves and the others. It is common f o r a core set of basic questions t o be used w i t h every household, complemented by a selection f r o m additional sets put only t o households of a particular type (low income, single parent, gross income above a threshold). In some studies, the second generation households set up by the children f r o m first generation families are tracked and incorporated into the research as they are f o r m e d . Such activities require large teams and substantial budgets. For f u r t h e r information on the BHPS, see http://www.iser.essex.ac.uk/ bhps/doc/index.htm
r i g o r o u s means o f p r o d u c i n g explanations o f phenomena (and the m o s t satisfactory basis f o r p r e d i c t i o n s ) . Observations t h a t lack a theoretical u n d e r p i n n i n g c a n n o t p r o v i d e a basis f o r e x p l a n a t i o n or p r e d i c t i o n . • There is n o consensus o n w h a t the precise technical specifications f o r a t h e o r y s h o u l d be a n d o n l y p a r t i a l agreement over w h a t constitutes an adequate e x p l a n a t i o n . • Every systematic discipline possesses a c h a n g i n g set o f concepts t h a t organizes k n o w l e d g e w i t h i n t h a t field a n d identifies the entities i n the w o r l d w i t h w h i c h i n q u i r y is concerned. A t least some o f these concepts have an abstract a n d idealized character - t h e i r existence c a n n o t be directly substantiated b u t m u s t be i n f e r r e d f r o m their effects o n w h a t is observable. Some social science examples, m o r e or less at r a n d o m , include perfect competition, the self, the Schumpeterian workfare state, governmentality. • There is an o n - g o i n g debate between t w o m a i n p h i l o s o p h i c a l camps over the status o f theoretical entities: realists believe they represent mechanisms a n d processes t h a t d o exist i n the w o r l d , instrumentalists see t h e m as
34
Surveying the social world
s i m p l i f y i n g devices, h e l p f u l t o make sense o f research data b u t n o t t o be credited w i t h an independent existence (Chalmers 1 9 9 9 ) . • A t h e o r y is a linguistic c o n s t r u c t i o n t h a t , c r i t i c a l l y a m o n g a range o f funct i o n s , states the existence o f a general l a w - l i k e r e l a t i o n s h i p between t w o or m o r e abstract concepts. Theories are f r e q u e n t l y developed discursively by their authors so t h a t the presentation is m i x e d u p w i t h extraneous illust r a t i o n , c o m m e n t a n d c r i t i c i s m : a schematic r e c o n s t r u c t i o n o f a t h e o r y strips i t d o w n t o the essential p r o p o s i t i o n s a n d i t is these p r o p o s i t i o n s t h a t c o u l d be i n c o r p o r a t e d i n t o a research design o r m i g h t be i n v o k e d t o interpret research f i n d i n g s . • The o r t h o d o x perspective o n e x p l a n a t i o n (the covering-law model) p o r trays i t as h a v i n g the f o r m of a deductive argument, that is, one i n w h i c h the t r u t h of the premises guarantees the t r u t h of the conclusion. The p h e n o m enon t o be explained is described w i t h i n a statement that forms the conclusion of the argument: at least one o f the premises has t o be a statement f o r m u l a t i n g a l a w - l i k e generalization b o r r o w e d f r o m a theory (for example, 'revolutions occur d u r i n g periods of rising mass expectations', 'suicide rates are positively associated w i t h the degree o f i n d i v i d u a l i s m i n society'). The other premise states a list o f ' i n i t i a l c o n d i t i o n s ' or l i m i t i n g states o f affairs that have t o be satisfied f o r the theoretical generalization t o h o l d . • A l l e m p i r i c a l research rests o n some theoretical assumptions a b o u t w h a t entities exist a n d are capable o f being investigated. I n some cases, the assumptions are made e x p l i c i t a n d the perspectives a c k n o w l e d g e d , i n others they are i m p l i c i t a n d need t o be teased o u t . For example, even a t h o r o u g h l y descriptive i n q u i r y l i k e the Travel Survey rests o n a variety o f theoretical premises. By asking a b o u t h o m e addresses, n u m b e r o f cars i n the h o u s e h o l d a n d w o r k patterns, the assumption is being made t h a t the selection o f a m o d e o f c o m m u t i n g is an essentially r a t i o n a l choice based largely o n i n d i v i d u a l assessments o f t i m e , cost a n d convenience. H o w e v e r , choice of m o d e o f t r a n s p o r t c o u l d a r g u a b l y rest o n quite d i f f e r e n t a n d less calculative considerations. A preference f o r car use m i g h t reflect, at least i n p a r t , a desire f o r p r i v a c y a n d the feelings o f i n v u l n e r a b i l i t y t h a t some i n d i v i d u a l s associate w i t h m o t o r i n g . For others, car use is l i k e l y t o carry strongly negative c o n n o t a t i o n s because i t is perceived as e n v i r o n m e n t a l l y destructive. By o m i t t i n g questions t h a t c o u l d t a p these considerations, the Travel Survey is effectively i n c o r p o r a t i n g one theoretical perspective i n preference t o others. I n summary, some k i n d o f theoretical t h i n k i n g is always an input t o any e m p i r i c a l research. • A t the same t i m e , research m a y have a theoretical output. T h e analysis o f survey data, as Chapter 8 w i l l argue, is never s i m p l y a matter o f i d e n t i f y i n g statistically significant associations. T h e substantive a n d theoretical significance o f such associations has t o be established i n the l i g h t o f the theoretical perspectives b u i l t i n t o the research design, or those available b e y o n d i t i n the discipline or disciplines p a r e n t i n g the research.
Theory into practice
35
• T e x t b o o k s o n research methods t e n d t o h i g h l i g h t the special a n d quite rare instances i n w h i c h a research project revolves a r o u n d the testing o f specific theoretical hypotheses. I t is m o r e realistic, however, t o acknowledge t h a t a great deal o f e m p i r i c a l research is eclectic, d r a w i n g o n whatever bodies o f theoretical t h i n k i n g seem relevant. Rather t h a n setting o u t t o test a theory, most survey research is either e x p l o r a t o r y or developmental i n t h a t i t seeks tentatively t o establish, o r modestly advance, theoretical t h i n k i n g o n some t o p i c . Such advance can be achieved i n a variety o f w a y s : one is by elabora t i n g key concepts so as t o a l l o w a m o r e refined a p p l i c a t i o n o f a crudely articulated t h e o r y : another is t o develop the measurement apparatus associated w i t h a n existing theory a n d thereby extend its scope i n t o n e w areas o f a p p l i c a t i o n . I t is true t h a t a great deal o f research has exclusively descriptive goals, t h o u g h even here secondary analysis can d r a w o u t e x p l a n a t o r y possibilities unrecognized by the o r i g i n a l investigators.
Incorporating a theoretical dimension into surveys I n the l i g h t o f the discussion i n the previous section, i t s h o u l d be apparent t h a t creating a survey t h a t incorporates theoretical t h i n k i n g is an exercise t h a t demands some creative t h o u g h t . H o w e v e r , a checklist always helps! 1 W i l l the designated target p o p u l a t i o n have the a p p r o p r i a t e attributes f o r e x p l o r i n g o r testing the theoretical perspective(s) o f interest? (see Chapter 4 , page 63) 2 W i l l the chosen survey design p e r m i t the relevant logical comparisons t o be made t h a t e x p l o r i n g or testing the theories requires? W i l l the t e m p o r a l order o f changes i n the values o f variables be clear? (see the previous section) 3 W i l l the questions posed enable the d e r i v a t i o n o f a l l the variables t h a t are central t o a theory? W i l l the key theoretical concepts be adequately operationalized? 4 Even t h o u g h they p r o b a b l y d o n o t belong t o the theoretical perspectives o n w h i c h a survey is based, w i l l there be data available o n p o t e n t i a l l y c o n f o u n d i n g variables? (see the previous section) M o s t o f the items o n this checklist are dealt w i t h elsewhere i n the b o o k so the rest of this section concentrates o n the issues raised i n p o i n t 3 a b o u t h o w to operationalize theoretical concepts w i t h i n surveys. T h e task is t o find adequate i n d i c a t o r s a n d measurable effects f o r theoretical mechanisms a n d p r o cesses w h i c h are n o t themselves d i r e c t l y observable or measurable. A leading A m e r i c a n i n n o v a t o r i n social science research m e t h o d s , Paul Lazarsfeld, developed a strategy f o r b r e a k i n g general theoretical constructs d o w n i n t o their measurable dimensions t h a t has been i n f l u e n t i a l (Lazarsfeld 1 9 5 8 ) , b u t this is a challenging task even f o r the experienced investigator.
36
Surveying the social world There are three m a i n types o f m e a s u r i n g device t h a t are used r e g u l a r l y i n
social surveys: derived measures; ready-made i n d i c a t o r s ; a n d psychometric, educational a n d other tests a n d scales.
D e r i v e d measures These are simple i n d i c a t o r s devised b y investigators themselves a n d b u i l t f r o m the responses t o a series o f questions posed i n the i n t e r v i e w or quest i o n n a i r e . A derived variable w i l l usually be constructed i n the first stages o f data analysis b y some f o r m o f m a t h e m a t i c a l s u m m a t i o n f r o m several other variables c o n t a i n e d i n the c o d e b o o k (see Chapter 7, pages 1 2 8 - 9 ) . D e m o graphic characteristics such as age, sex a n d f a m i l y a n d h o u s e h o l d size are exceptional i n t h a t relatively f e w other variables o f theoretical consequence lend themselves t o measurement via the responses t o a single direct question. B o x 2.2 provides an example o f derived variables used i n the Travel Survey.
Ready-made i n d i c a t o r s There is a vast range o f social, economic, h e a l t h , e d u c a t i o n a l , social psychological a n d o t h e r types o f i n d i c a t o r t h a t have been used i n social science research. Some have established positions as s t a n d a r d measures: a l t h o u g h n o t necessarily flawless, their w i d e s p r e a d use i n the past guarantees f u r t h e r use i n the f u t u r e as n e w projects seek direct c o m p a r a b i l i t y w i t h older ones. T h e f o l l o w i n g examples are chosen m o r e or less at r a n d o m : • T h e p r o p o r t i o n o f p u p i l s i n a school eligible f o r free school dinners is w i d e l y used i n B r i t a i n as a r o u g h a n d ready c o m p a r a t i v e measure o f the level o f social d e p r i v a t i o n i n the school's catchment area. • T h e p r o p o r t i o n o f households n o t o w n i n g a car, available f r o m B r i t i s h Census data, is s i m i l a r l y used as a simple i n d i c a t o r o f the socio-economic character o f u r b a n n e i g h b o u r h o o d s . • T h e R e t a i l Prices I n d e x (RPI) is a key measure o f i n f l a t i o n i n the U K econo m y as i t affects consumers. • Deaths per m i l l i o n passenger miles t r a v e l l e d is used i n studies o f accident risk t h a t w i s h t o c o m p a r e different modes o f t r a n s p o r t (planes versus cars) or different e n v i r o n m e n t s ( m o t o r w a y s versus other t r u n k roads). • B o x 2.3 examines the use o f social class as a n i n d i c a t o r .
Psychometric, e d u c a t i o n a l a n d other tests a n d scales There is a huge v a r i e t y o f p e n c i l a n d paper tests dealing w i t h p e rs o n a l i t y characteristics, social attitudes, social psychological factors related t o groups a n d t e a m memberships, a n d other topics t h a t c o u l d p o t e n t i a l l y be i n c l u d e d i n i n t e r v i e w schedules o r questionnaires. Selecting a single t o p i c
Theory into practice
37
f r o m the m a n y possibilities, there are several measures o f anxiety, depression a n d suicidal i d e a t i o n available i n c l u d i n g the Hospital Anxiety and Depression questionnaire ( H A D ) ( Z i g m o n d a n d Snaith 1983) a n d the Beck Anxiety Inventory (Beck et al. 1 9 8 8 ) . T h e i n c l u s i o n o f such tests i n research m a y pose a v a r i e t y o f difficulties. W i t h the i l l u s t r a t i o n s c i t e d , there c o u l d be ethical p r o b l e m s i n asking respondents w h e t h e r they feel depressed o r suic i d a l i f there was any p o s s i b i l i t y t h a t this c o u l d actually trigger such feelings. There are also p r a c t i c a l considerations - w i l l c o m p l e t i n g a lengthy test be b o r i n g o r take u p t o o m u c h o f the respondent's time? W i l l a fixed test p r o cedure fit i n w i t h the rest o f the i n t e r v i e w o r questionnaire? Finally, there are the issues o f the adequacy o f the measures themselves t h a t are discussed i n the n e x t section.
Box 2.2
Derived variables in the Travel
Survey
Respondents indicated a large variety of different combinations of main (question 3) and alternative (question 4) modes of commuting, partly because question 4 allowed multiple responses. There was a need t o classify commuters into a small number of commuting groups for the analysis. This was done partly on the basis of whether and how the car (including m o t o r bikes) featured in an individual's commuting pattern. Eight derived variables were created, each representing a mode of commuting that reflected a distinctive combination of responses t o the t w o questions: • Group I Exclusive car users: respondents w h o ticked one f r o m boxes 5, 6, 7 and 8 in response t o question 3 and also either I o r any of 6, 7, 8 o r 9 for question 4. • Group 2 Car users with some public transport: ticked one f r o m 5, 6, 7 and 8 for question 3 and also 3 and/or 4 f r o m question 4. • Group 3 Exclusive users of foot or pedal bike: ticked I o r 2 for question 3 and also I o r 2 and/or 3 for question 4. • Group 4 Exclusive users of public transport: ticked 3 o r 4 for question 3 and also I o r 4 and/or 5 for question 4. • Group 5 Car users with some foot or pedal bike: ticked one f r o m 5, 6, 7 and 8 for question 3 and also 2 and/or 3 for question 4. • Group 6 Foot or pedal bike with some public transport: ticked I o r 2 for question 3 and also ticked 4 and/or 5 in question 4. • Group 7 Public transport with some foot or pedal bike: ticked 3 o r 4 for question 3 and also 2 and/or 3 for question 4. • Group 8 Foot or pedal bike with some car use: ticked I o r 2 for question 3 and also ticked any of 6, 7, 8 o r 9 for question 4. Only a handful of cases did not f i t into any of these eight groups. N o case could belong t o more than one.
38
Surveying the social world
The resulting groups did reveal interesting associations w i t h other variables including, fairly obviously, the availability of cars t o a household (question 22). Although the Travel Survey did not gather respondents' views about the environment, this might be an additional direction in which associations could be found. This is a very simple example of how, generally, classifications based on derived variables can play a role in bridging the gap between the response t o a specific question and constructs which are closer t o the realm of theory.
Box 2.3
Measuring social class
Social class is a key concept in both classical and contemporary social theory. There is a voluminous debate, on the one hand over how the abstract conception should be formulated, and on the other about what empirical indicators are appropriate. Since 1911, the Registrar General's classification of what were originally called 'social grades', based on industrial group, occupation and level of skill, has been used in the United Kingdom as one of the principal empirical indicators of social class, particularly in officially-sponsored research (Rose and O'Reilly 1997: I ) . This classification was originally devised by a medical statistician t o examine differentials in mortality and fertility rates. It was renamed 'social class based on occupation' (SC) in 1990 by which point it had been used in innumerable research studies. The categories are as follows (OPCS 1991: 12) I II III
Professional occupations Managerial and technical (formerly Intermediate') Skilled occupations ( N ) Non-manual (M) Manual
IV V
Partly-skilled occupations Unskilled occupations
Over time, SC became increasingly incongruent w i t h the prevailing theoretical approaches t o social class and was also criticized for lacking reliability and validity (see page 39). A variety of alternatives, such as the twenty category classification by socio-economic group (SEG - OPCS 1991: 13-14), the Goldthorpe classification based on employment relations (Rose and O'Reilly 1997:40-8), the Institute of Practitioners in Advertising's Social Grade Scheme (A, B, C I , etc), widely used in market research, and Erik Olin Wright's schema based on Marxian class t h e o r y ( W r i g h t 1985), have been devised and applied in empirical research.
Theory into practice
39
In 1994, the Office of National Statistics (ONS), the U K government agency responsible f o r SC and SEG, commissioned a review of existing class classifications w i t h the intention of producing a revised scheme. The 'collapsed', eight category, interim version of the revised socio-economic classification (SEC) which resulted, based largely on the Goldthorpe approach, is set o u t below: 1 2 3 4 5
Higher professionals/senior managers Associate professionals/junior managers O t h e r administrative and clerical workers O w n account non-professional Supervisors, technicians and related workers
6 Intermediate workers 7 O t h e r workers 8 Never worked/other inactive Unlike SC, SEC has been subjected t o extensive testing t o establish its validity using data collected by O N S f r o m its Omnibus Survey and the Labour Force Survey. SEC also has much more explicit links w i t h t h e o r y than the SC. A version was used in the 2001 Census in Britain. In order t o use the SEC in a survey, questions will be needed that elicit the following three characteristics f r o m a respondent: • occupation • size of employing establishment (if any) • employment status (employer, employee, self-employed, not active)
Reliability and validity U s i n g pre-developed indicators a n d tests gives p r o m i n e n c e t o the issues o f r e l i a b i l i t y a n d v a l i d i t y . R e l i a b i l i t y is a measure o f the extent t o w h i c h the results o f an i n d i c a t o r or test are consistent over t i m e . T h i s consistency can itself be measured i n the f o r m o f a statistical coefficient o f r e p r o d u c i b i l i t y , o f t e n Cronbach's alpha, w h i c h is s i m i l a r t o a c o r r e l a t i o n coefficient (see page 152). There are several d i f f e r e n t comparisons t h a t can be made t o examine r e l i a b i l i t y : • Test-retest:
respondents
complete
the same i n s t r u m e n t o n d i f f e r e n t
occasions. • Internal
consistency:
i f a psychometric or other pencil a n d paper test con-
sists o f m a n y items t a p p i n g the same u n d e r l y i n g concept, s p l i t - h a l f m e t h ods can be used t o compare the consistency o f results between (say) o d d and even n u m b e r e d items.
40
Surveying the social world
• Inter-observer reliability: w i l l d i f f e r e n t interviewers using the same schedule p r o d u c e equivalent responses f r o m the interviewee? A rule o f t h u m b o f t e n q u o t e d is t h a t r e l i a b i l i t y coefficients s h o u l d be at least 0.7 t h o u g h , as w i t h m a n y rules o f t h u m b , i t is precise b u t a r b i t r a r y . V a l i d i t y raises the question o f w h e t h e r a measuring device is actually connected adequately t o the theoretical mechanism, process or construct i t was intended t o capture. D o , f o r example, h i g h scores o n the H A D questionnaire correlate w i t h cases o f clinically-definable depression? Once again there is a variety o f approaches t o j u s t i f y i n g a n i n s t r u m e n t : • Content validity: this is decided by a panel o f experts w h o r e v i e w w h e t h e r a measure does e v e r y t h i n g i t s h o u l d : i t is clearly a p r e t t y flimsy test a n d raises ' w h o validates the v a l i d a t o r ' questions! • Concurrent validity: this measures a construct's v a l i d i t y against an u n i m peachable s t a n d a r d , another f o r m o f measurement w h i c h has itself demonstrable v a l i d i t y b u t w h i c h m a y be c o m p l e x , expensive o r have other restrictions o n its use: such a s t a n d a r d is o b v i o u s l y n o t always available. • Predictive validity: can the measure successfully i d e n t i f y outcomes a n d consequences? D o respondents scoring h i g h l y o n H A D b u t w i t h o u t sympt o m s at the t i m e o f testing subsequently get diagnosed as c l i n i c a l l y depressed? Because m a n y factors m a y intervene after testing t o prevent or delay outcomes, predictive v a l i d i t y is o f t e n h a r d t o establish w i t h certainty. • Construct validity: this l o o k s back at the p e r f o r m a n c e o f a measure over t i m e , preferably c o v e r i n g a w i d e range o f studies, t o see i f i t has p r o d u c e d f r u i t f u l findings. T h u s , i t w o u l d be possible t o review the use o f (say) the SC measure o f social class t o see w h e t h e r i t w a s , a n d remains, a n effective means o f e x p l a i n i n g h e a l t h differentials, v o t i n g patterns a n d other phenomena w h i c h are theoretically l i n k e d t o class m e m b e r s h i p . I n fact, such a r e v i e w o f construct v a l i d i t y was c o n d u c t e d f o r the SC measure by an academic panel o n behalf o f the O N S . T h e i r c o n c l u s i o n was t h a t SC was n o t v a l i d a n d r e c o m m e n d e d its replacement by SEC (see B o x 2.3). One o f the significant advantages o f using established measuring i n s t r u ments is t h a t the b u r d e n o f establishing r e l i a b i l i t y a n d v a l i d i t y has already fallen o n another's shoulders.
Key s u m m a i The key comj:
Theory into practice
41
2 The use of standardized questionnaires and interview schedules designed with an emphasis o n respondents* understanding and convenience
i Further reading A n excellent guide to the evolution of social surveys and also a defence of their utility, which is also referred to i n the further reading for Chapter 1 , is Marsh (1982) The Survey Method. The first three chapters of Hughes (1976) Sociological Analysis: Methods of Discovery are a good introduction to the connections between theory and research. Hughes and Sharrock (1997) The Philosophy of Social Research, 3rd edition, is a guide to different styles of research and the kind of philosophical underpinnings each has. There are several excellent introductions to the philosophy of science including Chalmers (1999) What is This Thing Called Science?, 3rd edition. Miller's (1991) Handbook of Research Design and Social Measurement, 5th edition and M u r p h y et al. (1994) Tests in Print (IV) are among the guides to published tests and scales. L i t w i n (1995) How to Measure Survey Reliability and Validity provides a brief guide to the topic indicated by the title. (This latter volume is part of Fink, ed. (1995) The Survey Handbook, which contains small guides on most aspects of surveying.)
(V)
Planning your project
Reviewing your assets I n C h a p t e r 2 (page 3 0 ) , i t was suggested t h a t surveys were less flexible t h a n some other research strategies because the need f o r s t a n d a r d i z a t i o n leaves l i t t l e scope f o r m o d i f i c a t i o n s t o be i n t r o d u c e d i n m i d - c o u r s e . C a r e f u l advanced p l a n n i n g is therefore essential. A realistic i n i t i a l appraisal o f
Planning your project
43
the resources t h a t y o u can b r i n g t o bear t o c a r r y o u t the survey w i l l help t o c o n f i r m t h a t the o p e r a t i o n envisaged is viable. I t w i l l also p r o v i d e a basis f o r c o n s t r u c t i n g an o u t l i n e timetable t h a t covers the w h o l e o f the project.
B a c k g r o u n d i n f o r m a t i o n a n d l i b r a r y resources Even f o r small-scale c o m m u n i t y a n d in-house o r g a n i z a t i o n a l surveys, prel i m i n a r y research is desirable t o c o n f i r m t h a t the i n f o r m a t i o n being sought has n o t already been collected i n other exercises. T h i s w i l l mean a p p r o a c h ing personnel i n the a p p r o p r i a t e organizations a n d i t w i l l p r o b a b l y be necessary t o reveal y o u r i n t e n t i o n s , at least i n o u t l i n e , i n order t o secure their cooperation. Some l i b r a r y p r e p a r a t i o n is i m p o r t a n t f o r nearly a l l surveys: • t o establish w h a t i n f o r m a t i o n is already available a b o u t the proposed target p o p u l a t i o n ; • f o r surveys w i t h an e x p l i c i t theoretical d i m e n s i o n , t o investigate h o w other research has applied the concepts a n d perspectives i n w h i c h y o u are interested; • t o t r a c k d o w n comparable previous research, especially previous surveys, even t h o u g h they m a y have t a k e n a d i f f e r e n t tack; • t o check any indicators o r scales t h a t y o u propose t o employ. O n l y the largest public libraries i n m a j o r cities w i l l be able t o offer m u c h help w i t h some o f these. Smaller public libraries i n the U K should have Social Trends, an annual digest o f official statistics published by the Office f o r N a t i o n a l Statistics, and this is a very useful starting p o i n t i n the search f o r statistical i n f o r m a t i o n published by and o n behalf of government. The N a t i o n a l Statistics website is at http://www.statistics.gov.uk/. For US i n f o r m a t i o n , the f o l l o w i n g are g o o d starting points: http://www.whitehouse.gov/news/ fsbr.html f o r links t o statistical reports f r o m US Federal agencies; h t t p : //www.gla.ac.uk/Library/Depts/MOPS/, a guide t o the US Government statistics that are available o n the Internet; http://www.fedstats.gov/ f o r links t o over one h u n d r e d Federal agencies; http://www.census.gov/, US Bureau of the Census Site. For A u s t r a l i a , see http://www.abs.gov.au, A u s t r a l i a n Bureau of Statistics i n f o r m a t i o n service. For Canada, see http://www.statcan.ca, Statistics Canada site. Some o f the b a c k g r o u n d m a t e r i a l w i l l be i n academic journals t o w h i c h o n l y university a n d research institute libraries subscribe. I n t e r - l i b r a r y loans can be arranged b u t they can be s l o w a n d a cover charge w i l l be a p p l i e d f o r each i t e m . T h e findings a n d the i n s t r u m e n t a t i o n f o r m a n y surveys are never published i n an o r t h o d o x f a s h i o n f o r a variety o f reasons: the d o c u m e n t a t i o n f o r t h e m m a y have t o be o b t a i n e d by c o n t a c t i n g the i n s t i t u t i o n a l sponsors o f the research.
44
Surveying the social world The U n i v e r s i t y o f Essex at Colchester, U K , has several resources o f p o t e n -
t i a l value t o investigators p l a n n i n g surveys. • T h e D a t a A r c h i v e is a c o l l e c t i o n o f over 4 0 0 0 datasets f r o m surveys c o n ducted i n B r i t a i n a n d other countries, especially E u r o p e a n d N o r t h A m e r ica. These datasets are available t o academic researchers as c o m p u t e r database files. T h e U R L is: http://www.data-archive.ac.uk/ • Q u a l i d a t a is the E c o n o m i c a n d Social Science Research Council's archive o f data f r o m q u a l i t a t i v e research projects w h i c h is itself searchable o n line. T h e U R L is: http://www.essex.ac.uk/qualidata/ • T h e I n s t i t u t e f o r Social a n d E c o n o m i c Research (ISER) is responsible f o r r u n n i n g several large scale surveys a n d m u c h o f the m a t e r i a l related t o t h e m , i n c l u d i n g i n s t r u m e n t a t i o n a n d findings, is available o n - l i n e . T h e U R L is: http://www.iser.essex.ac.uk/
H u m a n resources Even i f y o u are c o n d u c t i n g a survey i n a solo capacity, there m a y be a d d i t i o n a l help y o u can call o n f o r some o f the labour-intensive tasks. Table 3 . 1 identifies suitable tasks t h a t can be allocated t o volunteers o r casuallyp a i d helpers. Some tasks like i n t e r v i e w i n g a n d c o d i n g s h o u l d be given o n l y t o suitably t r a i n e d a n d responsible assistants capable o f w o r k i n g independently. Even a n essentially r o u t i n e task like p r e p a r i n g a m a i l s h o t can require supervision t o deal w i t h any c o m p l i c a t i o n s . Some k i n d s o f e r r o r t h a t c o u l d occur at this early stage are n o t easily reversible a n d c o u l d jeopardize the entire survey. A l w a y s be cautious i n y o u r estimates o f the p r o d u c t i v i t y o f inexperienced volunteers o r p a i d helpers: they m a y face steep l e a r n i n g curves a n d they m a y n o t be as c o m m i t t e d as y o u t o q u a l i t y a n d sustained performance. A n o t h e r k i n d o f h u m a n resource is assistance f r o m experts. T w o k i n d s are m e n t i o n e d later i n the b o o k . T h e first is the 'insider', a member o r associate o f the g r o u p t h a t y o u are researching, o r a n employee w i t h i n a n organiza t i o n i n w h i c h y o u are c a r r y i n g o u t data c o l l e c t i o n . Insiders can p l a y an invaluable role p a r t i c u l a r l y i f n o one i n the research t e a m has first-hand k n o w l e d g e a b o u t the research groups o r the venue. T h i s k i n d o f associate can help y o u before research begins w i t h y o u r request f o r access a n d y o u r data c o l l e c t i o n i n s t r u m e n t s , a n d again after data c o l l e c t i o n w i t h the i n t e r p r e t a t i o n a n d presentation o f findings. T h e second k i n d o f expert is a statistician, o r anyone w i t h substantial experience o f research design a n d data h a n d l i n g , w h o m y o u m a y possibly need t o consult r e g a r d i n g u n u s u a l o r c o m p l e x sample design a n d analysis p r o b l e m s . One w a y t o m a k e the m o s t o f l i m i t e d h u m a n resources is t o b u y i n c o m m e r c i a l research services f o r parts o f a project. I n p r i n c i p l e , almost any elem e n t o f the research process can be purchased f r o m suitable agencies: i n
Planning your project Table 3.1
45
Delegable tasks i n surveying
Task
Skills
Checking databases, labels Stuffing and unstuffing mailshots Data entry Telephone follow-ups Transcription of taped interviews
Clerical, basic computer Manual/clerical Basic computer Basic communication skills Audio-typing
required
practice, costs are considerable a n d y o u r budget, i f i t exists, m a y n o t stretch ( a l t h o u g h i n q u i r i e s a b o u t a p p r o x i m a t e costs w i l l d o n o h a r m ) . T h e m o s t cost-effective element t o sub-contract m a y w e l l be d a t a - e n t r y : the keyboarders e m p l o y e d b y specialized 'data c a p t u r e ' agencies can transfer a l l the responses f r o m questionnaires o r i n t e r v i e w e r - c o m p l e t e d schedules i n t o a c o m p u t e r data file, o f f e r i n g the a d d i t i o n a l benefit o f v a l i d a t e d data e n t r y (see C h a p t e r 7). T h e charges are usually calculated f r o m the t o t a l n u m b e r o f key depressions plus a n i n i t i a l 'set-up' fee. For i n s t r u m e n t s w i t h o u t a large n u m b e r of open-ended responses, the costs can be very reasonable a n d the time-saving considerable.
F i n a n c i a l resources M a n y people w i t h o u t experience o f social research express shock a n d dismay w h e n they are given a q u o t a t i o n f o r c o n d u c t i n g w h a t they consider to be a small-scale research project. There is an e n d u r i n g belief t h a t social research ( u n l i k e ' r e a l ' scientific research) can be c o n d u c t e d o n a shoestring. U n f o r t u n a t e l y , the fear t h a t lay sponsors w i l l b a u l k at the costs sometimes leads first-time researchers, a n x i o u s f o r s u p p o r t f o r a cherished p r o j e c t , t o p r o d u c e ' o p t i m i s t i c ' estimates t h a t underestimate the t r u e costs a n d s i m p l y help t o perpetuate belief i n the shoestring. T h e list b e l o w gives some r o u g h estimates o f the costs, c u r r e n t at the t i m e o f p u b l i c a t i o n , o f selected elements of c o n d u c t i n g a survey. • I t w i l l cost at least £ 7 0 0 f o r an a u d i o - t y p i s t t o p r o d u c e transcripts f r o m t w e n t y 4 5 - m i n u t e taped personal i n t e r v i e w s . • I g n o r i n g the design process, t o d i s t r i b u t e 2 0 0 0 copies o f a t w o - s i d e d A 4 postal questionnaire together w i t h a one-sided A 4 c o v e r i n g letter, a n d get back 1000 responses, a l l o w i n g f o r p h o t o c o p y i n g , stationery, p a y i n g stuffers, a n d second class postage o u t a n d back, w i l l cost a r o u n d £ 1 0 0 0 . • I t w i l l cost a b o u t £ 4 5 0 t o have a data capture agency process the 1000 questionnaires, assuming there was an average o f 2 5 0 key depressions per response.
46
Surveying the social world
T e c h n i c a l resources Small-scale survey research f o r t u n a t e l y rarely requires sophisticated e q u i p m e n t (beyond computers w h i c h are dealt w i t h o n page 4 7 ) . O n e m a j o r e x c e p t i o n is t h a t projects based o n tape-recorded i n t e r v i e w s need t o b o r r o w or invest i n a g o o d q u a l i t y audio-cassette recorder (preferably one w i t h an auto-reverse o n r e c o r d f a c i l i t y so t h a t the i n t e r v i e w e r does n o t have t o m a n u a l l y stop proceedings a n d t u r n the cassette over after 45 m i n u t e s ) . I f a large n u m b e r o f i n t e r v i e w s have t o be t r a n s c r i b e d , a dedicated cassette t r a n scriber is preferable t o an o r d i n a r y tape-recorder. These machines have a 'go back' f a c i l i t y t h a t r e w i n d s the tape a l i t t l e a n d replays so t h a t the t y p i s t can listen again t o an i n a u d i b l e phrase: they also come w i t h a f o o t c o n t r o l a n d earphones.
Setting the timetable Precise answers a b o u t h o w l o n g any phase o f a survey s h o u l d take t o c o n d u c t are d i f f i c u l t i n the absence o f h a r d i n f o r m a t i o n r e g a r d i n g the scale o f the research a n d the resources available. Nevertheless, i t is possible t o p r o vide some general p o i n t e r s . • Y o u clearly need t o w o r k b a c k w a r d s f r o m i m m o v a b l e deadlines (the delivery o f the project r e p o r t fixed by a sponsor, o r a l i m i t e d w i n d o w of o p p o r t u n i t y f o r data c o l l e c t i o n set b y h o l i d a y s or other restrictions o n y o u r access t o the target p o p u l a t i o n ) . • I t is usual t o give respondents a m i n i m u m o f t w o weeks t o complete postal questionnaires. • A n extensive p r o g r a m m e o f personal i n t e r v i e w s can be very t i m e - c o n s u m i n g , especially i f t r a v e l is i n v o l v e d : the overhead o f arrangements (and re-arrangements) can be d i f f i c u l t t o manage at the same t i m e as actually c o n d u c t i n g other i n t e r v i e w s (can someone back at base d o this?). D o n o t over-estimate the t o t a l n u m b e r o f i n t e r v i e w s t h a t a small t e a m is capable o f c o n d u c t i n g , n o r the t i m e i t w i l l take t o complete t h e m . • T r a n s c r i b i n g tape-recorded i n t e r v i e w s is also t i m e - c o n s u m i n g : a n e x p e r i enced a u d i o t y p i s t under o p t i m u m c o n d i t i o n s m i g h t take f o u r times the length o f the i n t e r v i e w f o r a f u l l t r a n s c r i p t i o n o f a personal interview. Focus groups are considerably m o r e d i f f i c u l t because o f m u l t i p l e a n d o v e r l a p p i n g speakers. • Analysis expands t o fill the t i m e a l l o t t e d f o r its c o m p l e t i o n : i n m o s t instances, i t is best t o settle o n a finite p e r i o d f o r the analysis i n advance associated w i t h a clear u n d e r s t a n d i n g o f the m i n i m u m y o u hope t o achieve w i t h i n these l i m i t s . • T h e overheads f o r c o l l a b o r a t i n g can be considerable. K e e p i n g each other
Planning your project
47
i n f o r m e d , g e t t i n g people back o n t r a c k after p r o b l e m s a n d a v a r i e t y o f other c o o r d i n a t i o n tasks can eat i n t o the p r o d u c t i v i t y gains o f h a v i n g several hands. A t e a m o f three people rarely produces three times the o u t p u t o f one member.
Computing and software resources A l t h o u g h i t is possible t o c o n d u c t a small-scale survey w i t h o u t c o m p u t e r assistance, i t w o u l d be perverse t o a t t e m p t i t . Access t o a desktop or l a p t o p c o m p u t e r a n d suitable software is a l m o s t certainly the single m o s t i m p o r t ant resource t h a t can a i d the investigator i n a survey project. A c o m p u t e r n o t o n l y takes the d r u d g e r y o u t o f n u m e r i c a l and t e x t u a l data analysis b u t w i t h suitable s o f t w a r e i t can help t o design a professional l o o k i n g questionnaire a n d ensure accurate data entry. Even a basic machine, e q u i p p e d o n l y w i t h w o r d processing a n d spreadsheet a p p l i c a t i o n s , can at the m i n i m u m help w i t h project r e c o r d keeping a n d correspondence. I f there are suitable levels o f c o m p u t e r literacy i n play, such a machine can p l a y a useful r o l e i n a l l phases o f a survey.
Hardware H o w ancient a n d l i m i t e d a machine y o u can get by o n depends very m u c h o n the scale o f the project a n d the k i n d o f s o f t w a r e y o u envisage e m p l o y i n g . Unless a machine has a h a r d disc a n d a central processor (CPU) o f r o u g h l y equivalent p o w e r t o an I n t e l 8 0 4 8 6 , i t can p r o b a b l y be r u l e d o u t . (This is n o t meant t o suggest t h a t the c o m p u t e r m u s t be a PC w i t h an I n t e l or similar processor o r r u n n i n g a version o f M i c r o s o f t ' s W i n d o w s o p e r a t i n g system.) T h e m o r e sophisticated survey analysis packages t e n d t o be d e m a n d i n g i n terms o f b o t h processor p o w e r a n d m e m o r y requirements. O l d e r machines w i t h p o o r e r specifications w i l l certainly struggle t o r u n t h e m a n d m a y perf o r m so s l o w l y as t o be unusable. I f there are several investigators and/or h i r e d hands c o l l a b o r a t i n g over a project, i t is preferable t o keep a l l the data files a n d other i n f o r m a t i o n i n one l o c a t i o n , i n v a r i a b l y o n a n e t w o r k e d system, w h e r e access t o t h e m can be m a x i m i z e d . E q u a l l y i m p o r t a n t , there are grave dangers o f generating u n s y n c h r o n i z e d a n d diverging versions o f the same key files w h e n d i f f e r e n t m e m bers o f a t e a m store their w o r k i n progress i n d i f f e r e n t places.
Software There are three m a i n types t o consider: standard business a p p l i c a t i o n s , onestop a p p l i c a t i o n s , a n d dedicated applications.
48
Surveying the social world
Standard business applications As i m p l i e d above, a s t a n d a r d w o r d processing a p p l i c a t i o n can be useful t o create c o v e r i n g letters a n d t o execute mailmerges f o r address labels. Selfc o m p l e t i o n questionnaires can be designed o n w o r d processors b u t i t requires a l i t t l e t r i a l a n d e r r o r t o get finely t u n e d results a n d there is rarely any help o n h o w t o get some o f the u n u s u a l layouts t h a t m a y be r e q u i r e d . A standard office spreadsheet like M i c r o s o f t Excel is also extremely useful a n d can be used t o store the data file (see Chapter 7) as w e l l as t o p r o d u c e tables a n d charts. A l t h o u g h there are fewer ready-made solutions a n d less c o n venience t h a n w i t h a specialized survey package, i t is nevertheless possible t o c a r r y o u t q u i t e advanced data analysis a n d t o p r o d u c e excellent charts f r o m a spreadsheet. Business databases such as M i c r o s o f t Access can also store the data file b u t they are o f t e n less i n t u i t i v e t o use a n d d o n o t always have as extensive a range o f g r a p h i c a l presentation facilities.
One-stop applications A one-stop a p p l i c a t i o n offers a comprehensive a n d integrated suite o f facilities f o r c o n d u c t i n g surveys i n c l u d i n g questionnaire design, data entry, data analysis a n d presentation tables a n d graphics. There is clearly a n advantage i n terms o f convenience i n o n l y h a v i n g t o cope w i t h the interface f o r a single a p p l i c a t i o n , w h i l e there m a y also be cost savings t o be h a d . A t the same t i m e , the statistical procedures available i n one-stop packages t e n d t o be m o r e l i m i t e d t h a n those i n the leading dedicated packages, t h o u g h this m a y n o t be a p r o b l e m f o r a small-scale p r o j e c t . Examples o f one-stop packages are: • PinPoint a n d its successor KeyPoint, b o t h p u b l i s h e d b y L o n g m a n Softw a r e (the U R L is http://www.longman.net/keypoint/). T h e latter i n c o r porates facilities f o r c o n d u c t i n g w e b surveys. • Sphinx Survey, p u b l i s h e d by Sage Publications Software a n d d i s t r i b u t e d by Scolari (the U R L is w w w . http://www.scolari.co.uk/). Some versions s u p p o r t the lexical analysis o f open-ended responses.
Dedicated applications A dedicated a p p l i c a t i o n concentrates o n s u p p o r t i n g a specific aspect o f survey research, p a r t i c u l a r l y data analysis. As a result, there t e n d t o be m o r e user o p t i o n s a n d a higher degree o f s o p h i s t i c a t i o n t h a n i n a one-stop package. T h i s can present p r o b l e m s f o r first t i m e users w h o can get lost i n a sea o f d i f f i c u l t choices. O n the other h a n d , there is a very g o o d chance t h a t exactly the f a c i l i t y y o u are seeking w i l l be available ' o f f the s h e l f . Dedicated s o f t w a r e also exists f o r data e n t r y a n d c h a r t i n g , b u t the choice i n the data analysis field is especially extensive. Examples o f data analysis packages i n c l u d e :
Planning your project
49
• SPSS (Statistical Package f o r the Social Sciences), p u b l i s h e d b y SPSS I n c (http://www.spss.com/). T h e grandfather o f survey analysis software w i t h a t r a c k r e c o r d stretching back over 35 years, a very large i n t e r n a t i o n a l user base a n d associated w i t h a w i d e range o f l i n k e d p r o d u c t s a n d services. Supported b y extensive d o c u m e n t a t i o n a n d several t e x t b o o k s (see f u r t h e r reading f o r Chapter 8 ) . • GB Stat ( d i s t r i b u t e d b y Scolari): offers g o o d i m p o r t facilities, g r a p h i n g and e x p o r t t o w o r d processors. There are several general issues relevant t o the selection o f software packages f o r use i n survey projects. • Is i t one-stop o r dedicated? C a n i t deliver a l l the f o r m s o f analysis y o u expect t o employ? • Does i t have g o o d i m p o r t a n d e x p o r t facilities? I n other w o r d s , c a n i t recognize data files t h a t have been constructed i n other p r o g r a m m e s a n d c o m p u t e r environments? I f i t lacks a p a r t i c u l a r facility, y o u m a y w a n t t o e x p o r t some o r a l l o f the data t o another package t h a t does have i t . • Is t h e package w e l l supported? T h i s covers on-screen s u p p o r t , technical help-lines, w e b sites, a n d paper d o c u m e n t a t i o n . Does a n y o f this help come at a price y o u cannot afford? • C a n i t deal w i t h the n u m b e r o f variables a n d a l l the types o f data y o u expect t o collect? I n particular, i f y o u have used open-ended questions, w h a t facilities does i t offer f o r processing the responses? • Does i t p r o v i d e g o o d presentation facilities (especially tables a n d charts) t h a t can be i m p o r t e d d i r e c t l y i n t o the w o r d processor y o u are g o i n g t o use to p r o d u c e the report? • A r e there licence restrictions o n the w a y i t can be used w h i c h w i l l c o n f l i c t w i t h the w a y y o u are p l a n n i n g t o r u n the research? Is there a n educational o r c h a r i t y price tariff?
Gaining access to organizations I n Chapter 5 (pages 86 a n d 90) w e consider ways o f a p p r o a c h i n g i n d i v i d u a l respondents a n d encouraging their p a r t i c i p a t i o n . H o w w e i n t r o d u c e a n d characterize o u r research o n the telephone o r i n person, o r i n a c o v e r i n g letter a c c o m p a n y i n g a questionnaire, is c r u c i a l i n securing p a r t i c i p a t i o n a n d establishing r a p p o r t . I n m a n y cases, however, o u r research is c o n d u c t e d i n a n o r g a n i z a t i o n a l c o n t e x t . O u r survey is n o t o f the general p u b l i c b u t o f members o f a n o r g a n i z a t i o n , w h e t h e r a c o m m e r c i a l company, a u n i o n o r professional association, a religious m o v e m e n t o r a v o l u n t a r y agency. I n such cases the o r g a n i z a t i o n mediates o u r research r e l a t i o n s h i p w i t h o u r respondents.
50
Surveying the social world
O n the face o f i t , the p r o b l e m we face is this: g a i n i n g access t o the organiza t i o n so t h a t w e can survey its members. W h o s e a u t h o r i z a t i o n d o w e need, a n d w h a t do w e have t o say a n d do t o get in} T h i s , t h o u g h , is o n l y one aspect o f access. Access is n o t a one-off achievement b u t an o n g o i n g process, w i t h f o u r dimensions (Buchanan et al. 1 9 8 8 ) : getting i n , getting o n , getting o u t a n d getting back. As H o r n s b y - S m i t h (1993) remarks, i t can be h e l p f u l i f we t h i n k o f o u r i n v o l v e m e n t w i t h the o r g a n i z a t i o n as an 'access career'. Getting on p o i n t s t o the need t o secure w i l l i n g c o o p e r a t i o n once w e have been granted f o r m a l rights o f access t o respondents. Just because senior people have a u t h o r i z e d o u r research does n o t necessarily mean t h a t the r a n k - a n d - f i l e w i l l be enthusiastic. ( W h e n A l d r i d g e a r r i v e d f o r one i n t e r v i e w w i t h a c l e r g y m a n , he was i m m e d i a t e l y asked w h e t h e r he was a spy f r o m the bishop.) We need t o w i n people over. I n a c o m p l e x o r g a n i z a t i o n w e m a y w e l l need t o negotiate w i t h a w i d e range o f gatekeepers w h o can facilitate or hinder o u r research. Getting out is also a sensitive process, despite the insensitive language. G e t t i n g o u t is n o t escape. Once w e have gathered o u r data, w e s h o u l d n o t s i m p l y r u s h o u t o f the o r g a n i z a t i o n b r e a t h i n g a sigh o f relief. W e have b u i l t up obligations t o o u r respondents a n d t o the people w h o have helped us c a r r y the research f o r w a r d . We w i l l p r o b a b l y be feeding back o u r findings t o t h e m i n one w a y or another, so w e need t o m a i n t a i n o u r l i n k s t o t h e m . One reason f o r h a n d l i n g getting o u t sensitively is the need w e o r others m a y have t o get back. W e may, f o r example, w a n t t o c o n d u c t f o l l o w - u p interviews w i t h key i n f o r m a n t s , or gather m o r e i n f o r m a t i o n f r o m c o m p a n y archives. T h e o r g a n i z a t i o n m a y itself ask us t o carry o u t f u r t h e r w o r k . Even if w e have n o i n t e n t i o n o f c o n d u c t i n g f u r t h e r w o r k i n the o r g a n i z a t i o n , others may, so w e o w e i t t o t h e m n o t t o leave a sour atmosphere b e h i n d . There c a n n o t be a set f o r m u l a f o r securing access i n this f o u r f o l d meaning o f the t e r m . Patience a n d quiet persistence are o f t e n needed, as is the willingness t o seize o p p o r t u n i t i e s w h e n they arise - they are seldom predictable. N o r is i t always easy t o p r e d i c t w h i c h o r g a n i z a t i o n s w i l l be the m o s t d i f f i c u l t t o access f o r the purpose o f c o n d u c t i n g a survey, n o r w h a t barriers w i l l need t o be overcome. I n v i r t u a l l y a l l cases, c o n f i d e n t i a l i t y w i l l have t o be negotiated carefully. T h e researchers m a y w e l l need t o demonstrate t h e i r competence t o sceptics, since n o t everyone is c o n v i n c e d o f the value o f social surveys o r the expertise o f social scientists. Even w h e n they are n o t sceptical, m a n y organizations are so time-pressured t h a t they need t o be persuaded t h a t t h e i r members can spare the t i m e t o p a r t i c i p a t e . I n some cases, ascribed qualities o f the researcher such as gender, age, social status, ethnicity, religious a f f i l i a t i o n can be a f o r m i d a b l e a n d even impenetrable barrier. O n e advantage o f using s e l f - c o m p l e t i o n questionnaires (see B o x 3.1) is t h e i r i m p e r s o n a l i t y : the p r o b l e m o f i n t e r v i e w e r effects does n o t arise.
Planning your project
51
Three methods of gathering data W e can d i s t i n g u i s h three vehicles f o r g a t h e r i n g data f r o m respondents: 1
The self-completion questionnaire I n this m e t h o d , respondents f i l l o u t the questionnaire themselves. I t m a y be a p o s t a l (mail) questionnaire, w h i c h they complete a n d r e t u r n by post. I t m a y be a questionnaire handed t o t h e m , f o r example by a teacher i n class or a receptionist i n a w a i t i n g r o o m , w h i c h they are asked t o complete o n the spot a n d h a n d i n . O r i t m a y be a n e m a i l questionnaire w h i c h they complete a n d r e t u r n electronically.
2 The face-to-face interview H e r e the researcher i n t e r v i e w s the respondent i n person, either i n the respondent's h o m e , or i n the researcher's office, o r i n some ' n e u t r a l ' l o c a t i o n . I n social research, m o s t i n t e r v i e w s are one-toone, t h o u g h g r o u p i n t e r v i e w s are also possible - i n t e r v i e w i n g the a d u l t partners i n a h o u s e h o l d , f o r example. 3 The telephone interview H e r e the i n t e r v i e w is c o n d u c t e d over the telephone - or, i n f u t u r e , the v i d e o p h o n e . We set o u t i n the boxes b e l o w the m a i n advantages a n d disadvantages o f each m e t h o d , before considering the key choices t o be m a d e .
Box 3. i
Seif-compietion questionnaires: pros and cons
Advantages
Disadvantages
Cost
Questionnaire length
The cost of reproducing and distributing questionnaires is relatively low.
Self-completion questionnaires need t o be short and also look short, o r the response rate will be low.
Time to collect data
Simple questions
Questionnaires can be distributed and returned quickly.
Complex questions are cumbersome t o ask and take t o o long t o answer.
Large samples
Few open questions
Because costs are low and data collection is fast, it is feasible t o survey large samples of the population. The method benefits f r o m economies of scale.
Since w r i t t e n answers t o open questions can take a long time, only a few such questions can be asked.
52
Surveying the social world
Geographical distribution
Response rate
Since the researcher is not present, the sample can be drawn f r o m a wide geographical area.
Even w i t h good design, response rates can be low unless respondents have strong reasons t o participate. Response rates will be underestimated if questionnaires have been sent t o people w h o are n o t part of the target population o r w h o have moved address. Unless they let us know, we shall count them as refusals when they are not.
No interviewer bias
Control of context of response
There is no interviewer t o introduce unauthorized comments about the research, the questions or the respondent.
The researcher often has no control over w h o fills o u t the questionnaire, nor the spirit in which they do so. Respondents can scan the whole questionnaire first, rather than follow the desired sequence of questions.
No interviewer effects
Response bias
Respondents do n o t have t o relate t o characteristics of the researcher such as their age, sex, ethnicity, dress o r accent.
People w h o experience literacy problems, o r whose mobility is restricted, will be less likely t o respond.
Handling sensitive topics
Salience
Since the researcher is n o t present, respondents may find it easier t o handle sensitive questions, particularly if their responses are anonymous.
Gauging the salience of items t o the respondent can be difficult.
Planning your project
Box 3.2
Face-to face interviews: pros and cons
Advantages
Disadvantages
Length of interview schedule
Cost
Because responses are verbal, it is possible t o ask more questions that in a self-completion questionnaire. The appearance of the interview schedule is n o t relevant t o the interviewee.
Interviews are costly in money and time.
Complex questions
Sample size
The presence of the interviewer enables complex questions t o be explained, if needed, t o the interviewee.
Because of the time and money involved, one interviewer can conduct a limited number of interviews each day. There are no economies of scale.
Question skips
Geographical restrictions
As long as they are clear t o the interviewer, question skips raise no problems f o r the respondent.
The cost of travel and the time it takes may limit the geographical reach of surveys carried o u t by interviews.
Open questions
Time to collect data
Since respondents do not have t o w r i t e their answer, open questions can be used more freely.
Given that interviewing can be taxing f o r the interviewer, especially when interviews are not wholly structured, any one researcher can only undertake a few interviews each day - often four is the maximum.
Salience
Interviewer bias
The use of open questions, and non-verbal cues f o r the respondent, enable the interviewer t o gauge which items are salient t o the respondent and which are of no concern.
Interviewers can introduce bias by offering unauthorized comments on the questions, the research o r the interviewee, which can lead the respondent in a particular direction.
53
54
Surveying the social world
Visual aids
Interviewer effects
Show cards can be used t o help respondents frame their answer.
Personal characteristics of the interviewer - such as age, sex, ethnicity, dress o r accent - can affect the way in which the interviewee responds.
Ranking and rating questions
Leading questions
Relatively complex ranking and rating exercises are possible. For example, occupational titles can be w r i t t e n on cards, and respondents asked t o rank them, o r s o r t them into categories, based on criteria such as social status.
Even w i t h o u t interviewer bias, leading questions can easily be introduced unwittingly into the less structured part of an interview.
Control over context of response In contrast t o self-completion questionnaires, the researcher has cont r o l over w h o responds t o questions and the sequence of questions. By establishing good rapport the researcher can ensure that questions are taken seriously.
Social desirability The presence of the interviewer makes it even more likely that the respondent will seek t o give socially desirable answers.
Rapport
Anonymity
The interviewer's success in achieving a good relationship w i t h the respondent will improve the quality of the answers.
Although confidentiality can be guaranteed, anonymity clearly cannot.
Group interviews
Safety
Sometimes we would like responses f r o m more than one person, for example, f r o m the adult members of a household. This is only feasible in a face-to-face interview.
A t t e n t i o n needs t o be given t o the physical safety of the interviewer, especially if interviews are conducted by one interviewer in the respondent's home.
Planning your project
Box 3.3
Telephone interviews: pros and cons
Advantages
Disadvantages
Cost
Simple questions
Costs are far lower than w i t h faceto-face interviews.
Because strong rapport is hard t o achieve, and because show cards are not possible, complex questions have t o be avoided.
Large samples
Response bias
Because costs are l o w and data collection is fast, it is more feasible t o survey larger samples of the population than if interviews are face-toface.
Unless great care is taken, socially disadvantaged groups will be underrepresented.
Geographical distribution
Sensitive questions
Since the researcher is not present, the sample can be drawn f r o m a wide geographical area.
Telephone conversations are an unsuitable medium f o r asking sensitive questions.
T i m e to collect data
Open questions
There is no travel time, and the respondent's agreement t o participate is quickly obtained.
Open questions are less effective than in face-to-face interviews. O n the telephone, respondent's answers are usually brief, and probes have a limited effect.
Question skips
Limited response categories
As in face-to-face interviews, provided they are clear t o the interviewer question skips raise no problems f o r the respondent.
Respondents cannot be expected t o memorize a long list of response categories. Visual aids such as show cards cannot be used t o help respondents frame their answer, as they can in face-to-face interviews.
Fewer interviewer effects
Contamination by telesales
Although personal characteristics of the interviewer - such as age, sex, ethnicity, o r social class - may be inferred, they are less obvious and intrusive than in face-to-face interviews.
The telephone is widely used for selling goods and services, often w i t h an initial pretence of conducting research. Genuine research is not always easily distinguished f r o m these other activities.
55
56
Surveying the social world
Safety
Cold call
The physical safety of the interviewer is not an issue.
The telephone call comes ' o u t of the blue'; not having been prepared, respondents may be less likely t o agree t o take part.
Anonymity Although confidentiality can be guaranteed, anonymity clearly cannot.
Self-completion questionnaires, face-to-face i n t e r v i e w s a n d telephone interviews are the three m a i n methods o f g a t h e r i n g data i n social surveys. We m a y a d d a f o u r t h , as m e n t i o n e d already i n chapter one: o b s e r v a t i o n . We s h o u l d also note t w o variants o n s e l f - c o m p l e t i o n questionnaires: e m a i l a n d interactive surveys, a n d diaries.
Email and interactive surveys These are electronic variants o f the postal questionnaire, a n d they offer advantages a l o n g w i t h some p r o b l e m s . I n an e m a i l survey, the questionnaire is d i s t r i b u t e d a n d r e t u r n e d electronically. T h i s has advantages over paper questionnaires: • w e can p r e - p r o g r a m the order o f questions, so t h a t respondents progress t h r o u g h the questionnaire i n the sequence w e desire w i t h o u t s k i p p i n g ahead or g o i n g back; • because o f this, the p r o b l e m o f q u e s t i o n skips does n o t arise - the p r o g r a m a u t o m a t i c a l l y moves t o the n e x t relevant q u e s t i o n ; • the p r o g r a m can p r o m p t respondents, a n d alert t h e m t o the fact t h a t they have made a mistake - f o r example, i f they t r y t o t i c k several boxes w h e r e o n l y one is r e q u i r e d ; • there is n o i n t e r m e d i a t e stage o f i n p u t t i n g d a t a ; the data are available f o r i m m e d i a t e analysis; • there are n o p r o b l e m s a b o u t h o w t o arrange f o r the questionnaires t o be r e t u r n e d , a n d n o intermediaries t o intervene i n the process o f d i s t r i b u t i o n and return; • w e can g a i n access t h r o u g h newsgroups t o m i n o r i t i e s w h o are h a r d t o reach b y other means. There are, t h o u g h , some m a j o r d r a w b a c k s :
Planning your project
57
• there is a s a m p l i n g bias t o w a r d s affluent, w e l l educated, y o u n g , w h i t e , male citizens o f First W o r l d countries - t h o u g h this w i l l become less acute as m o r e people g a i n access t o the internet; • w e need p r o g r a m m i n g skills, o r a p r o g r a m m e r , t o design the questionnaire; • respondents need f a m i l i a r i t y w i t h a n d access t o a c o m p u t e r e q u i p p e d w i t h the necessary s o f t w a r e ; • respondents m a y lack confidence i n the security o f data sent over the internet a n d stored o n a remote server o r c o m p u t e r ; • a n o n y m i t y c a n n o t c o n v i n c i n g l y be guaranteed.
Diaries A s k i n g respondents t o keep a d a i l y r e c o r d o f t h e i r actions can be a useful p a r t o f a social survey. Potentially, i t enables us t o gather a large a m o u n t o f data, m u c h o f w h i c h w o u l d be h a r d t o o b t a i n by other means. I t can stand i n f o r o b s e r v a t i o n w h e r e o b s e r v a t i o n is n o t possible. We p o i n t o u t i n Chapter 6 (see B o x 6.6) t h a t one o f the m o s t d i f f i c u l t p r o b l e m s facing the survey researcher is asking a b o u t p e r i o d i c behaviour. Suppose w e are interested i n respondents' c i n e m a - g o i n g . W h i l e a m i n o r i t y o f t h e m m a y have extremely regular behaviour - they go t o the cinema every Saturday, as p a r t o f a regular n i g h t o u t - m o s t respondents w i l l n o t be l i k e this. T h e i r cinema attendance w i l l be far m o r e variable a n d h a r d t o summarize. One answer t o the p r o b l e m is t o ask respondents t o keep a d i a r y o f their activities day by day. We shall n o t have t o depend o n the respondents' f a l l ible m e m o r y . N o r w i l l w e face the social desirability p r o b l e m o f respondents o v e r - r e p o r t i n g socially a p p r o v e d behaviour a n d u n d e r - r e p o r t i n g socially stigmatized activities. A d i a r y w i l l p r o v i d e us w i t h a reliable r e c o r d o f their actual behaviour. O r w i l l it? Diaries are n o t a panacea f o r social respondents m i s r e p o r t their behaviour, c o n f o r m w i t h social n o r m s . K e e p i n g conscious, a n d so m a y itself affect their
d e s i r a b i l i t y effects. N o t o n l y m a y they may alter their behaviour to a d i a r y makes people m o r e selfactions.
I t m a y be t e m p t i n g t o t h i n k o f the d i a r y as a l o w cost w a y o f g a t h e r i n g data. T h e respondent does a l l the w o r k a n d w e reap the benefit. O f course i t is n o t as simple as t h i s . Diaries are d i f f i c u l t t o design a n d t o analyse. I n survey research the d i a r y is a f o r m o f s e l f - c o m p l e t i o n questionnaire, one renewed every day. We s h o u l d design i t as such. Respondents need t o k n o w w h a t t o r e c o r d a n d w h e n t o d o so. As w i t h questionnaires a n d i n t e r v i e w schedules, w e have t o be selective, a n d as h e l p f u l as possible i n e x p l a i n i n g t o the respondent w h a t w e are asking t h e m t o d o a n d w h y . We need t o be clear
58
Surveying the social world
a b o u t the use t o w h i c h the data w i l l be p u t , i n c l u d i n g the issue o f a n o n y m ity or c o n f i d e n t i a l i t y a n d any feedback w e i n t e n d t o give o n the results o f o u r analysis. We s h o u l d contact the respondents i n person or by telephone d u r i n g the course o f t h e i r record-keeping, t o t h a n k t h e m f o r their p a r t i c i p a t i o n , answer any queries, a n d encourage t h e m t o c a r r y o n . We s h o u l d also, ideally, arrange t o call t o collect the d i a r y i n person. Diaries are, t h e n , labour-intensive f o r the researcher t o o . Sometimes, t o keep the data manageable, the researcher asks o n l y a subsample o f respondents t o keep a diary. O n e p r o b l e m here is response bias: perhaps certain k i n d s o f respondent w i l l be m o r e w i l l i n g t o d o so t h a n others.
Choosing a method of gathering data I n surveys c o n d u c t e d by a solo researcher or a small t e a m , p r a c t i c a l c o n siderations o f t e n l i m i t the o p t i o n s open t o us. I f a large sample is r e q u i r e d , or i f respondents are geographically scattered, face-to-face i n t e r v i e w s are n o r m a l l y impossible because they consume too m u c h t i m e a n d money. I f w e need t o ask a l o t o f questions, a n d i f the f o r m a t is c o m p l e x , w i t h m u l t i p l e question skips, t h e n a s e l f - c o m p l e t i o n questionnaire is unsuitable, unless i t can be d i s t r i b u t e d electronically. T h e m o r e questions there are, the m o r e a face-to-face i n t e r v i e w becomes a p p r o p r i a t e . If w e need t o ask a l o t o f open questions, face-to-face i n t e r v i e w s are t o be preferred (see Boxes 3 . 1 , 3.2 a n d 3.3).
Combining methods of data gathering T h e m e t h o d s o f data-gathering w e have been discussing are n o t m u t u a l l y exclusive. I t is o f t e n possible t o c o m b i n e t h e m , w i t h beneficial results.
U s i n g a questionnaire t o generate f o l l o w - u p i n t e r v i e w s T h e r a t i o n a l e is t h a t the questionnaire w i l l p r o v i d e basic i n f o r m a t i o n a b o u t the sample f r o m w h i c h generalizations can be made t o the w h o l e p o p u l a t i o n . I n t e r v i e w i n g is a t o o l w h i c h w i l l a l l o w us t o p r o b e m o r e deeply i n t o the respondents' feelings, attitudes, o r i e n t a t i o n s , hopes a n d fears. I n t e r v i e w s y i e l d r i c h evidence t h a t complements the generalizable b u t t h i n data f r o m a questionnaire. I f the questionnaire has p r o d u c e d unexpected or p u z z l i n g findings, we can explore t h e m i n d e p t h t h r o u g h i n t e r v i e w s . U s i n g a questionnaire t o i d e n t i f y a subset o f respondents f o r i n t e r v i e w can raise p r o b l e m s over a n o n y m i t y o f findings. I f the questionnaire is
Planning your project
59
a n o n y m o u s , h o w can w e m a i n t a i n o u r guarantee w h i l e i d e n t i f y i n g people w h o are w i l l i n g t o be interviewed? There are t w o possibilities. First, w e can breach the a n o n y m i t y w h i l e o f f e r i n g reassurances. A t the end o f the questionnaire, after w e have t h a n k e d t h e m f o r their p a r t i c i p a t i o n , w e can include a brief statement t h a t w e i n t e n d t o c o n d u c t f o l l o w - u p i n t e r v i e w s , a n d ask w h e t h e r they w o u l d be w i l l i n g t o p a r t i c i p a t e . I f so, w e w i l l need t o ask f o r a name a n d address o r telephone number. A l t h o u g h this is relatively s t r a i g h t f o r w a r d , i t o b v i o u s l y means t h a t o u r questionnaire is n o longer a n o n y m o u s . A v a r i a t i o n o f this is i l l u s t r a t e d by the Travel Survey. A s a n incentive t o complete the questionnaire, w e o f f e r e d respondents the o p p o r t u n i t y t o enter a prize d r a w f o r a bicycle, w h i c h w e t h o u g h t an a p p r o p r i a t e r e w a r d ! T h e questionnaire h a d a tear-off s t r i p , o n w h i c h respondents were asked t o give their name a n d details o f h o w w e c o u l d contact t h e m i f they h a d w o n . We reassured t h e m t h a t the slip w o u l d be detached f r o m the questionnaire as soon as w e h a d received i t , a n d t h a t a l l data w o u l d r e m a i n c o n f i d e n t i a l . I f w e decide t h a t the questionnaire m u s t r e m a i n a n o n y m o u s , w e shall have t o s u p p l y respondents w i t h t w o envelopes, one f o r their questionnaire and one f o r their personal details. I f the questionnaire is n o t a n o n y m o u s , w e are able t o tie u p each i n t e r v i e w w i t h the c o r r e s p o n d i n g questionnaire. S h o u l d w e d o so? S h o u l d w e refer back, i n the i n t e r v i e w , t o the responses made o n the questionnaire? A t times, this can be f r u i t f u l , b u t care m u s t be t a k e n . We r i s k p r o v o k i n g socially desirable responses. I f w e c o n t i n u a l l y r e m i n d respondents w h a t they said before, they m a y adjust their replies t o be consistent w i t h t h a t .
U s i n g i n t e r v i e w s , focus groups o r diaries t o suggest items f o r a questionnaire H e r e the i n t e r v i e w s , focus groups o r diaries are being used less f o r their o w n sake t h a n t o help us f o r m u l a t e salient, m e a n i n g f u l questions f o r use i n the questionnaire. Triangulation I n the literature o n research m e t h o d s , t r i a n g u l a t i o n refers t o the use o f a variety o f research strategies, o r o f data f r o m a v a r i e t y o f sources, t o test an hypothesis. T h e t e r m t r i a n g u l a t i o n comes f r o m surveying. W e calculate the p o s i t i o n o f an object, C, by t a k i n g bearings o n i t f r o m t w o p o s i t i o n s , A a n d B. I f w e measure the distance between A a n d B, w e k n o w the length o f one side o f the triangle defined by p o i n t s A , B, a n d C. We use an i n s t r u m e n t such as a t h e o d o l i t e t o measure the angle at apexes A a n d B. F r o m this w e can calculate the exact l o c a t i o n o f C. T h e p o i n t is, w e need t o make t w o o r m o r e independent measurements t o d o so.
60
Surveying the social world
But does this analogy stand u p w h e n dealing w i t h social data? I f w e regard a questionnaire as a measurement f r o m p o i n t A , a n d a n i n t e r v i e w as a measurement f r o m p o i n t B, can w e n o w determine p o i n t C - namely, data a b o u t a respondent? A r e social data like that? I f w e ask a respondent a b o u t her o p i n i o n s o n field sports i n an i n t e r v i e w as w e l l as a questionnaire, have we d e t e r m i n e d her o p i n i o n s as definitively as w e can calculate the l o c a t i o n o f an object i n the landscape? Perhaps n o t . O n e facet o f social data, like q u a n t u m mechanics, is t h a t the act o f measurement affects the t h i n g being measured. T h e accounts t h a t people give i n questionnaires, i n t e r v i e w s , focus groups or diaries are just t h a t : accounts. W h e t h e r i t is possible t o construct one objective, definitive statement o u t o f these v a r y i n g accounts is a contested p h i l o s o p h i c a l quest i o n . R e t u r n i n g t o the p r a c t i c a l matters, w e m a y s i m p l y conclude t h a t an o p e n - m i n d e d use o f a v a r i e t y o f methods w i l l d o n o h a r m , a n d w i l l t e n d t o enrich o u r u n d e r s t a n d i n g o f the social w o r l d .
Further reading Devine and Heath (1999) Sociological Research Methods in Context is particularly useful i n discussing the ways in which different research methods can be combined. Hornsby-Smith's chapter i n Gilbert's (1993a) collection, Researching Social Life, provides a good account of the problems of access. Czaja and Blair (1995) Designing Surveys: A Guide to Decisions and Procedures is readable and wide ranging.
(J)
Selecting samples
Introduction Sampling is the process o f choosing i n a systematic fashion a sub-set o f cases f r o m w h i c h data w i l l be collected f r o m the p o o l o f a l l those p o t e n t i a l l y relevant t o the research being conducted. T h e sub-set selected is the sample, the p o o l is the target p o p u l a t i o n . T h i s t e r m i n o l o g y is used whatever the cases i n question are - they w i l l o f t e n be h u m a n i n d i v i d u a l s , b u t other possibilities
62
Surveying the social world
i n social science research include collectivities (households w i t h i n a defined area, the stores w i t h i n a retail chain), relationships (couples i n the process o f d i v o r c i n g , doctors w i t h patients w h o have a p a r t i c u l a r c o n d i t i o n ) , events (inmate releases f r o m prisons, patient 'episodes' i n hospitals), o r slices o f space-time ( u r b a n intersections m o n i t o r e d over a p e r i o d o f t i m e f o r possible accidents). T h e need t o make any type o f selection always reflects researchers' l i m i t e d resources. I n a n ideal w o r l d , data c o u l d be collected f r o m every case i n a target p o p u l a t i o n (a s i t u a t i o n sometimes referred t o as complete enumera t i o n ) . T h i s is the objective, t h o u g h never the achieved result, i n some censuses c o n d u c t e d b y n a t i o n states i n t o the c o n d i t i o n o f their h u m a n p o p u l a t i o n s . O n e h u n d r e d percent coverage o f small target p o p u l a t i o n s m a y be p r a c t i c a l , b u t t i m e constraints a n d finite budgets f r e q u e n t l y render c o m plete e n u m e r a t i o n o f large target p o p u l a t i o n s o u t o f the question. T h e researcher is t h e n o b l i g e d t o i n t r o d u c e some sort o f selection o f the cases t o be i n c l u d e d w i t h i n the study. There is a f u n d a m e n t a l choice t o be made between the t w o m a j o r types o f selection procedure. I f data f r o m the selected cases is t o be used as the basis f o r generalizations a b o u t a n entire target p o p u l a t i o n t h e n p r o b a b i l i t y (or r a n d o m ) methods o f s a m p l i n g s h o u l d be e m p l o y e d using the principles set o u t i n the section devoted t o this, b e l o w (page 6 2 ) . I f , o n the other h a n d , data f r o m the selected cases can stand i n their o w n r i g h t a n d there is n o r e q u i r e m e n t t o generalize f r o m t h e m , the p r o cedures set o u t i n the section o n n o n - p r o b a b i l i t y s a m p l i n g (page 79) w i l l be adequate. W h a t e v e r selection procedures are a d o p t e d , they need t o be consistent w i t h the o v e r a l l p r o j e c t research design a n d s h o u l d be developed i n c o n j u n c t i o n w i t h the latter. Specifically, w h e r e t h e research design requires a c o m p a r i s o n t o be made between p a r t i c u l a r g r o u p s o r t i m e p e r i o d s , t h e n adequate q u a n t i t i e s o f cases w i t h the a p p r o p r i a t e attributes m u s t be made available f o r the analysis stage b y a n y selection procedures a d o p t e d . I t is wise n o t t o lose sight o f the fact t h a t even the m o s t sophisticated sample design represents n o m o r e t h a n a n a t t e m p t t o reach a r a t i o n a l c o m p r o m i s e between r i g o u r o n the one h a n d a n d a n effective a p p l i c a t i o n o f t i m e a n d m o n e y o n t h e other. A f u r t h e r p o i n t is t h a t even w h e n 'scientific' s a m p l i n g procedures are used under o p t i m u m c o n d i t i o n s , a l l r e s u l t i n g generalizat i o n s d e r i v e d f r o m sample data are i n e v i t a b l y subject t o a degree o f error. T h e k e y advantage t h a t p r o b a b i l i t y s a m p l i n g possesses over the alternative selection procedures is t h a t i t a l l o w s the l i k e l y size o f this e r r o r t o be calculated. Sampling i n a systematic f a s h i o n rests o n relatively s t r a i g h t f o r w a r d principles whose a p p l i c a t i o n i n simple surveys is usually u n p r o b l e m a t i c . H o w e v e r , w h e r e a research p r o b l e m seems t o necessitate the c o n s t r u c t i o n o f a c o m p l e x sample design, the i m p l i c a t i o n s f o r the data analysis stage need t o be checked o u t i n advance w i t h a statistician f a m i l i a r w i t h social surveys.
Selecting samples
63
Theoretical populations A consideration w h i c h is sometimes glossed over i n methods texts b u t w h i c h needs t o be given early a t t e n t i o n i n any b u t the p u r e l y descriptive survey concerns the d e f i n i t i o n o f the target p o p u l a t i o n . I f a research project sets o u t t o test a theoretical hypothesis or even, m o r e modestly, t o a p p l y a n d explore theoretical concepts, there is t h e n a need t o consider w h a t k i n d s o f target p o p u l a t i o n are relevant t o the p a r t i c u l a r hypothesis or concepts. T o ensure the adequate e x p l o r a t i o n o f a theory, the e m p i r i c a l target p o p u l a t i o n selected by the researcher needs t o be i n c l u d e d w i t h i n the theoretical p o p u l a t i o n , the usually infinite d o m a i n o f e m p i r i c a l p o p u l a t i o n s w h i c h any general t h e o r y addresses. T h i s is essentially a conceptual c o n s i d e r a t i o n w h i c h needs t o be dealt w i t h at the research design stage.
Box 4.1
Ensuring a 'theoretically relevant* target population
In Delinquent Boys: the Culture of the Gang (1955), Cohen argued that gang delinquency was a response t o the problems encountered by (largely) w o r k ing class adolescents adjusting t o a system of status evaluation operating in American society through which it was impossible for them t o earn selfrespect. The delinquent subculture represents an alternative status system built on an inversion of key middle class values, particularly respect for legitimately acquired possessions.The 'cavalier misappropriation o r destruction of property' (Cohen 1955:134) that it is argued is characteristic of much juvenile delinquency is interpreted as a rejection of middle class acquisition through diligence, self-discipline and sobriety and a celebration of their opposites. A piece of research setting out t o explore Cohen's ideas would need t o begin by identifying a target population that is theoretically relevant t o Cohen's thesis. A researcher based in Britain might be tempted t o employ a target population of juveniles convicted for the offence of criminal damage and then move on t o analyse the perpetrators' membership of peer groups and their class and school backgrounds. However, consideration would have t o be given t o whether the legal definition of criminal damage in the U K is sufficiently close t o the 'cavalier misappropriation o r destruction of property' for this population t o be appropriate. A n alternative approach might be t o use the records of schools from a Local Education Authority area o r areas t o identify a target population of pupils w h o had committed acts of so-called 'mindless vandalism' in connection with school premises and property. In either case, an additional issue is whether Cohen's t h e o r y was f o r m u lated in intrinsically culture-specific terms so that any fair test would have t o be on an American target population. In general, the cases making up a target population should demonstrably possess the characteristics appropriate t o the t h e o r y under scrutiny.
64
Surveying the social world
The discussion i n B o x 4 . 1 raises the general issue o f the extent t o w h i c h i t is possible t o test a theoretical hypothesis or explore theoretical concepts i n a survey (or other research) w h i c h is n o t specifically designed f o r the task. Sometimes a theoretical f r a m e w o r k can give rise t o b r o a d predictions or have corollaries t h a t are sufficiently general t o be c o n f i r m e d or d i s c o n f i r m e d by the findings f r o m general purpose survey or similar research. I t c o u l d be the case i n the example f r o m B o x 4 . 1 t h a t data o n the d i s t r i b u t i o n o f j u v e n ile p r o p e r t y crime across classes derived f r o m existing o f f i c i a l crime statistics are f o u n d t o be at odds w i t h Cohen's p r o p o s i t i o n s . H o w e v e r , i t is also possible t h a t the o u t c o m e o f this f o r m o f e m p i r i c a l test w i l l be u n c e r t a i n a n d contested because o f (say) debate over w h e t h e r the definitions o f 'class' a n d ' p r o p e r t y c r i m e ' t h a t u n d e r p i n the data are congruent w i t h those offered by the theory. Specifically designed research generally offers the best chance o f a decisive a n d r i g o r o u s test o f a theory.
Probability sampling strategies The m a i n task i n the remainder o f this chapter is t o o u t l i n e the variety o f strategies available f o r selecting a sample f r o m the target p o p u l a t i o n . P r o b a b i l i t y strategies w i l l be dealt w i t h first, f o l l o w e d by n o n - p r o b a b i l i t y strategies. P r o b a b i l i t y or r a n d o m s a m p l i n g is an integral feature o f m o d e r n 'scientific' survey research (indeed, scientific sampling was an alternative name f o r i t , t h o u g h its use has n o w w a n e d ) . P r o b a b i l i t y s a m p l i n g is designed m a i n l y t o assist the accurate estimation o f the values of characteristics o f p o p u l a t i o n s ( p o p u l a t i o n parameters) based o n data obtained f r o m a sample. Examples o f t y p i c a l parameters i n w h i c h social researchers m i g h t be interested i n r e l a t i o n t o p a r t i c u l a r p o p u l a t i o n s are the p r o p o r t i o n o f households t h a t o w n a m o t o r vehicle, a n d the average gross w e e k l y income o f economically active i n d i viduals. T h e t e r m ' p r o b a b i l i t y s a m p l i n g ' is a reference t o the a d o p t i o n o f selection procedures w h i c h a l l o w the use o f inferences derived f r o m the m a t h ematical theory o f p r o b a b i l i t y . T h e l i n k w i t h this theory gives a l l p r o b a b i l i t y selection strategies some u n i q u e features n o t enjoyed by any other ways o f choosing the cases t o supply data, t h o u g h this is n o t t o say i t is universally appropriate o r superior. Its special features are as f o l l o w s : • Probability selection cannot offer any cast-iron guarantee t h a t a particular sample selected w i l l be 'representative' of the m i x of cases i n the target p o p u l a t i o n . Instead, i t offers the researcher the possibility o f calculating the level o f likely sampling e r r o r associated w i t h an estimate o f a p o p u l a t i o n value. Sampling error, w h i c h is discussed further b e l o w (page 76), can be t h o u g h t o f as the v a r i a b i l i t y between every logically possible potential sample o f a given size a n d type: the researcher (normally) selects just one.
Selecting samples
65
• I f the researcher can specify a desired level o f accuracy f o r estimates o f a key p o p u l a t i o n value, the m i n i m u m necessary sample size t o achieve i t can be calculated. I n other w o r d s , i f the k e y p o p u l a t i o n value the researcher wishes t o k n o w is ( f o r example) the m e a n size o f households w i t h i n plus or m i n u s 1 person, i t is possible t o w o r k o u t h o w large the sample m u s t be t o deliver a n estimate o f this precision. • I t is a necessary c o n d i t i o n f o r the use o f a w i d e v a r i e t y o f statistical tests a n d measures at the data analysis stage. P r o b a b i l i t y s a m p l i n g requires the researcher t o organize a l o t t e r y t o determine w h i c h cases i n the target p o p u l a t i o n w i l l m a k e u p the sample. There are, i n fact, a f e w elaborations a n d q u a l i f i c a t i o n s o n this b u t the essence o f p r o b a b i l i t y s a m p l i n g is t h a t i t is designed t o rule o u t the h a n d p i c k i n g o f i n d i v i d u a l cases. ' L o t t e r y ' m a y w e l l conjure u p i n the reader's m i n d images o f flashing lights a n d a mechanical apparatus w i t h r e v o l v i n g d r u m s c o n t a i n i n g n u m b e r e d c e l l u l o i d balls. H o w e v e r , its use i n c o n n e c t i o n w i t h surveys is p a r t l y m e t a p h o r i c a l : i t indicates t h a t the l o g i c a l selection c o n d i t i o n s present i n a fair l o t t e r y m u s t be s i m u l a t e d . N o e q u i p m e n t is r e q u i r e d a n d o n l y a very l i m i t e d f o r m o f l o t t e r y actually takes place. As i t applies i n p r o b a b i l i t y s a m p l i n g , the selection l o t t e r y has t o meet t w o general requirements. T h e first, a n d the m o r e b i n d i n g , states t h a t every case i n the target p o p u l a t i o n s h o u l d have a calculable a n d finite chance o f i n clusion i n the sample. T h i s implies t h a t n o case i n the target p o p u l a t i o n s h o u l d be c o m p l e t e l y excluded a n d n o case c a n be guaranteed i n c l u s i o n i n the sample i n advance o f selection. ' F i n i t e ' means here m o r e t h a n zero b u t less t h a n 1 : a n y case excluded f r o m the outset w o u l d have a zero p r o b a b i l i t y o f i n c l u s i o n i n the sample - because an event t h a t c a n n o t h a p p e n has a p r o b a b i l i t y o f zero - w h i l e a case w i t h a pre-set place i n the sample w o u l d have a p r o b a b i l i t y o f 1 , w h i c h i n p r o b a b i l i t y t h e o r y is the value associated w i t h inevitable events. Surprise is sometimes expressed t h a t the first p r i n ciple is n o t m o r e d e m a n d i n g . I t does n o t state, as i n t u i t i o n expects, t h a t every case be given a n equal chance i n the lottery. W h y this is unnecessary is covered below, i n the section o n stratified r a n d o m s a m p l i n g . (Sample designs i n w h i c h every case i n the p o p u l a t i o n has a n equal chance o f i n c l u s i o n i n the sample are, i n fact, a special case a n d k n o w n as epsem samples as a n abbrev i a t i o n o f the phrase 'equal p r o b a b i l i t y selection m e t h o d ' . ) T h e second p r i n c i p l e requires t h a t the selection o f any case o r g r o u p o f cases takes place independently o f the selection o f a n y other i n d i v i d u a l case or g r o u p o f cases. D r a w i n g case n u m b e r 128 i n the target p o p u l a t i o n f o r i n c l u s i o n i n the sample s h o u l d have n o bearing o n the chances o f n u m b e r 2 5 2 (or any other case) g e t t i n g selected subsequently. T h i s r e q u i r e m e n t is less stringent t h a n the first a n d p r o b a b i l i t y t h e o r y as i t applies t o social surveys is sufficiently r o b u s t t o a l l o w i t t o be cautiously v i o l a t e d i n t r i e d a n d tested w a y s .
66
Surveying the social world
T w o c o m p o n e n t s are r e q u i r e d t o i m p l e m e n t the selection lottery. T h e first is a s a m p l i n g f r a m e , w h i c h is a listing o f a l l the cases i n the relevant target p o p u l a t i o n . Sometimes suitable listings are available r e a d y - c o m p i l e d , i n other instances the c o m p i l a t i o n m u s t be carried o u t b y the researcher. O n e example o f a p r e - c o m p i l e d s a m p l i n g f r a m e w h i c h has o f t e n been used t o sample the 'general p u b l i c ' i n the U K is the Electoral Register, w h i c h is a list o f eligible voters listed b y house n u m b e r w i t h i n each street a n d w a r d o f a p a r l i a m e n t a r y constituency. I t is prepared b y local a u t h o r i t y officials f r o m returns s u b m i t t e d b y householders a n d is h e l d i n each t o w n h a l l . A second example is the P A F (Postcode Address File) w h i c h lists a l l the addresses i n the U n i t e d K i n g d o m b y postcode a n d is p r o d u c e d b y the Post O f f i c e . Since this is available i n d i g i t a l f o r m a t , i t has the advantage o f being computersearchable. Special variants o f the PAF cover p r i v a t e households a n d i n s t i t u t i o n s separately ( a l t h o u g h m o r e detailed listings o f business enterprises, specifying characteristics such as l o c a t i o n , n u m b e r o f employees a n d sector of activity, c a n be purchased f r o m business i n f o r m a t i o n brokers like D u n & Bradstreet L t d ) . O t h e r p r e - c o m p i l e d b u t m o r e specialized s a m p l i n g frames include the lists o f registered members c o n t r o l l e d by professional bodies, trade u n i o n s a n d enthusiast groups o f various k i n d s . T h e physical f o r m a t o f the s a m p l i n g frame listing is i n c i d e n t a l . I n m a n y cases i t c a n be a ' v i r t u a l ' list (thus the n u m b e r s 1 t o 10 c o u l d stand f o r the ten a d m i n i s t r a t i v e divisions o f a state o r r e g i o n w h e n listed i n alphabetical o r d e r ) . T h e k e y requirements f o r any s a m p l i n g frame are t h a t i t is c o m p r e hensive, accurate a n d u p t o date. H o w e v e r , despite the a p p a r e n t l y straightf o r w a r d character o f a s a m p l i n g f r a m e , i t needs t o be emphasized t h a t the task o f creating one f r o m scratch c a n be considerable f o r p a r t i c u l a r target p o p u l a t i o n s : consider, f o r example, the p r o b l e m o f g e t t i n g a n a t i o n a l sample o f discharged b a n k r u p t s o r m a n u f a c t u r i n g organizations w i t h exceptionally h i g h degrees o f w o r k f o r c e absenteeism. Sometimes the obstacles w i l l be insuperable, i n w h i c h case the research w i l l have t o be re-cast t o e m p l o y a different target p o p u l a t i o n , o r i t w i l l have t o be re-designed using a n o n survey m e t h o d o l o g y . T h e second r e q u i r e m e n t f o r i m p l e m e n t i n g the l o t t e r y is a procedure f o r d r a w i n g the cases f r o m the s a m p l i n g f r a m e . T h e procedure w h i c h corresponds m o s t closely t o p r o b a b i l i t y t h e o r y is t o use tables o f r a n d o m n u m bers. These are f r e q u e n t l y i n c l u d e d i n the back o f statistics texts a n d are s i m p l y collections o f r a n d o m digits w h e r e any d i g i t is as l i k e l y t o occur as any other a n d a l l c o m b i n a t i o n s o f digits are also equally likely. C o m p u t e r s can also generate r a n d o m n u m b e r s f o r these purposes a n d m o s t statistical packages include a convenient f a c i l i t y t h a t a l l o w s y o u t o specify a range w i t h i n w h i c h y o u require the n u m b e r s t o f a l l . T h e cases i n the listing need to be n u m b e r e d consecutively f r o m 1 u p t o the t o t a l i n the target p o p u l a t i o n . B o x 4 . 2 explains h o w t o use p r i n t e d r a n d o m n u m b e r tables. Systematic selection is a n alternative w a y t o d r a w cases f o r the sample. I f
Selecting samples
Box 4.2
67
Using random number tables
Tables of random numbers are often presented as blocks of digits w i t h intervening gaps t o assist identification, but there is no significance t o the number of digits included in the rows and columns. If the desired size of sample is (say) 550 cases, the task will be t o select 550 instances of three consecutive digits between 001 and 550 inclusive f r o m a numbered version of the sampling frame list. (You might actually need t o select some 'spare' cases t o cover contingencies such as refusals, failure t o contact, and errors in the sampling frame.) The consecutive digits do n o t all have t o be within a block of numbers. N o t e that the numbers assigned t o each case in the sampling frame should actually o r notionaily be padded w i t h leading zeros because within the random number tables, case 6 will be represented by the combination 006. So, in the example above, a three digit combination 044 in the table would be within range and the case assigned this number would be selected f o r the sample, but the combination 61 I would be ignored as out of range and you would move o n t o the next set of three digits. You should nominate a starting position randomly (that is, w i t h o u t knowing what the first combinations of digits are). Then you can w o r k through the random 'numbers' backwards o r forwards by page, up o r down blocks, including adjacent numbers o r leaving gaps in any way that takes your fancy (provided you proceed consistently and accept and reject mechanically).
y o u require a sample o f 5 0 0 cases f r o m a p o p u l a t i o n o f 1 0 , 0 0 0 , y o u w o u l d w i t h this m e t h o d s i m p l y select every t w e n t i e t h case f r o m the s a m p l i n g f r a m e . (The gap between selected cases is t e r m e d the s a m p l i n g f r a c t i o n o r s a m p l i n g i n t e r v a l : i f the size o f the p o p u l a t i o n is represented b y N a n d the size o f the sample is represented b y n, t h e n the s a m p l i n g i n t e r v a l k = N/n: i n the example cited, k = 10000/500 = 20). There are some a d d i t i o n a l considerations t h a t s u r r o u n d the use o f systematic selection. Y o u r i n i t i a l case s h o u l d be selected at r a n d o m (by, f o r example, using the r a n d o m n u m b e r generator o f a c o m p u t e r statistics package w i t h the range set t o the s a m p l i n g i n t e r v a l ) . I f y o u r sample is a large one (say, over 1 0 0 0 cases), y o u s h o u l d occasionally stop a n d m a k e a n e w start w i t h a case r a n d o m l y chosen i n the same m a n n e r as the first case. T h e reason for this is t h a t a s a m p l i n g f r a m e c o u l d e m b o d y a concealed p e r i o d o r cycle w i t h i n the l i s t i n g o f cases w h i c h coincides w i t h y o u r s a m p l i n g i n t e r v a l . Suppose, f o r e x a m p l e , y o u are c o n d u c t i n g a survey o f residents o n a h o u s i n g estate a n d are s a m p l i n g addresses. U n k n o w n t o y o u , every t e n t h address is associated w i t h a corner p l o t c o n t a i n i n g a m u c h larger house t h a n the others. A s a m p l i n g i n t e r v a l o f t e n c o u l d consistently catch every corner
68
Surveying the social world
house a n d the resulting sample w o u l d significantly exaggerate the average income a n d overall affluence o f households o n the estate. A r a n d o m starting p o i n t a n d p e r i o d i c re-starts are designed t o restrict the extent o f a n y sync h r o n i z a t i o n between the i n t e r v a l a n d the list. M o r e generally, whenever a p r e - c o m p i l e d s a m p l i n g frame is e m p l o y e d , i t is i m p o r t a n t t o k n o w w h a t p r i n c i p l e has governed the o r d e r i n g o f cases o n the list. A l p h a b e t i c a l order is usually ' n e u t r a l ' f o r m o s t research purposes (that is, u n l i k e l y t o be connected w i t h the key variables i n a s t u d y ) , b u t the i m p l i c a t i o n s o f c h r o n o l o g ical, geographical a n d ' e x o t i c ' o r d e r i n g criteria need t o be carefully examined. I f r a n d o m re-starts are n o t used, systematic selection breaches the second (independence) p r i n c i p l e o f the selection l o t t e r y i n t h a t the choice o f the first case effectively determines the i d e n t i t y o f a l l the rest o f the cases t h a t m a k e u p a p a r t i c u l a r sample. H o w e v e r , i t is o f t e n the m o s t convenient procedure t o i m p l e m e n t a n d p r o v i d e d t h a t the s a m p l i n g frame is ordered o n a n e u t r a l p r i n c i p l e a n d p e r i o d i c re-starts are e m p l o y e d , selection o f cases b y s a m p l i n g i n t e r v a l a p p r o x i m a t e s reasonably closely t o the r i g o u r o f f u l l r a n d o m select i o n . I n the sub-sections t h a t f o l l o w , p a r t i c u l a r p r o b a b i l i t y sample designs a n d refinements are e x a m i n e d i n sufficient detail t o assist the reader t o recognize the appropriateness o f each v a r i a n t t o different k i n d s o f research s i t u a t i o n . Details o f m o r e comprehensive treatments o f sample design are given i n the f u r t h e r reading section at the end o f the chapter.
Simple r a n d o m s a m p l i n g (SRS) I n this design, the cases t h a t w i l l m a k e u p the sample are chosen i n a single process o f selection f r o m the s a m p l i n g frame t h a t covers the entire target p o p u l a t i o n . I f the cases are n u m b e r e d f r o m 1 t o N i n advance, selection can be based o n r a n d o m n u m b e r tables. T h e question o f h o w t o determine the a p p r o p r i a t e sample size is dealt w i t h o n page 7 7 .
Stratified r a n d o m s a m p l i n g Stratification is a n i m p o r t a n t refinement o f the selection process w h i c h , o p e r a t i n g under suitable c o n d i t i o n s , c a n i m p r o v e u p o n the effectiveness o f SRS s a m p l i n g . I t requires the i d e n t i f i c a t i o n o f a ' s t r a t i f y i n g v a r i a b l e ' (occasionally several variables) o n the basis o f whose values the target p o p u l a t i o n can be d i v i d e d i n t o distinctive 'strata' o r g r o u p i n g s . A separate r a n d o m sample is then t a k e n f r o m each s t r a t u m . N o r m a l l y , i t has t o be possible t o determine the value o f every case i n the target p o p u l a t i o n o n the strati f y i n g variable(s) before selecting the sample. T h u s , s t r a t i f i c a t i o n requires the researcher t o acquire some i n f o r m a t i o n a b o u t the target p o p u l a t i o n under study i n advance o f data collection ( f o r instance, f r o m previous research) a n d i t exploits this a d d i t i o n a l i n f o r m a t i o n t o b u i l d a sample t h a t
Selecting samples
Box 4.3
69
A simple random sample of customers
A utility company decides t o survey its 16,400 private customers in one of its operating regions t o establish the effectiveness of its customer relations. It is decided that an SRS sample of roughly 2000 will be adequate. The comprehensive list of customer names for the region is extracted f r o m the computer database used f o r billing and the names are transferred t o a spreadsheet and allocated consecutive numbers starting f r o m 00001 and ending at 16400. A random sample of 2000 numbers falling in the range between I and 16400 inclusive is generated by the spreadsheet and the corresponding customers are selected. This sample of customers will provide a systematic basis f o r making generalizations about all the customers in the region. However, there will inevitably be variation between the very large number of different samples of 2000 cases that could be generated by this selection procedure. As a result, generalizations based on the sample data will be subject t o a probable degree of e r r o r that can be calculated. Conducting a SRS of all the customers offers no guarantee, as was made clear on page 64, that the resulting sample will be representative of particular types that exist in the target population. It is unlikely t o contain (for instance) exactly the same p r o p o r t i o n of female customers, new customers, small volume users, o r late payers as there are in the region, if it is important t o ensure that the sample reflects the proportions of cases w i t h key characteristics in the target population, o r that it contains a minimum number of such cases, refinements t o the sample design are needed. A n additional point w i t h large target populations is that paper listings will be bulky and difficult t o handle, so some f o r m of Virtual' computer listing will usually be preferable.
m i r r o r s the c o m p o s i t i o n o f the target p o p u l a t i o n o n the chosen characteristic(s). I n c h o o s i n g a s t r a t i f y i n g variable, the essential r e q u i r e m e n t is t h a t cases i n each ' s t r a t u m ' (that is, w i t h the same value of the s t r a t i f y i n g v a r i able) s h o u l d have similar relationships w i t h the dependent variable(s) o f interest i n the study, b u t t h a t as m u c h v a r i a b i l i t y as possible s h o u l d exist between strata a n d the dependent variable(s). T h e gains f r o m i n t r o d u c i n g s t r a t i f i c a t i o n increase t o the extent t h a t this r e q u i r e m e n t is m e t . Consider an example. I n an o r g a n i z a t i o n a l study o f j o b satisfaction (the dependent v a r i a b l e ) , job grade w o u l d be a n a p p r o p r i a t e s t r a t i f y i n g variable p r o v i d e d t h a t m e m b e r s h i p o f d i f f e r e n t j o b grades significantly affected levels of j o b satisfaction. (There m i g h t be evidence o n this i n the l i t e r a t u r e ) . I f the values o f j o b grade w e r e , ' M a n a g e m e n t ' , 'Technical a n d supervisory s t a f f , 'Shop f l o o r s t a f f , the target p o p u l a t i o n w o u l d i n this case be d i v i d e d i n t o
70
Surveying the social world
these three strata. Lists o f the members i n the g r o u p i n g s c o u l d be o b t a i n e d f r o m the Personnel D e p a r t m e n t a n d w o u l d constitute three s a m p l i n g frames f r o m each o f w h i c h cases w o u l d be chosen i n t u r n by a r a n d o m procedure. Some v a r i a t i o n between the levels o f j o b satisfaction o f i n d i v i d u a l s w i t h i n the shop floor s t r a t u m a n d w i t h i n the other strata w o u l d , o f course, r e m a i n - i f there was none left at a l l , there w o u l d be l i t t l e f o r the research t o reveal! T h e r e q u i r e m e n t f o r s t r a t i f i c a t i o n t o w o r k beneficially is s i m p l y t h a t there s h o u l d be less w i t h i n - s t r a t u m v a r i a t i o n t h a n b e t w e e n - s t r a t u m v a r i a t i o n . The next matter t h a t needs t o be settled is w h a t s a m p l i n g f r a c t i o n s h o u l d be used f o r each s t r a t u m . I n the simplest case, w h e r e the design is referred t o as p r o p o r t i o n a t e stratified s a m p l i n g (PSS), a u n i f o r m f r a c t i o n is used f o r all the strata. T h i s results i n a sample w h i c h is a m i r r o r o f the target p o p u l a t i o n w i t h respect t o the s t r a t i f y i n g variable. Each g r o u p i n g of the s t r a t i f y i n g v a r i able constitutes the same p r o p o r t i o n o f the sample as i t does o f the p o p u l a t i o n . T h e alternative, d i s p r o p o r t i o n a t e stratified s a m p l i n g (DSS), varies the sampling f r a c t i o n f o r different strata. There are several reasons f o r d o i n g this. O n e is s i m p l y t o increase the representation i n the final sample o f small strata w h i c h otherwise m i g h t c o n t r i b u t e o n l y a h a n d f u l o f cases f r o m w h i c h no sound inferences c o u l d be d r a w n . A n o t h e r reason is t o deliberately divert extra research resources t o strata k n o w n i n advance t o have h i g h l y variable relationships w i t h respect t o the key dependent variable, at the expense of strata k n o w n i n advance t o have relatively homogeneous relationships w i t h this variable. I n the research o n j o b satisfaction, f o r example, i t m i g h t be k n o w n f r o m a previous survey t h a t technical a n d supervisory staff were, as a g r o u p , characterized by especially variable a n d fluctuating levels o f j o b satisf a c t i o n i n c o m p a r i s o n t o shop floor staff where the levels were m o r e consistent between i n d i v i d u a l s a n d over t i m e . D i r e c t i n g extra sample cases t o the most variable s t r a t u m i n a sample design f r o m less variable strata can be an extremely effective w a y o f reducing o v e r a l l s a m p l i n g error. T h e DSS sample i n c o l u m n (4) o f Table 4 . 1 n o w provides a better basis f o r m a k i n g generalizations a b o u t M a n a g e m e n t a n d Technical a n d supervisory staff t h a n the SRS design i n c o l u m n (2) o r the PSS design i n c o l u m n (3) w i t h o u t using a larger sample. As i t stands, however, the c o m p o s i t i o n o f the DSS sample i n terms o f strata m i g h t n o w seem t o be i n danger o f g i v i n g a dist o r t e d picture o f the target p o p u l a t i o n as a w h o l e . T h e remedy is t o give each case i n the sample a n u m e r i c a l w e i g h t i n g (based o n the s a m p l i n g f r a c t i o n ) w h i c h returns b o t h over-sampled strata a n d under-sampled strata t o their true p r o p o r t i o n s i n the p o p u l a t i o n : this w e i g h t i n g can be used t h r o u g h o u t the data analysis stage. T h i s capacity t o r e - w e i g h t data explains w h y the l o t t e r y principles d o n o t need t o insist o n g i v i n g a l l the cases (and groups o f cases) i n the target p o p u l a t i o n equal chances o f i n c l u s i o n i n the sample. T h e choice between feeding w e i g h t e d or u n w e i g h t e d data i n t o analyses is a n o p t i o n p r o v i d e d i n m o s t c o m p u t e r survey packages.
71
Selecting samples
Box 4.4
A comparison of SRS, PSS and DSS
This box explores sampling designs for the organizational study of job satisfaction introduced on page 69. A study of job satisfaction seeks t o conduct interviews w i t h a cross-section of all the personnel in an organization w i t h the available resources keeping the limit t o 100 interviews. Column ( I ) in Table 4.1 below shows the size of each stratum and the target population. Column (2) indicates the situation using a simple random sample (SRS) w i t h a sampling fraction of I in 50. The question marks indicate that the strata play no role in the SRS selection process and indicate the uncertainty over exactly h o w many cases there will actually be in the sample f r o m each category. Column (3) shows the results of using a proportionate stratified sample (PSS) w i t h the same I in 50 sampling fraction as column (2) but selected f r o m each stratum separately. O n e obvious problem w i t h adopting a uniform sampling fraction in this instance is that it w o u l d provide t o o few cases f o r study f r o m the smaller strata. Column (4) shows the results f r o m a disproportionate stratified sample (DSS) that varies the sampling fraction for the different strata. The objective in this instance would be partly t o increase the numbers in the final sample from both the management and technical staff strata, but also t o deliberately over-sample technical staff where job satisfaction is likely t o be highly variable. The shop f l o o r staff are 'under-sampled* in comparison t o their proportion in the population while the other t w o groups are 'over-sampled' at the expense of the former. Provided the assumptions about variations in job satisfaction are correct, the sampling e r r o r for (4) should be less than that for either (2) o r (3) despite the sample size remaining constant.
Table 4.1
C o m p a r i s o n o f SRS, PSS a n d DSS desig;ns
Stratum
(V Population
Management Technical Shop floor Total
250 500 4250 5000
N
(2) SRS ?
p p 100
1:50
(3) PSS 1:50
(4) DSS
5 10 85 100
1:8.3 1:12.5 1:141.6 1:50
proportion
n 30 40 30 100
Part of the i m p o r t a n c e o f s t r a t i f i c a t i o n i n p r o b a b i l i t y s a m p l i n g is t h a t i t is the p r i n c i p a l means by w h i c h the researcher can engineer w h a t is p o p u l a r l y t h o u g h t o f as a 'representative' sample. Unless there is s t r a t i f i c a t i o n by gender, h o u s e h o l d size, or average rent etc., t h e n there is n o c o n t r o l i n
72
Surveying the social world
p r o b a b i l i t y s a m p l i n g o f the m a n n e r i n w h i c h specific characteristics a n d attributes o f cases appear i n the sample. I t f o l l o w s t h a t f o r p r a c t i c a l reasons representativeness is s o m e t h i n g t h a t can o n l y be achieved f o r a small n u m b e r o f specific characteristics a n d n o t g l o b a l l y f o r the sample as a w h o l e .
Box 4.5
Combining stratification and systematic selection
A straightforward way t o implement stratification in conjunction w i t h systematic selection is t o s o r t the cases in the sampling frame into order by ascending o r descending value of the stratifying variable (or variables). The table below shows an extract f r o m a sampling frame adapted f r o m the U K Post Office's Postcode Address File. Sampling unit
% owner occupiers
Region I Postal sector 1356 Postal sector 1456 Postal sector 1567 etc . . . Region 2 Postal sector 2345 Postal sector 2456 Postal sector 2567 etc . . .
64 60 56
48 47 47
Postal sectors are areas made up of adjacent postal codes, in order t o achieve national coverage, this sample is stratifying by region. In order t o achieve a range of socio-economic backgrounds, it is also stratifying by percentage of o w n e r occupier households, a figure available f o r small areas f r o m census data. Provided a suitable sampling interval is chosen, selection of sectors f r o m each region and a range of socio-economic backgrounds can be guaranteed. This box can be read in conjunction w i t h Box 4.6.
Multi-stage sampling A second m a j o r refinement o f SRS designs entails the p o s s i b i l i t y o f c o n d u c t i n g several stages o f selection i n sequence. A n example w i l l s i m p l i f y the e x p l a n a t i o n . A research project m a y necessitate data c o l l e c t i o n f r o m a very large, perhaps n a t i o n a l , target p o p u l a t i o n . I t m a y w e l l be the case t h a t n o n a t i o n a l s a m p l i n g f r a m e exists (and even i f one d i d , i t w o u l d be extremely t i m e - c o n s u m i n g t o set u p an SRS design i n c o n j u n c t i o n w i t h i t ) . I t is also
Selecting samples
73
possible, however, t h a t adequate s a m p l i n g frames m a y be available f o r localities. Large scale p o p u l a t i o n s are i n v a r i a b l y organized i n t o a v a r i e t y o f hierarchical u n i t s . I n the case o f a n a t i o n state l i k e the U n i t e d K i n g d o m , one set o f hierarchical units c o u l d be administrative region, parliamentary constituency, borough, ward, street address, a n d household. T h e units are hierarchical i n as m u c h as l o w e r units are 'nested' w i t h i n higher ones: every street address 'belongs t o ' a w a r d , a n d every w a r d fits i n t o a b o r o u g h , a n d so o n . Sampling procedures are able t o m a k e use o f this arrangement by m o v i n g d o w n the hierarchy m a k i n g selections f r o m each u n i t i n t u r n u n t i l they are able t o e x p l o i t existing s a m p l i n g frames o r i t is feasible t o create t h e m . I n the example given, s t a r t i n g at the ' t o p ' o f the system, the p r i m a r y s a m p l i n g u n i t or PSU w o u l d be the a d m i n i s t r a t i v e r e g i o n . A sub-set o f regions w o u l d be chosen using r a n d o m procedures a n d the p a r l i a m e n t a r y constituencies they c o n t a i n e d w o u l d be listed t o m a k e u p the s a m p l i n g frame f o r the second stage (so t h a t the secondary s a m p l i n g u n i t o r SSU w o u l d be the constituency). R a n d o m selection w o u l d again take place a n d the b o r oughs i n the chosen constituencies w o u l d be listed t o f o r m the s a m p l i n g f r a m e f o r the t h i r d stage, a n d so o n . A n i m p o r t a n t ' e c o n o m y ' i n the p r o cedure is t h a t o n l y selected units are passed o n f r o m each stage r e d u c i n g the size o f the n e x t s a m p l i n g f r a m e t h a t has t o be constructed. There is n o theoretical l i m i t t o the n u m b e r o f stages i n a design a l t h o u g h each process o f selection adds c u m u l a t i v e l y t o the o v e r a l l s a m p l i n g error. I t is possible t o c o m b i n e multi-stage s a m p l i n g w i t h s t r a t i f i c a t i o n a n d other refinements t o produce sophisticated designs.
Box 4.6 The Family Expenditure Survey ( F E S ) - a complex national sampling design The FES is an annual sample survey of private households' spending and saving that has been conducted in the U K by the Office of Population Census and Surveys (OPCS) and the Office of National Statistics (ONS) since 1957. The achieved sample size is about 7000 and respondents complete expenditure diaries as well as taking part in interviews. The key features f r o m a sampling viewpoint are as follows: • the design is a two-stage, stratified and clustered, random sample; • the sampling frame is the Post Office's small users' Postcode Address File; • there are various exclusions including offshore islands (owing t o the expense of collection), members of the US armed forces, Roman Catholic priests living in parish accommodation, and households containing members of the diplomatic service of other countries - though non-British households are not generally excluded: N o r t h e r n Ireland is covered but the sampling arrangements differ f r o m those described here;
74
Surveying the social world
• the PSU is a postal sector - ward-sized areas which provide the clustering element in the design: 672 sectors are selected at stage I , 10,000 addresses are selected at stage t w o ; • sectors are stratified by ( I ) Government Office region t o give a geographical spread, (2) whether an area is officially classified as urban o r not t o cover urban-rural differences, (3) the p r o p o r t i o n of owner occupiers and the p r o p o r t i o n of renters according t o the last census t o ensure a socio-economic spread. For further details, see O N S (annually) Family Spending. The URL is: http://www.statistics.gov.uk/products/
Cluster samples are a specialized a d a p t a t i o n o f multi-stage designs. A m a j o r c o m p o n e n t o f the expense o f surveys based o n h o u s e h o l d or w o r k based interviews is the t r a v e l costs i n c u r r e d c o n v e y i n g interviewers t o respondents, especially where the target p o p u l a t i o n is w i d e l y d i s t r i b u t e d geographically. A procedure f o r r e d u c i n g costs by c o n c e n t r a t i n g the data collection o p e r a t i o n is t o divide the area covered b y the target p o p u l a t i o n i n t o a n u m b e r o f clusters o f adjacent cases, perhaps circumscribed by d i s t r i c t or other geographical boundaries f o r w h i c h s a m p l i n g frames exist. The list o f clusters is t h e n sampled r a n d o m l y a n d the chosen clusters are translated i n t o the final s a m p l i n g u n i t , usually addresses. A l l the cases i n chosen clusters (or, at a m i n i m u m , a substantial p r o p o r t i o n o f t h e m ) are i n c l u d e d i n the sample. W i t h the addresses o f selected i n d i v i d u a l s or households b u n c h e d together, field w o r k e r s need t o t r a v e l t o fewer disparate locations a n d can c o n d u c t m o r e interviews per t r i p . T h e logic o f cluster s a m p l i n g means t h a t it w o r k s best w h e r e each cluster is as heterogeneous as possible. Ideally, each cluster w o u l d a p p r o a c h the diversity o f the target p o p u l a t i o n as a w h o l e a n d i n this respect desirable characteristics f o r a cluster are the opposite o f those i n s t r a t i f i c a t i o n w h e r e there is a p r e m i u m o n h o m o g e n e i t y w i t h i n a s t r a t u m . Clustering is an example o f a refinement t h a t is designed p r i m a r i l y t o reduce costs i n a w a y t h a t m i n i m i z e s the i m p a c t o n s a m p l i n g errors, b u t i t w i l l still n o r m a l l y be the case t h a t s a m p l i n g e r r o r is higher i n a cluster sample t h a n i n a SRS o f the same size. A p o t e n t i a l p r o b l e m w i t h multi-stage s a m p l i n g is t h a t i t can lead t o d i f f i culties p r o d u c i n g a sample o f the desired size o w i n g t o p o t e n t i a l differences i n the size o f the PSUs. A useful device t h a t is c o m m o n l y e m p l o y e d t o deal w i t h this is t o arrange f o r the selection o f PSUs w i t h p r o b a b i l i t y p r o p o r t i o n a l t o size (PPS). T h e chance o f selection f o r each PSU is adjusted so t h a t i t is p r o p o r t i o n a l t o the n u m b e r o f cases i n the target p o p u l a t i o n t h a t each PSU contains. F u r t h e r details are given i n B o x 4.7.
Selecting samples
75
Box 4.7 Implementing probability proportional to size for a multi-stage design A convenient way t o apply PPS is t o construct a table f o r all the PSUs in the target population w i t h the cumulative number of cases they represent, in this example, the PSUs are nine regions w i t h the following populations:
Region 1 Region Region Region Region Region
Cumulative size
Size
PSU
2 3 4 5 6
Region 7 Region 8 Region 9
1 100 000 250 000
3 1 1 1
980 190 490 600 300 000 090
000 000 000 000 000 000 000
1 1 2 2 3 6
100 350 330 520 010 610
000 000 000 000 000 000
7 910 000 8 910 000 10 000 000
Assuming random number tables are t o be used and four regions need t o be selected for stage 2, the task becomes one of drawing four lots of four digits ranged between 0001 and 1000 (the final four zeros in the cumulative size column can be ignored). Any set of four digits drawn f r o m the tables between 0001 and 01 10 inclusive will select Region I , between 0111 and 0135 inclusive will select Region 2, between 0136 and 0233 inclusive will select Region 3, etc., etc. Region 3 is represented by 98 sets of digits, as against the 49 sets available for Region 5 which has half its population. Thus the number of sets of digits that correspond t o a region (and therefore the chances of any region being selected) is proportional t o the size of its population. N o t e that care needs t o be taken t o ensure that the selection units and procedures employed in subsequent stages do not 'undo' the p r o p o r t i o n ality created in stage I .
Accuracy, precision and confidence intervals L i t t l e m e n t i o n has so far been made o f sample size. T h e question o f h o w large a p a r t i c u l a r sample needs t o be cannot always y i e l d a simple direct answer. To be clear a b o u t the issues s u r r o u n d i n g size, i t is necessary t o r e t u r n again t o the p r o b a b i l i t y basis o f s a m p l i n g a n d , as a p r e l i m i n a r y , t o i n t r o d u c e
76
Surveying the social world
the d i s t i n c t i o n between accuracy a n d p r e c i s i o n . As w e have seen, r a n d o m selection procedures generate samples the data f r o m w h i c h can be used t o estimate the value o f selected p o p u l a t i o n characteristics. I t is n o t possible, however, t o establish exactly the accuracy o f an estimate, t h a t is, h o w closely a specific estimate based o n an executed sample coincides w i t h the true p o p u l a t i o n value. W h a t the statistics o f p r o b a b i l i t y s a m p l i n g d o instead is t o m a k e i t possible t o calculate the precision o f a n estimate. Precision indicates h o w closely the estimates derived f r o m a l l the samples o f a given size a n d design t h a t c o u l d possibly be selected f r o m the target p o p u l a t i o n cluster a r o u n d the p o p u l a t i o n value being p r e d i c t e d . Precision is measured by the f a m i l y o f s t a n d a r d e r r o r statistics, one o f w h i c h exists f o r every i n d i v i d u a l estimator (thus, there is a s t a n d a r d e r r o r o f the m e a n , a s t a n d a r d e r r o r o f a p r o p o r t i o n , etc.). C a l c u l a t i o n s o f precision take the f o r m o f confidence intervals. T h e interval element is a range o f values centred o n the sample estimate w i t h i n w h i c h the p o p u l a t i o n value is p r e d i c t e d t o f a l l : the confidence element refers t o a level o f c e r t a i n t y attached t o the p r e d i c t i o n , c o n v e n t i o n a l l y 95 per cent o r 99 per cent. I f the researcher wishes t o be 9 9 per cent co nfident i n the p r e d i c t i o n , the range o f values the i n t e r v a l covers w i l l be larger (and therefore the estimate w i l l be less precise) t h a n i f he o r she settles f o r the 95 per cent level. Several o f the t e x t b o o k s cited at the end o f the chapter set o u t the statistics f o r c a l c u l a t i n g confidence intervals i n d e t a i l .
Sample size and sampling error I t has been n o t e d (page 64) t h a t s a m p l i n g e r r o r was a measure o f the overall v a r i a b i l i t y between every possible sample o f a p a r t i c u l a r size a n d design t h a t c o u l d be selected f r o m a target p o p u l a t i o n . T h e greater the s a m p l i n g e r r o r associated w i t h a sample, the l o w e r the precision o f the estimates p r o duced f r o m i t . Increasing the sample size represents one w a y o f r e d u c i n g s a m p l i n g e r r o r a n d i m p r o v i n g precision, b u t i t is rather inefficient because s a m p l i n g e r r o r varies w i t h the square o f sample size ( i n SRS samples). I n other w o r d s , i n o r d e r t o halve the level o f s a m p l i n g error, the sample size m u s t be increased f o u r times. M o d i f i c a t i o n o f the o v e r a l l research design and/or refinement o f the sample design ( t h r o u g h , f o r e x a m p l e , stratification) may be preferable t o c o n d u c t i n g a large a n d p o t e n t i a l l y expensive s a m p l i n g and data g a t h e r i n g o p e r a t i o n . I t is possible t o o b t a i n a concrete i n d i c a t i o n o f the scale o f s a m p l i n g e r r o r for a p a r t i c u l a r type o f sample design. Consider a d i c h o t o m o u s (yes/no) v a r i able such as 'households w h o have spent a h o l i d a y a b r o a d w i t h i n the last five years' w h i c h available evidence m i g h t suggest is split a b o u t 5 0 - 5 0 i n the p o p u l a t i o n , the w o r s t case f r o m a s a m p l i n g v i e w p o i n t . T h e s a m p l i n g e r r o r for this a t t r i b u t e w o u l d be just over ± 3 per cent i n SRS samples o f 1000 at the 95 per cent level o f confidence. T h i s means t h a t i f there were 52 per cent
Selecting samples
77
i n the sample w h o h a d t a k e n f o r e i g n h o l i d a y s , w e c o u l d be 95 per cent sure t h a t there were between 49 per cent a n d 55 per cent i n the p o p u l a t i o n . I f the sample size c o u l d be increased t o 2 5 0 0 , the s a m p l i n g error w o u l d f a l l t o 2 per cent a n d the confidence i n t e r v a l w o u l d s h r i n k t o between 50 per cent and 54 per cent. H o w can the a p p r o p r i a t e size f o r a sample be determined? I f a project has as a key objective the e s t i m a t i o n o f the value o f a p a r t i c u l a r p o p u l a t i o n parameter w i t h a p a r t i c u l a r level o f precision a n d confidence (say, f o r example, the average h o u s e h o l d income i n a p a r t i c u l a r target p o p u l a t i o n plus o r m i n u s £ 1 0 at 95 per cent confidence), t h e n i t is relatively straightf o r w a r d i n SRS samples t o w o r k o u t exactly h o w large the sample needs t o be t o p r o d u c e this (see, f o r e x a m p l e , the calculations i n K a l t o n 1966, p p . 2 4 - 5 , o r M o s e r a n d K a l t o n 1 9 7 1 , section 7.1). I n m a n y cases (as above w i t h f o r e i g n h o l i d a y s ) , such calculations require a p r e l i m i n a r y estimate o f the v a r i a b i l i t y o f the key parameter w i t h i n the p o p u l a t i o n . H o w e v e r , because m o s t projects have diffuse objectives, the a p p r o p r i a t e level o f precision t o set and the variables t o p r i o r i t i z e are n o t always self-evident. Some analysis w i l l i n v a r i a b l y be based o n selected sub-groups w h e r e the numbers w i l l be smaller a n d the s a m p l i n g errors w i l l be higher t h a n w h e n the sample as a w h o l e is under c o n s i d e r a t i o n . T h e c o n v e n t i o n a l strategy is t o err o n the side of c a u t i o n a n d base s a m p l i n g e r r o r considerations o n the least f a v o u r a b l e variables, those l i k e l y t o have the highest v a r i a b i l i t y i n the target p o p u l a t i o n . A r u l e o f t h u m b also sometimes offered is n o t t o p e r m i t the size o f any subg r o u p w h i c h w i l l be the basis o f analysis t o f a l l b e l o w 50.
Other types of error that affect surveys A d i s t i n c t i o n is o f t e n d r a w n between the d i f f e r e n t types o f inaccuracy or error t h a t can affect surveys depending o n their source because w h a t needs to be done a b o u t t h e m varies. As t h e i r name i m p l i e s , selection errors o r i g i nate i n the selection process itself w h i l e non-selection errors are the residual category w h i c h come f r o m anywhere else i n survey m e t h o d o l o g y (and w h i c h w i l l n o t be discussed f u r t h e r here). T h e m o s t serious types o f selection a n d non-selection errors are listed i n Table 4.2 together w i t h l i k e l y responses t o t h e m ( t h o u g h the o p t i m u m response depends o n the p o i n t at w h i c h a p r o b lem comes t o l i g h t ) . Sampling e r r o r was discussed i n the last section. C o n s i d e r i n g the other types o f selection e r r o r i n Table 4 . 2 , a degree o f s a m p l i n g f r a m e inaccuracy may be inevitable a n d w h e r e a list is c o m p i l e d b y an external agency, a researcher m a y have n o o p t i o n other t h a n t o accept its l i m i t a t i o n s a n d take t h e m i n t o account i n the research design. T h e incorrect i n c l u s i o n o f cases w h i c h are n o t p r o p e r l y a genuine p a r t o f the target p o p u l a t i o n m a y subseq u e n t l y come t o l i g h t a n d be self-correcting (for instance, o n contact w i t h
78
Surveying the social world
T a b l e 4.2
Types o f e r r o r i n surveys
Selection problems Sampling error too large Sampling frame flaws Non-response
Possible responses Increase sample size, refine sample design Checks to establish extent of problems Reminders (postal surveys) or recalls (Telephone and household interviews)
Non-selection problems Use of incorrect or biased estimator Interviewer mistakes
Possible responses Consult statistician Simplify interview schedule, re-instruct interviewers Use computer-verified data entry; revise coding schemes
Coding errors
respondents), whereas the o m i s s i o n f r o m the f r a m e o f cases w h i c h s h o u l d have been i n c l u d e d is m o r e serious because i t is less l i k e l y t o be discovered a u t o m a t i c a l l y . Non-response is a f u n d a m e n t a l p r o b l e m t h a t affects m o s t surveys t o a greater o r lesser extent. I t refers t o the f a i l u r e o f research efforts t o gather data f r o m a l l the cases t h a t genuinely b e l o n g i n the sample. Reasons f o r non-response include the refusal o r i n a b i l i t y o f respondents t o participate a n d cases w h i c h t u r n o u t t o be uncontactable ( o n account o f the death o r r e l o c a t i o n o f i n d i v i d u a l s , their change o f status, or the closure o f businesses, etc). I f non-response reaches h i g h levels, i t can threaten the statistical v a l i d i t y o f survey findings. I t is q u i t e distinct f r o m s a m p l i n g e r r o r : by d e f i n i t i o n , s a m p l i n g e r r o r is a r a n d o m v a r i a t i o n between possible samples, b u t non-response is h i g h l y u n l i k e l y t o be r a n d o m . Respondent refusals t o participate i n surveys, f o r example, are l i k e l y t o come d i s p r o p o r t i o n a t e l y f r o m certain social g r o u p s . These include those w h o have u n o r t h o d o x views o n the topics o f the research (or w h o s i m p l y believe t h e i r views t o be u n o r t h o d o x ) a n d w h o can be especially r e l u c t a n t t o reveal t h e m . I n d i v i d u a l s w h o are socially excluded o r have c o n f l i c t u a l relationships w i t h agencies o f social c o n t r o l are especially l i k e l y t o refuse irrespective o f the nature o f the research i n q u e s t i o n . T h i s element o f self-selection means t h a t the responders/achieved cases c a n n o t be t a k e n as representative o f the n o n - r e s p o n ders/non-achieved cases. I n m u c h the same w a y , i n d i v i d u a l s w h o p r o v e d i f f i c u l t t o contact w i l l possibly have occupations a n d life-styles substant i a l l y different t o those o f respondents. T h e classic remedy f o r dealing w i t h the n o n - c o n t a c t element o f n o n response i n p o s t a l questionnaires is postal (or telephone) reminders. I n h o u s e h o l d i n t e r v i e w i n g , the procedures can require a fixed n u m b e r o f callbacks t o an address at different times t o the o r i g i n a l visit. Refusals can be dealt w i t h by a v a r i e t y o f methods i n c l u d i n g incentive payments o r other
Selecting samples
79
rewards a n d careful p r i o r a t t e n t i o n t o t h e design o f c o v e r i n g letters a n d p r e p a r a t o r y i n f o r m a t i o n . A n o t h e r t a c k t h a t can be a d o p t e d w h e n persuasion t o p a r t i c i p a t e has failed is t o t r y t o get at least one piece o f n o n contentious i n f o r m a t i o n f r o m refusers (such as age) so t h a t i t is possible t o c o m p a r e their p r o f i l e w i t h t h a t o f respondents o n a variable c o m m o n t o both. N o n - s e l e c t i o n errors are i n c l u d e d i n Table 4 . 2 f o r completeness a n d are discussed i n the sections o n data c o l l e c t i o n a n d c o d i n g . T h e v a r i e t y o f sources o f e r r o r discussed above underlines the reluctance o f experienced survey researchers t o rely w h o l l y o n large samples t o deliver h i g h precision estimates since increasing sample size reduces o n l y s a m p l i n g e r r o r b u t does n o t deal w i t h the o t h e r sources.
Box 4 . 8
T h e Travel Survey: sample design
The Travel Survey addressed t w o different target populations o f commuters, students and staff. The student target population was restricted t o those in the second o r subsequent year of their courses since a very high p r o p o r t i o n o f first years lived in hails o f residence on the campus itself. Only staff working on the main campus were included. It was decided t o use a DSS approach: the sampling fraction for most of the staff was set t o 1:4 as against 1:5 f o r students because staff commuting was a more critical problem. Staff f r o m three departments scheduled t o move t o a new campus were oversampled w i t h a 1:3 fraction.The designated target populations were 4763 staff and 7995 students. The designated sample size was 1220 staff and 1998 students. The achieved sample sizes were 590 staff (48%) and 282 students (14%).
Sampling strategies: non-probability sampling N o n - p r o b a b i l i t y selection methods d o n o t i m p l e m e n t a r a n d o m selection lottery. T h e y c a n n o t therefore m a k e use o f inferences f r o m p r o b a b i l i t y t h e o r y a n d , i n consequence, they d o n o t p r o v i d e equivalent guarantees o f precision t o the procedures discussed i n the previous sections. T h e y nevertheless m a y have a specialist role t o p l a y at p a r t i c u l a r stages o f the survey process. Convenience s a m p l i n g A convenience sample, as the name implies, is based o n a selection o f cases w h i c h are easily accessible t o the researcher f o r the e x p e n d i t u r e o f relatively
80
Surveying the social world
little e f f o r t . Examples o f h i g h accessibility are households located i n neighb o u r h o o d s close t o the researcher's residence, o r students i n the researchers' o w n classes. T h e element o f deliberate selection b y the researcher a n d the fact o f his o r her association w i t h the chosen cases seriously compromises these types o f selection. Even w h e r e the cases are neither l o c a l n o r personally k n o w n t o the researcher, the 'convenience' o f selecting t h e m m a y be c o n nected t o the fact t h a t they are celebrated o r long-established instances o f their class a n d , i n these respects, a t y p i c a l o f the target p o p u l a t i o n as a w h o l e . T h e use o f convenience samples s h o u l d p r o p e r l y be restricted t o feasibility studies a n d p i l o t research. Even here, their u t i l i t y is p r o b l e m a t i c unless they are made u p o f cases w i t h s i m i l a r attributes t o the target p o p u l a t i o n . I f n o t , the i n f o r m a t i o n they p r o d u c e w i l l be o f little use even t o test o u t the suita b i l i t y o f survey arrangements o r i n s t r u m e n t a t i o n .
Snowball sampling I n this v a r i a n t , the researcher relies o n each case t o s u p p l y details o f the locat i o n o f f u r t h e r cases, so t h a t the sample g r o w s steadily i n extent (metap h o r i c a l l y , like a s n o w b a l l r o l l e d a l o n g the s n o w y g r o u n d ) . I t is a p p r o p r i a t e i n s o m e w h a t specialized circumstances w h i c h m a y be s u m m a r i z e d as follows: • n o s a m p l i n g frame exists; • cases are rare a n d are geographically w i d e l y d i s t r i b u t e d ; • cases are l i k e l y t o k n o w o f each other; • cases are w i l l i n g t o s u p p l y i n f o r m a t i o n a b o u t each other. Circumstances i n w h i c h s n o w b a l l s a m p l i n g m i g h t p r o v e useful are where there is a need t o gather together a c o l l e c t i o n o f organizations o f f e r i n g a very n e w service o r p r o d u c t (and w h i c h are l i k e l y t o be aware o f the c o m p e t i t o r s ) , o r patients (or possibly relatives o f patients) suffering f r o m rare medical c o n d i t i o n s w h o m a y be i n contact w i t h f e l l o w sufferers. I n situations where the c o n d i t i o n o r characteristic is socially undesirable, however, referrals m a y n o t be f o r t h c o m i n g . S n o w b a l l samples suffer f r o m the same m a i n l i m i t a t i o n s as convenience samples a n d their use is generally l i m i t e d t o e x p l o r a t o r y studies.
Purposive s a m p l i n g Purposive s a m p l i n g is e m p l o y e d m a i n l y i n e x p l o r a t o r y a n d i n q u a l i t a t i v e research. T h e logic o f this k i n d o f selection is n o t based o n typicality but o n l o c a t i n g cases w i t h attributes o f p a r t i c u l a r interest t o the researcher. A n i m p l e m e n t a t i o n o f p u r p o s i v e s a m p l i n g is c o n t a i n e d w i t h i n t h e ' g r o u n d e d t h e o r y ' a p p r o a c h discussed i n Glaser a n d Strauss ( 1 9 6 7 ) , Strauss (1987) a n d Strauss a n d C o r b i n ( 1 9 9 3 ) . Together w i t h some o f the other purposive
Selecting samples
81
selection procedures, g r o u n d e d t h e o r y is preoccupied w i t h the creation o f e x p l a n a t o r y categories a n d , t h r o u g h t h e m , w i t h b u i l d i n g theoretical systems, rather t h a n w i t h d e m o n s t r a t i n g t h a t cases are representative o f their e m p i r i c a l p o p u l a t i o n s . I n order t o construct such categories, the researcher seeks a c o l l e c t i o n o f p a r a d i g m a t i c o r ' i d e a l ' instances, extreme examples, recent o r o l d instances, instances w h e r e x occurs w i t h y or i n the absence o f £, etc. F r o m the v i e w p o i n t o f selection, the t w o key elements i n g r o u n d e d t h e o r y are (i) theoretical s a m p l i n g , ' . . . w h e r e b y the analyst decides o n analytic g r o u n d s w h a t data t o collect n e x t a n d w h e r e t o find t h e m ' (Strauss 1987, 38); a n d (ii) s a m p l i n g t o s a t u r a t i o n , w h e r e data f r o m cases w i t h the desired attributes are collected at a research site u p t o the p o i n t at w h i c h n o n e w insights o r f u r t h e r i n f o r m a t i o n is uncovered. Such an a p p r o a c h a l l o w s data gathered early t o be analysed i n t i m e t o influence subsequent data select i o n a n d gathering strategies. The objectives o f purposive s a m p l i n g are r a d i c a l l y d i f f e r e n t f r o m those o f p r o b a b i l i t y s a m p l i n g a n d i t c a n n o t be j u d g e d by the same c r i t e r i a . Clearly, however, l i k e convenience a n d s n o w b a l l techniques, i t does n o t a l l o w the c a l c u l a t i o n o f levels o f precision.
Q u o t a sampling Q u o t a s a m p l i n g is w i d e l y used i n m a r k e t research a n d o p i n i o n p o l l i n g i n circumstances w h e r e p r o b a b i l i t y s a m p l i n g w o u l d also be a p p r o p r i a t e . I t requires researchers t o be able t o estimate i n advance h o w key variables (usually d e m o g r a p h i c attributes l i k e age a n d sex) are d i s t r i b u t e d i n the target p o p u l a t i o n . F r o m this i n f o r m a t i o n , quotas o f i n t e r l o c k i n g attributes t h a t respondents m u s t satisfy are devised a n d given t o interviewers. T h e a c c u m u lated totals o f a l l the quotas reflect the p r o p o r t i o n s o f the characteristics i n the p o p u l a t i o n . T h e interviewers t h e n have some d i s c r e t i o n a b o u t finding suitable respondents t o f u l f i l the q u o t a w i t h i n their allocated n e i g h b o u r hoods. Some assignments a l l o w f o r street i n t e r v i e w i n g , others f o r h o u s e h o l d i n t e r v i e w i n g only. A n example q u o t a t h a t c o u l d be assigned t o an interviewer is given i n Table 4.3. I n this example, the i n t e r v i e w e r needs t o find ( a m o n g others) three w o m e n w h o are i n the 4 5 - 6 4 age bracket a n d are a l l i n the l o w e r o c c u p a t i o n a l class. I f h o u s e h o l d interviews were being c o n d u c t e d , some restrictions l i k e one person per h o u s e h o l d , n o adjacent addresses, m i g h t be a p p l i e d . A q u o t a sample can be regarded as an a t t e m p t t o c o m b i n e the advantages o f s t r a t i f i c a t i o n ( i n t r o d u c e d v i a the i n t e r l o c k i n g characteristics) w i t h a degree o f clustering (the result o f each i n t e r v i e w e r o p e r a t i n g w i t h i n a part i c u l a r l o c a t i o n or n e i g h b o u r h o o d ) . T h e i r a t t r a c t i o n t o c o m m e r c i a l organizations is t h a t they are relatively easy t o set u p q u i c k l y w i t h the clustering o f f e r i n g savings o n overheads l i k e t r a v e l a n d subsistence. C o m p a r i s o n s o f p r o b a b i l i t y a n d q u o t a samples suggest t h a t the latter can, i n knowledgeable
82
Surveying the social world
T a b l e 4.3
I n t e r l o c k i n g sex, age a n d o c c u p a t i o n a l class characteristics o f
respondents Class
Age
Totals
20-29 30-34 45-64 65+
Lower
Totals
Prof/Managerial
Intermediate
Male
Female
Male
Female
Male
Female
1
_
—
-
1
-1
1 1 1
1 3 2 1 7
1 1 3 2 7
--
-
1
1
1
-3
4 6 7 3 20
hands, offer equivalent accuracy despite the fact t h a t the interviewer's greater d i s c r e t i o n i n q u o t a s a m p l i n g is a n a d d i t i o n a l p o t e n t i a l source o f errors. H o w e v e r , q u o t a samples d o n o t p e r m i t s a m p l i n g errors t o be calculated i n the same w a y they are f o r p r o b a b i l i t y samples a n d , o v e r a l l , the technique is better suited t o t e a m research c o n d u c t e d b y experienced practitioners t h a n i t is t o solo o r novice surveyors.
Selecting samples
83
Further reading Chapters 5 and 6 of Moser and Kalton (1971) Survey Methods in Social Investigation (2nd edn) are written at an introductory level and include only the minimum of statistical theory. Kalton's (1983) Introduction to Survey Sampling (out of print but still available in academic libraries) offers a compact, intermediate level, treatment. A more recent alternative to Kalton is Barnett (1991). Kish (1965) offers an advanced theoretical handbook on sampling principles.
Çs^)
Collecting your data
Doing it yourself I n large-scale surveys, data c o l l e c t i o n is t y p i c a l l y c o n t r a c t e d o u t t o an agency w h i c h employs h i r e d hands. T h e y c o n d u c t the i n t e r v i e w s , i f there are any. T h e y code the responses a n d enter t h e m i n t o the computer. I n contrast t o this supposed d r u d g e r y , the creative w o r k o f design a n d analysis is done by the researchers. T h i s b o o k is addressed t o people w h o are collecting the data themselves, either i n d i v i d u a l l y or as a member o f a small research t e a m . D o i n g i t y o u r self has a n u m b e r o f advantages, p a r t i c u l a r l y f o r i n t e r v i e w i n g , as Saunders ( 1 9 9 0 : 383) argues i n his survey o f h o m e o w n e r s .
Collecting your data
85
First, d o i n g i t yourself gives y o u a far better 'feel' f o r the data t h a n i f a h i r e d i n t e r v i e w e r h a d s i m p l y delivered the findings t o y o u . Y o u w i l l k n o w n o t o n l y w h a t was said b u t h o w i t was said. Y o u w i l l have an insight i n t o w h a t areas respondents f o u n d sensitive, a n d w h y this was so. Y o u w i l l also be better able t o judge w h i c h items were the m o s t salient t o respondents. Second, h i r e d interviewers are n o t necessarily interested i n the research. W h y s h o u l d they be, especially i f w e have defined t h e m as the muscle a n d ourselves as the brains? T h e p a y is poor, a n d i t is p i e c e w o r k - so the quicker they can get t h r o u g h an i n t e r v i e w the better i t w i l l be f o r t h e m . To m a k e the i n t e r v i e w go s m o o t h l y , they m a y say things w h i c h the researchers certainly w o u l d n o t have sanctioned. A l d r i d g e recently h a d the experience o f being i n t e r v i e w e d by someone w h o c o m p l i m e n t e d h i m o n his taste i n classical music, w h i c h she deduced f r o m the C D s o n display i n his s i t t i n g r o o m . T h e interviewer proceeded t o agree w i t h some o f his answers! F l a t t e r i n g , perhaps, b u t very d a m a g i n g t o v a l i d i t y . W h e t h e r o r n o t h i r e d interviewers deviate f r o m o u r script, their s i t u a t i o n encourages a n i n s t r u m e n t a l a n d calculative a p p r o a c h t o the i n t e r v i e w s . Because o f t h i s , researchers o n large-scale surveys have t o spend days o n the r e c r u i t m e n t a n d t r a i n i n g o f interviewers, a n d o n m a k i n g the i n t e r v i e w schedule's i n s t r u c t i o n s w a t e r t i g h t . T h i r d , using h i r e d interviewers is feasible o n l y w h e r e i n t e r v i e w s are h i g h l y s t r u c t u r e d . I f w e w a n t t o ask searching open-ended questions, i t is better t o do so ourselves. We w o u l d a d d t w o m o r e p o i n t s , i m p l i c i t i n w h a t Saunders says. D o i n g i t yourself is deeply satisfying. I t is also m o r e conducive t o the exercise o f the sociological i m a g i n a t i o n .
Commissioned research M a n y readers w i l l be c a r r y i n g o u t research f o r someone else: an employer, a v o l u n t a r y association, a c h u r c h or charity. Even t h o u g h y o u are d o i n g i t y o u r self, a n d w h e t h e r o r n o t y o u are being p a i d , y o u d o n o t have a free h a n d . I t m a y be the sponsor's v i e w t h a t they have the a i m a n d the v i s i o n , w h i l e y o u have the technical k n o w - h o w . Paradoxically, however, sponsors are usually s u r p r i s i n g l y vague a b o u t w h a t they w a n t t o k n o w . N o r d o they necessa r i l y have a clear strategy f o r the research - they s i m p l y w a n t t o c o m m i s s i o n 'a survey'. T h i s means t h a t y o u w i l l be i n v o l v e d i n discussion w i t h y o u r sponsors t o establish n o t just the p r a c t i c a l details o f the survey b u t its objectives. Y o u r role is t o help the sponsors c l a r i f y w h a t i t is they w a n t t o k n o w . Sponsors usually recognize this soon after negotiations begin. A t this stage, they are open t o y o u r proposals a b o u t h o w t o define a n d achieve the objectives o f the research.
86
Surveying the social world
Difficulties w i t h sponsors t e n d t o arise later. T h e survey m e t h o d can be a v i c t i m o f its o w n v i r t u e , its openness t o p u b l i c scrutiny. Sponsors w i l l ask t o see y o u r d r a f t questionnaire o r i n t e r v i e w schedule. T w o things t e n d t o h a p p e n . T h e y w i l l ask y o u t o m o d i f y or o m i t some questions as t o o sensitive, a n d they w i l l present y o u w i t h questions they w a n t y o u t o include. These requests m a y come very late, a n d just at the t i m e w h e n y o u are ready to l a u n c h the survey h a v i n g c o m p l e t e d y o u r p i l o t i n g . I f y o u are asked t o m o d i f y o r o m i t questions, i t shows t h a t the sponsor is p r o b a b l y a f r a i d o f the answers. T h i s is a sign t h a t w e s h o u l d be asking exactly those questions. Even t h o u g h sponsors say t h e i r objective is t o i m p r o v e t h e i r service t o their clients, they m a y be f r i g h t e n e d by the prospect of a barrage o f c r i t i c i s m . I n a large o r g a n i z a t i o n , one section - the catering d e p a r t m e n t , say - m a y feel i t is being u n d u l y exposed t o c r i t i c i s m . T h e y m a y w e l l say t h a t they have done their o w n 'survey' already, a n d k n o w a l l they need t o k n o w . W h e n y o u are asked t o include questions, the p r o b l e m is t h a t the sponsor usually expects y o u t o include t h e m w o r d f o r w o r d . This is w h a t they w a n t to k n o w , a n d this is h o w they w a n t y o u t o ask i t . I f the questions are w e l l designed there w i l l be n o p r o b l e m ; b u t w h a t chance is there o f that? O n e or t w o b a d l y w o r d e d questions can seriously damage the o v e r a l l q u a l i t y o f the responses a n d the response rate itself. H o w t o handle this? Clearly, the answer depends o n the precise s i t u a t i o n . We suggest one p r i n c i p l e : emphasize the technicalities. Y o u have been asked to c a r r y o u t a survey because y o u have expertise. Even i f y o u are a beginner, reading this b o o k w i l l give y o u far m o r e k n o w l e d g e a b o u t surveys t h a n y o u r sponsors have. T h e y have asked f o r y o u r advice, a n d y o u s h o u l d n o t be apologetic a b o u t i t . I f y o u helped t h e m i n the early stages t o c l a r i f y their objectives, they are l i k e l y t o f o l l o w y o u r advice n o w . Questionnaires a n d s t r u c t u r e d i n t e r v i e w schedules are documents t h a t sponsors can ask t o vet. Such s c r u t i n y o f the details is n o t so easy i n the case of u n s t r u c t u r e d i n t e r v i e w s a n d focus g r o u p s . T h i s is one reason f o r i n c l u d ing t h e m i n o u r research strategy: they are less vulnerable t o the sponsor's attentions.
Covering letters for postal questionnaires A p o s t a l questionnaire m u s t be accompanied b y a c o v e r i n g letter. There is n o f o r m u l a f o r such letters - as ever, the sociological i m a g i n a t i o n comes i n t o play. H o w the letter is w o r d e d w i l l depend o n the t o p i c o f the research, the respondents, y o u r r e l a t i o n s h i p w i t h the respondents, a n d w h a t y o u are able and w i l l i n g t o p r o m i s e as regards feedback. B o x 5 . 1 offers some guidelines. I f the resources are available, i t is w o r t h sending a r e m i n d e r letter t o those w h o have n o t responded. D o i n g so i n v a r i a b l y generates a significant n u m b e r
Collecting your data
Box 5.1
Guidelines for a covering letter
Style The letter should be clear, straightforward, businesslike and fairly formal, but not pompous. Headed writing paper is helpful. A n informal chatty style will be off-putting t o some respondents, w h o will read it as frivolous. O n the other hand, the days are happily gone when we could address respondents self-importantly, as though they were obliged t o take part. Spelling and grammar Like it o r not, many people interpret mistakes in spelling and grammar as signs that the w r i t e r is careless, ill-educated o r unintelligent. However unfair, these judgements will be made, t o the detriment of the response rate and the quality of responses. D o proof read carefully, and ask for advice if you need t o . Purposes of the research W e need t o say as much as we can about this - but as briefly as possible - in o r d e r t o persuade respondents that participation is w o r t h w h i l e . W e may need t o mention sponsorship o r funding, and should also give a concise statement about o u r position as researchers. How the respondent was selected Unless it is obvious f r o m the context of the research, w e should explain briefly and non-technicaily h o w the respondent was selected f o r inclusion in the study. Why the respondent can help Sometimes respondents fear they cannot help us because they are not experts and do not know enough about o u r research topic. W e may need t o reassure them about this, for example by saying that we are interested in their opinions and experience, and that we wish t o have a broad coverage of all shades of opinion. Confidentiality and anonymity W e need t o be clear about what guarantees we are giving, and t o be alert t o the problem that some respondents may take confidentiality t o mean anonymity. If the questionnaire has a serial number, w e should explain its significance. Feedback W e may decide t o offer feedback individually t o respondents, though this can be costly. Alternatively, we may indicate t o them where o u r findings will be published. O n e possibility is t o use the w o r l d wide web. Answering queries It may be desirable t o give a telephone number o r email address which respondents can use if they have any queries. Thanks part.
W e should thank the respondent in anticipation of their taking
87
88
Surveying the social world
o f e x t r a responses. I t w o r k s m o r e easily w h e n the questionnaires are n o t a n o n y m o u s , since w e can target the reminders t o non-respondents. As w i t h a c o v e r i n g letter, there is n o f o r m u l a f o r a reminder. Clearly, people are e n t i t l e d t o refuse, so w e c a n n o t be accusatory i n t o n e . W h a t w e s h o u l d d o is refer t o the value o f the research a n d o f the respondent's participation.
Box 5.2
Guidelines for a reminder letter
Keep it short A reminder letter should be even shorter than the original covering letter. Content Refer t o the value of the respondent's participation. Facilitation Enclose another
copy of the
questionnaire and
another
stamped addressed envelope. Appreciation
Acknowledge that the respondent's reply may be in the post,
and thank the respondent.
B o x 5.3 shows the t e x t o f the c o v e r i n g letter used i n the Travel Survey. As is n o r m a l i n real life research, the o u t c o m e was a c o m p r o m i s e . A l t h o u g h d r a f t e d b y the Survey U n i t i t w e n t o u t over the signature o f the senior academic responsible f o r t r a f f i c o n campus. G i v e n the c o n t e x t o f the research, it was t h o u g h t essential t o e x p l a i n the reasons w h y the u n i v e r s i t y was c o n d u c t i n g a survey.
Box 5.3
A n example of a covering letter
Professor David Greenaway Pro-Vice-Chancelior and Professor of Economics Department of Economics University Park Nottingham N G 7 2RD
Dear Colleague/Student,
Travel to Work Survey 1998 Thank you f o r finding time t o complete this questionnaire. Why are we doing it? The University is committed t o traffic management policies aimed at reducing vehicle dependency, encouraging the use of public transport and
89
Collecting your data managing vehicular movements within and between o u r campuses. This commitment is part of a wider environmental strategy f o r the University. The survey will identify the travel patterns of staff and students that will assist us in formulating policies t o : • tackle t h e problems of increasing demand f o r vehicular access and parking; • make recommendations f o r developing and supporting viable and accessible transport alternatives. The findings of the survey will be reported t o Transport Consultants w h o will advise us in producing a C o m m u t e r Plan f o r the University, which is expected in O c t o b e r 1998. About the survey The survey is confidential. If you complete and return the questionnaire accompanied by the slip below you can be entered into a prize draw t o win a bicycle. There will be t w o prizes, one each f o r staff and students. The questionnaires and t h e slip, which will be separated immediately o n receipt, should be returned by 12.5.98 in the envelope provided. W i n ners will be announced o n 20.5.98. Many thanks in anticipation of your cooperation. Yours faithfully, Professor D. Greenaway
...
.
.
___
BICYCLE PRIZE If you wish t o be entered f o r the draw, please fill in the details below and tear off this slip. Enclose the slip w i t h your completed questionnaire in the envelope provided and return it through the internal mail. The slip will be separated f r o m t h e questionnaire immediately o n receipt. A l l personal details will remain absolutely confidential.
Name Department Contact telephone number
Please tear off and return with the questionnaire
90
Surveying the social world
Approaching respondents for an interview I t is h e l p f u l , i f possible, t o send p o t e n t i a l respondents a letter first, i n c l u d i n g the same k i n d s o f p o i n t s t h a t w o u l d be i n a c o v e r i n g letter f o r a postal quest i o n n a i r e . H e r e the letter is f u n c t i o n i n g as a k i n d o f letter o f i n t r o d u c t i o n . T h e n , w h e n c o n t a c t i n g the person b y telephone o r i n person, w e c a n refer back t o the letter. W e s h o u l d n o t assume t h a t respondents w i l l remember the detailed contents o f the letter o r even h a v i n g received i t - f o r someone else i n the h o u s e h o l d m a y have opened i t a n d n o t m e n t i o n e d i t . T h e p o i n t is, i t serves as a n i n t r o d u c t i o n a n d shows o u r g o o d f a i t h i n w i s h i n g t o elicit informed and w i l l i n g cooperation. I f c a l l i n g i n person, w e w i l l o b v i o u s l y w a n t t o l o o k respectable, a n d n o t be m i s t a k e n f o r a salesperson o r evangelist. W e s h o u l d carry i d e n t i f i c a t i o n w i t h a p h o t o g r a p h , a n d a n o f f i c i a l letter e x p l a i n i n g w h o w e are a n d g i v i n g a contact address a n d telephone n u m b e r f o r v e r i f i c a t i o n . A s w i t h a covering letter, w e need t o e x p l a i n the purposes o f the research, h o w the findings w i l l be used, w h e t h e r any s u m m a r y r e p o r t w i l l be available, h o w the respondent was chosen, a n d o u r guarantees o f c o n f i d e n t i a l i t y . O n e a w k w a r d p r o b l e m is w h e n one m e m b e r o f a h o u s e h o l d acts as a gatekeeper a n d tries t o refuse p a r t i c i p a t i o n o n behalf o f another ( ' M y w i f e w o n ' t w a n t t o take p a r t i n y o u r survey'). W e s h o u l d d o o u r best, p o l i t e l y , t o t r y t o speak t o the p o t e n t i a l respondent i n person.
Piloting P i l o t i n g is essential, b u t is o f t e n s k i m p e d a n d h u r r i e d . I n o u r experience sponsors rarely a l l o w f o r i t ; they w a n t y o u t o get o n w i t h the survey a n d p r o duce the results. A p i l o t survey is a d u m m y r u n o f the survey proper, i n w h i c h w e a i m t o test all the key aspects o f the survey, i n c l u d i n g access t o respondents, design o f the research i n s t r u m e n t , a n d gathering the data. T h e p i l o t survey m a y be preceded b y one o r m o r e pretests, i n w h i c h w e investigate p a r t i c u l a r aspects o f our survey, such as a specific set o f questions w e consider p r o b l e m a t i c . T h e pretests a n d the p i l o t survey are a l l p a r t o f the overall process o f p i l o t i n g . T e x t b o o k s o n surveys o f t e n propose a n elaborate a n d costly p r o g r a m m e o f pretests, f o l l o w e d by a large-scale p i l o t survey. I t is an ideal impossible t o live u p t o i n m o s t research carried o u t o n a l i m i t e d budget b y a solo researcher o r a small research t e a m . T h e answer is n o t t o despair, b u t t o focus o n the essentials. W e suggest the f o l l o w i n g guidelines: • T r y t o get i t r i g h t first t i m e . A p i l o t survey s h o u l d be as g o o d as y o u can make i t . P i l o t i n g enables us t o refine o u r survey, n o t t o t r a n s f o r m a hopeless mess i n t o a perfect i n s t r u m e n t . Perfection is n o t attainable anyway.
Collecting your data
91
• Q u a l i t y is m o r e i m p o r t a n t t h a n q u a n t i t y . Small-scale b u t intensive p i l o t i n g is far better t h a n large-scale crude p i l o t i n g . • I m a g i n a t i v e use o f small-scale pretests is very p r o d u c t i v e . I t enables us t o get detailed c o m m e n t s a n d suggestions a b o u t h o w t o i m p r o v e o u r research i n s t r u m e n t . • M a k e the p i l o t survey as similar as possible t o the survey proper. I n p r i n ciple, w e s h o u l d be testing the effectiveness o f a l l aspects o f the research design. • U s i n g y o u r judgement a b o u t the target p o p u l a t i o n , choose a representative range o f respondents f o r the p i l o t i n g . R e l y i n g o n friends o r colleagues w i l l n o t be representative o f the target p o p u l a t i o n . A m b i g u o u s , sensitive o r offensive questions m a y n o t be p i c k e d u p . I n the p i l o t i n g process, w e need t o be a t t u n e d t o the signs t h a t w a r n us t h a t something is w r o n g - as set o u t i n Boxes 5.4 a n d 5.5.
Box 5.4
Warning signs in pilot self-completion questionnaires
Giving several answers to a question where only one was required This means we need t o make o u r instructions clearer - for example, please / one only. Giving one answer to a question where several were possible Again, the instructions need t o be clarified - for example, please / all that apply. Failure to answer the question This may mean that the question is awkward o r offensive. Alternatively, something may have gone w r o n g w i t h o u r question skips. Open questions are left blank If hardly anyone answers them, do they have any value? A question asking respondents to rank items is not completed properly Ranking is a complex task. W e should simplify it, usually by reducing the number of items t o be ranked. Respondents write comments in the margins This is a straightforward sign that something is amiss. The questionnaire takes a long time to complete Even if people do not complain, this is a clear warning. Participants in a pilot survey may be more generous w i t h their time than respondents t o the survey proper will be. Almost everyone gives the same answer This is a warning sign of possible social desirability effects.
92
Surveying the social world
Box 5.5
Warning signs in pilot interviews
The interviewer has to clarify or expand on a question Presumably the question is unclear, and needs t o be reworded. The interviewer has to apologize for a question This is an extreme f o r m of the first point. In our experience of being interviewed, it is common in hired hand research. W e should never have t o apologize f o r a question. Interviewees appear reluctant or embarrassed
Something clearly is wrong.
Some questions may be more sensitive than w e realized, o r perhaps our self-presentation is unintentionally inhibiting. The interviews are significantly longer than expected The simple remedy is t o cut the number of questions, and perhaps t o reduce the amount of probing. The interviews are significantly shorter than expected This is a warning that rapport may n o t have been achieved, that respondents have doubts about the research, o r that question probes are not operating properly. There are items where respondents want to say more than we expected This is a sign that these items are salient t o the respondent. W e should consider asking more questions about them, perhaps w i t h deeper probes. Respondents
have difficulty with response categories W e may need t o use
show cards. Interviewers have difficulty with instructions Interviewers have problems t o o . Instructions, especially question skips, are often hard t o follow.
Distribution and return of questionnaires I f w e are sending questionnaires t h r o u g h the post, o r t h r o u g h the i n t e r n a l m a i l o f a n o r g a n i z a t i o n , w e need t o m a k e sure t h a t they reach the r i g h t people. First, w e need u p t o date addresses. Second, w e m u s t m a k e i t easy f o r respondents t o r e t u r n t h e i r questionnaire t o us. I f they are t o use the post, t h e n i t is desirable t o supply t h e m w i t h a stamped addressed envelope, w h i c h appears less i m p e r s o n a l t h a n a business r e p l y envelope. I n some cases, it m a y be m o r e convenient f o r respondents t o use the i n t e r n a l m a i l o f their o r g a n i z a t i o n , p r o v i d e d i t is efficient a n d p r o v i d e d t h a t respondents are c o n fident i t is secure. Sometimes, questionnaires are h a n d e d o u t personally b y an i n t e r m e d i a r y - f o r example, b y a teacher i n a classroom o r a receptionist i n a w a i t i n g area. R e l y i n g o n intermediaries is, however, very dangerous. Unless they have been f u l l y b r i e f e d , a n d unless w e can be quite confident t h a t they w i l l d o as
Collecting your data
93
instructed, i t w i l l p r o b a b l y n o t t u r n o u t w e l l . G o o d intentions are n o t e n o u g h . Intermediaries m a y n o t be f u l l y aware o f the nature a n d purposes o f the research, n o r are they necessarily knowledgeable a b o u t the research process. Sometimes they m a y go t o o far, i m p l y i n g t h a t p a r t i c i p a t i o n is r e q u i r e d a n d refusal n o t an o p t i o n . T h e y m a y give inaccurate i n s t r u c t i o n s , or p u t an i n a p p r o p r i a t e gloss o n the purposes o f the research. I n other cases, intermediaries m a y n o t pursue the m a t t e r at a l l v i g o r o u s l y , b u t w i l l s i m p l y leave questionnaires l y i n g a r o u n d f o r people t o complete i f they feel i n c l i n e d . Few w i l l d o so. Self-completion questionnaires are o f t e n used i n audience research i n theatres a n d l i k e venues. A c o m m o n p r o b l e m is t h a t the respondent is given n o i n s t r u c t i o n s whatever a b o u t w h e r e t o r e t u r n the questionnaire. People m a y be r e l u c t a n t t o leave i t b e h i n d o n their seat. So w e see people leaving the theatre at the end o f the p e r f o r m a n c e , c l u t c h i n g a questionnaire w h i l e v a i n l y l o o k i n g f o r the b o x t o p u t i t i n o r a n o f f i c i a l t o h a n d i t t o . M o s t o f these questionnaires finish their life i n a litter b i n o r the gutter. T h e f a u l t lies w i t h the researchers, f o r h a v i n g given n o t h o u g h t t o h o w the questionnaires are t o be r e t u r n e d .
• Collecting the data youi
Further reading To pursue these questions i n more depth, we suggest that the best way is to read about how researchers have tackled them i n their o w n w o r k . Devine and Heath (1999) provide a good starting point in their Sociological Research Methods in Context. Some of the articles i n Hammond's (1964) classic collection, Sociologists at Work, deal w i t h researchers adapting survey methods to particular settings and problems (see especially the chapters by Lipset, Coleman and Davis).
(V)
Designing the questions: what, when, where, why, how much and how often?
Key elements in this chapter * Asking meaningful questions
The sociological imagination F o r m u l a t i n g the questions t o include i n a questionnaire o r i n t e r v i e w schedule, designing the l a y o u t o f questionnaires a n d p l a n n i n g the sequence o f questions: a l l these lie at the heart o f survey w o r k a n d are one o f its m o s t enjoyable aspects. There are technicalities t o be t a k e n i n t o account a n d p i t falls t o be a v o i d e d , as w e e x p l a i n . B u t the technicalities stem f r o m s o m e t h i n g m o r e f u n d a m e n t a l , the sociological i m a g i n a t i o n .
Designing the questions
95
Professional sociologists d o n o t have a m o n o p o l y o n the sociological i m a g i n a t i o n . I t is g r o u n d e d i n social life - above a l l , i n the lives o f o u r respondents. We use o u r sociological i m a g i n a t i o n t o t r y t o i d e n t i f y the l i n k s between p u b l i c issues a n d p r i v a t e concerns, between the great issues o f o u r society such as p o v e r t y a n d social exclusion, disability, j o b insecurity, a n d the personal experiences o f people engaged w i t h t h e m . W e g r o u n d o u r i m a g i n a t i o n by p r e l i m i n a r y w o r k such as r e a d i n g a b o u t the t o p i c , t a l k i n g t o people, observing t h e m , p i l o t i n g o u r questions a n d so o n . Q u e s t i o n design calls o u r sociological i m a g i n a t i o n i n t o p l a y i n a n u m b e r o f w a y s . W e need t o f r a m e questions t h a t are m e a n i n g f u l , sensitive, precise, searching, a n d salient t o o u r respondents. We need t o c o n s t r u c t the questions i n such a w a y t h a t respondents w i l l w a n t t o answer t h e m as f u l l y a n d t r u t h f u l l y as they can.
Understanding what matters to respondents Surveys are o f t e n c r i t i c i z e d f o r being d r i v e n entirely by the interests o f the researcher. H o w d o w e k n o w t h a t w h a t interests us also interests o u r respondents? T h i s is the p r o b l e m o f salience. Respondents' h e l p f u l c o o p e r a t i o n does n o t necessarily s h o w t h a t w e have engaged w i t h their real concerns.
Box 6.1
Gauging salience
Open-ended questions W e examine the significance of open-ended questions later in this chapter. For the moment, we simply say that t w o of the most productive questions the Survey Unit has asked of first-year undergraduate students at the University of Nottingham, U K , are the following: W h a t would you say you have most liked about being an undergraduate student at the University of Nottingham? and W h a t would you say you have most disliked about being an undergraduate student at the University of Nottingham? Ranking questions These are closed versions of the open-ended questions given above. W e present o u r respondents w i t h a list of alternatives, and ask them t o choose a small number that are the most important t o them. Sometimes we ask respondents t o rank their selection in order of importance. This technique
96
Surveying the social world
can be revealing, though it will be very cumbersome if a ranking is required and the list is long. It can also seem somewhat artificial. Direct questions on salience W e present respondents w i t h a list, asking them t o indicate f o r each item how important it is t o t h e m . This approach is blunt, but can be effective. One very common approach is through a Likert scale, thus: Strongly agree
Agree
Neutral
Disagree
Strongly disagree
Catering on campus is excellent Halls of residence are well equipped and so on. A n alternative way of presenting the response categories is like this: Strongly agree 1 2 3 4 5 Strongly disagree For each item, respondents are asked t o put a ring round the appropriate number. W e suggest later (page I 12) that in most cases it is desirable t o have an odd numbered scale, normally w i t h five categories, so that there is a middle category. This middle category may be labelled 'neutral', o r 'uncertain', o r 'neither agree nor disagree'.
Recognizing differences between respondents A n essential reason f o r d o i n g a survey is t o d r a w comparisons between respondents. I f they a l l t h o u g h t a n d acted alike there w o u l d clearly be n o p o i n t i n a survey, since w e c o u l d s i m p l y take one case a n d generalize f r o m i t . V a r i a t i o n s between respondents can cause technical difficulties, as w e illustrate t h r o u g h the Travel Survey, b u t they are w h a t m a k e a survey w o r t h w h i l e . I n o u r experience, m a k i n g false or d u b i o u s assumptions a b o u t respondents is one o f the m o s t c o m m o n p r o b l e m s t o be overcome.
Designing the questions
Box 6.2 tions
Avoiding unjustified presuppositions and false assump-
Assumptions and presuppositions are similar but n o t quite the same. False assumptions By an 'assumption', we mean something that is taken for granted. All arguments are built on assumptions, but assumptions can be false. For example, in a postal questionnaire sent t o a sample of Church of England clergymen (this was before the church ordained w o m e n priests), Aldridge asked respondents the following question: Is the fundamentalist approach t o the Bible valid today? •
Yes
• •
No Uncertain
A very significant minority of respondents objected t o this question, on the grounds that the t e r m 'fundamentalist' was not only ambiguous but offensive. Aldridge had falsely assumed that the t e r m was clear and neutral! A n o t h e r example known t o us is a questionnaire on cremation and burial, which was delivered by post w i t h o u t even a covering letter and which caused distress t o many respondents, not least t o people w h o had been recently bereaved. The researchers presumably assumed, falsely, that the topic was not particularly sensitive, and that it could be treated as an unproblematic area of academic enquiry. Unwarranted presuppositions By a presupposition, we mean taking the existence of something for granted. The standard philosophical example of this is the question: Is the present King of France bald? The point is, of course, that since France is a republic there is no King of France. It is not true that the present king is bald, but nor is it false. In order t o ask fruitful questions in o u r surveys, we need t o know what there is and what there isn't in the social w o r l d in question. W h i c h of the following posts exist at the University of Nottingham, UK? Deputy Pro-Vice-Chancellor Director of Finance Dean of the Medical School Proctor Answer: the second and third exist, the first and last do not. A well-placed member of the university would k n o w this and could have told you if you had asked. Finding o u t what exists out there is a vital component of all social
97
98
Surveying the social world
research. W e can enquire about the King of France's hair, o r the Proctor's policy on student discipline, only after we have established that these beings actually exist. In seeking t o eliminate false assumptions and unwarranted presuppositions there are no easy answers and no simple tactics. W e are at the heart of the sociological imagination. Knowing about the social and organizational context is critically important, and piloting plays a key role. Here is one possibility: Consulting key informants W e have t o be clear what w e are doing here. Oppenheim (1992: 62-3) warns us against relying on 'experts'. If our questions are sloppy and ill thought-through, it would not take an expert t o tell us so, and the expert probably would not waste her o r his precious time trying t o rescue us from disaster. N o 'expert' knows everything. Experts in survey design can help us w i t h technicalities, as Oppenheim says, but they cannot do o u r thinking for us. Instead of relying on experts, w e should think in terms of making use of key informants. By this we mean people w h o can help in alerting us t o false assumptions and unwarranted presuppositions. They can also warn us about problems in the use of language.
Using unambiguous language sensitively O b v i o u s l y , w e w a n t questions t h a t are m e a n i n g f u l , clear, u n a m b i g u o u s , sensitive a n d revealing. G i v e n the v a r i a t i o n between respondents, this is n o t so easy. There are w e l l - k n o w n a n d n o t so w e l l - k n o w n differences i n language use d e p e n d i n g o n social factors such as age, r e g i o n , a n d social class. A n issue t h a t has t o be dealt w i t h is the social s t a n d i n g o f d i f f e r e n t usages. I n c o n t e m p o r a r y B r i t i s h usage, supposedly 'correct' usages include these: • the m i d d a y meal is lunch, n o t dinner • the r o o m i n w h i c h the f a m i l y gathers (if i t does!) is the sitting the living room o r the lounge
room,
not
• a magazine s h o u l d never be called a book • the loo o r lavatory is n o t a toilet M a n y people are h i g h l y sensitive t o these v a r i a t i o n s i n usage, r e g a r d i n g some o f t h e m as i m p o l i t e , vulgar, or i n c o r r e c t . H o w d o w e a v o i d a m b i g u i t y w i t h o u t p a t r o n i z i n g o u r respondents or ' c o r r e c t i n g ' t h e i r use o f English? Some answers are given i n B o x 6.3.
Designing the questions
Box 6.3
Tactics for dealing with ambiguous or unclear terms
Avoidance The most commonly used terms f o r a midday meal are lunch o r dinner, and for an evening meal dinner, supper, o r tea. One tactic is t o use alternatives such as midday meal, main evening meal, o r main meal of the day. Glossing A n o t h e r possibility is t o gloss the t e r m , that is, t o give a brief explanation of what w e mean by it. Here are t w o interview questions [asked only of those respondents w h o think their soul will live on after death] taken f r o m the Religion and Politics Survey, 1996, conducted by Princeton Survey Research Associates and accessible o n t h e American Religious Data Archive: http://www.arda.tm D o you think there is a heaven, where people w h o have led good lives are eternally rewarded? Yes (Believe in heaven) N o ( D o n ' t believe in heaven) D o n ' t know/Refused ( D o n ' t k n o w if believe in heaven) D o you think there is a hell, where people w h o have led bad lives and die w i t h o u t being sorry are eternally damned? Yes (Believe in hell) N o ( D o n ' t believe in hell) D o n ' t know/Refused ( D o n ' t k n o w if believe in hell) These t w o questions gloss the meaning of heaven and hell. A f t e r all, other meanings are c o m m o n in western culture. For example, some people believe that hell does not entail eternal damnation, others that damnation not only sounds spiteful but also fails t o convey the desolation of being cut off from G o d . Glossing involves an imposition of a meaning. Hence, it is desirable t o convey that we are simply saying what we mean by the t e r m , not what the term means. W e do not want t o give the impression that we are instructing our respondents in Standard English. O u r glossary t o this book provides an example. W e use questionnaire exclusively t o refer t o a f o r m completed by the respondent; other writers use it inclusively t o cover interview schedules as well. W e are not claiming that our usage is correct o r better, but are simply glossing our use of the t e r m t o deal w i t h the ambiguity. Clarification This is a f o r m of glossing in which we explicitly clarify potential ambiguity. Here is an example f r o m the Travel Survey:
99
100
Surveying the social world
O n occasions when you travel t o the campus by car, where do you park? N o t applicable In the Science City area In the central area (including Highfields House, West Drive and Education) O n the periphery (including Halls, History and the Sports Centre) In this case, the location in o u r categories of Highfields House and the other examples is explicitly clarified. W e are in effect glossing what we mean by the central area and the periphery. Giving examples In a Survey Unit questionnaire sent t o Pre-Registration House Officers in England - people in their final year of basic medical training - respondents were asked about specific formal educational sessions, and then were asked: Have any other formal educational meetings been arranged (for example, lectures, journal club, X-ray meetings, etc.)? Giving examples is far more friendly than issuing instructions, but carries the danger of suggesting some answers while possibly distracting attention f r o m others. It is best used when we k n o w that the examples either cover all the main possibilities o r send an unambiguous message about what we have in mind. (Incidentally, the question breaks a rule w e were taught at school: you do not say 'etc.* if you have already said 'for example*. However, we think that being clear and helpful is more important than being formally 'correct'. O n the other hand, many respondents will be shocked t o see misspellings, so it is important t o check spelling carefully, running a speilcheck program if possible.) Indirectly eliminating unwanted meanings This is sometimes possible, though perhaps risky. It depends on respondents picking up cues f r o m the context. Consider the following example: Over the past seven days, have you bought any of the following? Please tick all that apply. A A A A
comic paper newspaper magazine book
In this example, book and magazine are listed separately, w i t h magazine appearing before book. The researcher expects the reader t o infer f r o m this that book is used exclusively of magazine.
Designing the questions
101
The role of open-ended questions Some books w a r n against using open-ended questions at a l l i n surveys, w h i l e others say t h a t open-ended questions s h o u l d be k e p t t o a strict m i n i m u m . W h y is this? Three m a i n reasons are given. 1
Open-ended questions are m o r e d i f f i c u l t t o answer, because respondents or interviewees are called u p o n t o t h i n k t h r o u g h (or t h i n k u p ) their answer f r o m scratch, w i t h o u t help f r o m the researcher. T h i s is p a r t i c u l a r l y p r o b l e m a t i c w i t h questionnaires, since w r i t i n g a n answer requires m o r e t i m e a n d e f f o r t t h a n g i v i n g i t verbally. I f respondents suspect t h a t the reason f o r open-ended questions is t h a t the researcher has n o t t a k e n the t r o u b l e t o t h i n k a b o u t response categories, this m a y w e l l affect the response rate a n d the q u a l i t y o f responses.
2 T h e responses t o open-ended questions are m o r e d i f f i c u l t t o code, u n l i k e closed questions, w h e r e the response categories are pre-coded. 3 T h e responses t o open-ended questions are harder t o analyse. Partly, this is because o f c o d i n g p r o b l e m s . I n a d d i t i o n , a n u m b e r o f respondents w i l l s i m p l y s k i p over open-ended questions. Open-ended questions t y p i c a l l y have a higher rate o f non-response t h a n closed questions d o . Despite these real difficulties, open-ended questions can p l a y a n i m p o r t a n t p a r t i n survey w o r k , b o t h i n questionnaires a n d i n t e r v i e w s . T h e y can be used f o r a n u m b e r o f purposes. T o introduce variety Questionnaires a n d i n t e r v i e w s w h i c h rely o n a very s m a l l n u m b e r o f types o f q u e s t i o n a n d response - a Yes/No/Don't K n o w f o r m a t , f o r example - m a y be s t r a i g h t f o r w a r d , b u t are also l i k e l y t o be seen as t e d i o u s . O n e w a y o f i n t r o d u c i n g v a r i e t y is t h r o u g h the c a r e f u l use o f open-ended questions.
T o t a p salience As discussed above ( B o x 6.1), open-ended questions can be very useful i n h e l p i n g us t o assess the salience o f a n issue t o a respondent.
T o show a humanistic approach Surveys are sometimes t h o u g h t t o be i n e v i t a b l y m u n d a n e , b o r i n g a n d insensitive. By using open-ended questions as w e l l as closed ones, w e are able t o send a clear signal t h a t w e a p p r o a c h o u r research i n a h u m a n i s t i c s p i r i t . O u r respondents are informants, w i t h t h e i r o w n individual p o i n t s o f view, w h i c h they are q u i t e capable o f expressing in their own words.
102
Surveying the social world
T o a c k n o w l e d g e t h a t researchers are n o t omniscient I n some cases, w e have so l i t t l e idea o f w h a t answers m i g h t be f o r t h c o m i n g , or the possibilities are so vast, t h a t i t is s i m p l y n o t possible t o p r o v i d e respondents w i t h a sensible list o f the m a i n alternatives. I n the Travel Survey, people w h o cycle t o w o r k were asked the open-ended question: H o w do you think facilities f o r cyclists could be improved? M e m b e r s o f the Survey U n i t are n o t cyclists themselves, a n d c o u l d n o t easily anticipate w h a t the answers f r o m cyclists w o u l d be.
T o generate q u o t a t i o n s A f e w well-chosen q u o t a t i o n s f r o m o u r respondents can convey the f l a v o u r of responses far better t h a n any other r h e t o r i c a l device. W e are delivering o u r p r o m i s e t o give people a voice. I f o u r survey is being u n d e r t a k e n o n behalf o f a sponsor, direct q u o t a t i o n f r o m respondents - w h o m a y be customers - can have a n i m m e d i a t e i m p a c t . There w i l l be g o o d news as w e l l as b a d . First, the g o o d news, f r o m postgraduate students: W h a t would you say you have most liked about being a postgraduate student at the University of Nottingham? 'High quality - the experienced teachers, good courses.' 'My lovely friends f r o m all around the w o r l d , U K , Taiwan, Turkey, Germany, Greece, Spain, D e n m a r k . . . ' T h e safe and beautiful campus.' T h e b a d news: W h a t would you say you have most disliked about being a postgraduate student at the University of Nottingham? 'Catering is grossly over-priced, especially sandwiches/hot drinks. The overall feel is of a monopolized market.' 'Lack of proper union facilities, emphasis on halls t o exclusion of postgraduates.' Used j u d i c i o u s l y , direct q u o t a t i o n s can b r i n g h o m e t o readers the salient issues f o r respondents - an i m p o r t a n t aspect o f the w r i t i n g o f a research r e p o r t , w h i c h is covered i n chapter nine. Occasionally, a n open-ended question can p r o d u c e a n unexpected response w h i c h can set the researcher t h i n k i n g m o r e deeply a b o u t the issue. A start l i n g example is the f o l l o w i n g , f r o m a p r o g r a m m e o f i n t e r v i e w s i n I s l i n g t o n , L o n d o n , i n 1968 ( A b e r c r o m b i e et al. 1 9 7 0 ) : D o you believe in God? 'Yes.'
Designing the questions
103
D o you believe in a G o d w h o can change the course of events on earth?' ' N o , just the ordinary one.' T h i s is the o n l y survey question w e k n o w o f t h a t has given rise t o a p o e m : D o n a l d Davie's ' O r d i n a r y G o d (Davie 1 9 8 8 ) . 5
Box 6.4
Making the best use of open-ended questions
Use them sparingly Open-ended questions require more time and effort on the part of the respondent, particularly in self-completion questionnaires. They are also more difficult t o code. As Oppenheim warns (1992: 113), open-ended questions 'are often easy t o ask, difficult t o answer, and still more difficult t o analyse'. Do not begin with them It is usually desirable t o begin w i t h closed questions, so that the respondent is drawn into the study and rapport is established before the more difficult open-ended format is introduced. Use them to probe the respondents' view of salient issues In the survey of postgraduates cited above, t w o open-ended questions were used t o tap into students' best and w o r s t experiences of the university. Allow an appropriate space for the response As a general guide, we suggest a space equivalent t o three o r four lines. Any less, and respondents may conclude that their opinions are not really being taken seriously. Any more, and respondents may feel intimidated o r annoyed that an unreasonable effort is being required of them.
Tackling the social desirability problem A m a j o r challenge f o r a l l overt f o r m s o f social research is the social desira b i l i t y p r o b l e m . Respondents t e n d t o give socially a p p r o v e d answers t o o u r questions, t o over-report their v i r t u o u s actions a n d u n d e r - r e p o r t their vices, and t o engage i n socially a p p r o v e d behaviour w h e n they k n o w w e are observing t h e m . The p r o b l e m o f social desirability has a n u m b e r o f dimensions. Respondents m a y be t r y i n g t o d o one or m o r e o f the f o l l o w i n g things:
104
Surveying the social world
• being h e l p f u l a n d cooperative t o the researcher by g i v i n g the answer they t h i n k the researcher w a n t s ; • g i v i n g answers t h a t appear t o s h o w t h a t they are c u l t i v a t e d people, m o r a l l y decent, a n d g o o d citizens; • d e m o n s t r a t i n g t h a t they are r a t i o n a l b y g i v i n g answers t h a t are l o g i c a l a n d consistent.
Box 6.5
Tactics for dealing with social desirability effects
• Be specific, asking neither about hypothetical behaviour (what would you do if?), nor about regular behaviour (how often do you?), but about a specific time period (what did you do in the last seven days?). • Ask indirect questions instead of addressing a sensitive issue head-on. • Avoid leading questions. • Make clear - for example in a covering letter - that o u r research is scientific and ethically neutral. • Consider using self-completion questionnaires that are completely anonymous and that do not involve personal interaction with a researcher.
Questions about respondents' knowledge Surveys f r e q u e n t l y include questions w h i c h t a p a respondent's k n o w l e d g e a b o u t a given issue - f o o d hygiene, say, o r the effects o f s m o k i n g o n h e a l t h . T h i s is n o t the same as asking people f o r their o p i n i o n s . I n a democratic society, a range o f o p i n i o n is t o be expected, b u t lack o f k n o w l e d g e equates t o ignorance, w h i c h is socially undesirable. I f respondents feel t h a t they are facing some k i n d o f test designed t o expose their ignorance, they m a y be u n w i l l i n g t o p a r t i c i p a t e . I n any case, science is always a d v a n c i n g , so w e can never be sure w e have the complete t r u t h a b o u t these questions, a n d the line between k n o w l e d g e a n d o p i n i o n is o f t e n less clear t h a n w e m a y like t o t h i n k . O n e w a y o f dealing w i t h the p o t e n t i a l l y i n t i m i d a t i n g character o f k n o w ledge questions is t o present t h e m as questions a b o u t respondents' o p i n i o n s . Phrases such as ' i n y o u r o p i n i o n ' , ' i n y o u r v i e w ' a n d ' f r o m y o u r o w n e x p e r i ence' m a y be used t o signal t h i s . We can also p r o v i d e respondents w i t h a ' D o n ' t k n o w ' category.
Avoiding overlapping categories Very o f t e n , w e ask respondents t o indicate w h e r e they f a l l i n a p a r t i c u l a r range. T h i s can help t o soften questions t h a t m i g h t otherwise be t o o
Designing the questions
105
sensitive, such as asking a b o u t the respondent's age o r level o f i n c o m e . Take this example: Please state your age last birthday: Under 20 20-30 30-40 40-50 50-60 Over 60 T h e p r o b l e m here, o b v i o u s once m e n t i o n e d , is t h a t a respondent aged 30 falls i n t o t w o categories, namely 2 0 - 3 0 a n d 3 0 - 4 0 . T h e same p r o b l e m applies t o respondents aged 4 0 a n d 5 0 . T h e categories o v e r l a p . T h e response categories need t o be r e f o r m u l a t e d . For example: Please state your age last birthday: Under 20 20-29 30-39 40-49 50-59 60 and over
Asking about age Presenting respondents w i t h a set o f categories, as above, is one c o m m o n w a y i n w h i c h w e can m i n i m i z e people's sensitivities a b o u t age. Instead o f asking t h e m exactly h o w o l d they are, w e ask t h e m t o indicate i n t o w h i c h age range they f a l l . For m a n y purposes, this w i l l be a l l w e need. I f , however, we need t o k n o w respondents' age m o r e precisely, there are some technical p r o b l e m s t o overcome. Consider a c h i l d aged 5 years 9 m o n t h s . H o w m a n y years o l d is he? M o s t respondents w i l l say 5 years, b u t some, a significant m i n o r i t y , w i l l r o u n d the age u p t o 6. T h i s is a p r o b l e m w h e n asking a b o u t the age o f c h i l d r e n , b u t i t applies t o adults t o o . O n e p o s s i b i l i t y is t o ask respondents t o state their age i n years a n d m o n t h s . T h i s m a y w o r k reasonably w e l l w h e n asking a b o u t y o u n g c h i l d r e n , t h o u g h even here there is a small technical p r o b l e m , i n t h a t respondents m a y r o u n d t o the nearest m o n t h . I n any case, w e o f t e n d o n o t need such precision, a n d adults t y p i c a l l y d o n o t t h i n k i n these terms a b o u t their o w n age. A n o t h e r p o s s i b i l i t y is t o ask f o r date o f b i r t h . T h i s is very precise, b u t can s o u n d excessively bureaucratic a n d o f f i c i a l . A m o r e c o m m o n s o l u t i o n t o a v o i d the a m b i g u i t y is t o ask people f o r their 'age last b i r t h d a y ' .
106
Surveying the social world
Avoiding double-barrelled questions A d o u b l e - b a r r e l l e d (or w o r s e , m u l t i p l e - b a r r e l l e d ) question is one where m o r e t h a n one question is being asked at the same t i m e . For example: ' D o y o u o w n a camcorder o r video recorder?' is asking a b o u t t w o separate items. A m o r e subtle example is: ' H o w o f t e n are y o u i n contact w i t h y o u r parents?' - here, t w o people are i n v o l v e d , a n d the respondents' relations w i t h t h e m m a y be very d i f f e r e n t . One tactic f o r detecting this p r o b l e m is t o l o o k f o r the tell-tale w o r d s ' a n d ' a n d ' o r ' , a n d the use o f the slash, as i n cinema/theatre. I n general, the p r o b l e m o f d o u b l e - b a r r e l l e d questions is m o r e l i k e l y t o occur i n i n f o r m a l interviews t h a n i n s t r u c t u r e d i n t e r v i e w s a n d selfc o m p l e t i o n questionnaires. T h e g o o d news is t h a t such questions are also less o f a p r o b l e m i n an i n f o r m a l i n t e r v i e w , since any d i f f i c u l t y they cause can easily be r e p a i r e d . Even so, they are better a v o i d e d . Consider the f o l l o w i n g question: D o you k n o w if your employer has an equal opportunities policy? I f a respondent says 'Yes', w e are r i g h t t o infer t h a t , unless they are being facetious, they mean t h a t the employer does have such a policy. I f they are being facetious, their 'Yes' m a y mean 'Yes I k n o w the answer, b u t I ' m n o t g o i n g t o tell y o u w h a t i t is u n t i l y o u ask the question p r o p e r l y ' . So m u c h f o r facetiousness. W h a t , t h e n , i f the respondent replies ' N o ' ? H e r e there is a serious d o u b t : does the respondent m e a n he does n o t k n o w , or is he t e l l i n g us t h a t his employer does n o t have an equal o p p o r t u n i t i e s policy? I n conversation, phrases such as ' d o y o u k n o w i f ? ' are used t o a l l o w r o o m f o r people n o t t o k n o w the answer t o a question w i t h o u t any i m p l i c a t i o n t h a t they are i g n o r a n t a n d should k n o w . I n social research, even i n i n f o r m a l interviews, w e s h o u l d find other ways o f m a k i n g i t easy f o r respondents t o say t h a t they d o n ' t k n o w .
Avoiding negatives, double-negatives and worse Suppose y o u are opposed t o the p o l i c y t h a t students s h o u l d c o n t r i b u t e t o their u n i v e r s i t y t u i t i o n fees. W h a t is y o u r response t o the f o l l o w i n g question? Tuition fees should not be abolished Strongly agree
Agree
Neutral
Disagree
Strongly disagree
W o r k i n g o u t y o u r o w n p o s i t i o n o n a negative statement such as this can be p e r p l e x i n g . I t is a p a r t i c u l a r l y acute p r o b l e m f o r people w h o disagree w i t h the negative; i n this example, they d o n o t agree t h a t t u i t i o n fees s h o u l d n o t
Designing the questions
107
be abolished - a d o u b l e negative. I t is far simpler t o present respondents w i t h a positive statement, such as: Tuition fees should be abolished Strongly agree
Agree
Neutral
Disagree
Strongly disagree
The main things that go wrong in designing questions, and how to prevent them Questionnaires t h a t are t o o l o n g We s h o u l d resist the t e m p t a t i o n t o ask questions o u t o f idle curiosity. O t h e r things being equal, the longer a questionnaire is the l o w e r the response rate w i l l be. The Travel Survey is quite short, w i t h a t o t a l o f 23 questions f o r staff and 2 1 f o r students - a n d even here, the m a x i m u m n u m b e r o f questions any respondent has t o answer is o n l y 19. As w e l l as being as concise as possible, the questionnaire needs t o be l a i d o u t i n such a w a y t h a t i t looks manageable. T h e same p o i n t applies t o i n t e r v i e w s , w h i c h s h o u l d n o t be p r o l o n g e d unnecessarily. I n a r r a n g i n g a n interview, i t is n o r m a l t o p r o v i d e respondents w i t h a n estimate o f h o w l o n g i t is expected t o take. C o m m o n examples are: a r o u n d three-quarters o f a n h o u r ; n o m o r e t h a n a n h o u r ; between a n h o u r and a n h o u r a n d a half. I t is necessary t o p r o v i d e such estimates so t h a t respondents can set aside e n o u g h t i m e f o r t h e m . B u t d o w e need t o p r o v i d e a s i m i l a r estimate f o r a s e l f - c o m p l e t i o n questionnaire? S h o u l d w e say things l i k e : 'this questionnaire w i l l take a r o u n d ten minutes t o complete'? I f w e d o , we need t o m a k e sure t h a t o u r estimate is accurate, or o u r false reassurance w i l l be c o u n t e r p r o d u c t i v e . I n any case, whatever w e say, o u r respondents w i l l judge f o r themselves w h e t h e r or n o t the questionnaire l o o k s w o r t h their t i m e a n d t r o u b l e . O n balance, therefore, w e t h i n k i t is n o r m a l l y better t o a v o i d such promises.
R a n k i n g questions t h a t are t o o c o m p l i c a t e d R a n k i n g questions appear t o offer an excellent means f o r gauging the relative salience o f items t o the i n d i v i d u a l respondent. I t appears very attractive to offer respondents a list o f items, asking t h e m t o r a n k t h e m a c c o r d i n g t o their i m p o r t a n c e . Surely this w i l l y i e l d a r i c h b o d y o f data f o r analysis? Suppose w e w i s h t o ask a sample o f postgraduate students f r o m other countries a b o u t their o r i e n t a t i o n t o their studies i n B r i t a i n . We m i g h t decide to ask a question such as the f o l l o w i n g : Students tend t o have priorities in what they hope t o gain f r o m postgraduate study. Judging by what you feel at the moment, please rank the following factors in order of importance t o you, putting a I next t o the factor
108
Surveying the social world
which is most important and so on d o w n t o 9 for the factor which is least important. To To To To
be able t o cultivate a wide range of interests experience a different culture interact w i t h different kinds of people develop intellectually
To To To To
acquire knowledge and skills t o base your career on have a full social life make new friendships develop your sporting abilities
To develop your language skills W h a t c o u l d possibly go w r o n g w i t h this? H a r d experience suggests t h a t a l o t can, a n d p r o b a b l y w i l l : • M a n y respondents w i l l n o t r a n k a l l nine items. Instead, they w i l l r a n k a f e w - perhaps three o f f o u r - a n d leave the rest b l a n k . • Some respondents w i l l w a n t t o have t i e d items, a n d i t is very h a r d t o stop t h e m . For example, they m a y decide t h a t ' t o interact w i t h different k i n d s o f people' a n d ' t o develop i n t e l l e c t u a l l y ' r a n k equal second. H o w w i l l y o u analyse their response? • Some respondents w i l l n o t treat i t as a r a n k i n g exercise. Instead, they w i l l place an X o r a V against the items t h a t matter t o t h e m , leaving a l l the rest blank. • Some respondents w i l l w r i t e i n ' a l l o f t h e m ' . There are t w o ways o f dealing w i t h the p r o b l e m s o f r a n k i n g . One p o s s i b i l i t y is t o s i m p l i f y the task. I n the example above, i t w o u l d be m o r e s t r a i g h t f o r w a r d t o ask respondents t o p u t a V (a t i c k o r a check m a r k ) against the three items t h a t are m o s t i m p o r t a n t t o t h e m . Even so, they w o u l d have a l o n g list o f c o m p l e x items t o c o n t e n d w i t h . As another possibility, w e c o u l d p r o d u c e a m u c h shorter list - three items, say - a n d i n v i t e respondents t o r a n k t h e m 1 , 2 , 3. W e r e c o m m e n d t h a t five is the m a x i m u m n u m b e r o f items t h a t respondents be asked t o r a n k . A l t e r n a t i v e l y , w e can change the r a n k i n g i n t o a r a t i n g . T h e Survey U n i t presented the items as f o l l o w s : Students tend t o have priorities in what they hope t o gain f r o m postgraduate study. Judging by what you feel at the moment, please rate how important the following are t o you. Very Fairly Not important important important To be able t o cultivate a wide range of interests To experience a different culture . . . and so on.
Designing the questions
109
Lack of variety A very c o m m o n f a i l i n g i n questionnaire design is t o a d o p t the same f o r m a t f o r a l l o r m o s t o f the responses. O f t e n , this takes the f o r m o f a l o n g series o f statements t o each o f w h i c h the response categories are: s t r o n g l y agree agree - n e u t r a l - disagree - s t r o n g l y disagree. T h e l a y o u t o f such questionnaires m a y be neat a n d t i d y , b u t they r u n the risk o f being tedious t o c o m plete. A b o r e d respondent is seldom a g o o d i n f o r m a n t .
Vague questions a b o u t frequency o f actions I t is very c o m m o n , i n a l l types o f social survey, t o gather i n f o r m a t i o n a b o u t p e r i o d i c a l actions. We w a n t t o k n o w h o w o f t e n respondents d o t h i n g s . For example, h o w o f t e n d o they go t o the theatre? We m i g h t envisage the following: D o you go t o the theatre? Often Sometimes Rarely
Never
B u t w h a t does this tell us? Suppose a respondent goes t o the theatre r o u g h l y once a w e e k . Is t h a t o f t e n , o r sometimes? I f they go once a m o n t h , is t h a t o f t e n , sometimes o r rarely? T h e p r o b l e m is, o f course, t h a t d i f f e r e n t respondents w i l l i n t e r p r e t the categories differently, so w e shall have o n l y the vaguest idea o f the frequency o f attendance a m o n g o u r respondents. Because the response categories are vague, the danger o f social desirabili t y effects is p a r t i c u l a r l y acute. G o i n g t o the theatre is a relatively high-status activity, suggesting an active interest i n the arts a n d the intellectual life. O v e r - r e p o r t i n g m a y be a p r o b l e m . I n the case o f m o r e socially d u b i o u s activities - g o i n g t o the dogs, perhaps? - u n d e r - r e p o r t i n g is m o r e likely. O n e w a y o f dealing w i t h p e r i o d i c a l b e h a v i o u r is t o offer m o r e specific categories o f response, such as: H o w often on average do you go t o the theatre? More than once
Once a week
Once a month
Once a year
Never
a week A d i f f i c u l t y w i t h this is t h a t the response categories, t h o u g h commonsensical, are n o t exhaustive. W h a t a b o u t someone w h o goes t o the theatre o n average every other week - t h a t is, t w i c e a m o n t h or six times a year? W e have n o category f o r her, a n d f o r others whose p e r i o d i c i t y does n o t fit i n t o o u r categories. A n o t h e r p r o b l e m w i t h this a p p r o a c h is t h a t i t assumes t h a t the behaviour i n question is regular, a n a s s u m p t i o n w h i c h m a y be false. Some people go t o the theatre several times d u r i n g h o l i d a y periods, b u t n o t at a l l at other times. We have i n t r o d u c e d the phrase ' o n average' i n t o the question, t o t r y t o deal w i t h this d i f f i c u l t y , b u t a d i f f i c u l t y i t remains.
110
Surveying the social world
Perhaps w e s h o u l d t i g h t e n u p the response categories. T h u s , f o r example: H o w often do you go t o the theatre? Never 1-5 times 6-10 times a year a year
11-20 times a year
O v e r 20 times a year
T h e g a i n i n precision has been b o u g h t at the cost o f extreme artificiality. A w h o l l y different a p p r o a c h is t o ask people a b o u t their b e h a v i o u r over a specified t i m e p e r i o d . W e m i g h t ask t h e m h o w o f t e n they have been over the last week, or the last m o n t h , o r the last year. T h i s has the advantage o f being specific. There are, however, a n u m b e r o f p r o b l e m s t o be t a c k l e d i f this a p p r o a c h is a d o p t e d . To start w i t h , there is considerable a m b i g u i t y i n asking a b o u t weeks or m o n t h s or years. I m a g i n e a respondent filling o u t a questionnaire o n F r i d a y 18 N o v e m b e r . W h a t w i l l she or he u n d e r s t a n d by the phrase, 'over the last week'? Does i t m e a n the p e r i o d since Sunday 13 N o v e m b e r (Sunday being the first day o f the C h r i s t i a n week)? Does i t m e a n the p e r i o d since M o n d a y 14 N o v e m b e r ( M o n d a y being f o r m a n y people the first day o f the w o r k i n g week)? O r does i t m e a n the seven days since Saturday 12 N o v e m b e r ? I n m a n y cases, researchers p r o b a b l y m e a n seven days - i n w h i c h case w e need to say so. I f w e use the phrase 'over the last m o n t h ' , this m i g h t m e a n the p e r i o d since the b e g i n n i n g o f the m o n t h , or the last 30/31 days, or, m o r e r o u g h l y , the last f o u r weeks. As f o r years, 'over the last year' m i g h t m e a n the p e r i o d since the beginn i n g o f the year, or the last 365/366 days. I n some situations, i t w i l l n o t be clear w h e t h e r the year referred t o is the calendar year b e g i n n i n g 1 January or some other year, such as the financial year o r the academic year. I n the case o f 'over the last t w e l v e m o n t h s ' , the fact t h a t w e m i g h t be a f e w days short o f a f u l l year is u n l i k e l y t o m a t t e r - the p e r i o d is l o n g e n o u g h f o r i t t o be a t r i v i a l issue. I n contrast, asking people w h a t they d i d 'yesterday' is n o t a m b i g u o u s . I t minimizes the p r o b l e m s o f m e m o r y recall. T h e longer the t i m e p e r i o d the greater the o p p o r t u n i t y f o r m e m o r y t o be c o l o u r e d by self-image. A short t i m e p e r i o d therefore helps t o c o m b a t social d e s i r a b i l i t y effects. O n e p o t e n t i a l p r o b l e m w i t h asking a b o u t b e h a v i o u r 'yesterday' is t h a t i t may have been an u n u s u a l day. A respondent w h o , say, has t w o glasses o f w i n e every day m a y n o t have h a d a d r i n k o n t h a t p a r t i c u l a r day f o r some special a n d n o t o f t e n t o be repeated reason. I n some cases this w i l l n o t matter. I f w e have a large sample o f respondents, a n d i f w e are interested i n aggregate data rather t h a n i n i n d i v i d u a l s , these v a r i a t i o n s w i l l be very m i n o r and w i l l p r o b a b l y be cancelled o u t (another respondent w i l l , u n u s u a l l y f o r her, have d r u n k t w o glasses o f w i n e o n a special occasion). W h a t w i l l m a t t e r is i f the t i m e p e r i o d is e x c e p t i o n a l f o r a significant n u m b e r o f respondents. I f w e i n t e r v i e w people o n 2 January a b o u t their
Designing the questions
I II
eating a n d d r i n k i n g over the last seven days, w e have chosen a p e r i o d w h i c h i n m a n y societies is a m a j o r feast, a n d n o t t y p i c a l o f the rest o f the year. Unless over-indulgence at Christmas a n d N e w Year is the object o f o u r research, w e s h o u l d choose another p e r i o d . T h i s is an o b v i o u s example, b u t there are m a n y others w h e r e w e need t o be c a r e f u l : h o l i d a y s , the h o l y days of faiths other t h a n o u r o w n , a n d the b e g i n n i n g a n d end o f cycles such as academic terms. Selecting a sensible a n d m e a n i n g f u l t i m e f r a m e is n o t impossible, b u t requires some t h o u g h t a n d o f t e n a l i t t l e research. The m o s t effective w a y o f asking a b o u t p e r i o d i c a l actions w i l l v a r y f r o m case t o case. G i v e n t h a t p o i n t , B o x 6.6 lists some general guidelines, a l l based o n the need t o be as specific a n d u n a m b i g u o u s as possible.
Box 6.6
Asking about periodical actions
• Avoid 'often - sometimes - occasionally - never' and variants on the theme. Such terms are vague, and mean different things t o different people. • D o not ask about 'the last week', ask about 'the last seven days'. • If asking about a year, be clear what period is meant - f o r example, 'since I January', 'since the start of the academic year', 'over the last twelve months'. • Keep the time period as short as you sensibly can, t o minimize problems of m e m o r y recall and social desirability effects. • Make sure the time period is meaningful, and sensibly matches the periodicity of the behaviour in question. • Make sure the time frame is not an unusual one - unless that is the point of the research.
Lack of clarity about confidentiality and anonymity I f w e tell respondents t h a t o u r questionnaire is a n o n y m o u s , i t means t h a t w e have n o w a y o f i d e n t i f y i n g w h i c h questionnaire belongs t o w h i c h respondent. T h i s is a s t r o n g reassurance, a n d o b v i o u s l y impossible i n i n t e r v i e w situations. Even i n a s e l f - c o m p l e t i o n questionnaire, a n o n y m i t y can be p r o b lematic, as discussed o n page 2 3 . For example, consider a survey o f u n i versity staff t h a t asks respondents t o state t h e i r sex, their r a n k , a n d their academic d e p a r t m e n t . Clearly, the researchers c o u l d m o r e c o n f i d e n t l y guarantee a n o n y m i t y t o a male lecturer i n a large engineering d e p a r t m e n t t h a n i t c o u l d t o a female professor i n a small d e p a r t m e n t o f economics. Since a n o n y m i t y is an absolute categorical guarantee, w e need t o be sure w e can genuinely deliver i t .
I 12
Surveying the social world
I f w e offer o u r respondents confidentiality, w e need t o be clear w h a t is i n v o l v e d . T h e researchers, after a l l , k n o w w h o has said w h a t . C o n f i d e n t i a l ity means t h a t w e w i l l n o t disclose this i n f o r m a t i o n t o anyone else. G u a r a n tees o f c o n f i d e n t i a l i t y t y p i c a l l y i n v o l v e the f o l l o w i n g : • the use o f pseudonyms t o disguise the names o f respondents, places a n d organizations; • c h a n g i n g m i n o r a n d i r r e l e v a n t details i n o r d e r t o disguise these names; • keeping the data securely; • n o t a l l o w i n g access t o the data t o anyone outside the research t e a m ; • d e s t r o y i n g the data at the end o f the p r o j e c t , or a n o n y m i s i n g i t a n d p l a c i n g i t i n an archive.
The most frequently raised problems, and our answers I n o u r experience, there are a n u m b e r o f issues t h a t repeatedly t r o u b l e people w h e n designing questions. T h e issues m o s t c o m m o n l y raised w i t h us, and o u r responses t o t h e m , are these.
S h o u l d I i n c l u d e a m i d d l e category? For e x a m p l e , i n asking respondents a b o u t their level o f agreement or disagreement w i t h a statement, w h i c h o f the f o l l o w i n g L i k e r t scales is preferable? Strongly agree
Agree
Disagree
Strongly disagree
Strongly agree
Agree
Neutral
Disagree
Strongly disagree
Some researchers w o r r y t h a t i f they include the m i d d l e category i t w i l l be t o o attractive. Respondents w i l l s i m p l y d u c k the question a n d take the easy w a y o u t . T h e r e f o r e , so the a r g u m e n t r u n s , i t is better t o force respondents i n t o g i v i n g either a positive o r a negative answer. Against this is the p o i n t t h a t respondents m a y l e g i t i m a t e l y be n e u t r a l . F o r c i n g t h e m i n t o either the p r o - o r the a n t i - c a m p is a r t i f i c i a l , a n d can be extremely a n n o y i n g t o people w h o are genuinely n e u t r a l o n a n issue. A n o t h e r version o f this p r o b l e m arises i n the use o f semantic d i f f e r e n t i a l scales, w h e r e respondents are asked t o rate their views o n a b i p o l a r n u m e r ical scale, w i t h o p p o s i n g adjectives at each p o l e . H e r e is an e x a m p l e , t a k e n f r o m the Survey U n i t ' s o m n i b u s questionnaires. Students are presented w i t h a series o f terms describing the university, a n d asked t o p u t a r i n g r o u n d the n u m b e r w h i c h comes closest t o t h e i r o w n view, t h u s : lively friendly
12 12
3 4 5 3 4 5
dull unfriendly
Designing the questions
I 13
W h e n using such scales, w e r e c o m m e n d h a v i n g either five or seven ratings m o r e is t o o c o m p l i c a t e d a n d adds very l i t t l e . W i t h an o d d n u m b e r o f response categories there is a m i d d l e p o s i t i o n here, response category 3. B u t i f w e presented respondents w i t h a n even n u m b e r o f categories, thus: lively friendly
12 3 4 5 6 I 2 3 4 5 6
dull unfriendly
w e w o u l d deny t h e m the m i d d l e o p t i o n . T h i s w a y o f a p p r o a c h i n g the p r o b l e m is a l i t t l e less o b v i o u s t h a t the earlier e x a m p l e , b u t even so i t is a r t i f i c i a l and potentially annoying. I n s h o r t , w e r e c o m m e n d t h a t a m i d d l e category be p r o v i d e d unless there are c o m p e l l i n g reasons f o r n o t d o i n g so. I f questionnaires are w e l l designed, respondents w i l l n o t give a n e u t r a l answer merely because they are b o r e d o r i n t i m i d a t e d . I f respondents are n e u t r a l or i n d i f f e r e n t t o a n i t e m , t h a t is a w o r t h w h i l e finding. S h o u l d I i n c l u d e a ' d o n ' t k n o w ' category? T h i s is a s i m i l a r p r o b l e m w i t h a s i m i l a r answer. Whenever respondents can sensibly be t h o u g h t n o t t o k n o w a b o u t an i t e m , o r t o be u n c e r t a i n a b o u t i t , w e s h o u l d a l l o w t h e m t o express t h e i r d o u b t or uncertainty. Suppressing the p o s s i b i l i t y o f legitimate ' d o n ' t k n o w s ' a n d 'uncertains' merely distorts the social reality, a n d m a y be very o f f - p u t t i n g t o respondents. I f a substantial percentage o f o u r respondents say they d o n ' t k n o w a b o u t an i t e m , t h a t is n o t a p r o b l e m b u t a finding.
S h o u l d i t be sex, o r gender? A m o n g the basic i n f o r m a t i o n w e gather i n a survey, w e usually w a n t t o k n o w w h e t h e r o u r respondents are male or female. I n an i n t e r v i e w w e d o n o t need t o ask, a n d i t w o u l d be absurd t o d o so. B u t w h a t a b o u t a selfc o m p l e t i o n questionnaire? S h o u l d w e label this variable sex, or gender} T h i s is a n extremely c o m p l e x issue. A t first sight, the answer is clear-cut: it s h o u l d be sex. U n t i l recently, sex was i n v a r i a b l y the label used o n questionnaires. T h e d i s t i n c t i o n d r a w n by sociologists is between sex as b i o l o g i cally given (male a n d female), a n d gender as socially constructed (masculine a n d f e m i n i n e ) . B u t this raises a host o f theoretical, p h i l o s o p h i c a l a n d ideological issues. Sociologists have become increasingly concerned a b o u t the i m p l i c a t i o n o f b i o l o g i c a l d e t e r m i n i s m t h a t is f r e q u e n t l y read i n t o the t e r m 'sex'. T h e same holds true o f 'race', a t e r m w h i c h is n o w usually placed inside i n v e r t e d commas t o s h o w t h a t w e repudiate a l l bogus theories o f race a n d racial s u p e r i o r i t y , recognizing instead o u r c o m m o n h u m a n i t y . N o w a days w e r i g h t l y ask a b o u t ethnicity, n o t 'race'. I n a s i m i l a r f a s h i o n , r i g h t l y
I 14
Surveying the social world
or w r o n g l y , 'sex' is c o m i n g t o be a suspect t e r m as far as questionnaires are concerned. Some people prefer 'gender', b u t others dismiss this as misplaced ' p o l i t i c a l correctness'. T h e answer is definitely n o t t o d u c k the question completely. So m u c h o f social life is s t r u c t u r e d b y gendered inequalities between the sexes t h a t t o f a i l t o r e c o r d the sex o f o u r respondents is t o capitulate t o ignorance. Fortunately, w e can deal w i t h the p r o b l e m b y f o r m u l a t i n g o u r question as follows: A r e you: Female
•
Male
•
Questionnaire layout As w e l l as f r a m i n g i n d i v i d u a l questions t h a t are as accurate, searching a n d sensitive as w e can m a k e t h e m , w e also need t o ensure t h a t the o v e r a l l l a y o u t o f a questionnaire (and the structure o f an i n t e r v i e w schedule) is clear, coherent a n d sensitive. A f e w simple guidelines s h o u l d h e l p . I n B o x 6.7, w e c o m m e n t o n h o w w e a p p l i e d these i n the Travel Survey.
Introduction A f e w sentences briefly i n t r o d u c i n g the questionnaire, i n c l u d i n g any o v e r a l l guidance o n its c o m p l e t i o n , are a m u s t . I t does n o t m a t t e r i f they repeat some o f the matters dealt w i t h i n the c o v e r i n g letter.
Instructions As w e l l as any o v e r a l l guidance o n the c o m p l e t i o n o f the questionnaire, w e need t o m a k e clear t o respondents h o w i n d i v i d u a l questions are t o be answered. I n the Travel Survey, y o u w i l l see t h a t w e have given instructions such as 'Please V one box' or 'Please V all that apply'.
Sections I t is o f t e n h e l p f u l t o respondents, p a r t i c u l a r l y i f the questionnaire is f a i r l y l o n g , t o divide i t i n t o sections, each w i t h a brief i n t r o d u c t i o n t o set the scene. O u r o m n i b u s survey o f postgraduate i n t e r n a t i o n a l students c o n t a i n e d the f o l l o w i n g sections: • Section A - Before you came I n this first section w e are interested i n y o u r reasons f o r choosing the U n i versity o f N o t t i n g h a m a n d y o u r p a r t i c u l a r postgraduate research/course.
Designing the questions • Section
B - Now
you are
I 15
here
I n this section w e are interested i n y o u r o p i n i o n s a n d experiences o f the U n i v e r s i t y as a postgraduate. • Section C - Use of the Internet Increasing n u m b e r s o f students are n o w using the Internet a n d the U n i versity o f N o t t i n g h a m ' s W e b site f o r i n f o r m a t i o n . I n the n e x t f e w questions w e w a n t t o ask y o u a b o u t y o u r use o f this technology. • Section D - Research students only T h i s section is f o r students registered f o r M P h i l a n d P h D degrees. I f y o u are s t u d y i n g f o r any other postgraduate qualifications please go d i r e c t l y to section E. • Section
E - Background
details
I n this section w e ask a f e w questions a b o u t yourself a n d y o u r degree course/research.
Use o f c o l u m n s I f possible, i t is a g o o d idea t o divide each sheet o f the questionnaire i n t o t w o c o l u m n s . T h i s uses the space efficiently: i t cuts d o w n o n unnecessary b l a n k space, a n d i t prevents questions f r o m straggling across the page. Questions s h o u l d be n u m b e r e d v e r t i c a l l y d o w n the c o l u m n s , as i n the Travel Survey: like this:
I 2 3
4 5 6
not like this:
I 3 5
2 4 6
Question numbering N u m b e r i n g the questions is essential i n order t o a v o i d c o n f u s i o n . Some w r i t ers suggest t h a t questions can have subletters (3a, 3 b , 3c a n d so o n ) . T h e i r m a i n reason f o r this is t h a t i t makes the t o t a l n u m b e r o f questions seem less t h a n i t is. We believe, however, t h a t i t is possibly c o n f u s i n g a n d tends t o l o o k fussy, so w e r e c o m m e n d t h a t questions are n u m b e r e d 1 , 2, 3 a n d so o n w i t h out any sublettering. W h e r e an o v e r a l l question has a n u m b e r o f p a r t i c u lar examples - as i n questions 13 a n d 14 o n the Travel Survey - there is n o need f o r separate n u m b e r i n g ; they can a l l be presented under the one question.
Sequence o f questions I f possible, w e begin w i t h relatively s t r a i g h t f o r w a r d questions t h a t w i l l be easy t o answer. These w i l l 'break the ice', b u i l d i n g u p the respondent's c o n fidence i n the survey. M o r e c o m p l e x a n d subtle questions are i n t r o d u c e d later. I f w e w i s h t o ask questions a b o u t personal matters such as age, sex,
I 16
Surveying the social world
ethnicity, i n c o m e , a n d m a r i t a l status, these are n o r m a l l y placed at the end, by w h i c h t i m e w e hope t o have gained the respondent's f u l l confidence. A n alternative is t o p u t these questions at the very b e g i n n i n g , w h i c h is w h y they are sometimes k n o w n as 'face-sheet d a t a ' . A l t h o u g h bureaucratically neat, i n t h a t i t 'gets t h e m o u t o f the w a y ' , w e believe t h a t s t a r t i n g this w a y is potentially off-putting, and we do not recommend it.
Q u e s t i o n skips I t is o f t e n impossible t o devise a questionnaire i n such a w a y t h a t a l l the questions are a p p r o p r i a t e f o r everyone t o answer. We m a y w e l l need t o have filter questions: depending o n y o u r answer t o a filter question, y o u either go o n t o the n e x t question o r s k i p t o a later one. T o o m a n y question skips can be extremely c o n f u s i n g , a n d can m a k e a questionnaire l o o k c l u t t e r e d . We therefore t r y t o keep the n u m b e r o f filter questions t o the absolute m i n i m u m , a n d t o be as clear as possible a b o u t where w e expect respondents t o s k i p t o . T h i s is one o f the reasons w h y questions s h o u l d be n u m b e r e d . Filter questions o f t e n take the f o l l o w i n g f o r m . Q u e s t i o n 3 asks respondents i f they p l a y a musical i n s t r u m e n t . I f yes, they go o n t o answer questions 4 , 5 a n d 6, w h i c h ask w h i c h instruments they play, h o w o f t e n they d o so, a n d h o w m u c h they enjoy i t . I f the answer t o question 3 is n o , t h e n clearly questions 4 t o 6 are irrelevant, so w e need t o i n s t r u c t respondents: ' I f n o , please go t o question 7'. T o o m u c h o f this can be c o n f u s i n g , a n d sometimes rather a n n o y i n g : i f respondents have t o keep s k i p p i n g questions, i t m a y seem as i f the questionnaire is n o t really designed f o r t h e m at a l l . One w a y t o m i n i m i z e the c o n f u s i o n a n d p o t e n t i a l annoyance is t o include the s k i p w i t h i n the q u e s t i o n . H e r e is a n example f r o m the o m n i b u s survey of i n t e r n a t i o n a l postgraduates: Have you received any information about postgraduate training courses since you began your degree? Can't remember
•
Yes
No
•
If yes, w h o was offering t o provide the courses advertised? Training body outside the University The University Graduate School Faculty/Department Unsure w h o the provider was O t h e r (please specify the provider)
• • • • •
Designing the questions
I 17
A n o t h e r p o s s i b i l i t y is t o require respondents t o s k i p t o a w h o l e n e w section, as a w a y o f e l i m i n a t i n g c o n f u s i o n . I f a questionnaire has m o r e t h a n one o r t w o question skips, i t m a y w e l l be a sign t h a t s o m e t h i n g is w r o n g . P r o b a b l y w e are t r y i n g t o survey t o o m a n y different groups o f people a b o u t t o o m a n y d i f f e r e n t t h i n g s . Perhaps w e can cut o u t some o f the questions? I f n o t , t h e n perhaps w e can send d i f f e r e n t questionnaires t o d i f f e r e n t groups? For example, the Survey U n i t devised t w o versions o f the Travel Survey questionnaire, one f o r students a n d one f o r staff. T h e m a i n questions were exactly the same; a l l t h a t d i f f e r e d were questions a b o u t the respondents' w o r k . Sending d i f f e r e n t questionnaires t o d i f f e r e n t categories o f respondent can o b v i o u s l y o n l y be done i f w e can i d e n t i f y i n advance the category t o w h i c h a respondent belongs. W h a t i f w e c a n n o t d o so? O n e p o s s i b i l i t y is t o aband o n the idea t h a t w e can c o n d u c t the research by s e l f - c o m p l e t i o n questionnaire, w h i c h is s i m p l y t o o i n f l e x i b l e an i n s t r u m e n t f o r o u r purposes. We s h o u l d consider i n t e r v i e w i n g respondents. O n e advantage o f the i n t e r v i e w f o r m a t is t h a t i t is the interviewer, n o t the respondent, w h o has t o d o the skipping.
Conclusion H e r e w e t h a n k o u r respondents f o r their c o o p e r a t i o n . W e m a y invite t h e m t o offer any f u r t h e r comments at the end o f the questionnaire or o n a n a d d i t i o n a l sheet. We m u s t also remember t o let t h e m k n o w o r r e m i n d t h e m h o w t o r e t u r n the questionnaire t o us. For example: Thank you f o r taking the time t o complete this questionnaire. If you would like t o make any further comments please attach a piece of paper. Please return the questionnaire in the FREEPOST envelope provided either in the internal o r external mail.
Box 6.7
The layout of the Travel
Survey
The Travel Survey illustrates these principles in action. Each survey has its o w n particular difficulties t o be overcome. W e have already mentioned, on this page, the problem of designing a questionnaire suitable for both students and staff. In the end, a separate questionnaire was sent t o each. The sequence of questions was straightforward. W e began w i t h factual questions about respondents' journey t o w o r k , such as the distance f r o m home t o w o r k , the time taken and the means of transport used. W e then moved t o questions which ask respondents for suggestions about how facilities could be improved. These questions require a little more reflection, but they may also appeal t o respondents since they give them an
I I8
Surveying the social world
opportunity t o have an influence on improving the University's provision. Finally, we ask a series of more personal questions. W e hope that, having completed the earlier parts of our questionnaire, respondents will have confidence in us and o u r research. W e offer the guarantee that 'under no circumstances will attempts be made t o identify individuals'. Even though the questionnaire is anonymous this reassurance is still necessary, particularly so f o r people in a minority - a woman technician, f o r example. If the sequence of questions was easy t o decide, a far more troublesome issue was the sheer complexity of many people's journeys t o w o r k . People do not necessarily use the same means of transport every day of the week o r every week of the year. Some variations follow a regular pattern, others are unpredictable. A car driver may use the bus on Fridays, when her partner has the car t o visit his parents. A pedestrian o r cyclist may take the bus o r a taxi if it is raining heavily. A member of staff may use the car during school t e r m time, in o r d e r t o drop children off at school; outside school t e r m , the parent may cycle t o w o r k . Some people have long and complicated journeys t o w o r k : they drive o r walk t o the railway station in Derby, depending on the weather, catch the train t o Nottingham (or Beeston, if that particular train stops there), and then take a bus o r a taxi t o the university depending on the time and the state of their finances. The Survey Unit's questions had t o be sensitive t o all these possibilities, while keeping the questionnaire short and straightforward. Hence, f o r example, question 3: What mode of transport do you use most often for the longest stage of your journey to campus? W e also faced the problem of question skips. W e wanted t o ask people for their suggestions about h o w facilities could be improved. However, we did not want t o ask car drivers t o speculate about the needs of cyclists, o r pedestrians t o address what they guessed might be the concerns of people using public transport. Instead, we wished t o ask people about problems w i t h which they were themselves familiar. Hence the format of questions 8 t o 14. W e used a combination of visual and verbal signals (the instructions, the shading, and enclosing questions 8 t o 14 within a border) t o indicate w h o should answer which questions. The outcome, which appeared t o be successful, is deceptively simple - but it t o o k time t o get it right.
Designing interview schedules So far, w e have concentrated o n the design o f s e l f - c o m p l e t i o n questionnaires. W h a t a b o u t i n t e r v i e w schedules? Essentially, the same principles apply. Even t h o u g h the i n t e r v i e w has the advantage t h a t the researcher can e x p l a i n any unexpected difficulties a n d t r y
Designing the questions
I 19
t o s m o o t h over any sensitive items, this is n o excuse f o r p o o r design. As m u c h t h o u g h t needs t o go i n t o an i n t e r v i e w schedule as i n t o a self-comp l e t i o n questionnaire. T h i s is true even w h e n the i n t e r v i e w is i n f o r m a l a n d u n s t r u c t u r e d . T h e i n t e r v i e w e r needs t o be become very f a m i l i a r w i t h the i n t e r v i e w schedule or guide, so t h a t the i n t e r v i e w can proceed s m o o t h l y w i t h o u t the d i s t r a c t i o n o f the i n t e r v i e w e r f u m b l i n g f o r the n e x t i t e m . T h e principles g o v e r n i n g the sequence o f items are the same as f o r selfc o m p l e t i o n questionnaires. Questions a b o u t personal details are usually held back u n t i l the e n d . As w i t h questionnaires, i t is h e l p f u l t o indicate t o interviewees any significant changes o f t o p i c w i t h i n the interview. H e r e is the w a y i n w h i c h Saunders ( 1 9 9 0 ) i n t r o d u c e d the v a r i o u s sections o f his interviews w i t h h o m e o w n e r s i n the U K : I would like t o begin by asking a few questions about your past and present housing. I'm interested in getting some idea of h o w you spend your spare time. I would n o w like t o ask a few questions about your household's income and outgoings. Finally, returning t o the theme of your house and home . . . A m o n g the specific issues arising i n the p l a n n i n g a n d e x e c u t i o n o f interviews, a n d n o t covered by o u r discussion o f s e l f - c o m p l e t i o n questionnaires, the m o s t i m p o r t a n t are: using probes; using s h o w cards, i n c l u d i n g p r o m p t s ; r e c o r d i n g the responses; a n d r e s p o n d i n g t o interviewees' queries. W e deal w i t h each o f these i n t u r n .
Probes Probes m a y be classified i n t o t w o types: probes seeking m o r e detailed fact u a l i n f o r m a t i o n , a n d probes designed t o encourage respondents t o elaborate o n their o p i n i o n s or accounts o f their o w n experience. I n a s t r u c t u r e d interview, w e m a y need t o p r o b e respondents f o r fuller or m o r e detailed i n f o r m a t i o n . I n order t o decide o n w h a t probes t o use, w e need t o k n o w exactly w h a t i n f o r m a t i o n w e r e q u i r e . Unless an i n t e r v i e w is entirely s t r u c t u r e d , there are l i k e l y t o be occasions w h e n w e w a n t t o d r a w o u r interviewees o u t , asking t h e m n o t merely f o r m o r e i n f o r m a t i o n b u t also t o e x p a n d o n their t h o u g h t s , feelings a n d experiences. W e d o n o t , however, w i s h o u r i n t e r v i e w t o seem like a n i n t e r r o g a t i o n or i n q u i s i t i o n . B o x 6.8 gives a list o f w a y s i n w h i c h w e can p r o b e f o r a fuller response. W e list t h e m i n order o f intrusiveness, w i t h the least i n t r u s i v e first.
120
Surveying the social world
Box 6.8
Some sample probes for eliciting a fuller response
1 A n expectant pause. 2 A n encouraging sound: ' m m h m m ' , 'uh-huh'. 3 Repeating part o r all of the interviewee's reply: 'So, you switched t o sociology after your first year at university?' 4 Summarizing their response. 'So, your reason for switching t o sociology was that you were aiming for a career in market research?' 5 Asking for an example: 'Can you give me an example of the problems you had w i t h economics?' 6 Asking for clarification: 'I'm not quite sure I've understood why you were unhappy w i t h economics. Could you tell me a little more?'
I t is desirable t o make this k i n d o f p r o b e as u n t h r e a t e n i n g as possible. A n expectant pause is o f t e n e n o u g h . Silences, i f they appear i n danger o f becomi n g embarrassing, can be filled w i t h ' m m h m m ' s a n d ' u h - h u h ' s , perhaps n o d d i n g the head t o indicate encouragement. Repeating p a r t o r a l l o f w h a t the interviewee has said is o f t e n very effective i n m o v i n g the interviewee t o elaborate their earlier response. These ways o f p r o b i n g are t y p i c a l l y m o r e effective t h a n b l u n t l y asking f o r ' m o r e details'. A l t h o u g h i t is t e m p t i n g t o p r o b e b y asking, 'Is there a n y t h i n g m o r e y o u w o u l d l i k e t o say?' o r 'Is there a n y t h i n g y o u w o u l d l i k e t o add?', these are as m u c h ways o f b r i n g i n g a t o p i c t o a n end as they are o f o p e n i n g it u p . T h e y i n v i t e responses such as, ' N o , that's a b o u t i t ' , o r ' I can't t h i n k o f a n y t h i n g else, n o ' .
Show cards We o f t e n w a n t t o ask a series o f questions w h i c h have the same response categories. O n e w a y o f h a n d l i n g this w o u l d be t o ask: ' W o u l d y o u say t h a t y o u are very satisfied, satisfied, neutral, dissatisfied o r very dissatisfied w i t h the f o l l o w i n g ? ' , f o l l o w e d by reading o u t each i t e m i n t u r n . T h i s can be a w k w a r d , because i t relies o n the respondent's remembering w h a t the response categories were. T h e longer the list, the greater the p r o b l e m is likely t o be. Respondents m a y say things like: ' I ' m n o t very satisfied, n o . ' T h e p r o b l e m w i l l be t h a t w e do n o t k n o w whether t o record this as 'dissatisfied' o r 'very dissatisfied', so w e w i l l have t o ask, 'Does that mean that y o u are dissatisfied o r very dissatisfied?' The respondent m a y feel t h a t he is being corrected i n some w a y f o r h a v i n g failed t o remember w h a t the appropriate response categories were. T h i s d i f f i c u l t y is a v o i d e d b y h a v i n g the response categories w r i t t e n o u t o n a s h o w c a r d , w h i c h w e h a n d t o the respondent. I n this example, the response c a r d is acting as a p r o m p t o r r e m i n d e r t o the respondent. As w e l l as acting as p r o m p t s , s h o w cards can be used t o present lists o f
Designing the questions
121
items t o respondents, f o r example asking respondents t o indicate w h i c h items they possess f r o m a list o f consumer goods. Similarly, a list o f age bands or income brackets is t y p i c a l l y given t o respondents o n a s h o w c a r d . Show cards are also used t o present m a t e r i a l t o respondents f o r c o m m e n t . Vignettes are t y p i c a l l y presented i n this w a y . Interesting examples o f vignettes m a y be f o u n d i n F i n c h and M a s o n (1993) Negotiating Family Responsibilities. R e s p o n d i n g t o interviewees' queries I n a n i n t e r v i e w , i t is n o t u n c o m m o n f o r respondents t o raise queries a n d questions. I f these are p o i n t s o f c l a r i f i c a t i o n , o r reassurance a b o u t c o n fidentiality, we s h o u l d be i n a p o s i t i o n t o r e p l y t o t h e m o p e n l y a n d straightforwardly. I n some cases, p a r t i c u l a r l y i n less s t r u c t u r e d i n t e r v i e w s , respondents m a y ask questions a b o u t the i n t e r v i e w e r s ' o w n beliefs a n d experiences. For example, i n his interviews w i t h clergy A l d r i d g e was asked a b o u t his o w n religious beliefs o r lack o f t h e m . H i s response was t o say t h a t he w o u l d be very h a p p y t o t a l k a b o u t those issues at the end o f the i n t e r v i e w , b u t w o u l d like f o r the m o m e n t t o concentrate o n the interviewee's o w n beliefs a n d experiences. Respondents were i n v a r i a b l y h a p p y t o proceed i n t h a t w a y .
R e c o r d i n g the responses I n s t r u c t u r e d a n d semi-structured i n t e r v i e w s , the researcher w i l l t y p i c a l l y be r e c o r d i n g the responses as she goes a l o n g . Just as s e l f - c o m p l e t i o n questionnaires have t o be easy f o r the respondent t o complete, so i n t e r v i e w schedules have t o be clear a n d s t r a i g h t f o r w a r d f o r the interviewer. T h i n g s w i l l be made easier i f there is a clear visual d i s t i n c t i o n between the questions t o be asked o f the respondent, a n d i n s t r u c t i o n s t o the i n t e r v i e w e r a b o u t question skips a n d probes. V e r y o f t e n , the i n s t r u c t i o n s t o the interviewer are p r i n t e d i n b o l d c a p i t a l letters. U n s t r u c t u r e d i n t e r v i e w s w i l l usually be tape recorded, w i t h the interviewees' agreement, whereas there is n o p o i n t i n t a p i n g a f u l l y s t r u c t u r e d interview. W h a t a b o u t semi-structured interviews? I f there is a large n u m b e r o f open-ended questions, i t m a y be w o r t h r e c o r d i n g the i n t e r v i e w a n d t r a n scribing the relevant parts o f i t . F u l l t r a n s c r i p t i o n is t i m e - c o n s u m i n g : even w i t h a t r a n s c r i p t i o n machine, f o r every hour's w o r t h o f r e c o r d i n g y o u s h o u l d a l l o w five hours f o r t r a n s c r i p t i o n .
Setting up for coding W h e n designing a questionnaire or schedule, w e need t o l o o k ahead t o the stage at w h i c h w e w i l l be analysing the data. A l l b u t the simplest surveys w i l l call f o r i n p u t t i n g the data i n t o a computer.
122
Surveying the social world
I n s e l f - c o m p l e t i o n questionnaires a n d s t r u c t u r e d i n t e r v i e w schedules, a l l or m o s t o f the items w i l l be pre-coded questions. W e decide i n advance w h a t all the categories o f response w i l l be. I n order t o s i m p l i f y a n d speed u p the process o f entering data i n t o the computer, i t is h e l p f u l t o have n u m b e r s , i n clear b u t u n o b t r u s i v e typeface, b y the side o f each o f the response categories. T h e Travel
Survey
shows h o w this is done. Q u e s t i o n 3, f o r example, l o o k s
like this: 3 W h i c h mode of transport do you use most often for the longest stage of your journey t o work? Please / Walk Bicycle Rail Bus Car as driver Car as passenger Motorbike as driver Motorbike as passenger
all that apply
m a a a a a a a
Even w h e r e respondents are presented w i t h a n open-ended q u e s t i o n , w e sometimes precode the responses. T h i s is o n l y possible w h e r e w e have a reasonably clear idea o f w h a t the responses are l i k e l y t o be. W h e r e w e have little idea, o r w h e r e the range o f p o t e n t i a l answers is u n m a n a g e a b l y large, t h e n p r e c o d i n g is n o t possible. T h e open-ended questions o n the Travel Survey (questions 8, 9, 12, a n d the last p a r t o f 13) were n o t precoded. W i t h open-ended questions, the p r o b l e m o f c o d i n g can be acute. Respondents express themselves i n t h e i r o w n u n i q u e w a y , a n d w e have t o classify their responses under a p r e d e t e r m i n e d h e a d i n g . W h e r e m o r e t h a n one person is i n v o l v e d i n c o d i n g , consistency is even m o r e d i f f i c u l t t o achieve. Consistency is a p r o b l e m , b u t the need t o use o u r i m a g i n a t i o n a n d insight t o i n t e r p r e t respondents' answers is n o t a p r o b l e m b u t essential t o the sociological i m a g i n a t i o n .
Designing the questions
123
on have, designed^ * ~ o r taking part In the Interview whose schedule • * A r e there any questions you are asking merely o
Further reading Oppenheim (1992) Questionnaire Design, Interviewing and Attitude Measurement (new edition) is a clear and thorough guide. Devine and Heath (1999) Sociological Research Methods in Context examines eight major sociological studies, six of which used a survey as one part of their research strategy.
7 ) Processing responses
Key elementi In tl * Hariual and a y t o m * Formats for data fi
Introduction T h e i n f o r m a t i o n collected f r o m respondents w i l l n o r m a l l y need t o be translated i n t o a d i g i t a l f o r m a t i n p r e p a r a t i o n f o r subsequent analysis b y c o m puter, a process t h a t can be t e r m e d response processing. For large samples and lengthy questionnaires, this t r a n s l a t i o n w i l l i n v o l v e a n extensive a m o u n t o f r o u t i n e w o r k , b u t i t also requires some decisions t o be t a k e n t h a t w i l l shape the w a y data analysis can be c o n d u c t e d . I n m o r e detail, response processing entails the f o l l o w i n g elements:
Processing responses
125
1 T h e selection o f a f o r m a t f o r the d i g i t a l data file w i t h i n w h i c h cases a n d responses can be r e c o r d e d , checked, analysed a n d , i f necessary, transf o r m e d . T h e chosen f o r m a t determines the f r a m e w o r k w i t h i n w h i c h elements 2 a n d 4 w i l l be c a r r i e d o u t . 2 T h e c o n s t r u c t i o n o f a c o d e b o o k - a paper or c o m p u t e r i z e d list i d e n t i f y ing a n d l a b e l l i n g the set o f variables t h a t the researcher decides t o derive f r o m the questions a n d the responses, together w i t h a sequence o f (norm a l l y ) n u m e r i c codes a n d t e x t u a l labels t h a t represents a l l the possible types o f response a n d non-response f o r each v a r i a b l e . 3 I n c o n j u n c t i o n w i t h the f o r m a t o f specific questions, the codes chosen i n element 2 w i l l determine the level o f measurement o f each v a r i a b l e : this c o n s i d e r a t i o n has m a j o r i m p l i c a t i o n s f o r data analysis a n d is discussed o n page 129. 4 C o d i n g - the selection o f a n a p p r o p r i a t e code f r o m the c o d e b o o k f o r each case/question a n d its e n t r y ( k e y b o a r d i n g , data i n p u t ) o n t o the c o m p u t e r data file: a l t h o u g h i t is better reserved f o r this specific process, c o d i n g is sometimes used as an equivalent t o response processing as a w h o l e . 5 C h e c k i n g a n d cleaning the data file. Some o f the methods t h a t can be a d o p t e d t o handle the response-processing phase deal w i t h p a r t i c u l a r elements i n this list a u t o m a t i c a l l y , b u t i t is i m p o r t ant f o r the researcher t o appreciate w h a t is h a p p e n i n g ' b e h i n d the scenes' t o prevent u n w a n t e d o p t i o n s being a d o p t e d by d e f a u l t . T h e f o l l o w i n g sections cover each o f the five listed items, s t a r t i n g w i t h data i n p u t because the scale of this labour-intensive task i n m e d i u m a n d large projects can o v e r s h a d o w the other elements i n response processing.
Manual, semi-automated and automated data input T h e use o f a c o m p u t e r t o assist response processing a n d subsequent phases has been presumed p r i n c i p a l l y because m a n u a l data analysis is slow, inflexible, l i m i t e d t o basics a n d e r r o r - p r o n e . V e r y f e w surveys are n o w analysed by h a n d (the exceptions are m a i n l y i n f o r m a l , 'in-house' exercises based o n h a n d f u l s o f respondents a n d p i l o t projects w h e r e the substantive results m a y be i r r e l e v a n t ) . C o m p u t e r packages i n general are discussed i n Chapter 3 b u t here i t is w o r t h n o t i n g the v a r i e t y o f possibilities f o r computer-assisted response processing, s t a r t i n g w i t h the highest levels o f a u t o m a t i o n .
Electronic surveys E m a i l surveys, w e b surveys, a n d surveys c o n d u c t e d using some c o m p u t e r packages (for e x a m p l e , K e y P o i n t ) a l l o w respondents t o generate a n d c o m plete a n electronic questionnaire o r f o r m o n a c o m p u t e r screen. T h e codeb o o k , codes a n d levels o f measurement are a l l d e t e r m i n e d at the p o i n t the
126
Surveying the social world
f o r m is designed so t h a t w h e n a case is r e t u r n e d via e m a i l or d i s k , the responses are already i n a d i g i t a l f o r m a t ready f o r analysis. A m a j o r attract i o n o f this o p t i o n t o researchers is t h a t the l a b o u r o f data i n p u t is p e r f o r m e d by the respondent! A d o w n s i d e is t h a t a l l respondents require c o m p u t e r access.
O p t i c a l M a r k Readers ( O M R ) A n intermediate degree o f a u t o m a t i o n o f data entry is possible t h r o u g h the use o f O M R e q u i p m e n t . T h i s technique employs c u s t o m - p r i n t e d paper questionnaires i n w h i c h pen m a r k s made b y respondents i n pre-designated areas are detected i n a scan o f the f o r m . T h e responses t h a t are detected are electronically c o m p i l e d i n t o a data file. O M R comes i n t w o variants, very expensive dedicated machines t h a t are designed f o r heavy d u t y a n d h i g h t h r o u g h p u t (these w i l l be b e y o n d the budgets o f m o s t one-off projects), a n d software applications (such as Remark OMR, p u b l i s h e d b y P r i n c i p i a P r o d ucts) t h a t r u n o n an o r d i n a r y desktop c o m p u t e r a n d read the i n p u t f r o m a c o n v e n t i o n a l flatbed scanner l i n k e d t o the PC. I n b o t h cases, the c o d e b o o k , codes a n d levels o f measurement are a l l d e t e r m i n e d at the p o i n t the quest i o n n a i r e is designed. A relatively simple cost-benefit c a l c u l a t i o n s h o u l d indicate w h e t h e r O M R is an attractive o p t i o n . I n a d d i t i o n t o the PC, scanner a n d s o f t w a r e package, the costs need t o include p r i n t i n g the c u s t o m questionnaires, whose l a y o u t is constrained b y the requirements o f the s o f t w a r e . I n a d d i t i o n , there are some less o b v i o u s costs a n d l i m i t a t i o n s . Forms need t o be fed i n t o the scanner b y h a n d (time-consuming) or mechanically (an a d d i t i o n a l expense); open-ended t e x t u a l responses c a n n o t be dealt w i t h via O M R a n d have t o be filtered o u t t o a d i f f e r e n t technology; there is always a finite e r r o r rate i n the scanning process caused b y a m b i g u o u s respondent m a r k s w h i c h have t o be resolved by h u m a n inspection: the process o f scanning O M R f o r m s generally m o n o p o l i z e s the capacity o f a desktop c o m p u t e r w h i c h c a n n o t be used simultaneously f o r other purposes.
Specialized data e n t r y s o f t w a r e Computer-assisted telephone i n t e r v i e w i n g ( C A T I ) a n d Computer-assisted personal i n t e r v i e w i n g ( C A P I ) s o f t w a r e i n w h i c h the a p p r o p r i a t e questions a n d p r o m p t s appear o n an interviewer's screen, dedicated data entry p r o grammes (such as SPSS Data Entry II) a n d m a n y general survey packages offer facilities designed t o accompany a n d assist the m a n u a l e n t r y of data via a c o m p u t e r k e y b o a r d . T h e f u n c t i o n s available i n c l u d e : • Data-entry screens Some s o f t w a r e ( p a r t i c u l a r l y the dedicated packages) a l l o w the design o f customized c o m p u t e r screen displays f o r data entry (a
Processing responses
i 27
choice m a y be offered between t i c k i n g boxes o n a n on-screen facsimile o f the questionnaire/interview schedule, o r entry o f values i n t o the cells o f a spreadsheet w h e r e each r o w represents a case a n d each c o l u m n a v a r i able). Supplementary i n f o r m a t i o n t o assist c o d i n g decisions can be supp l i e d t o i n p u t t e r s v i a w i n d o w s o r boxes. Such devices are p a r t i c u l a r l y useful w h e r e data i n p u t is sub-contracted t o h i r e d hands. • Constrained entry fields Software c a n prevent t o o f e w o r t o o m a n y characters being entered f o r a specific response. • Double entry T o ensure i n p u t accuracy, each data i t e m is t y p e d t w i c e w i t h the software testing f o r consistency. • Routing W h e n a response t o a question w i t h a filter takes a b r a n c h t h a t skips ahead, the software c a n ensure t h a t the p o i n t o f e n t r y f o r i n p u t t i n g the n e x t response a u t o m a t i c a l l y j u m p s ahead also, m i n i m i z i n g i n p u t errors. • Bounds checking N u m e r i c a l l i m i t s c a n be set t o ensure t h a t impossible values are n o t entered as a result o f k e y b o a r d errors ( f o r instance, i n a survey w i t h a target p o p u l a t i o n o f employees, attempts t o enter a respondent age outside the range 1 6 - 6 5 inclusive c o u l d be p r o h i b i t e d ) . • Consistency tests These ensure t h a t e m p i r i c a l l y u n l i k e l y o r inconsistent c o m b i n a t i o n s o f characteristics w i t h i n a case are a u t o m a t i c a l l y detected and h i g h l i g h t e d (for example, a h o u s e h o l d w h i c h appears t o o w n 2 0 cars or a n i n d i v i d u a l whose responses t o d i f f e r e n t questions suggest a c u r r e n t status o f b o t h e m p l o y e d a n d u n e m p l o y e d ) . Such cases c o u l d be the result o f i n t e r v i e w e r mistakes o r respondents m i s u n d e r s t a n d i n g questions.
Standard office a p p l i c a t i o n s o f t w a r e I f dedicated software is unavailable, a standard office spreadsheet o r database a p p l i c a t i o n w i l l p r o v i d e a m o r e t h a n adequate m a n u a l data entry e n v i r o n m e n t t h a t m a y offer several o f the facilities m e n t i o n e d above. F a i l i n g these, a w o r d processor o r t e x t e d i t i n g a p p l i c a t i o n w i l l suffice, b u t some m a n u a l w a y o f checking the i n t e g r i t y o f the data entered (or a sample o f i t ) s h o u l d be considered.
Data file formats and data types I f a n integrated survey software package has been selected f o r use i n a p r o ject, this choice w i l l p r o b a b l y have determined the file f o r m a t i n w h i c h the i n p u t data is h e l d f o r analysis a n d presentation purposes. Such packages t e n d t o e m p l o y their o w n p r o p r i e t a r y file f o r m a t s t h a t are very u n l i k e l y t o be directly c o m p a t i b l e w i t h any o f the others. H o w e v e r , m a n y packages a l l o w a range o f a d d i t i o n a l f o r m a t s t o be recognized a n d i m p o r t e d semia u t o m a t i c a l l y (and also, t o a lesser degree, t o be e x p o r t e d ) .
I28
Surveying the social world
I f data i n p u t has t o take place before a c o m p u t e r analysis package has been e a r m a r k e d , the safest strategy is t o i n p u t the data using a w i d e l y recognized file f o r m a t such as CSV ( c o m m a separated value) or T S V (tab separated v a l u e ) . These are very simple t e x t - o n l y files c o n t a i n i n g a series o f data items (the n u m e r i c value or series o f a l p h a n u m e r i c characters representing each response) separated b y a c o m m a o r t a b character a c t i n g as a d e l i m i t e r (or ' p u n c t u a t i o n ' ) between each i t e m . A carriage r e t u r n character punctuates each respondent's data (each case). A CSV file w i l l consist entirely o f p r i n t a b l e characters ( i n c l u d i n g spaces) plus carriage r e t u r n s , a n d a T S V file p r i n t a b l e characters plus tabs a n d carriage r e t u r n s . Such files can be created easily i n w o r d processors a n d spreadsheets ( e x p o r t i n g i n t e x t o n l y m o d e ) a n d t e x t e d i t o r s . A useful o p t i o n i n CSV files is t o i n c l u d e as the first ' r o w ' o f the file (up t o the first carriage r e t u r n ) a list o f the names ( i n q u o t a t i o n m a r k s a n d separated b y commas) o f each v a r i a b l e i n t u r n t h a t the data items represent. A n y a p p l i c a t i o n capable o f i m p o r t i n g the data file s h o u l d a u t o m a t i c a l l y recognize these as v a r i a b l e names a n d display t h e m appropriately. Survey c o m p u t e r packages are capable o f recognizing a finite range of data types (that is, variable values). I t is o b v i o u s l y c r u c i a l t o ascertain t h a t the package y o u select can handle a l l the types t h a t are relevant t o the p r o ject i n h a n d . N e a r l y a l l packages can handle integers ( w h o l e n u m b e r s ) , f l o a t ing p o i n t s (decimals), strings (or alphanumerics, w i t h values made u p o f letters, n u m b e r s a n d spaces) a n d dates. Some have a d d i t i o n a l data types f o r currency a n d freetext (continuous prose). Packages also differ over matters such as l i m i t s o n the t o t a l n u m b e r o f variables, the m a x i m u m a n d m i n i m u m p e r m i t t e d values o f integer a n d f l o a t i n g p o i n t variables, the m a x i m u m n u m b e r o f d e c i m a l places t h a t can be processed, the m a x i m u m l e n g t h o f strings, a n d the characters t h a t can be i n c l u d e d w i t h i n strings (such as p u n c t u a t i o n m a r k s a n d accented characters). A c r u c i a l c o n s i d e r a t i o n i f a quest i o n n a i r e contains open-ended questions is w h e t h e r a package can handle lengthy c o m m e n t s . Text h a n d l i n g i n several packages is restricted t o s t r i n g variables w i t h a m a x i m u m o f 255 characters. I f this is the case, lengthy responses w i l l either need t o be split between several variables, o r openended c o m m e n t s m a y have t o be directed t o an alternative package f o r analysis.
Constructing the codebook Before packages f o r the survey process became available f o r desktop c o m puters, a c o d e b o o k listing the sequence o f variables w i t h their code assignments h a d t o be w r i t t e n o u t by h a n d . Each s o f t w a r e package n o w has its o w n procedures f o r eliciting the c o d e b o o k i n f o r m a t i o n w h i c h enables i t t o find every variable i n the data file, store the values i n t e r n a l l y a n d display the
Processing responses
129
data i n a h e l p f u l f o r m a t o n screen ( p r o v i d i n g the names o f the variables i n a CSV file, as described above, is one w a y o f c o m m u n i c a t i n g a v i t a l p a r t o f the c o d e b o o k ) . H e r e is a list o f items o f i n f o r m a t i o n t h a t a package m i g h t require t o construct its database: • Variable location I n some types o f file f o r m a t , the c o m p u t e r package needs t o k n o w w h e r e w i t h i n the c o l u m n s o f data i n the data file the values for a p a r t i c u l a r variable begin a n d end. • Variable type As i n d i c a t e d above, survey s o f t w a r e m a y recognize a range o f types a n d require t h e m t o be distinguished. • Numeric format T h e character o f n u m e r i c variables m a y need t o be f u r t h e r specified - f o r example, as integer, specific f l o a t i n g (decimal) p o i n t f o r m a t , scientific n o t a t i o n . • Variable name/label Packages m a y operate w i t h i n t e r n a l reference names t h a t are constrained i n l e n g t h a n d s t a r t i n g character, b u t m a y p e r m i t m o r e intelligible variable labels f o r display purposes. • Display formats I t m a y be possible t o c o n t r o l the w a y the value o f a v a r i able is displayed o n screen independently o f the f o r m a t i n w h i c h i t is stored w i t h i n the database. • Value labels These supplement the actual values f o r display purposes w i t h explanations o f the m e a n i n g o f each k i n d o f response/code. A l t h o u g h some o f this i n f o r m a t i o n w i l l be m a n d a t o r y , other elements such as the labels are n o r m a l l y o p t i o n a l . H o w e v e r , i t is usually w e l l w o r t h the t i m e i n v o l v e d t o label f u l l y any database file. I t is p a r t i c u l a r l y i m p o r t a n t i f there is a t e a m c o n d u c t i n g the analysis or r e v i e w i n g the results i n order t o prevent misunderstandings over h o w variables o r codes have been c o n structed. Even solo researchers r e t u r n i n g t o a file after a l o n g i n t e r v a l can forget the d e t a i l o f decisions t a k e n m o n t h s or years p r e v i o u s l y : a f u l l y labelled database file reduces the researcher's dependence o n m e m o r y .
Levels of measurement As Chapter 2 made clear, q u a n t i t a t i v e analysis o f any k i n d requires the key theoretical concepts i n an area o f i n q u i r y t o be o p e r a t i o n a l i z e d , t h a t is, t o be associated w i t h e m p i r i c a l observations or measurements. I n survey research, the question design, the selection o f categories t o classify the responses a n d the choice o f codes t o represent the categories j o i n t l y determine the exact m a n n e r i n w h i c h a concept w i l l be t r a n s f o r m e d i n t o one or m o r e corres p o n d i n g variables. T h e choice o f categories t o classify the responses is f u n d a m e n t a l w i t h i n o p e r a t i o n a l i z a t i o n because i t is the relations t h a t exist between the d i f f e r e n t categories t h a t sets l i m i t s o n w h a t sort o f measurement w i l l be possible a n d h o w sophisticated any statistical analysis o f the resulting data w i l l be. F o u r types o f relations, o r levels o f measurement, are
130
Surveying the social world
c o n v e n t i o n a l l y distinguished. T h e y are presented b e l o w i n ascending order o f the s o p h i s t i c a t i o n o f measurement t h a t can be c o n d u c t e d .
N o m i n a l level W h e r e the set o f categories f o r a v a r i a b l e possesses n o i n t r i n s i c o r d e r o r scale, classification rather t h a n measurement p r o p e r applies. Consider, as an i l l u s t r a t i o n , q u e s t i o n 2 0 o n the Travel Survey q u e s t i o n n a i r e . There is n o i n t r i n s i c o r d e r f o r the d i f f e r e n t w o r k i n g areas w i t h i n the U n i v e r s i t y . ' L i b r a r i e s ' c o u l d have been listed first a n d coded ' 1 ' w i t h the A r t s F a c u l t y listed last a n d g i v e n code ' 1 0 ' w i t h o u t a n y t h i n g being upset. A l p h a b e t i c a l orderings o f categories are c o n v e n t i o n a l , n o t i n t r i n s i c , orders, so ' 1 ' is merely being assigned here as a convenient n u m e r i c a l label (the categories c o u l d have been assigned any set o f n u m e r i c o r n o n - n u m e r i c codes as l o n g as each was d i f f e r e n t - s t a r t i n g at 1 a n d c o u n t i n g u p is s i m p l y a m a t t e r o f c o n v e n t i o n ) . W h e r e codes are a r b i t r a r y , they are n o t n u m b e r s i n m a t h e m a t i c a l terms a n d i t is n o t legitimate t o c a r r y o u t a r i t h m e t i c operations o n t h e m ( i n the above e x a m p l e , there is n o sense i n w h i c h a case i n the A r t s F a c u l t y is nine m o r e or less t h a n one i n the L i b r a r i e s ) . As a result, the types o f statistical analysis t h a t are possible o n n o m i n a l variables are severely restricted. O r d i n a l level I n a variable at an o r d i n a l level o f measurement, the categories d o have an intrinsic order. Consider Travel Survey question 14. There is clearly a scale o f l i k e l i h o o d o f change w i t h i n the categories so t h a t responses i n the 'very l i k e l y ' categories (coded 1) represent greater l i k e l i h o o d o f change i n c o m m u t i n g patterns t h a n those i n the 'possibly' boxes (coded 2 ) , a n d the latter stand f o r greater degrees o f l i k e l y change t h a n those i n the ' u n l i k e l y ' boxes (coded 3). Ideally, 'very l i k e l y ' s h o u l d be coded 3, a n d ' u n l i k e l y ' 1 , b u t 1 f o r 'possibly', 2 f o r ' u n l i k e l y ' a n d 3 f o r 'very l i k e l y ' is i n t u i t i v e l y w r o n g . Even t h o u g h the numbers assigned reflect the o r d i n a l n a t u r e o f the categories, there are still restrictions o n h o w the codes can be m a n i p u l a t e d a r i t h m e t i c ally, a l t h o u g h a w i d e r range o f statistics can be used at the o r d i n a l t h a n at the n o m i n a l level o f measurement.
I n t e r v a l level There are very f e w variables w i t h i n t e r v a l scales i n c o m m o n use i n the social sciences, a l t h o u g h temperature measured i n Fahrenheit provides a n a t u r a l science example. A u n i t o f measurement is i n t r o d u c e d i n t o the p i c t u r e t h a t enables the ' i n t e r v a l ' or q u a n t i t a t i v e difference between any t w o measured cases t o be established. Because the b o t t o m o f a temperature scale like
Processing responses
131
Fahrenheit is n o t anchored i n the real w o r l d constant, absolute zero, b u t uses the a r b i t r a r y ( b u t convenient) p o i n t at w h i c h w a t e r freezes, i t is n o t legitimate t o m u l t i p l y o r d i v i d e , o n l y t o a d d a n d subtract, the difference between temperatures. T h u s , i f Manchester is 8 0 ° F a n d M o n t e v i d e o is 4 0 ° F, y o u can say i t is 4 0 ° F h o t t e r i n Manchester, b u t y o u c a n n o t say t h a t i t is t w i c e as h o t .
R a t i o level I n the highest level o f measurement, the b o t t o m o f the scale is g r o u n d e d i n a ' n o n - a r b i t r a r y zero' p o i n t a n d there are fewer restrictions o n the ways values c a n be legitimately t r a n s f o r m e d . Social scientific examples include age, i n c o m e a n d p o p u l a t i o n density. Consider also q u e s t i o n 1 1 o n the Travel Survey. Each respondent w h o travels b y p u b l i c t r a n s p o r t enters a s u m i n p o u n d s sterling t h a t becomes the value f o r t h a t case o n t h a t variable. I t is legitimate t o c a r r y o u t a n y basic a r i t h m e t i c o p e r a t i o n o n these values, so i f one case has a value o f £ 3 . 5 0 t h e n i t t r u l y represents t w i c e the e x p e n d i t u r e o f another w i t h a value o f £ 1 . 7 5 . To s u m m a r i z e , i n n o m i n a l variables, classification rather t h a n measurement has t a k e n place ( a l t h o u g h i t is still possible t o m a k e some q u a n t i t a t i v e statements a b o u t the d i s t r i b u t i o n o f cases between the categories). I n o r d i n a l variables (such as those w i t h L i k e r t - t y p e categories), o n l y the p o s i t i o n s o f cases relative t o each other o n the variable have been measured. I n r a t i o v a r i ables, a n u m b e r o f units have been assigned t o each case i n a w a y t h a t establishes their absolute positions o n the d i m e n s i o n being measured.
Pre-coding and post-coding I t s h o u l d be clear f r o m the last section t h a t the value a case o r respondent possesses o n a variable measured at the o r d i n a l level o r above is n o t strictly speaking a code at a l l b u t a n o n - a r b i t r a r y measurement t h a t has t o be respected. T h i s is t r u e w h e t h e r the measurement has been c a r r i e d o u t b y the respondent themselves, as i n question 1 1 o n the Travel Survey, o r b y (or o n behalf o f ) the researcher as, f o r example, w h e r e psychological o r c l i n i c a l tests are administered a n d a score o r measure is extracted b y the researcher. O f course, i t m a y be necessary a n d entirely legitimate t o t r a n s f o r m the o r i g i n a l values i n r a t i o variables by, f o r instance, c o n v e r t i n g i m p e r i a l t o m e t r i c units o r dates t o a u n i t o f elapsed t i m e . Variables w i l l f r e q u e n t l y also need t o be l i n k e d a n d integrated i n v a r i o u s w a y s a n d this is another respect i n w h i c h c o m p u t e r survey packages c a n assist a n d m a k e the researcher's life a l o t simpler. For n o m i n a l a n d o r d i n a l variables, i t is useful t o assist data e n t r y b y
132
Surveying the social world
p r i n t i n g the codes t o be assigned beside the a p p r o p r i a t e categories, choices and o p t i o n s o n the questionnaire. T h i s s h o u l d be done discreetly i n a small typeface so t h a t i t does n o t distract respondents. I f the responses t o a quest i o n are finite a n d a l l o f t h e m can be anticipated i n advance (as i n the quest i o n , ' W h i c h o f the f o l l o w i n g p r o d u c t s have y o u b o u g h t i n the last m o n t h ? : P r o d u c t A , P r o d u c t B, P r o d u c t C ) t h e n the question can be pre-coded - t h a t is, a l l the codes f o r each response can be determined a n d p r i n t e d o n the quest i o n n a i r e i n advance o f d i s t r i b u t i o n . I n other cases, respondents w i l l r e t u r n some values t h a t cannot be p r e d i c t e d i n advance, o r there m a y be so m a n y possibilities t h a t they cannot be listed. ( A question e l i c i t i n g the titles o f recently v i e w e d films w o u l d present this p r o b l e m . ) A set o f responses t h a t cannot be a n t i c i p a t e d m u s t be post-coded after data c o l l e c t i o n . T h u s , i n the film example above, once the titles have been s u p p l i e d , i t is t h e n possible t o code t h e m i n a n y a p p r o p r i a t e w a y , f o r example, i n t o genres (action films, romances, comedies, a n d so o n ) . A c o m m o n question f o r m a t (see the Travel Survey, question 2 0 ) is t o present a set o f pre-coded categories i n t e n d e d t o cover the m a j o r i t y o f respondents, f o l l o w e d b y a residual ' O t h e r ' , category ( w h i c h w i l l be post-coded) f o r a m i n o r i t y o f special cases. A l l open-ended questions, b y d e f i n i t i o n , m u s t be post-coded.
Missing data A general p r i n c i p l e i n response processing is t o a v o i d using blanks i n the data file t o stand f o r cases/questions w h e r e there are n o values t o enter. T h i s is because i t is h a r d t o k n o w w h e t h e r a b l a n k is a deliberate c o d i n g decision or a n accidental c o d i n g slip. I t is better practice t o reserve dedicated codes for missing data values, chosen t o be distinct f r o m possible substantive values. C o n v e n t i o n a l l y , codes such as ' 9 9 ' a n d ' - 1 ' are used f o r these p u r poses a n d some c o m p u t e r survey packages can include o r exclude t h e m f r o m statistical processing a c c o r d i n g t o the user's preference. I n some situations, it m a y be desirable t o i d e n t i f y the reasons, w h e r e they are k n o w n , f o r data being missing. I n a n interview-based i n q u i r y , f o r instance, i t c o u l d be helpf u l t o use three d i f f e r e n t codes t o d i s t i n g u i s h between data t h a t is missing because the question w a s ' n o t applicable' f o r the respondent a n d 'respondent refusal' o r 'question n o t p u t ' .
Multiple responses I n some question f o r m a t s , such as question 4 o n the Travel Survey, a respondent can legitimately ' t i c k ' several d i f f e r e n t response categories. There are three alternative methods o f assigning codes t o these m u l t i p l e response quest i o n s . Each m e t h o d preserves a l l the i n f o r m a t i o n supplied b y the respondent
Processing responses
133
a n d w h i c h one is most convenient depends m a i n l y o n h o w the data is g o i n g t o be analysed. M u l t i p l e dichotomies T h e researcher creates several d i c h o t o m o u s variables i n the c o d e b o o k , one f o r each t i c k b o x a question contains. Each variable has one o f its values (by c o n v e n t i o n , '1') representing the s i t u a t i o n w h e r e a respondent has t i c k e d t h a t b o x , a n d the other value (by c o n v e n t i o n , '0') representing the s i t u a t i o n w h e r e the respondent has n o t selected the category. F o r question 3, eight d i c h o t o m o u s variables w i l l be created ( W a l k , Bicycle, R a i l , etc.), one f o r each t i c k b o x . F o r analysis, the m u l t i p l e dichotomies usually need t o be r e c o m b i n e d i n some m a n n e r i n t o m o r e inclusive variables.
O r d i n a l choices Suppose, f o r i l l u s t r a t i o n , t h a t the respondent c a n make three selections w i t h i n the same q u e s t i o n . T h e researcher creates three variables i n the codeb o o k , called say Choicel, Choice! a n d Choice3, representing the first, second a n d t h i r d respondent selections respectively. Each o f these variables has a set o f values c o r r e s p o n d i n g t o a l l the t i c k boxes. Choicel may or may n o t represent a first preference depending o n the question w o r d i n g .
Binary coding O n l y one variable is created i n this s o l u t i o n . Each t i c k b o x w i t h i n a question is given i n t u r n a n u m e r i c a l code i n the sequence 1 , 2 , 4 , 8, 16, 32 . . . T h e variable as a w h o l e is assigned a value equivalent t o the arithmetic t o t a l o f all the codes corresponding t o t i c k e d boxes. A n y t o t a l represents a distinct c o l lection o f t i c k e d choices: i f this system w a s used i n question 4 o f the Travel Survey, a code ' t o t a l ' o f 14 f o r the question w o u l d mean t h a t the respondent h a d checked ' W a l k ' , 'Bicycle' a n d ' R a i l ' as alternative modes o f t r a n s p o r t .
Checking and cleaning the data There are t w o k e y p o i n t s w i t h i n response processing w h e r e checking a n d cleaning the data is advisable. O n e is i m m e d i a t e l y after data c o l l e c t i o n . Part i c u l a r l y i n cases w h e r e a postal survey has been c a r r i e d o u t a n d the data entry is being done b y h i r e d hands o r a n agency, i t is w o r t h g o i n g t h r o u g h the f o r m s before they are passed o n , l o o k i n g f o r p r o b l e m cases. These m a y be ones where k e y fields lack responses (is i t w o r t h processing these cases?) or where there are data i n p u t dilemmas - perhaps respondents have made comments w h i c h were n o t catered for, o r have p r o v i d e d m u l t i p l e responses
134
Surveying the social world
where they were n o t a n t i c i p a t e d , or answers w h i c h are s i m p l y p u z z l i n g . I t is best t o assess the scale o f such p r o b l e m s early o n a n d t o resolve t h e m before they can affect the confidence a n d p r o d u c t i v i t y o f any personnel h i r e d t o c a r r y o u t data i n p u t a n d c o d i n g . T h e second strategic p o i n t f o r checking is w h e n the c o m p u t e r survey package has read a l l o f the data available. A f e w simple checks here can c o n firm t h a t there are n o serious c o d e b o o k errors. Does the n u m b e r o f cases the software sees m a t c h w h a t y o u anticipated? A frequency c o u n t (see page 139) o f some key variables w i l l s h o w w h e t h e r there are any ' i m p o s s i b l e ' values i n key variables w h i c h c o u l d n o t c o r r e s p o n d t o v a l i d responses.
Key summary Unless you are c< fui of respondent
file and possibilit y that m
J
Further reading Although levels of measurement are dealt w i t h in nearly all survey and statistics texts, response processing gets relatively little coverage. Exceptions are Rose and Sullivan (1996), Chapter 3, de Vaus (1991), Chapter 14, and Bourque and Fielder (1995), section 5.
(j^)
Strategies for analysis
Key elements in this chapter
Introduction Stated i n b r o a d terms, the m a i n goals i n survey analysis are the c r e a t i o n o f i l l u m i n a t i n g accounts, persuasive narratives a n d plausible e x p l a n a t i o n s , g r o u n d e d i n the survey f i n d i n g s , c o n c e r n i n g the social structures, g r o u p s , a n d processes under i n v e s t i g a t i o n . T h e statistical summaries o f aggregates a n d sub-groups a n d the c r e a t i o n o f statistical maps o f the i n t e r - r e l a t i o n ships between k e y variables are i n t e r m e d i a t e steps t o w a r d s these goals.
136
Surveying the social world
Some t e x t b o o k accounts o f survey analysis give an a l m o s t exclusive e m p h a sis t o statistical d e s c r i p t i o n a n d inference, b u t the i m a g i n a t i v e analysis o f survey data necessarily transcends statistical reasoning. M o v i n g between statistics a n d substance demands s k i l l a n d i n t e l l e c t u a l agility, w i t h t e c h n i cal statistical k n o w l e d g e p l a y i n g an i m p o r t a n t b u t n o t a d i r e c t i n g role because statistical m e t h o d s are a t o o l n o t an e n d . T h i s chapter gives an a p p r o p r i a t e emphasis t o statistical analysis, b u t i t refers the reader t o some o f the m a n y texts t h a t are available f o r m o r e detailed treatments o f statistical t o p i c s . A p r i m e objective i n w h a t f o l l o w s is t o assist inexperienced surveyors t o a v o i d the danger o f getting t r a p p e d i n a mass o f data a n d losing the p l o t , f o r themselves a n d f o r the readers o f any r e p o r t based o n the survey. T h i s is m o r e o f a danger i n descriptive t h a n analytic surveys because i n the latter the i n i t i a l questions w h i c h fed i n t o the research design p r o v i d e a l a n d m a r k f r o m w h i c h analysts s h o u l d be able t o get their bearings. H o w e v e r , the strategy sometimes suggested f o r analytic surveys o f o r g a n i z i n g analysis exclusively a r o u n d the e m p i r i c a l testing o f a set o f specific p r o p o s i t i o n s carries its o w n dangers. A n a r r o w focus o n p a r t i c u l a r relationships i d e n t i f i e d i n advance can m a k e analysts unreceptive t o n o v e l t y a n d the unexpected w i t h i n the data. For descriptive surveys, statistical strategies f o r e x p l o r i n g data i n the absence o f e x p l i c i t hypotheses are available (see the f u r t h e r r e a d i n g section at the end o f the chapter).
Dimensions of analysis There are various w a y s i n w h i c h the d i f f e r e n t aspects o f survey analysis can be dissected. T h e three dimensions i d e n t i f i e d b e l o w are conceived t o be p o t e n t i a l l y present i n the analysis o f any survey a l t h o u g h they m a y n o t be equally e x p l o i t e d . U n s u r p r i s i n g l y , the descriptive d i m e n s i o n tends t o d o m i nate i n p r i m a r i l y descriptive surveys. The analytic a n d c o n t e x t u a l aspects w i l l be m o r e p r o n o u n c e d i n analytic surveys, b u t the a r t o f analysis is t o p r o m o t e the development o f a l l three so t h a t the research p o t e n t i a l o f a survey is f u l l y realized. E n s u r i n g the survey r e p o r t gives adequate emphasis t o each o f these dimensions w i l l facilitate the p r o d u c t i o n o f the r o u n d e d accounts referred t o above.
Descriptive The analyst needs t o p r o v i d e the audience f o r the survey w i t h a d e s c r i p t i o n o f the survey findings using such devices as frequency counts f o r the m a i n variables, statistical s u m m a r y measures, graphs a n d charts, a n d direct q u o t a t i o n f r o m respondent c o m m e n t s . H o w e v e r , a descriptive t o u r t h r o u g h a questionnaire or i n t e r v i e w schedule, i n question order, s h o u l d be avoided f o r several reasons. First, the structure o f the questionnaire is n o t n o r m a l l y
Strategies for analysis
137
devised w i t h the presentation o f results i n m i n d . Second, i n order t o construct a n a r r a t i v e o f the k i n d m e n t i o n e d above, the c o m m e n t a r y needs t o interweave the descriptive w i t h the analytic a n d c o n t e x t u a l dimensions a n d not present the descriptive m a t e r i a l i n a b l o c k . T h i r d , such a n unselective and u n s t r u c t u r e d a p p r o a c h w i l l q u i c k l y lose readers' interest. D e s c r i p t i o n ordered b y themes derived f r o m the c o n t e x t u a l d i m e n s i o n is nearly always preferable t o a question b y question catalogue.
Analytical The analyst w i l l be seeking (to a greater degree i n analytic a n d t o a lesser degree i n descriptive surveys) t o establish l i n k s between d i f f e r e n t questions/variables/sub-groups, t o create measures a n d i n d i c a t o r s o u t o f existing variables t o represent theoretical concepts, a n d t o develop statistical models o r other f r a m e w o r k s t h a t e x p l a i n aspects o f the data. C r o s s - t a b u l a t i o n , tests o f significance, measures o f difference, the apparatus o f statistical inference w h i c h a l l o w s generalization f r o m sample data back t o the target p o p u l a t i o n , m u l t i v a r i a t e analysis, m o d e l b u i l d i n g a n d testing procedures, a l l these serve the analytic d i m e n s i o n .
Contextual The n a r r o w e r c o n t e x t is the set o f hypotheses o r p r o b l e m s t h a t were h i g h lighted i n the research design, a n d possibly the theoretical concepts a n d f r a m e w o r k s w h i c h lie b e h i n d t h e m . I f a statistical m o d e l has been developed f r o m the survey data, i t w i l l require r e - i n t e r p r e t a t i o n i n terms o f 'real w o r l d ' social structures a n d interactions. T h e analyst w i l l need t o relate the e m p i r i cal findings back t o the s t a r t i n g p o i n t s o f the research, a n d at the same t i m e these s t a r t i n g p o i n t s can s u p p l y the themes w h i c h determine w h a t statistical description is retained f o r the f i n a l r e p o r t a n d h o w i t is organized. A l o n g the way, the analyst w i l l need t o h i g h l i g h t the areas i n w h i c h there are decisive e m p i r i c a l outcomes a n d summarize those w h e r e the findings are ambiguous or c o n t r a d i c t o r y . There is, i n a d d i t i o n , the broader c o n t e x t . I n the m a j o r i t y of surveys, b o t h analytical a n d descriptive, the researcher w i l l be keen t o locate the outcomes against the b a c k d r o p o f the relevant setting. I n academic o r p o l i c y spheres, this m a y entail a n e v a l u a t i o n o f the significance o f the data f o r r i v a l approaches o r p r e v i o u s l y p u b l i s h e d findings. I n a n i n s t i t u t i o n a l setting, the i m p l i c a t i o n s o f key findings f o r o r g a n i z a t i o n a l decisionm a k i n g o r c o m m u n i t y p o l i c y m a y be relevant.
Analysis of open-ended responses T h e remainder o f this chapter is concerned w i t h statistical procedures b u t the responses t o open-ended questions require a r a d i c a l l y d i f f e r e n t a p p r o a c h
138
Surveying the social world
t h a t o f t e n necessitates the development o f a classificatory scheme, so this t o p i c w i l l be dealt w i t h first. Open-ended m a t e r i a l can be generated i n a range o f circumstances t h a t determine w h a t k i n d o f analysis is a p p r o p r i a t e . A t the simplest end o f the c o n t i n u u m is the ' O t h e r ' space p r o v i d e d after a closed question o n a quest i o n n a i r e o r i n an i n t e r v i e w f o r cases t h a t belong t o categories t h a t c o u l d n o t be anticipated i n advance. N o r m a l l y , the respondent is expected t o supply just a w o r d or short phrase a n d the analysis o f such m a t e r i a l merely requires n e w c o d e b o o k categories t o be created f o r the relevant variable (see C h a p ter 7). A closed question may, however, be f o l l o w e d by a larger 'space' i n w h i c h the respondent is i n v i t e d t o e x p a n d o n o r e x p l a i n a previous response. For example, i n the Travel Survey, question 13 was a closed question t h a t offered respondents six considerations t h a t c o u l d be i n f l u e n c i n g their use o f a car f o r c o m m u t i n g , w i t h the i n s t r u c t i o n t o t i c k one o f three categories o f i m p o r t a n c e . T h e open-ended question t h a t f o l l o w e d w a s , ' I f there are any other i m p o r t a n t considerations f o r y o u n o t m e n t i o n e d above, please describe t h e m here'. T h i s a n d similar open questions are effectively an i n v i t a t i o n t o the respondent t o a d d f o r themselves f u r t h e r r o w s t o the previous question a n d t h e n respond t o t h e m . T h e f r a m e w o r k o f the closed question is still operative f o r b o t h the respondent a n d the analyst: the responses w i l l be i n p a r a l l e l t o , o r i n extension of, the previous answers a n d analysis o f the a d d i t i o n a l m a t e r i a l s h o u l d be s t r a i g h t f o r w a r d . T h e most c o m p l e x open-ended questions t o analyse are those t h a t i n v i t e the respondent t o m a k e a very general o r s u m m a r y response t o a t o p i c w i t h the designer deliberately w i t h h o l d i n g the c o n t e x t f o r the w a y the answers are t o be f r a m e d . A n example is, ' W h a t d o y o u t h i n k is the most serious p r o b l e m f a c i n g the c u r r e n t G o v e r n m e n t ? ' T h e respondent c o u l d a p p r o a c h this i n a variety o f w a y s . I t c o u l d lead some respondents t o consider shortt e r m , p a r t y p o l i t i c a l difficulties, w h i l e others m i g h t be p r o m p t e d t o t h i n k a b o u t m u c h m o r e abstract issues such as the loss o f c o m m u n i t y o r the decline o f the p u b l i c sphere. I n a d d i t i o n t o l y i n g at d i f f e r e n t levels o f c o n creteness/abstraction, the m a t e r i a l generated m a y also require the i n t r o d u c t i o n o f conceptual d i s t i n c t i o n s . T h u s , some responses w i l l be based o n generally recognized perceptions o f the g r a v i t y o f 'established' social p r o b lems, w h i l e others w i l l stem f r o m a m u c h m o r e personal a n d localized sense o f p o l i t i c a l grievance a n d a l i e n a t i o n . I f there are n o m o r e t h a n t w o separate dimensions u n d e r l y i n g the responses, a classification can p r o b a b l y be developed m a n u a l l y a n d can be a p p l i e d by s o r t i n g the t e x t o f the responses i n t o the categories t o w h i c h they belong, possibly i n a w o r d processor. M o r e c o m p l e x examples w i l l require the use o f a q u a l i t a t i v e c o m p u t e r package i n t o w h i c h all the response t e x t can be transferred f o r m a n i p u l a t i o n a n d analysis. A m o n g the packages t h a t offer a p p r o p r i a t e facilities are Ethnography Nvivo, Nud*ist, HyperRESEARCH a n d Atlas.ti (all d i s t r i b u t e d i n the U K a n d N o r t h A m e r i c a by Scolari - Sage Publications Software).
Strategies for analysis T a b l e 8.1
Valid
139
M a i n m o d e o f t r a v e l t o w o r k (q3)
Walk Bike Rail Bus Car driver Car passenger Motorbike driver Motorbike passenger Total
Frequency
Percent
Valid percent
Cumulative percent
50 67 1 63 373 26 8 1 589
8.5 11.4 .2 10.7 63.3 4.4 1.4 .2 100.0
8.5 11.4 .2 10.7 63.3 4.4 1.4 .2 100.0
8.5 19.9 20.0 30.7 94.1 98.5 99.8 100.0
Examining single variables T h e statistical analysis o f survey data w i l l o f t e n i n v o l v e a m i n i m u m o f t w o stages. I n the first, the analyst produces a variety o f e x p l o r a t o r y statistical p r o d u c t s (counts, charts, s u m m a r y measures, tables, tests). Some o f these e x p l o r a t i o n s w i l l t u r n o u t t o be dead ends a n d w i l l have t o be a b a n d o n e d , others w i l l p r o d u c e findings w h i c h have some interest b u t w h i c h are p e r i p h eral o r are perhaps t o o detailed f o r i n c l u s i o n i n a final r e p o r t (but w h i c h nevertheless c o n t r i b u t e a l i t t l e t o an emerging p i c t u r e o f the findings). I n a second stage, a carefully chosen selection o f the p r o d u c t s is made t h a t bear o n the central research arguments o r o n the m a i n c o n t e x t u a l themes. These p r o d u c t s m a y need t o be m o d i f i e d o r r e - w o r k e d t o ensure they are m u t u a l l y consistent a n d c o m p l e m e n t one another. T h e a c c o m p a n y i n g interpretative c o m m e n t a r y is best w r i t t e n at this later stage. I t is inevitable t h a t some o f the data analysis w i l l need t o be abandoned at stage t w o : there is n o o b l i g a t i o n i n survey research t o ' s h o w a l l the steps i n y o u r w o r k i n g ' t o the audience. Even i n a survey far a l o n g the c o n t i n u u m t o w a r d s the descriptive pole, some variables can be e a r m a r k e d as being o f central interest. O n the basis o f the research design, c o m m o n sense or previous i n q u i r i e s , some p r i n c i p a l candidates f o r the roles o f independent, i n t e r v e n i n g or dependent variables s h o u l d present themselves. Establishing the d i s t r i b u t i o n o f values o n these variables is a useful i n i t i a l step i n analysis. T h e type a n d level o f sophistic a t i o n o f the analysis is l i m i t e d i n situations w h e r e nearly a l l the cases are located i n one value category o r are heavily clustered a r o u n d the h i g h or l o w end o f the value d i s t r i b u t i o n - m o s t statistical analysis depends o n the existence o f v a r i a t i o n i n values between cases. Table 8.1 shows a frequency c o u n t generated b y the SPSS c o m p u t e r package f o r the Travel Survey responses t o question 3 - the m a i n m o d e o f t r a v e l to w o r k , w h i c h clearly has a n o m i n a l level o f measurement. A table o f this
140
Surveying the social world
k i n d is useful p r o v i d e d the variable has a l i m i t e d n u m b e r o f categories (as a rule o f t h u m b , under 2 0 ) . T h e ' V a l i d Percent' c o l u m n shows the c o u n t net o f any missing responses (none w i t h this variable, so i t is i d e n t i c a l t o the 'Percent' c o l u m n ) . T h e ' C u m u l a t i v e Percent' c o l u m n , w h i c h takes the frequency o f cases f a l l i n g i n a category, adds the frequencies f a l l i n g i n a l l the previous categories a n d expresses the s u m as a percentage o f the t o t a l n u m b e r o f cases, gives a f u r t h e r i n d i c a t i o n o f the o v e r a l l shape o f the distribution. T h e m o d e by w h i c h c o m m u t e r s chose t o t r a v e l t o w o r k is one o f the Travel Survey's m o s t i m p o r t a n t variables. A n i n i t i a l c o n s i d e r a t i o n o f the numbers i n each category (using Table 8.1) shows t h a t car use overshadows all the alternatives. T h i s m i g h t lead an analyst o p e r a t i n g w i t h a brief c o n cerned w i t h the r e d u c t i o n o f car use t o decide t h a t a w a y f o r w a r d c o u l d be t o explore w h e t h e r some c u r r e n t car users h a d realistic alternative t r a v e l modes. For example, was there a substantial s u b - g r o u p o f car users w h o lived sufficiently close t o campus f o r w a l k i n g o r c y c l i n g t o be at least feasible? T h e very uneven d i s t r i b u t i o n o f cases o n t r a v e l m o d e also rules o u t some other p o t e n t i a l strategies. For example, there are t o o f e w examples t o p e r m i t any very detailed e x a m i n a t i o n o f those using m o t o r b i k e s . T h e i n f o r m a t i o n i n the frequency c o l u m n o f Table 8.1 c o u l d be represented equally i f n o t m o r e effectively i n a graphic f o r m a t . A bar c h a r t i n w h i c h the length o f each bar is p r o p o r t i o n a l t o the n u m b e r o f cases recorded i n t h a t category o f the variable is a p p r o p r i a t e f o r n o m i n a l level variables l i k e m o d e o f t r a v e l . A s i m i l a r g r a p h i c a l m e t h o d , the h i s t o g r a m , can be used f o r c o n t i n u o u s variables, i n w h i c h case the order o f the categories o n the c h a r t axis is necessarily fixed a n d the w i d t h o f a bar needs t o reflect the class interv a l i f these are u n e q u a l (as w i t h age bands 0 - 1 6 , 1 7 - 2 9 , 3 0 - 4 9 , 5 0 - 6 4 , 6 5 + ) . Figure 8.1 reproduces an SPSS h i s t o g r a m f o r the r a t i o level daily return fare variable {Travel Survey question 11) measured i n pence. T h i s indicates a clear peak o f cases w i t h fares between 125 a n d 150 pence, b u t a t a i l o f higher fares over three p o u n d s . T h e f e w cases under a p o u n d are p r o b a b l y people using a bus t o o r f r o m w o r k only. A f u r t h e r p o s s i b i l i t y f o r d i s p l a y i n g frequency data i n a g r a p h i c a l f o r m a t is the stem a n d leaf p l o t . These are equivalents f o r the frequency table, bar chart o r h i s t o g r a m , b u t they are especially useful i f there are t o o m a n y categories t o be displayed conveniently as bars. Stem a n d leaf p l o t s are c o m b i n a t i o n s o f the i n f o r m a t i o n i n a table o f frequencies w i t h a n a d a p t a t i o n o f the g r a p h i c a l p r i n c i p l e f r o m bar charts (the l e n g t h o f the 'branches' corresponds t o the n u m b e r o f cases i n a p a r t i c u l a r category o r d e n o m i n a t i o n ) . Figure 8.2 is a stem a n d leaf p l o t c o r r e s p o n d i n g t o the same data depicted i n Figure 8 . 1 . T h e 'stem' i n this case represents 100 pence d e n o m i n a t i o n s , a convenient 'large' b u n d l e o f the units o f measurement o f the variable. Each b r a n c h corresponds t o a 50 pence i n t e r v a l w i t h i n each p o u n d , 5 0 - 9 9 , 1 0 0 - 1 4 9 , 1 5 0 - 1 9 9 , 2 0 0 - 2 5 0 , etc. There is n o 0 0 - 4 9 b r a n c h because there
Strategies for analysis
141
30
20 o c CD cr o v_
LL
10
Std. Dev = 70.30 Mean = 161.6 N = 82.00
0 J 50.0
100.0 75.0
150.0
125.0
200.0
175.0
250.0
225.0
300.0
275.0
325.0
Pence Figure 8.1
Daily return fare ( q l l )
are n o cases w i t h these values. O n e ' l e a f is placed o n the a p p r o p r i a t e b r a n c h for every recorded case w i t h i n the i n t e r v a l . T h u s the three 6s, five 8s a n d three 9s t h a t f o r m a b r a n c h at zero o n the stem stand f o r the 1 1 cases o f fares between 5 0 a n d 99 pence inclusive. E x t r e m e values are o f t e n ' t r i m m e d ' t o reduce the height o f the stem, a n d the b o t t o m o f the stem i n Figure 8.2 i n d i cates t h a t t w o leaves a n d their branches, c o r r e s p o n d i n g t o fares o f at least £ 4 . 2 0 , are n o t displayed. T h i s figure conveys a considerable a m o u n t o f i n f o r m a t i o n a b o u t daily return fare a n d offers a n excellent v i s u a l i z a t i o n o f the shape made u p b y the d i s t r i b u t i o n o f case values. I f the ' t y p i c a l ' fare is reckoned t o be r o u g h l y £ 1 . 3 0 , there are clearly m o r e cases o f higher t h a n l o w e r fares. Together w i t h the i n f o r m a t i o n f r o m Table 8 . 1 o n choices o f t r a n s p o r t m o d e , i t seems reasonable t o surmise t h a t the relatively s m a l l numbers o f c o m m u t e r s w i t h l o w e r fares reflect a w i d e s p r e a d preference a m o n g those h a v i n g shorter j o u r neys t o w o r k t o use cycles, go o n f o o t o r use a car rather t h a n go b y bus. By s h i f t i n g briefly i n t o the c o n t e x t u a l d i m e n s i o n , f u r t h e r possibilities o f data e x p l o r a t i o n become apparent. G i v e n t h a t one o f the aims u n d e r l y i n g the Travel Survey w a s t o estimate the extent t o w h i c h u n i v e r s i t y staff c o u l d be
142
Surveying the social world
Q I I Stem-and-Leaf Plot Frequency Stem & Leaf 11.00 0.66688888999 35.00 I .00002222222222222222222333334444444 13.00 11.00 7.00 5.00 2.00
I .5555555566778 2 . 11122334444 2 .6677888 3 .22233 Extremes (> = 420)
Stem width; 100.00 Each leaf: I case(s) Figure 8.2
Daily return fare ( q l l )
i n d u c e d t o reduce their reliance o n cars as a means o f c o m m u t i n g , the next e x p l o r a t o r y step m i g h t be t o l o o k at the p r o p o r t i o n s o f c o m m u t e r s using different t r a n s p o r t modes b r o k e n d o w n by the distance between h o m e a n d w o r k (question 1). W i t h the a i d o f i n f o r m a t i o n f r o m question 4 o n the alternative modes t h a t were used at least once a week by respondents, the outlines o f a m i n i - n a r r a t i v e emerge. A m o n g other things, this c o u l d examine w h e t h e r the m i n o r i t y o f staff w h o live close t o the campus, b u t w h o nevertheless use cars p r e d o m i n a n t l y , was sufficiently large t o be w o r t h targeting as a g r o u p whose c o m m u t i n g behaviour c o u l d be o p e n t o influence a n d change. (As question 14 i m p l i e s , one o p t i o n open t o the U n i v e r s i t y a u t h o r i ties was the i n t r o d u c t i o n o f charges f o r campus car p a r k i n g . ) Even t h o u g h m o r e exclusive car users m i g h t be f o u n d a m o n g those l i v i n g (say) t w o o r m o r e miles away, i t is those car users w i t h shorter journeys t o w o r k t h a t have m o r e alternative modes o f t r a v e l open t o t h e m .
Measures of central tendency, dispersion, spread and shape A variety o f statistical measures f o r m a l i z e the d e s c r i p t i o n o f a d i s t r i b u t i o n by selecting a s u m m a r y aspect. Measures t h a t summarize by r e p o r t i n g a t y p ical or average value are k n o w n as measures o f central tendency, the most c o m m o n instances o f w h i c h are the m o d e , the a r i t h m e t i c m e a n a n d the m e d i a n . Measures o f dispersion (such as the range, the standard d e v i a t i o n a n d the i n t e r q u a r t i l e range) r e p o r t the degree o f variety o r spread w i t h i n a d i s t r i b u t i o n . Finally, measures o f s y m m e t r y (such as skew a n d kurtosis) p r o vide an i n d i c a t i o n o f the o v e r a l l shape o f a d i s t r i b u t i o n . A l l these measures
Strategies for analysis
143
are p a r t i c u l a r l y a p p r o p r i a t e as aids i n e x a m i n i n g a n d c o m p a r i n g d i s t r i b u t i o n s o f cases o n c o n t i n u o u s , r a t i o level, variables, w h e r e the a m o u n t o f n u m e r i c a l detail available c o u l d otherwise be o v e r w h e l m i n g . A useful bare s u m m a r y o f a frequency d i s t r i b u t i o n can be supplied by o f f e r i n g a measure o f central tendency alongside one o f dispersion. For example, i n the case o f the Travel Survey variable t h a t featured i n Figures 8.1 a n d 8.2, daily return fare, r e p o r t i n g t h a t the m e a n is 165 pence a n d the s t a n d a r d d e v i a t i o n is 70.3 pence is m u c h m o r e effective t h a n presenting either one alone because their relative size provides a f a i r l y g o o d snapshot o f the o v e r a l l d i s t r i b u t i o n . N o t e t h a t the value o f the m e a n differs f r o m t h a t i n the stem a n d leaf p l o t because some extreme h i g h values were excluded f r o m the p l o t b u t i n c l u d e d i n the calculations i n this p a r a g r a p h . T h e y represent staff w h o were c o m m u t i n g l o n g distances, p r o b a b l y o n a t e m p o r a r y basis w h i l e m o v i n g jobs or houses, a n d they a r g u a b l y d i s t o r t the o v e r a l l p i c t u r e . A general c o n s i d e r a t i o n w i t h measures o f b o t h central tendency a n d dispersion is t h a t a p a r t i c u l a r statistic m a y o n l y be used w i t h variables at o r above a given level o f measurement. As an i l l u s t r a t i o n , the o n l y a p p r o p r i a t e s u m m a r y o f central tendency w i t h n o m i n a l variables is the m o d e , whereas o r d i n a l variables can use either the m e d i a n o r the m o d e , w h i l e r a t i o v a r i ables can use b o t h o f these as w e l l as the a r i t h m e t i c mean. Basic statistics a n d data analysis texts l i k e Healey (1990) a n d M a r s h ( 1 9 8 8 ) p r o v i d e m o r e detail o f the d e f i n i t i o n s , f o r m u l a e a n d a p p l i c a b i l i t y o f descriptive measures (see the f u r t h e r reading section at the end o f the chapter).
Standardizing variables S t a n d a r d i z a t i o n is a procedure t h a t assists the task o f c o m p a r i n g variables t h a t have been measured i n d i f f e r e n t u n i t s . I t facilitates the c o n s t r u c t i o n o f c o m p l e x i n d i c a t o r s o u t o f measures based o n d i f f e r e n t scales a n d i t is a necessary p r e l i m i n a r y t o the use o f various m u l t i v a r i a t e statistical methods. T h e c a l c u l a t i o n o f standardized variables t y p i c a l l y entails s u b t r a c t i n g the mean (as a measure of a t y p i c a l value) f r o m each recorded value a n d t h e n d i v i d i n g the remainder by the standard d e v i a t i o n (as a measure o f spread). W h a t e v e r variables are fed i n , the o u t p u t variables a l l have a m e a n o f 0 a n d a standard d e v i a t i o n o f 1 . T h e t r a n s f o r m e d values, k n o w n as Z scores, preserve the o r i g i n a l scale positions o f every case, so n o i n f o r m a t i o n has been lost.
Statistical inference and sampling error The descriptive methods discussed above deal adequately w i t h situations o f complete e n u m e r a t i o n (see page 62), b u t they d o n o t take account o f the
I44
Surveying the social world
consequences o f sample selection. Because o f v a r i a t i o n s i n the cases m a k i n g up r a n d o m samples, i t is n o t safe s i m p l y t o use sample values t o stand directly f o r target p o p u l a t i o n values. P r o v i d e d p r o b a b i l i t y s a m p l i n g has t a k e n place, however, i t is possible t o use statistical inference t o estimate the values o f p o p u l a t i o n characteristics f r o m the sample data. T h e outlines o f this procedure were presented o n pages 7 6 - 7 . Estimates o f any p o p u l a t i o n characteristic can be calculated t o a desired level o f precision by using a member o f the s t a n d a r d e r r o r f a m i l y o f statistics t h a t takes i n t o account the size o f the s a m p l i n g error. T h u s , the standard error of the mean f o r m u l a is used f o r c a l c u l a t i n g the p o p u l a t i o n m e a n . T h e result o f the c a l c u l a t i o n is a confidence i n t e r v a l , a range o f values s u r r o u n d i n g a calculated sample statistic w i t h i n w h i c h the p o p u l a t i o n value w i l l p r o b a b l y f a l l , w i t h the l i k e l i h o o d o f e r r o r f i x e d by the researcher. For example, the mean d a i l y r e t u r n bus fare i n the Travel Survey, i n c l u d i n g the extreme ' o u t l i e r s ' , is £ 1 . 6 5 . T h e value o f the s t a n d a r d e r r o r o f the mean statistic f o r the fares variable is 8.3, g i v i n g the l o w e r l i m i t f o r the range o f values t h a t makes u p the 95 per cent confidence i n t e r v a l f o r the m e a n as £ 1 . 4 8 , a n d the upper l i m i t £ 1 . 8 1 . T h e researcher can assume w i t h a 5 per cent r i s k o f e r r o r t h a t the mean fare f o r the p o p u l a t i o n o f a l l staff w i l l lie somewhere w i t h i n this range o f 33 pence. I t is possible t o reduce the r i s k o f e r r o r t o 1 per cent by c a l c u l a t i n g the 99 per cent i n t e r v a l , b u t i n this case the range o f values increases t o 4 4 pence, g i v i n g the estimate less precision. See the f u r t h e r r e a d i n g section at the end o f the chapter f o r texts c o v e r i n g this t o p i c i n m o r e d e t a i l .
Cross-tabulation T a b u l a r f o r m a t s f o r presenting findings are a n i m p o r t a n t bridge i n survey analysis between the descriptive a n d the analytic dimensions m e n t i o n e d o n pages 1 3 6 - 7 . Cross-tabulations o r contingency tables are so-called because they display each case i n a g r i d o f c o l u m n s a n d r o w s i n a w a y t h a t is c o n d i t i o n a l o n at least t w o o f its observed o r recorded a t t r i b u t e s . By presenting the j o i n t frequency d i s t r i b u t i o n s o f t w o o r m o r e variables, cross-tabulations c o m b i n e a great deal o f descriptive i n f o r m a t i o n w i t h the p o s s i b i l i t y o f m a k i n g statements a b o u t the relationships between the variables concerned. I n the f a m i l i a r , t w o variable (bivariate) c r o s s - t a b u l a t i o n , a l l the cases f r o m each category o f one o f the variables are sorted i n t o each o f the categories o f the other v a r i a b l e . A t h i r d variable can be i n c l u d e d , i n w h i c h case one bivariate sub-table is created f o r every category o f the t h i r d v a r i a b l e . The t h i r d variable m a y w e l l be a c o n t r o l variable, i n t r o d u c e d t o explore its suspected selective i m p a c t o n a k n o w n i n d e p e n d e n t - d e p e n d e n t r e l a t i o n s h i p . A l t h o u g h i n p r i n c i p l e the process can be c o n t i n u e d f o r a n indefinite n u m b e r o f variables, there are p r a c t i c a l difficulties i n d i s p l a y i n g a n d interp r e t i n g tables c o n t a i n i n g m o r e t h a n a b o u t f o u r variables simultaneously.
Strategies for analysis
145
Variables w i t h large numbers o f categories present similar p r o b l e m s a n d adjacent categories need t o be collapsed before c o n s t r u c t i n g the f i n a l table. Cross-tabulations are simple b u t p o w e r f u l statistical devices a n d i n evitably there are i m p o r t a n t considerations t h a t m u s t be observed i n their c o n s t r u c t i o n a n d presentation. 1 T h e cells i n tables display percentages because i t is d i f f i c u l t t o make c o m parisons o n the basis o f r a w n u m b e r s : the categories o f one o f the v a r i ables are selected t o p r o v i d e the bases (the d e n o m i n a t o r s ) f o r the percentage calculations. 2 Percentaging is governed by a f i r m r u l e - percentage i n the d i r e c t i o n o f the independent ( e x p l a n a t o r y ) variable t o t a l s : i n other w o r d s , a 100 per cent t o t a l f o r each category o f the independent v a r i a b l e is created a n d the frequencies these represent are used i n t u r n as the bases f o r c a l c u l a t i n g the cell percentages i n the categories o f the dependent (response) v a r i able. I f there is n o clear candidate f o r e x p l a n a t o r y status, t h e n the percentages can be based o n either/any v a r i a b l e , b u t care m u s t t h e n be t a k e n t h a t the table is i n t e r p r e t e d i n a m a n n e r t h a t is consistent w i t h the percentaging. 3 As w e l l as a c o l u m n (or r o w ) o f percentages f o r each i n d i v i d u a l category o f the independent variable, there needs t o be an a d d i t i o n a l ' A l l ' c o l u m n (or r o w ) , the marginal totals, consisting o f the t o t a l percentages o f the independent variable f o r each category o f the dependent variable. T h i s c o l u m n (or r o w ) o f m a r g i n a l sub-totals s h o u l d also s u m t o a 100 per cent. 4 I f the independent variable has been given the c o l u m n p o s i t i o n i n a crosst a b u l a t i o n t h e n , as a result o f the percentaging rule i n p o i n t 2, the c o m parisons w i l l t e n d t o be across the r o w s , between percentages o f the dependent variable i n t w o (or more) categories o f the independent v a r i able i n c l u d i n g the m a r g i n a l t o t a l . 5 A l t h o u g h some texts prefer t o place the independent variable i n the c o l u m n a n d the dependent variable i n the r o w p o s i t i o n , this is largely a m a t t e r o f convenience: f r o m a l a y o u t v i e w p o i n t , variables w i t h lengthy labels are best located i n the r o w p o s i t i o n . 6 C o n v e n t i o n a l l y , the r a w numbers (or N s ) f o r each percentage base are always present i n a table so t h a t the o r i g i n a l cell frequencies can be calculated by the readers i f they w i s h . T h e r o w s a n d c o l u m n s w h i c h represent numbers need t o be clearly labelled t o d i s t i n g u i s h t h e m f r o m those representing percentages. 7 Tables are n u m b e r e d i n sequence a n d have c o n v e n t i o n a l titles i n the f o r m a t , 'Table
: by independent I explanatory variable name(s)>': notes a n d definitions r e q u i r e d t o u n d e r s t a n d the table f u l l y are supplied as footnotes l i n k e d t o specific headings or cells i n the table. 8 Because o f r o u n d i n g , some percentage totals m a y be 99 or 1 0 1 . N o r m a l l y
146
Surveying the social world
it is best t o display the totals as 100 a n d t o e x p l a i n the r o u n d i n g error disparities i n a f o o t n o t e i n the first table. 9 Tables w h i c h have been r e p r o d u c e d f r o m an existing p u b l i c a t i o n always need a precise source: a reference t h a t includes a page and/or table n u m b e r is essential. 10 U n c l u t t e r e d tables w o r k best: a l l superfluous m a t e r i a l s h o u l d be suppressed (unnecessary decimal places, r e d u n d a n t totals): rules (lines) between c o l u m n s are n o t essential a l t h o u g h they can a i d c l a r i t y i n c o m p l e x tables. Table 8.2 shows a cross-tabulation f r o m the Travel Survey data f r o m the responses t o questions 19 a n d 2 2 . The d i r e c t i o n o f percentaging implies t h a t staff g r o u p is the independent variable a n d the n u m b e r o f cars available t o the h o u s e h o l d dependent. (Clearly, h o w m a n y cars y o u o w n e d c o u l d n o t be the d e t e r m i n a n t o f w h i c h staff category y o u belong to.) T h e title reinforces this b y n a m i n g the dependent variable first - see p o i n t 7 i n the list o f crosst a b u l a t i o n conventions above. Table 8.2 reveals o n l y modest percentage differences between the categories o f staff. H o w e v e r , there are three interesting features t h a t stand o u t . First, a l t h o u g h the academic a n d academic related groups are o n salary scales t h a t go t o higher levels t h a n the other g r o u p s , they d o n o t appear t o have any greater access t o cars. Second, f o r n o o b v i o u s reasons a smaller p r o p o r t i o n o f technical staff is i n car-less households t h a n the other g r o u p s ; and t h i r d , m o r e secretarial, clerical a n d j u n i o r a d m i n i s t r a t o r s are i n m u l t i car o w n i n g households t h a n the other g r o u p s . Since each staff g r o u p contains i n d i v i d u a l s at very d i f f e r e n t levels o f seniority a n d stages o f their w o r k careers, these findings need t o be treated w i t h c a u t i o n . For example, some members o f the secretarial g r o u p are y o u n g employees w h o w i l l be i n households t h a t possess a car or cars b u t m a y n o t have the use f o r c o m m u t i n g o f vehicles o w n e d by siblings o r parents. There are t w o simple b u t i m p o r t a n t lessons a b o u t survey analysis t o be learned at this p o i n t . T h e first is t h a t i t is h a r d t o m a k e sense o f survey findings as a n analyst i f a l l y o u k n o w a b o u t the social g r o u p s o r situations y o u are i n v e s t i g a t i n g is the findings themselves. I f this is the case, y o u need a c o l l a b o r a t o r o r p a r t i c i p a n t t o w o r k w i t h y o u w h o can p r o v i d e c o n t e x t u a l insights - somebody, f o r e x a m p l e , w h o c o u l d e x p l a i n the significance o f the d i f f e r e n t staff g r o u p s i n the Travel Survey a n d their respective earnings. T h e second lesson is t h a t i t is relatively u n u s u a l , especially i n descriptive investigations l i k e the Travel Survey, f o r any one finding t o be o f s h a t t e r i n g significance or t o stand alone as the finding i n the p r o j e c t . M o r e c o m m o n l y , y o u w i l l be p u r s u i n g leads a n d hunches a n d h o p e f u l l y g r a d u a l l y p i e c i n g together a d e v e l o p i n g p i c t u r e o f the slice o f the social w o r l d t h a t is o f interest.
Strategies for analysis T a b l e 8.2
147
Cars available t o h o u s e h o l d (q22) b y staff g r o u p ( q l 9 ) Staff group
None One More than one Total N
Academic
Academicrelated
Secretarial, clerical, junior admin Technical
Manual and ancillary
N
%
N
%
N
%
N
%
N
%
17 65 80 162
10 40 49 100
13 60 59 132
10 45 45 100
18 42 70 130
14 32 54 100
3 35 32 70
4 50 46 100
13 40 21 74
18 54 28 100
Testing hypotheses and statistical significance I n a d d i t i o n t o the technique f o r e s t i m a t i n g p o p u l a t i o n values, discussed o n pages 7 5 - 7 , i n f e r e n t i a l statistics provides a capacity t o establish w h e t h e r differences observed between sub-groups w i t h i n a sample o n a p a r t i c u l a r variable (differences, f o r instance, between means o r p r o p o r t i o n s ) are l i k e l y t o indicate equivalent differences i n the p o p u l a t i o n . T h e same procedures can be adapted t o cover situations w h e r e the means or p r o p o r t i o n s t o be c o m p a r e d b e l o n g t o cases t h a t come f r o m different r a n d o m samples. T h e central element i n the procedures is the testing o f a n u l l hypothesis, so-called because i t always asserts t h a t there is n o difference between a sample a n d a p o p u l a t i o n or between one sample a n d another. N o r m a l l y , the researcher hopes t o reject the n u l l hypothesis a n d b y so d o i n g t o c o n f i r m the existence o f e m p i r i c a l differences. I f i t c a n n o t be rejected, the i m p l i c a t i o n is t h a t the evidence is insufficiently s t r o n g t o infer any real w o r l d differences or relationships. T h e test o f a n u l l hypothesis calculates the p r o b a b i l i t y t h a t the recorded differences i n the sample(s) c o u l d have o c c u r r e d entirely by chance o n an assumption o f n o real w o r l d r e l a t i o n s h i p between the variables concerned. (Chance, i n this c o n t e x t , means s i m p l y s a m p l i n g error, a n d the tests rely o n s a m p l i n g d i s t r i b u t i o n s t h a t are theoretical plots o f the results f r o m every sample t h a t c o u l d possibly be obtained.) I f the observed differences are substantial, i t w i l l be extremely u n l i k e l y t h a t they have o c c u r r e d b y chance. I f the observed differences are less substantial, t h e n s a m p l i n g e r r o r becomes m o r e plausible as a n e x p l a n a t i o n a n d i t m a y n o t be possible t o reject the n u l l hypothesis. By c o n v e n t i o n , a p r o b a b i l i t y level o f -05 (5 per cent) is generally t a k e n as the m i n i m u m t h r e s h o l d o f statistical significance. I n other w o r d s , i f
148
Surveying the social world
there is a m o r e t h a n 5 i n 100 chance t h a t s a m p l i n g e r r o r c o u l d account f o r the observed difference, the researcher is o b l i g e d t o accept the n u l l h y p o t h esis. Occasionally, w h e r e there is reason t o be m o r e stringent, a -01 (1 per cent) level is also e m p l o y e d . T h u s , the o u t c o m e o f the test o f a hypothesis involves t w o key c o m p o n e n t s , a measure o f difference i n the f o r m o f the calculated value o f one member o f a large f a m i l y o f test statistics, a n d a p r o b a b i l i t y statement (usually i n d i c a t e d i n summaries o f results as p < -05 or p < -01) a b o u t the c r i t e r i o n level at w h i c h the n u l l hypothesis has been rejected (if the finding is significant) or N S (for n o t significant) i f the finding has a p > -05. W h i c h s a m p l i n g d i s t r i b u t i o n s are a p p r o p r i a t e as the bases f o r m a k i n g inferences f r o m d i f f e r e n t k i n d s o f data is a relatively technical m a t t e r a b o u t w h i c h statistics a n d data analysis texts give guidance. I n general terms, tests w h i c h m a k e n o , o r o n l y easily-satisfied, assumptions a b o u t the f o r m o f the u n d e r l y i n g d i s t r i b u t i o n are k n o w n as n o n - p a r a m e t r i c a n d sources describi n g t h e m can be f o u n d at the end o f the chapter. A test o f significance c o m m o n l y e m p l o y e d w i t h bivariate tables c o n t a i n i n g t w o n o m i n a l variables is the c h i square test f o r independence. S y m b o l ized by the Greek character x ( r h y m i n g w i t h ' t r y ' a n d w r i t t e n as x ) , i t is n a m e d after the theoretical d i s t r i b u t i o n i t e m p l o y s . T h e p o i n t o f the test is t o establish t h a t a case's m e m b e r s h i p o f a p a r t i c u l a r category o n one o f the variables has a bearing o n w h i c h category o f the other variable i t w i l l be i n (and t h a t this effect is d i s t i n c t f r o m s a m p l i n g e r r o r ) . T h e test produces a value t h a t is t h e n c o m p a r e d t o a table o f c h i square d i s t r i b u t i o n values (often r e p r o d u c e d i n the back o f statistics t e x t b o o k s ) . T h e n u l l hypothesis is rejected i f the calculated c h i square value is greater t h a n the t h r e s h o l d value i n the table f o r a p a r t i c u l a r level o f significance, w h i c h takes i n t o account the n u m b e r o f cells i n the table. A n above-threshold value f o r the c h i square statistic m a y also be i n t e r p r e t e d as a sign t h a t there is an association between the variables concerned. T h e c h i square value is n o t , however, p r o p o r t i o n a l t o the strength o f an association (see the next section). 2
T h e same b r o a d logic applies t o the analysis o f variance test o f significance (often abbreviated t o A N O V A ) . T h i s is f r e q u e n t l y a p p l i e d t o situations where there is a r a t i o level dependent v a r i a b l e , such as a test score, size o r income measure, a n d a categorical independent variable, such as gender, religious a f f i l i a t i o n or o c c u p a t i o n ( a l t h o u g h i t can also handle c o n t i n u o u s independent variables). I n either case, analysis o f variance addresses the issue o f w h e t h e r differences between the averages o f the dependent variable w i t h i n the d i f f e r e n t categories o f the independent variable are greater t h a n s a m p l i n g e r r o r w o u l d suggest. I t does so b y c o m p a r i n g the t o t a l a m o u n t o f v a r i a t i o n between the categories o f the independent variable w i t h the t o t a l a m o u n t within categories. T h e greater the a m o u n t by w h i c h the betweencategory v a r i a t i o n exceeds the w i t h i n - c a t e g o r y v a r i a t i o n , the m o r e l i k e l y i t is t h a t the n u l l hypothesis can be rejected a n d an association between
Strategies for analysis
149
independent a n d dependent variables i n f e r r e d . Analysis o f variance uses the F ratio as its test statistic a n d the F d i s t r i b u t i o n as its s a m p l i n g d i s t r i b u t i o n . There are constraints affecting its a p p l i c a t i o n w h e r e the n u m b e r s i n the categories o f the independent variable differ greatly. Statistical texts discuss m o r e f u l l y w h i c h test statistics a n d sampling distributions are appropriate f o r p a r t i c u l a r k i n d s of data a n d set o u t i n detail the steps i n the testing procedures. I t is w o r t h emphasizing here, however, t w o general aspects o f testing. Since the logic of calculating statistical significance involves the a t t e m p t t o discount the effects o f sampling error, there is n o p o i n t i n testing the significance o f data t h a t is n o t derived f r o m p r o b a b i l i t y samp l i n g . Secondly, statistical significance is n o t the same t h i n g as substantive or research significance. I t is possible t o establish any n u m b e r o f statistically significant results t h a t have n o bearing o n the objectives o f the research (or indeed o n the field o f i n q u i r y as a w h o l e ) . T h e reverse is also possible. A realw o r l d difference or relationship m a y f a i l t o achieve statistical significance (the sample size m a y have been t o o small t o p e r m i t sampling error t o be dismissed). T h i s second p o i n t highlights a general l i m i t a t i o n o f significance tests: outcomes are sensitive t o sample size as w e l l as the size o f recorded differences.
Measures of association for nominal variables T h e c a l c u l a t i o n a n d inspection o f table percentages is a m a i n s t a y o f survey analysis i n basic surveys. I n the case o f simple tables, this a p p r o a c h has the advantage o f being s t r a i g h t f o r w a r d f r o m the v i e w p o i n t o f b o t h p r o d u c t i o n a n d c o n s u m p t i o n . H o w e v e r , the c o m p a r i s o n o f percentages between categories has a n u m b e r o f l i m i t a t i o n s t h a t are n o t adequately dealt w i t h b y the responses t o c o m p l e x i t y suggested o n pages 1 4 4 - 5 (collapsing categories o r e x a m i n i n g a series o f bivariate sub-tables i f there is m o r e t h a n one independent v a r i a b l e ) . A severe l i m i t a t i o n o f a l l the bivariate procedures r e v i e w e d so far is t h a t they d o n o t p r o v i d e a s t r a i g h t f o r w a r d measure o f the strength o f the r e l a t i o n s h i p between the variables. Measures o f association address this deficiency. For n o m i n a l variables, there are t w o m a i n families o f simple measures o f association. T h e first f a m i l y is based o n the c h i square statistic a n d derives a d d i t i o n a l measures f r o m i t , the m o s t c o m m o n o f w h i c h are phi (for 2 X 2 tables, w r i t t e n as cf>) a n d its close r e l a t i o n , Cramer's V. T h e values o f b o t h o f these (and m a n y other indices o f association) take u p values between zero (for n o association) a n d 1.00 (for a perfect association). B o t h are relative indices i n the sense t h a t the closer t o zero the weaker a n d the closer t o 1.00 the stronger the associa t i o n . H o w e v e r , neither measure indicates the p r o p o r t i o n o f the o v e r a l l v a r i a t i o n w i t h i n a table t h a t is a t t r i b u t a b l e t o the association. B o t h c h i square-based measures are n o r m a l l y accompanied b y an i n d i c a t i o n o f the o u t c o m e o f the test o f a n u l l hypothesis.
150
Surveying the social world
T a b l e 8.3
Staff g r o u p ( q l 9 ) b y m a i n m o d e o f c o m m u t i n g (q3)
Academic
Academicrelated
Secretarial clerical, junior administrator Technical
%
%
%
Walk 11 Bike 15 Bus or rail 8 Car (driver/ passenger) 65 Motorbike (driver/passenger) 1 Total 100
%
Manual and ancillary %
7 8 6
8 7 18
4 16 7
13 8 14
77
68
69
61
2 100
0 100
4 100
4 100
The second f a m i l y , p r o p o r t i o n a l r e d u c t i o n i n e r r o r (PRE) measures, such as lambda ( w r i t t e n as X), are based o n a c o m p a r i s o n between t w o attempts at p r e d i c t i o n . I n the first a t t e m p t , the value o f the dependent variable is predicted i n a state o f ignorance a b o u t the values o f the independent variable; i n the second, i t is based o n the values o f the independent v a r i a b l e . I f there is a sufficient i m p r o v e m e n t i n the accuracy o f p r e d i c t i o n between the first a n d the second a t t e m p t , a n association between the t w o variables can be i n f e r r e d . T h e results o f a PRE test are also presented as an i n d e x between zero a n d 1.00, b u t a n i m p o r t a n t advantage o f a PRE i n d e x over a c h i squarebased equivalent is t h a t i t has a m o r e i n t u i t i v e l y m e a n i n g f u l i n t e r p r e t a t i o n . A l a m b d a o f .25 is an i n d i c a t i o n t h a t k n o w i n g the values o f the independent variable reduces the e r r o r factor i n p r e d i c t i n g dependent values by 25 per cent. As a result o f l a m b d a being an a s y m m e t r i c a l measure, results w i l l be different depending o n w h i c h variable is identified as independent. U n d e r some circumstances, l a m b d a is ultra-conservative a n d produces results i n d i c a t i n g n o association w h e n other tests result i n positive outcomes. To see the a p p l i c a t i o n o f a PRE measure, consider the Travel Survey issue o f w h e t h e r there is a r e l a t i o n s h i p between m e m b e r s h i p o f a staff g r o u p a n d main mode of travel to w o r k . The percentages i n Table 8.3 are n o t easy t o i n t e r p r e t . T h e m a n u a l a n d a n c i l l a r y g r o u p appears s l i g h t l y less dependent o n cars t h a n the others a n d it contains a larger p r o p o r t i o n o f w a l k e r s a n d a smaller one o f cyclists. There are also slight divergencies f r o m the 'average' a m o n g the other g r o u p s , b u t it is d i f f i c u l t t o judge w h e t h e r there is an o v e r a l l association between the t w o variables. H e r e the measures o f association can p l a y a valuable r o l e . Table 8.4 reports the results o f t w o measures o f association c o n d u c t e d b y the SPSS package f o r the data c o n t a i n e d i n Table 8.3.
O *^
ft ft
•3
\o en o o o o
O O
H O
ft, ft,
00 00
O
tí o
u O
cd
í2
O Tí" O T—i rH O O O O
iO N O O o o
0\ N O H N O o o o
t-h O o
T3 tí
T3 tí
IT) \D rH
cu
O$-< bJD
tí cd
cu
cu
tí _< CU
T3 > tí oi
-T3 >
2 g
° Ö n tí
ns
S
tí oS
oS
00 m
h
C/5
tí "tí ^
bJD tí
O u
•a
tí cd
o
-a
tí JO. 00
H
I-a "
tí
JD
II § 1
-o o o O
o OH
!
oS
a
"cd
s Ö .2 e
.52 *-i
» cu
l 3 U
CT
OH O .JH o .tí cu u -Q tí +j O
S -2 OS M § ^ O
*3
tí £
U M
152
Surveying the social world
Because l a m b d a can be s y m m e t r i c or asymmetric, the table shows three results: one f o r m o d e o f c o m m u t i n g dependent, one f o r staff g r o u p dependent, a n d ' s y m m e t r i c ' f o r neither dependent. The second p a r t o f Table 8.4 reports the results f o r another PRE measure, Goodman and Kruskal's tau, w h i c h is sensitive t o some patterns o f associa t i o n n o t detected b y l a m b d a . There are t w o r o w s o f results f o r t a u because it o n l y exists i n a n asymmetric f o r m . As far as l a m b d a is concerned, w i t h m a i n m o d e o f c o m m u t i n g dependent the result is zero association. T h e picture w i t h t a u is effectively similar. T h e i n d e x o f association at .015 is very close t o zero a n d there are n o g r o u n d s o n the basis o f these results f o r assuming a r e l a t i o n s h i p between these t w o variables. W h e r e the measures o f association are so close t o zero, the existence o f statistical significance is irrelevant.
Measures of association for ordinal variables A c o n s i d e r a t i o n t h a t applies t o this category is w h e t h e r the variable c o n cerned is continuous or collapsed. T h e variable c o n t a i n i n g the score resulting f r o m a series o f l i n k e d questions e m p l o y i n g a L i k e r t scale w i l l be c o n t i n u o u s - m a n y score values f o r a case w i l l be possible. Spearman's rank order correlation coefficient, k n o w n as rho (r ), is a p p r o p r i a t e here (see the n e x t section f o r m o r e o n c o r r e l a t i o n coefficients). Values v a r y f r o m zero t o 1.00 a n d have n o direct i n t e r p r e t a t i o n , t h o u g h the square o f r h o , r , can be interpreted i n the same m a n n e r as l a m b d a . s
s
2
I f the n u m b e r o f values o f an o r d i n a l variable has been reduced via the w a y the r a w scores are processed, o r as a result o f a subsequent r e - c o d i n g o p e r a t i o n , t h e n gamma (G) is a relevant test statistic. G a m m a values also lend themselves d i r e c t l y t o PRE-type i n t e r p r e t a t i o n . A m i n o r c o m p l i c a t i o n t h a t affects measures o f association f o r o r d i n a l variables is t h a t they each treat d i f f e r e n t l y pairs o f cases t h a t have the same score/rank o n the independent or the dependent variables (or b o t h ) .
Measures of association for ratio variables - correlation A s s o c i a t i o n between t w o r a t i o level variables (or between pairs w i t h i n a larger m a t r i x o f c o n t i n u o u s variables) can be measured by Pearson's correlation coefficient (r) t h a t has a s i m i l a r f o r m t o the o r d i n a l coefficient i n t h a t it takes u p values between zero a n d ± 1 . Coefficients w i t h m i n u s values i n d i cate a negative (inverse) association w i t h values m o v i n g i n opposite directions - as the values o f one variable decrease the values o f the other increase. Positive coefficients indicate t h a t as one variable increases (or decreases) i n value, the other moves i n a s i m i l a r d i r e c t i o n . A zero coefficient a n d the
Strategies for analysis T a b l e 8.5
Miles
Time
153
M i l e s ( q l ) a n d t i m e (q2) c o r r e l a t i o n Miles
Time
1.000
.701** .000 587
Pearson Correlation Sig. (2-tailed) N
588
Pearson Correlation Sig. (2-tailed) N
.701** .000 587
1.000 588
* * Correlation is significant at the .01 level (2-tailed).
values s u r r o u n d i n g i t i m p l y n o association w h i l e values close t o 1 a p p r o x i mate t o a perfect (positive o r negative) association. Values intermediate between zero a n d 1 d o n o t have a direct i n t e r p r e t a t i o n b u t , as w i t h the o r d i n a l coefficient, s q u a r i n g the r value produces a statistic t h a t offers greater i n t e l l i g i b i l i t y . C a l l e d the coefficient of determination ( r ) , an r o f .25 w o u l d indicate t h a t 25 per cent o f the t o t a l v a r i a b i l i t y o f the t w o variables is accounted f o r by their j o i n t v a r i a t i o n a n d t h a t 75 per cent comes f r o m other ( u n k n o w n ) sources. C o r r e l a t i o n is a s y m m e t r i c a l measure: i f variable A is s t r o n g l y correlated w i t h variable B t h e n necessarily B is w i t h A t o the same degree, a n d there is n o f a c i l i t y i n basic c o r r e l a t i o n f o r s t i p u l a t i n g o r i n f e r r i n g w h i c h is dependent a n d w h i c h independent. I t f o l l o w s f r o m this t h a t evidence o f s t r o n g correl a t i o n coefficients is, i n the absence o f other analytic or c o n t e x t u a l i n f o r m a t i o n , never adequate t o establish causal connections. I n the case o f the Travel Survey, i t is relatively safe, given the intervals a n d the n u m b e r o f cases, t o convert the responses t o questions 1 a n d 2 f r o m the bands (class intervals) used o n the questionnaire i n t o the units o f n e w r a t i o variables [miles a n d minutes) by assuming t h a t a l l responses lie at the m i d p o i n t s o f the bands. T h e resulting t e x t u a l o u t p u t f r o m the SPSS c o m p u t e r package is i n Table 8.5. T h e table presents the i n f o r m a t i o n t w i c e , once f r o m the V i e w p o i n t ' o f each v a r i a b l e t h o u g h the views are necessarily i d e n t i c a l . A l l variables correlate perfectly w i t h themselves, w h i c h accounts f o r the t w o cells w i t h coefficients o f 1.000. As c o u l d be a n t i c i p a t e d , the actual miles-time correl a t i o n o f . 7 0 1 is s t r o n g , a n d its coefficient o f d e t e r m i n a t i o n (not r e p o r t e d i n the table b u t .49) indicates t h a t c o - v a r i a t i o n accounts f o r just under h a l f o f the t o t a l v a r i a b i l i t y . T h e s o f t w a r e a u t o m a t i c a l l y conducts a test o f significance the o u t c o m e o f w h i c h is significant at the . 0 1 level, suggesting t h a t i t is very l i k e l y t h a t the association w o u l d also h o l d f o r the p o p u l a t i o n o f a l l staff. 2
2
154
Surveying the social world
Because o f the m a t h e m a t i c a l l y superior i n f o r m a t i o n c o n t a i n e d w i t h i n pairs o f variables measured at the r a t i o level, i t is possible t o go b e y o n d a measure o f association a n d t o analyse h o w a q u a n t i t a t i v e change i n an independent variable is translated i n t o a q u a n t i t a t i v e change i n a dependent v a r i able. T h e k i n d o f a r g u m e n t used i n the PRE measures can be reapplied t o this s i t u a t i o n . I n the absence o f any a d d i t i o n a l i n f o r m a t i o n , the best guess (prediction) f o r the value o f a case i n the p o p u l a t i o n o n a dependent variable (Y) - i n Travel Survey terms again minutes - w o u l d be the m e a n value o f all the sample cases o n minutes. H o w e v e r , i f there is a n independent variable ( X ) t h a t has a linear c o r r e l a t i o n w i t h minutes, like miles, t h e n k n o w l e d g e a b o u t this a d d i t i o n a l r e l a t i o n s h i p can be used t o i m p r o v e the accuracy o f the p r e d i c t i o n . A linear r e l a t i o n s h i p implies t h a t a u n i t increase o r decrease i n miles w i l l result i n the same a m o u n t o f change i n minutes across the w h o l e range o f values o f miles. T h e scattergram provides an o p p o r t u n i t y t o g r a p h i c a l l y p l o t the linear r e l a t i o n s h i p between these o r any t w o c o n t i n u o u s variables. A single straight s u m m a r y line, the least squares regression line, can be d r a w n b y a c o m p u t e r t h r o u g h the scatter o f coordinates t h a t represent the j o i n t d i s t r i b u t i o n o f cases. I t is so-called because i t m i n i m i z e s the t o t a l s u m o f the squared distances between each c o o r d i n a t e a n d the line itself. Its slope is a direct measure o f the linear r e l a t i o n s h i p between the t w o variables. Reading a l o n g the axis representing the independent variable a n d t a k i n g a vertical line t o intercept the regression line gives a p r e d i c t i o n o f the value o f a case on the dependent v a r i a b l e . The existence o f a c o r r e l a t i o n w i l l result i n a pred i c t i o n superior t o one based o n the mean o f a l l the sample values o n the dependent v a r i a b l e . T h e g r a p h i c a l p l o t o f regression can be s h o w n t o represent a n e q u a t i o n o f the f o r m Y = a + bX + e w h e r e Y is the predicted value o f the dependent v a r i a b l e , a is the intercept, the h y p o t h e t i c a l value o f Y f o r w h i c h X is 0, a n d b is a coefficient reflecting the slope - the effect o n Y o f a u n i t change i n X . Regression equations being used f o r p r e d i c t i o n also need t o i n c o r p o r a t e a residual e r r o r t e r m e t h a t covers the v a r i a t i o n t h a t remains u n e x p l a i n e d b y the linear association between the variables. Some general p o i n t s o n testing significance a n d measuring association are: • hypothesis testing is o n l y applicable a n d m e a n i n g f u l i n respect o f data derived f r o m p r o b a b i l i t y samples, b u t measures o f association are relevant m o r e generally (for example where data is the p r o d u c t o f c o m plete e n u m e r a t i o n ) ; • w h e r e t w o variables have d i f f e r e n t levels o f measurement i t is generally safe t o use measures o f association a p p r o p r i a t e t o the l o w e r level;
Strategies for analysis
155
• redefining or collapsing the categories o f n o m i n a l variables i n order t o c o n d u c t repeated tests i n the hope o f finding statistically significant d i f f e r ences is b a d practice; • there are h i d d e n dangers i n c o n d u c t i n g a series of significance tests o n differences between pairs of categories taken f r o m the same independent variable; the risk of mistakenly rejecting a n u l l hypothesis over a series of such tests is higher t h a n the level of significance adopted f o r the i n d i v i d u a l tests.
Multivariate analysis A l l o f the techniques discussed above are p r i m a r i l y designed f o r analysing associations i n a p a i r o f variables at a t i m e . Several o f t h e m , i n c l u d i n g percentaged tables, can be extended t o handle a l i m i t e d n u m b e r o f variables simultaneously. M u l t i v a r i a t e analysis is a generic name f o r a large n u m b e r o f d i f f e r e n t statistical techniques a l l o f w h i c h are designed t o represent the relationship between families o f variables. I n order t o establish the effects o f a c o n t r o l (intervening) variable o n an independent/dependent pair, the a p p r o a c h suggested o n page 144 was t o construct a n d compare a set o f bivariate sub-tables, one f o r each category o f the c o n t r o l variable. Such a procedure is t e r m e d e l a b o r a t i o n a n d there are three possible outcomes f r o m inspecting the sub-tables (or p a r t i a l s ) . 1 C o n f i r m a t i o n o f the o r i g i n a l r e l a t i o n s h i p : each sub-table reproduces the o r i g i n a l r e l a t i o n s h i p essentially unchanged. 2 A spurious o r an i n t e r v e n i n g r e l a t i o n s h i p : the r e l a t i o n s h i p i n each o f the p a r t i a l tables is m u c h weaker or disappears completely: this is an i n d i c a t i o n t h a t the c o n t r o l variable is possibly a d e t e r m i n a n t o f b o t h the o r i g i n a l independent a n d dependent variables. A l t e r n a t i v e l y , the o r i g i n a l independent variable m a y affect the c o n t r o l variable a n d this i n t u r n affects the dependent v a r i a b l e . W h i c h o f these possibilities applies has t o be established by considering the variables a n d the t e m p o r a l l i n k s i n v o l v e d rather t h a n statistically. 3 A n i n t e r a c t i o n : the o r i g i n a l r e l a t i o n s h i p is stronger i n some partials t h a n others: this indicates t h a t specific values o f the c o n t r o l variable enhance the r e l a t i o n s h i p between the o r i g i n a l independent a n d dependent v a r i able, w h i l e others attenuate or suppress i t . Clearly, each o u t c o m e has d i f f e r e n t i m p l i c a t i o n s f o r the d i r e c t i o n t o be t a k e n i n subsequent analysis. T h e mechanics o f e l a b o r a t i o n are described i n greater detail i n , f o r example, de Vaus ( 1 9 9 1 ) , Chapter 12 or Healey ( 1 9 9 0 ) , Chapter 17. W h i l e e l a b o r a t i o n has the m e r i t o f being relatively i n t u i t i v e , i t has l i m i t a t i o n s . T h e procedure lacks an i n t r i n s i c measure o f association; some k i n d s of interactions between the non-dependent variables m a y be d i f f i c u l t t o
156
Surveying the social world
detect s i m p l y by inspection; w h e r e the independent variables have m a n y categories, the n u m b e r o f sub-tables can q u i c k l y become unmanageable. I t is possible, i n p r i n c i p l e , t o collapse categories, b u t this w i l l be at the expense o f a loss o f i n f o r m a t i o n , a n d collapsing categories m a y be h a r d t o j u s t i f y w i t h n o m i n a l level variables i n w h i c h each category is f u n d a m e n t a l l y d i f f e r ent f r o m the others. T h e most effective response t o a s i t u a t i o n i n w h i c h an analyst wishes t o examine the simultaneous m u t u a l influence o f several independent variables o n one (or more) dependent variables is t o use one o f the m o r e advanced m u l t i v a r i a t e techniques. C e n t r a l t o m o s t o f these types o f analysis is the n o t i o n o f a process i n w h i c h the investigator attempts t o f i t a statistical m o d e l t o e m p i r i c a l data. A simple version o f fitting was described i n the previous section w h e r e predict i o n based o n linear regression between t w o r a t i o level variables was under scrutiny. M o r e generally, fitting a m o d e l can be u n d e r s t o o d i n terms o f a simple e q u a t i o n , observed data value = value p r e d i c t e d by statistical m o d e l + residual term (unexplained variation) T h e r e l a t i o n s h i p between a statistical m o d e l a n d a substantive t h e o r y raises technical considerations t h a t go b e y o n d the scope o f this b o o k . F r o m a p r a c t i c a l v i e w p o i n t , fitting a statistical m o d e l t o observed data means p r o d u c i n g an effective s u m m a r y d e s c r i p t i o n o f the w a y the values i n a set o f variables are inter-related. Effective implies ' s i m p l e ' a n d ' a c c o u n t i n g f o r a satisfactory degree o f v a r i a t i o n ' . Process is an apt t e r m because there m a y be a series o f models t o be t r i e d i n succession, a n d also because fitting any one o f t h e m can entail an iterative o r stepwise procedure i n w h i c h p a r t i c u l a r variables are added t o o r subtracted f r o m c o n s i d e r a t i o n i n o r d e r t o i m p r o v e the degree o f fit. ( W h a t variables are i n c l u d e d o r r e m o v e d is a c o n s i d e r a t i o n w h e r e b o t h c o n t e x t u a l a n d theoretical considerations are relevant.) A key statistical m o d e l f o r survey analysis is the general linear m o d e l ( G L M ) w h i c h u n d e r p i n s some o f the most w i d e l y used m u l t i v a r i a t e techniques i n c l u d i n g b o t h simple linear regression a n d m u l t i p l e linear regression. I n the case o f the latter, the m o d e l can be expressed a l o n g the lines o f an extended version o f the e q u a t i o n set o u t o n page 154. O n one side o f the e q u a t i o n is the p o p u l a t i o n value o f the dependent variable t o be predicted. T h e other side is made u p o f the independent variables i n c l u d e d i n the m o d e l , each one o f w h i c h is accompanied by a statistical coefficient (or m u l t i p l i e r ) , plus a residual f o r the u n e x p l a i n e d v a r i a t i o n . T h e e q u a t i o n f o r t w o independent variables l o o k s l i k e this (the w a y i n w h i c h f u r t h e r variables w o u l d be i n c o r p o r a t e d is self-evident): Y = a + bX 1
1
+ b X 2
2
+ e
T h i s k i n d o f e q u a t i o n does n o t lend itself t o representation i n a t w o - d i m e n sional p l o t so the n o t i o n o f slope is n o longer a p p r o p r i a t e .
Strategies for analysis
157
The 'b's indicate the standardized regression coefficients (or m u l t i p l i e r s ) f o r each o f the independent variables i n the e q u a t i o n a n d the use o f this s y m b o l reflects the fact t h a t they are also k n o w n as beta coefficients. A l t h o u g h t h e i r precise statistical f o r m varies w i t h the exact m u l t i v a r i a t e p r o cedure i n question,, regression coefficients have a s t a n d a r d i n t e r p r e t a t i o n . T h e y indicate the a m o u n t o f change i n the dependent variable t h a t results f r o m a single u n i t change i n the p a r t i c u l a r independent variable w i t h w h i c h the coefficient is p a i r e d , w i t h the effects o f a l l the other independent v a r i ables i n the e q u a t i o n suppressed (that is, c o n t r o l l e d ) . There is n o suitable r a t i o level dependent variable o n w h i c h t o c o n d u c t a m u l t i p l e regression analysis i n the Travel Survey. H o w e v e r , i f sufficient detailed data c o u l d be collected, such a technique c o u l d come i n t o p l a y t o a t t e m p t t o p r e d i c t , f o r example, the distance a w a y f r o m the site o f w o r k at w h i c h staff choose t o reside. Such a n e q u a t i o n m i g h t feature the f o l l o w i n g k i n d s o f variable: Distance f r o m home to w o r k
_ ~
Cost per mile of return journey
Journey
Number
Number
time
o f buses/ trains available
o f cars available
H o w e v e r , i t is very l i k e l y t h a t such a m o d e l w o u l d still leave a g o o d deal o f variance i n the l o c a t i o n o f homes u n e x p l a i n e d . T h e c o m m e n t s i n response t o the open-ended question 13 i n the Travel Survey made i t clear t h a t an i m p o r t a n t a t t r a c t i o n o f car use was t h a t i t enabled the c o m m u t e r t o d o other household tasks alongside g e t t i n g t o w o r k . D r o p p i n g c h i l d r e n o f f t o school, g i v i n g other f a m i l y members a l i f t , a n d p i c k i n g u p s h o p p i n g are some o f the tasks t h a t were m e n t i o n e d . I n analysis terms, the i m p l i c a t i o n o f this is t h a t the variable journey time w h i c h m i g h t be t h o u g h t t o capture the superior convenience o f a d o o r t o d o o r car j o u r n e y as against the w a l k f r o m the bus stop or s t a t i o n , m a y i n fact seriously underestimate the f l e x i b i l i t y o f the car as a m o d e o f c o m m u t i n g i n the eyes o f its users. M u l t i p l e regression is a n extremely p o w e r f u l a n d flexible f a m i l y o f techniques a n d , as is i n v a r i a b l y the case, its use comes w i t h a series o f p r e c o n d i t i o n s , l i m i t a t i o n s a n d restrictions. I t is generally necessary w i t h a l l advanced statistical techniques t o anticipate t h e i r requirements at the research design stage so t h a t a suitable sample size a n d variables w i t h a p p r o p r i a t e data types are available. T h e m a i n p r e c o n d i t i o n s a n d restrict i o n s , together w i t h some ways o f h a n d l i n g t h e m , are: • P r e d o m i n a n t l y r a t i o variables: there is scope f o r c o n v e r t i n g categorical variables t o a series o f ' d u m m y ' d i c h o t o m o u s variables w h i c h can t h e n be i n c l u d e d i n regression equations. • Simple r a n d o m s a m p l i n g is presumed t h o u g h other p r o b a b i l i t y designs m a y be acceptable.
158
Surveying the social world
• A n o r m a l d i s t r i b u t i o n o f cases o n the dependent variable f o r any value o f the independent variables: i f this is v i o l a t e d , statistical t r a n s f o r m a t i o n o f the values o f the dependent variable m a y be possible. • There is a basic a s s u m p t i o n t h a t the relationships between a l l the independent variables a n d the dependent variable are linear (see page 154). Scattergrams (scatterplots) can give an i n d i c a t i o n o f n o n - l i n e a r i t y : dealing w i t h i t requires the conversion o f the data, using a l o g a r i t h m i c transf o r m a t i o n , i n t o a f o r m a t i n w h i c h linear relations can be detected. • T h e regression techniques described above d o n o t cope w e l l w i t h situations i n w h i c h there is a substantial degree o f i n t e r a c t i o n o r c o r r e l a t i o n between the independent variables t o be i n c l u d e d w i t h i n a regression e q u a t i o n : again, this obstacle can be h a n d l e d b y the a p p l i c a t i o n o f m o r e sophisticated statistical techniques t h o u g h the cost is greater c o m p l e x i t y i n i n t e r p r e t i n g the results. • T h e residuals (or standardized residuals) i n regression equations need t o meet assumptions o f independence, linearity, n o r m a l i t y o f d i s t r i b u t i o n a n d constant variance: statistics texts discuss the tests available f o r each o f these c o n d i t i o n s a n d the possible antidotes t o their v i o l a t i o n . • E x c e p t i o n a l cases (outliers) can d i s t o r t i n t e r p r e t a t i o n leading t o the detect i o n o f spurious relationships o n the one h a n d a n d the suppression o f real ones o n the other: there are techniques f o r detecting outliers a n d regression equations can be calculated o m i t t i n g t h e m i n order t o evaluate the scale o f their i m p a c t . • P r e d i c t i n g the value o f a dependent variable b e y o n d the range o f values f o r w h i c h there is observed data is n o t advisable. • Unless there is a definite t e m p o r a l o r l o g i c a l order a m o n g variables, regression equations i n d i c a t i n g h i g h levels o f association w i l l still leave the issue o f causation o p e n . Because o f the large n u m b e r o f o p t i o n s i n c o n d u c t i n g regression analysis a n d because there are m u l t i p l e statistical criteria f o r the data t o be analysed t o satisfy, a detailed guide t o a p p l y i n g the procedures s h o u l d be consulted (see the f u r t h e r reading section at the end o f the chapter).
Box 8.1
Further important techniques of data analysis
Partial correlation A n extension of basic correlation techniques that allows the effects of selected independent variables that may affect a bivariate relationship t o be controlled. Whereas r symbolizes the standard o r zero-order correlation coefficient between X and Y w i t h no controls, r (a first o r d e r partial correlation coefficient) measures the association between X and Y discounting the effects of Z , while r (a second order partial correlation x y
x y z
x y a b
Strategies for analysis
coefficient) simultaneously discounts t w o variables, A and B. Partial correlation uses a recursive procedure in which zero order coefficients are fed into a formula t o generate first order coefficients, which, if desired, can then be fed back into the same formula t o produce second order coefficients, and so on. Factor analysis This is often applied at an intermediate stage in the analysis process f o r either of t w o main purposes. The first is t o reduce large correlation matrices containing many independent variables t o much smaller sets of 'hybrid' factors which contain elements of all the original variance-predicting variables combined together and weighted in such a way that each explains as much of the variation in the dependent variable(s) as possible. N o n e of the factors that are produced will be correlated at all w i t h each other. One problem w i t h this application is t o find a meaningful theoretical interpretation f o r the hybrid variables that emerge o u t of the analysis. A second use of factor analysis is t o demonstrate f r o m the scores of subjects that the different items included within a measuring instrument such as a pencil and paper test actually tap the same entity o r dimension. Path analysis A f o r m of causal modelling, the most sophisticated level of analysis, which requires an initial set of variables f o r which the temporal relationships are already established. By means of regression analyses, measures of the associations between all the variables are calculated and tested for significance. Weak linkages can then be ignored and dropped f r o m further consideration. The end result is a graph made up of arrows associated w i t h indexes that measure the precise influence each variable has on the others. Unlike some other multivariate techniques, the links in path models have directionality and are thus one step closer t o real w o r l d mechanisms and processes. Log-linear analysis A n approach, based on the GLM model, t o explaining associations in categorical data (cross-tabulations that are made up of nominal o r ordinal level variables). The technique considers all the variables in the table t o be independent and treats the observed frequency in a cell o r cells that is t o be predicted f o r the population as a function of all of them. In order t o produce an effective and simple model, an iterative procedure based on a goodness of fit test is used t o eliminate variables and statistical artefacts that do not contribute t o accurate prediction. In logit variants, a dichotomous dependent variable is presumed.
159
160
Surveying the social world
Key summary points • The objective in survey analysis is t o construct a coherent picture o f a piece of the social world ire tools t o help you achieve these objectives. • Survey analysis necesss
unpromising avenues o f
Further reading There are many general textbooks on data analysis and statistics for the social sciences. Some introductory British ones are, Rose and Sullivan (1996) Introducing Data Analysis for Social Scientists, second edition, which comes w i t h a dataset on floppy disc; Bryman and Cramer (1990) Quantitative Data Analysis for Social Scientists, which contains many worked examples of the analysis of survey data including multiple regression; Fielding and Gilbert (1999) Understanding Social Statistics, which is linked to data sets, exercises and a glossary available from a web site. A useful, purely statistical text is Healey (1990) Statistics: A Tool for Social Research, second edition, which covers descriptive, inferential and multivariate procedures. Detailed coverage of descriptive statistics can be found i n Loether and McTavish (1974) Descriptive Statistics for Sociologists. Marsh (1988) Exploring Data is an excellent guide to exploratory methods of data analysis in surveys although i t does not cover inferential statistics. A n intermediate level treatment of statistical tests commonly used in the social sciences is Siegel and Castellan Jr. (1988) Non-parametric Statistics for the Behavioural Sciences. Norusis Guide to Data Analysis (various SPSS releases) is helpful for users of the SPSS survey software package (the use of this package is also covered in Rose and Sullivan, Fielding and Gilbert, and Bryman and Cramer). Gilbert (1993b) Analysing Tabular Data: Loglinear and Logistic Models for Social Researchers is a useful guide to this increasingly popular technique. Factor analysis and regression are dealt w i t h in Everitt and Dunn (1991) Applied Multivariate Analysis, and also in Bryman and Cramer.
(J)
Presenting your findings
Writing for an audience I n 1995, the E c o n o m i c a n d Social Research C o u n c i l p u b l i s h e d a b o o k l e t entitled Writing for Business. I t contained the f o l l o w i n g extract f r o m a research team's s u m m a r y o f their w o r k : T h e research w i l l present a structuralist i n f o r m e d challenge t o b o t h positivistic a n d humanistic/post-structuralist approaches t o the study o f the e n v i r o n m e n t a l crisis, a n d i n p a r t i c u l a r t o the neo-classical e n v i r o n m e n t a l economics p a r a d i g m . (ESRC 1995:
13)
162
Surveying the social world
Far f r o m being a p a r o d y , the example is real a n d t y p i c a l . W h a t lesson can we d r a w f r o m it? To the academic c o m m u n i t y , the key w o r d s i n the extract are: ' s t r u c t u r a l ist', ' p o s i t i v i s t i c ' , 'humanist/post-structuralist' a n d 'neo-classical e n v i r o n m e n t a l economics p a r a d i g m ' . I t is n o t t h a t they are w r o n g , since these are key theoretical ideas. Academics u n d e r s t a n d such code w o r d s , a n d k n o w where the researchers are c o m i n g f r o m . I t is just t h a t , t o everybody else, the one arresting phrase is 'the e n v i r o n m e n t a l crisis'. T h e p r o b l e m is n o t one o f substance, b u t o f c o m m u n i c a t i n g t o an audience. T h e ESRC b o o k l e t also gives some o f the reactions o f business people t o research reports. For example: I f I can't get t o the p o i n t o f a r e p o r t q u i c k l y e n o u g h , i t goes i n the b i n . (Jan B u c k i n g h a m , Public A f f a i r s Manager, A l l i e d D o m e c q pic, 1995: 4) I s i m p l y haven't the t i m e t o read e v e r y t h i n g o n the o f f chance t h a t there's something o f interest. (Dr T o n y W h i t e , H e a d o f C o r p o r a t e Strategy, N a t i o n a l G r i d p i c , 1995:1) A n d , m o r e p o s i t i v e l y b u t also m o r e p o i g n a n t l y : I've g o t o n m y desk n o w a very i m p o r t a n t piece o f research - I t h i n k . (Sam Porter, Chief E c o n o m i s t , Boots p i c , 1995: iv) Problems i n c o m m u n i c a t i o n t e n d t o lead t o m u t u a l r e c r i m i n a t i o n s . T h e researchers m a y accuse the readers o f ignorance a n d n a r r o w - m i n d e d n e s s : they are o n l y interested i n s h o r t - t e r m p r o f i t s , the b o t t o m line. T h e audience m a y accuse the researchers o f arrogance a n d h i d i n g b e h i n d j a r g o n . Yet w e can also see, i n the examples above, a r e c o g n i t i o n t h a t research findings are o f t e n very i m p o r t a n t , i f o n l y w e c o u l d u n d e r s t a n d t h e m . I t m a y be h e l p f u l t o begin b y considering w h a t an audience o f non-specialists w i l l be l o o k i n g f o r i n a r e p o r t o f the findings o f a social survey. We suggest three basic p o i n t s : • Non-specialist readers w a n t t o k n o w w h a t the survey f o u n d , w h e t h e r the findings are i m p o r t a n t , a n d i f so h o w they can be p u t t o use. • Non-specialist readers w i l l o f t e n expect the researchers t o p r o d u c e n o t just findings b u t also r e c o m m e n d a t i o n s . • Non-specialist readers are less interested i n the m e t h o d o l o g i c a l details a n d the c o m p l e x statistics. The classic a p p r o a c h t o c o n s t r u c t i n g a research r e p o r t recognizes t h a t these are the readers' basic requirements. Findings a n d r e c o m m e n d a t i o n s are h i g h l i g h t e d , w h i l e m e t h o d o l o g i c a l complexities are given i n appendices for those w h o are interested or need t o k n o w .
Presenting your findings
163
W r i t i n g f o r a specialist audience - as i n a n article f o r a specialist j o u r n a l (Box 9.2) - involves g i v i n g m o r e a t t e n t i o n t o r e v i e w i n g the existing literat u r e , discussing t h e o r y a n d concepts, a n d p r o v i d i n g r i c h details o f m e t h o d o l o g y a n d findings.
Characteristics of the classic research report Overt macrostructure T h e overall structure o f the classic research r e p o r t is clear f o r a l l readers t o see. I t w i l l be d i v i d e d i n t o a n u m b e r o f easily i d e n t i f i e d chapters o r sections. T h e Travel Survey shows one example o f this (see B o x 9.1).
Box 9.1
Travel Survey - macrostructure
Title Contents page Key Findings Section 1: Background Section 2:
Staff Survey
Section 3: Section 4: Section 5: Summary
Student Survey Commercial Vehicle and Visitors Census Open-ended Responses
Appendix 1:
Definition of commuting modes Staff and student questionnaires Covering letter Commercial and visitors census forms
Appendix 2: Appendix 3: Appendix 4:
The Travel Survey was a r e p o r t prepared f o r a sponsor w h o w a n t e d h a r d data t o i n f o r m r a t i o n a l policies. T h e structure o f the r e p o r t reflects this. O t h e r situations call f o r different approaches. B o x 9.2 shows a t y p i c a l structure o f a research r e p o r t i n a n academic j o u r n a l . T h e rules are n o t h a r d a n d fast, b u t v a r y f r o m c o n t e x t t o c o n t e x t . T h e u n d e r l y i n g p r i n c i p l e o f t h e classic research r e p o r t remains t h e same: the m a c r o s t r u c t u r e o f the r e p o r t s h o u l d be clear a n d readily i d e n t i f i e d .
Frequent signposts To ensure t h a t t h e macrostructure
is clear, t h e classic research r e p o r t
contains p l e n t y o f signposts. Readers are r e m i n d e d where they are i n t h e a r g u m e n t . W e d o n o t w a n t o u r readers t o get lost - hence t h e signposts.
164
Surveying the social world
Box 9.2 A paper in an academic social science journal - macrostructure Title Author Abstract Keywords Theoretical introduction and literature review Research methods Findings Conclusions References Acknowledgements Biographical note
Academic w r i t i n g is f u l l o f signposts: ' i t was s h o w n i n Chapter 2 t h a t . . .', ' i t is first necessary t o consider . . .', ' I shall argue t h a t . . .'. Signposts can be overdone, a n d o f t e n are; b u t better t o o m a n y t h a n n o t e n o u g h .
T h r e e f o l d structure I m p l i e d i n the idea o f frequent signposts is a t h r e e f o l d structure. The readers are t o l d where they have been, where they are n o w a n d where they are g o i n g . People o f t e n say a piece o f w r i t i n g s h o u l d have 'a b e g i n n i n g , a m i d d l e a n d an e n d ' . O r , as the standard advice t o teachers has i t : y o u tell t h e m w h a t y o u ' r e g o i n g t o tell t h e m , t h e n y o u tell t h e m , a n d t h e n y o u tell t h e m w h a t you told them. The Travel Survey displays this t h r e e f o l d structure. T h e key findings present essential m a t e r i a l i n seven bullet p o i n t s (see B o x 9.3). T h e b o d y o f the
Box 9.3
The
Travel Survey - Key findings
• A large p r o p o r t i o n of University staff rely exclusively on cars for commuting: they value the flexibility and convenience of the car and may be resistant t o changing their existing travel patterns. • The lengthy travel times for those travelling t o w o r k by car reflect the fact that the journey t o w o r k can often involve one o r more stops en route. • Staff currently combining car use w i t h alternative modes of travel appear t o be more open t o changing their travel arrangements.
Presenting your findings
165
• The possibility of w o r k i n g f r o m home was considered attractive by a substantial number of staff. • There was widespread demand for improved cycle facilities, both in numbers of comments and the range of improvements suggested. • Students mostly walk o r cycle and live less than t w o miles away f r o m the campus. • Many requests were made for more frequent buses, more direct bus routes and subsidized fares.
r e p o r t , sections 1 t o 5, presents these findings i n m o r e d e t a i l . T h e r e p o r t ends w i t h a s u m m a r y (Box 9.4). Characteristically, the s u m m a r y n o t o n l y repeats the key findings, i t also amplifies a n d adds t o t h e m . I t also suggests, i n d i rectly, w h a t some o f the p o l i c y i m p l i c a t i o n s o f the survey m i g h t be. T h e t h r e e f o l d structure is m o r e t h a n mere r e p e t i t i o n .
Box 9.4
Travel Survey - summary
Staff depend heavily on cars t o get t o w o r k and many tend t o be 'exclusive car users' w h o appear t o have permanent access t o a car for commuting. Students mostly walk o r bike and live less than t w o miles away f r o m the campus. The main attraction of the car for staff and students is its flexibility and convenience. Staff w h o travel t o w o r k by car take longer than those using other modes of transport and the open-ended data suggest that this is because a journey t o w o r k often involves one o r more stops en route. Dropping children t o nursery and school, spouses t o w o r k and shopping were some of the tasks done before arriving at the campus. It might be expected for these reasons that exclusive car users will be reluctant t o alter their travel patterns. From the survey responses, the most likely influence that might alter the way this groups travels t o w o r k is the prospect of more direct bus routes. Although few students use a car t o get t o the campus, those w h o do tend t o rely on it exclusively and may also be hard t o deter f r o m car use. Staff combining car travel w i t h alternative modes of transport appear more flexible and likely t o change. Higher proportions of this group than exclusive car users were Very likely' t o alter their travel patterns for each of the developments we listed in question 14 (subsidized travel cards, more direct bus routes, campus car park charging, improved shower facilities and improved cycle security).
166
Surveying the social world
The option of w o r k i n g f r o m home was popular among staff and is one way campus traffic might be reduced. High proportions of academic, academic-related and secretarial staff find the prospect of working f r o m home Very attractive'. Daily bus fares are mostly under £2, the median fare f o r staff was £ 1.40, for students it was £ 1.09. Frequent requests were made f r o m both staff and students w h o currently use this mode of transport ( o r would like t o ) , f o r more frequent buses, m o r e direct buses and subsidized fares. There was widespread demand, both in numbers and range of suggestions, f o r improved cycle facilities. In particular, respondents asked f o r improved storage and security, more and better cycle paths, and facilities for showering. W i t h respect t o the census of commercial traffic, visits t o the campus build t o a mid-morning peak, slowing down in the late afternoon. The vehicle used is typically a van and will usually make a single delivery on campus most weekdays. The census held in the Social Sciences car park suggests that visitors are often the sole occupants of cars and come t o the campus fairly frequently.
Passive voice T h e classic research r e p o r t avoids the personal p r o n o u n s T a n d ' w e ' , n o r does i t address the reader as ' y o u ' . Instead, extensive use is made o f the passive voice, as i n this sentence itself. So, instead o f ' I i n t e r v i e w e d t w e n t y respondents', the researcher says, ' T w e n t y respondents were i n t e r v i e w e d ' . O n e p r o b l e m w i t h using the passive voice is t h a t w e c a n easily be l e d i n t o verbiage, so t h a t ' T w e n t y respondents were i n t e r v i e w e d ' becomes ' I n t e r views were c a r r i e d o u t w i t h t w e n t y respondents'. Past tense A l t h o u g h f e w w r i t e r s c o m m e n t o n i t , the use o f the past tense is a significant feature o f the classic research r e p o r t . T h e r e p o r t is retrospective: these were the hypotheses, this was the research m e t h o d , the responses were these. O n l y i n the f i n a l section, i f at a l l , is the w r i t e r l i k e l y t o s w i t c h t o the present tense: these are the conclusions, a n d f u r t h e r research is needed.
Use of tables, figures and diagrams I n the Travel Survey, w e presented o u r findings g r a p h i c a l l y i n the f o r m o f tables, as f o r example Table 9 . 1 .
Presenting your findings Table 9.1
167
Regular m o d e o f t r a n s p o r t t o the campus a m o n g students
Mode of transport
N
Walk Bicycle Car as driver Bus Car as passenger Motorbike as driver
133 82 47 13 5 2
Total
282
% 47.2 29.1 16.7 4.6 1.8 .7 100
T h e great advantage o f tables is t h a t w e can use t h e m t o present a w e a l t h of data economically. Readers are able t o see a l l the i n f o r m a t i o n , t o f o r m their judgements a b o u t i t a n d t o p e r f o r m f u r t h e r analysis i f they w i s h . T h e p r o b l e m w i t h tables is t h a t the m a i n p o i n t s m a y n o t come o u t very clearly. Some readers w i l l n o t k n o w w h a t t o m a k e o f a table - h o w t o ' r e a d ' i t . Tables lack visual i m p a c t , even i f w e use borders, b o l d type or colour. Instead o f tables, w e can use v a r i o u s w a y s o f representing o u r data t o greater effect (see also pages 1 3 9 - 4 2 ) . These are p a r t i c u l a r l y useful w h e n c o m m u n i c a t i n g w i t h a non-specialist audience. T h e i r d r a w b a c k s are t h a t they usually i n v o l v e some s i m p l i f i c a t i o n o f the data, a n d t h a t readers c a n n o t easily p e r f o r m their o w n analysis o f the data since they m a y n o t have a l l the i n f o r m a t i o n they need. A specialist audience w i l l n o r m a l l y expect this detail - perhaps i n an a p p e n d i x - i n a d d i t i o n t o the figures a n d diagrams. W h e r e w e w i s h t o illustrate the p r o p o r t i o n s i n w h i c h a variable is d i v i d e d i n t o its different values, a pie c h a r t (also called a pie d i a g r a m ) is a p p r o p r i ate. I n Figure 9 . 1 , the data are t a k e n f r o m Table 9 . 1 a n d presented as a pie c h a r t s h o w i n g h o w students divide u p i n terms o f h o w they n o r m a l l y get t o the campus. Readers are p r o b a b l y struck i m m e d i a t e l y b y h o w m a n y (47 per cent) w a l k t o campus. T h e eye m i g h t t h e n be d r a w n t o the second largest p r o p o r t i o n , the 2 9 per cent w h o cycle t o w o r k . Readers m i g h t also notice h o w f e w use a m o t o r b i k e ( 1 per cent), o r travel as a car passenger (2 per cent) o r even by bus (5 per cent). Pie charts are less effective w h e n there are m a n y segments, or w h e n some segments are very s m a l l . I n Figure 9 . 1 , t r a v e l l i n g b y m o t o r b i k e (1 per cent) is a sliver a n d t r a v e l l i n g as a car passenger (2 per cent) is o n l y a t i n y segment. I f the pie c h a r t is s m a l l , the slivers m a y disappear altogether. O n e p o s s i b i l i t y is t o c o m b i n e t h e m i n t o a n ' o t h e r ' category, w h i c h preserves the visual i m p a c t , t h o u g h at the cost o f a loss o f i n f o r m a t i o n . W h e r e w e are r e p o r t i n g o n frequencies, b a r charts a n d histograms are a s t r a i g h t f o r w a r d w a y o f presenting basic i n f o r m a t i o n clearly. Figure 9.2, d r a w n f r o m the Survey Unifs survey o f i n t e r n a t i o n a l students, presents data
168
Surveying the social world
36 31
29
• i
o 20
11 •
_ a _ J
B
Parents/guardians •
Figure 9.2
^
9
Brothers/sisters
Studied at Nottingham
9
Other relatives
H S t u d i e d elsewhere in UK
International students w i t h relatives w h o studied in the U K
o n the n u m b e r o f i n t e r n a t i o n a l students whose relatives have themselves studied i n the U K . T h e data are b r o k e n d o w n i n t o d i f f e r e n t categories of relative. I n a d d i t i o n , the figure shows the numbers w h o studied at N o t t i n g h a m c o m p a r e d t o other U K universities.
Presenting your findings
169
100
80
60 Main mode of travel f ~ ] Motor vehicle
40
Bus 20
I H i Bike 11
m
I
I Walk
T T Academic Administrative Technical and ancillary Staff groups Academic N=162 Administrative N=263 Technical and ancillary N=146 Figure 9.3
Mode of travel by main staff group
A w a y o f b r i n g i n g o u t the differences i n the j o i n t d i s t r i b u t i o n o f t w o v a r i ables is by a stacked bar c h a r t . Figure 9.3, based o n Travel Survey data, shows v i v i d l y the general dependence by a l l staff groups o n c o m m u t i n g by car, b u t i t also h i g h l i g h t s the m o r e w i d e s p r e a d use o f b o t h bicycles a n d w a l k i n g b y academics c o m p a r e d t o a d m i n i s t r a t o r s . I n this p a r t i c u l a r figure, visual comparisons o f the relative i m p o r t a n c e o f d i f f e r e n t modes o f t r a v e l have been assisted by m a k i n g the t o t a l n u m b e r o f observations f o r each g r o u p f o r m the basis o f the percentages f o r each stack. L i n e graphs are useful f o r presenting data o n social trends. Figure 9.4, based o n i n v e n t e d data, shows c o m p a r a t i v e trends i n car o w n e r s h i p a m o n g staff a n d students. T h e line g r a p h can be effective i n h i g h l i g h t i n g o v e r a l l trends, differences between categories, a n d peaks a n d t r o u g h s i n the data. I n a l l these w a y s o f presenting data, w e s h o u l d p r o v i d e readers w i t h sufficient i n f o r m a t i o n t o enable t h e m t o i n t e r p r e t the data correctly. D e p e n d i n g o n the n a t u r e o f the data a n d the f o r m a t chosen, such tables a n d figures s h o u l d have:
170
Surveying the social world
100
r
90 80
0
G)
70 60
TO < D if 50
0) CL
-Staff Students
40 30 20 10 0
1991
Figure 9.4
1992
1993
1994
1995
1996
1997
1998
1S
Staff and student car ownership 1991-99
• a title; • a p p r o p r i a t e l a b e l l i n g o f the c o l u m n s a n d r o w s o f a table; • a p p r o p r i a t e l a b e l l i n g o f the x ( h o r i z o n t a l ) axis a n d y (vertical) axis o f a d i a g r a m (if one variable is treated as an independent v a r i a b l e , i t is c o n v e n t i o n a l l y assigned t o the h o r i z o n t a l axis); • a legend i d e n t i f y i n g the segments o f a pie c h a r t , o r the elements o f a bar c h a r t , or the lines o n a line d i a g r a m ; • an i n d i c a t i o n o f the units o f measurement (for example, litres, miles, t h o u sands o f £ s ) ; • a n i n d i c a t i o n o f the n u m b e r o f cases i n v o l v e d ; • an a c k n o w l e d g e m e n t o f the source o f the data i f w e have n o t generated the data ourselves.
Reporting on the research methods I n a l l social science research w e need t o m a k e o u r research m e t h o d s transparent t o the reader. Science thrives o n c r i t i c i s m , a n d c r i t i c i s m thrives o n i n f o r m a t i o n . Readers s h o u l d be given a l l the i n f o r m a t i o n they need i n order to evaluate o u r research a n d t o replicate i t i f they w i s h . H o w w e p r o v i d e the i n f o r m a t i o n varies a c c o r d i n g t o the audience f o r o u r research r e p o r t . I n the case o f an article i n an academic j o u r n a l , w e w i l l be expected (see B o x 9.2) t o include a section g i v i n g details o f o u r research m e t h o d s . A n academic thesis w i l l t y p i c a l l y have a large section or w h o l e chapter devoted t o m e t h o d o l o g y . A research r e p o r t (see B o x 9.1) n o r m a l l y presents this m a t e r i a l i n one or m o r e appendices. The details w e need t o p r o v i d e cover three aspects o f o u r research
Presenting your findings
171
m e t h o d s . D e p e n d i n g o n the c o n t e x t , w e s h o u l d consider i n c l u d i n g the following. 1 • • • • •
I n f o r m a t i o n a b o u t the sample: sample size sampling frame s a m p l i n g procedure d e m o g r a p h i c characteristics o f the sample response rate
• discussion o f the representativeness o f the sample 2 • • •
I n f o r m a t i o n a b o u t o u r research i n s t r u m e n t s : reasons f o r choosing the m e t h o d o f data c o l l e c t i o n p i l o t i n g procedures reliability and validity
• • • •
a c o p y o f the questionnaires o r i n t e r v i e w schedules w h o gathered the data w h e n a n d w h e r e the data were gathered techniques o f data analysis
3 • • • •
Discussion o f research ethics: access t o respondents a n d documents consent a n d overtness/covertness o f research confidentiality or anonymity i n f o r m a t i o n a n d feedback p r o v i d e d t o respondents
Advantages of the classic approach 1
Clarity T h e emphasis o n signposting, the use o f tables a n d charts, a n d a l l the r e p e t i t i o n a n d r e d u n d a n c y s h o u l d m e a n t h a t i n f o r m a t i o n is conveyed clearly a n d u n a m b i g u o u s l y . 2 Ease of scanning T h e p r o m i n e n c e o f the m a c r o s t r u c t u r e makes i t easy f o r a reader t o scan the r e p o r t q u i c k l y i n order t o grasp the key p o i n t s . Beginn i n g w i t h k e y findings - i n business circles, a n 'executive s u m m a r y ' enables the reader t o get the gist i n m i n u t e s , even seconds. T h e t i m e pressed c o r p o r a t e executives cited at the b e g i n n i n g o f this chapter expect t o find a n executive s u m m a r y . T h e k e y findings o f the Travel Survey, the o u t c o m e o f weeks o f w o r k b y the Survey U n i t , are condensed i n t o precisely one h u n d r e d a n d fifty w o r d s . 3 Scientific register A v o i d i n g personal p r o n o u n s i n f a v o u r o f the passive voice c o n t r i b u t e s t o the f l a v o u r o f a scientific r e p o r t . I m p e r s o n a l i t y c o n veys objectivity. T h e structure o f the r e p o r t is geared t o t r a n s m i t t i n g i n f o r m a t i o n as efficiently as possible. O t h e r considerations - a racy style,
172
Surveying the social world
an i n t r i g u i n g n a r r a t i v e , w i t t y asides - w o u l d be seen as irrelevant distractions f r o m scientific r e p o r t i n g o f h a r d data. 4 Credibility
T h e i m p e r s o n a l scientific register is designed t o elicit credi-
b i l i t y . C o n f o r m i n g t o the canons o f the o r t h o d o x research r e p o r t is less r i s k y t h a n d e p a r t i n g f r o m t h e m . I n commissioned research, this is w h a t the sponsor is p a y i n g us for.
Disadvantages of the classic approach O n e o f the m a i n challenges t o the survey m e t h o d , discussed i n Chapter 1 , page 12, is the h u m a n i s t i c c r i t i q u e . Social surveys are said t o be a misguided a t t e m p t t o i m i t a t e the n a t u r a l sciences. Consistent w i t h this c r i t i q u e , the style a n d structure o f the classic research r e p o r t are seen as pseudo-scientific. T h e y betray, i t is alleged, the a n t i - h u m a n i s t i c nature o f the survey m e t h o d . T h i s basic challenge is reflected i n a n u m b e r o f specific p o i n t s : 1 Dullness
C l a r i t y , r e p e t i t i o n a n d redundancy, impersonality, i n f o r m a t i o n
transfer: the r e p o r t m a y be at best d r y a n d at w o r s t b o r i n g , b u t i n either case i t w i l l be d e v o i d o f h u m a n interest. 2 No surprises A strength o f the classic research r e p o r t , its s t a n d a r d i z a t i o n , can also be a weakness. Predictability can be represented as a p r o b l e m : o u r findings were predictable, a n d n o survey was needed t o r e p o r t the o b v i o u s . O u r readers k n e w a l l this already. 3 Misrepresentation of the research process T h e classic research r e p o r t provides a n u n t r u e account o f the research process. I t implies t h a t the researchers began by f o r m u l a t i n g a clear a n d distinct set o f hypotheses, w h i c h they t h e n tested using a carefully designed set o f instruments. F i n d ings are t h e n w r i t t e n u p efficiently a n d objectively. I t is n o t a n a t u r a l hist o r y o f the research process b u t an idealized artificial construct t h a t misleads readers a b o u t the nature o f scientific enquiry. Far f r o m being a m u r d e r mystery where the dénouement is saved u p t o the very e n d , the classic research r e p o r t announces its findings i n a s u m m a r y at the very b e g i n n i n g . T h e r e p o r t suppresses n a r r a t i v e : w e are n o t t o tell the story o f the research process as i t u n f o l d s , b u t t o reconstruct i t i n a n idealized f o r m . Readers i n e v i t a b l y lose a l l sense o f the t h r i l l o f the chase. A n y o n e w h o has ever engaged i n research k n o w s t h a t this idealized account is fiction. Research is messy, chaotic a n d unpredictable. H y p o t h eses are t y p i c a l l y vague ideas. C l a r i t y emerges d u r i n g the research process rather t h a n preceding i t . We can consider the p r o b l e m o f misrepresentation by e x a m i n i n g the d i s t i n c t i o n between w r i t i n g a n d w r i t i n g u p .
Presenting your findings
173
Writing up I n the classic research r e p o r t , a n d i n the n a t u r a l sciences, p r e p a r a t i o n o f the r e p o r t is c u s t o m a r i l y referred t o as ' w r i t i n g u p ' . Similarly, the final year o f a research degree such as an M P h i l or P h D is o f f i c i a l l y described as 'the w r i t i n g u p stage'. W h a t does this phrase i m p l y ? W r i t i n g u p is: • the final stage o f the research process; • a retrospective account i n the past tense; • an objective r e p o r t o f findings; • i n w h i c h c l a r i t y is the o n l y relevant l i t e r a r y value; • i n w h i c h r e w r i t i n g is m a i n l y p o l i s h i n g : r e m o v i n g g r a m m a t i c a l a n d spelling mistakes and other infelicities; • a n d i n w h i c h the emphasis is o n o r g a n i z a t i o n : w h i c h bits t o p u t w h e r e , a n d w h i c h bits t o leave o u t .
Writing As w e have seen, the classic research r e p o r t does n o t a i m t o give a n a t u r a l h i s t o r y o f the actual research process. Instead, i t is a selective a n d idealized construct c o n f o r m i n g t o a set o f conventions whose goal is t o t r a n s m i t i n f o r m a t i o n efficiently. T h e idea t h a t w r i t i n g begins o n l y after a l l the other stages o f the research process have been completed is u n t r u e t o the w a y i n w h i c h m a n y projects are actually prepared. L e a v i n g the w r i t i n g t o the w r i t i n g u p stage is t o ignore the p o i n t , emphasized by A l a s u u t a r i ( 1 9 9 5 ) , t h a t w r i t i n g is a t o o l o f t h i n k i n g . I t is n o t t h a t w e have clear a n d distinct ideas i n o u r m i n d s w h i c h w e t h e n c o m m i t t o paper. R e w r i t i n g is m o r e t h a n p o l i s h i n g a n d e l i m i n a t i o n o f spelling a n d g r a m m a t i c a l mistakes, m o r e t o o t h a n a d d i n g l i t e r a r y embellishments. O u r ideas are refined i n a n d by the processes o f w r i t i n g a n d r e w r i t i n g . I n most research projects the w r i t i n g process begins early o n . T h i s is t r u e o f social surveys as w e l l as field w o r k a n d ethnography. Survey researchers o f t e n keep a field diary, i n w h i c h they r e c o r d their impressions, their ideas a n d their intellectual puzzles. A l t e r n a t i v e l y , they m a y use a tape recorder. N o t i n g o u r ideas a n d impressions is o b v i o u s l y invaluable at the pretesting a n d p i l o t i n g stages, b u t i t is also useful i n the m a i n survey. For example, one o f the things interviewers w o u l d d o w e l l t o r e c o r d is their impressions o f the w a y respondents reacted t o key questions. T h i s can be p a r t i c u l a r l y i m p o r t ant i n i d e n t i f y i n g the issues t h a t are salient t o respondents, a n d i n e x p l o r i n g the nuances o f responses t o c o m p l e x or sensitive issues. H e r e again is an advantage o f gathering the data yourself, since a l l such insights, however penetrating, are lost i n h i r e d h a n d research.
174
Surveying the social world
We s h o u l d also consider the question o f g r a m m a t i c a l tense. As already n o t e d , the classic research r e p o r t is w r i t t e n i n the past tense. I n contrast, field w o r k i n sociology a n d social/cultural a n t h r o p o l o g y is t y p i c a l l y w r i t t e n i n the present tense - the so-called ' e t h n o g r a p h i c present'. Readers are t o l d t h a t the Azande believe this, t h a t the D i n k a do t h a t , a n d t h a t a m o n g the D o g o n this is the k i n s h i p system. T h e e t h n o g r a p h i c present tends t o stimulate the reader's interest, w h i l e the past tense tends t o slacken i t . I t is n o t necessarily a q u e s t i o n o f choosing one tense o r the other. Referr i n g t o the s u m m a r y o f the Travel Survey shows t h a t past a n d present tenses are used side b y side. There is an u n d e r l y i n g p a t t e r n : • T h e present tense is used t o present facts a b o u t staff a n d students, f o r example: 'Staff w h o t r a v e l t o w o r k by car take longer t h a n those using other modes o f t r a n s p o r t a n d the open-ended data suggest t h a t this is because a j o u r n e y t o w o r k o f t e n involves one or m o r e stops en r o u t e ' . • T h e past tense is used t o r e p o r t responses, f o r example: 'There was w i d e spread d e m a n d , b o t h i n num be r s a n d range o f suggestions, f o r i m p r o v e d cycle facilities'. • T h e present tense is used f o r r e c o m m e n d a t i o n s , f o r example: 'The o p t i o n of w o r k i n g f r o m h o m e was p o p u l a r a m o n g staff a n d is one w a y campus traffic m i g h t be reduced'. T h e past tense distances the researchers f r o m the respondents' o p i n i o n s : this is w h a t they t o l d us, a n d w e are just the messenger. I f w e use the present tense, i t m a y convey some sense, h o w e v e r subtly, t h a t w e endorse their p o i n t o f view. Consider the difference between: 'There was w i d e s p r e a d d e m a n d , b o t h i n num b e rs a n d range o f suggest i o n s , f o r i m p r o v e d cycle facilities' and 'There is w i d e s p r e a d d e m a n d , b o t h i n n u m b e r s a n d range o f suggest i o n s , f o r i m p r o v e d cycle facilities'. T h e first statement is agnostic a b o u t the cycle facilities a n d the d e m a n d , w h i l e the second statement m a y c a r r y a h i n t t h a t this d e m a n d is n o t g o i n g t o be satisfied unless the inadequate facilities are i m p r o v e d .
Writing up and writing I t is usually possible t o c o m b i n e the virtues o f w r i t i n g a n d w r i t i n g u p . T h e classic, w r i t t e n u p research r e p o r t does n o t have t o be b a d l y w r i t t e n . Even i f the s u m m a r y o f key findings/executive s u m m a r y a n d the f o r m a l conclusions are present i n staccato, bullet p o i n t f a s h i o n , there is l i k e l y t o be scope f o r a different a p p r o a c h i n the m a i n b o d y o f the r e p o r t . Whereas the t y p i c a l
Presenting your findings
175
article i n a n academic j o u r n a l demands t o be read f r o m b e g i n n i n g t o end, one advantage o f the research r e p o r t f o r m a t is t h a t i t makes i t easy t o read the r e p o r t i n m a n y d i f f e r e n t w a y s a n d sequences. Its a p p a r e n t l y r i g i d structure is actually very flexible. I t is n o disparagement t o sociology a n d the other social sciences t o say t h a t they are, a m o n g other t h i n g s , f o r m s o f l i t e r a t u r e . As such, they s h o u l d be a pleasure b o t h t o read a n d t o w r i t e .
Further reading Gilbert's edited collection (1993a) Researching Social Life contains a useful final chapter by Gilbert himself on writing about social research. Alasuutari (1995) Researching Culture is insightful about the writing process, even though he is sceptical about survey research. To sharpen one's style, Evans (2000) Essential English for Journalists, Editors and Writers and Cutts (1996) The Plain English Guide offer excellent advice.
Glossary
Accuracy: In the context of sampling, how closely an estimate from a sample coincides w i t h the population parameter it seeks to predict. Alphanumeric (field): A variable in a computer database that contains a combination of number, letter and punctuation characters. Analysis of variance ( A N O V A ) : A form of statistical analysis often used where there is a metric dependent variable and categorical independent variables. Anonymity: Respondents have anonymity if it is not possible for anyone, not even the researchers, to trace their responses back to them; see also confidentiality. Bar chart: The graphical representation of a nominal or ordinal variable i n which each bar represents a category: the length of each bar is proportional to the observed frequency in that category. Bivariate: Any statistical device or measure that deals w i t h t w o variables at a time. Case: The ultimate unit in relation to which information in a survey is collected (for example, an individual or a household). Case study: A research strategy involving the close examination of one social setting, in contrast to the survey and the experiment. Categorical (discrete) variable: One in which the categories are logically separate and form no natural order (for example, the categories of a religious faith variable might be Hindu, Muslim, Greek Orthodox, etc): see also continuous variable. Census: A survey, often of human individuals or households, in which the objective is complete enumeration of the target population. Central tendency: Statistical measures (such as the mode or arithmetic mean) that summarize a frequency distribution in terms of its typical value.
Glossary
177
Chi square ( x ) : A test of statistical significance often employed w i t h two nominal variables to demonstrate that they are not independent. Closed questions: Questions i n which respondents are given a set of alternative responses from which to choose; also called closed-ended or fixed-alternative questions. Cluster sample: A multi-stage sample design in which the target population is divided up into a large number of areas containing clusters of geographically adjacent cases: the final stage selects a number of these clusters at random either collecting data from every case or a large proportion of the cases within the chosen clusters. Codebook: A list, now usually computerized, of the variables that w i l l be derived from a questionnaire or interview schedule w i t h information including the codes assigned to different responses and their labels. Coding: The selection of an appropriate code for a response to a question in a questionnaire or interview schedule and its entry into a computer data file (sometimes used to refer to all aspects of processing responses). Complete enumeration: A survey which does not employ any selection procedure but which attempts to collect data from every case i n the target population. Computer-assisted personal interview (CAPI): A laptop computer and specialized software is used to generate on screen the questions and prompts for the interviewer and to record the responses by the interviewee. Computer-assisted telephone interview (CATI): Similar to CAPI but using a desktop computer to conduct telephone interviews. Confidence interval: A n estimate of any population characteristic from sample data in the form of a band of values that reflects the researcher's acceptance of a fixed likelihood of error; thus, i n a 95 per cent confidence interval for a population mean, there w i l l be a 5 per cent chance that the actual population mean falls outside the band. Confidentiality: Responses are confidential if the researchers guarantee that no one outside the research team w i l l be able to trace any response back to the respondent who gave it; see also anonymity. Contingency table: See cross-tabulation. Continuous variable: A metric variable, such as weight or time, made up of many ordered categories, i n which the unit of measurement is in principle infinitely divisible: see also categorical variable. Control group: I n an experimental design, a group which, unlike the experimental group, is not exposed to the independent variables under investigation; thus, in a clinical trial of a drug, control groups receive a placebo rather than the drug itself. Control variable: A variable introduced to examine its impact on an existing association between an independent variable and a dependent variable. Convenience sampling: A sample acquired w i t h a minimum expenditure of resources, usually as part of a pilot survey. Correlation: A n index of the extent to which the values of two variables vary together w i t h positive or negative values indicating the direction of the variation. Covering letter: A letter that accompanies a questionnaire, setting out the nature and objectives of the research and designed to motivate the recipient to respond; also called a cover letter. 2
178
Surveying the social world
Covert research: Research involving deception and not based on the informed consent of its subjects. Cross-tabulation: A joint frequency distribution of two or more variables i n a row and column format (also called a contingency table). Data file: The computer entity i n which all the data collected i n a survey are stored. Data input: The entry of the appropriate code for a response from the codebook into a data file. Dedicated application: Computer software designed to carry out a specialized function in connection w i t h surveys (such as data analysis); see also one-stop application. Dependent variable: A variable whose values are influenced by one or more independent variables. Dichotomous: A variable which can have just two possible values, such as yes/no, true/false, 0/1. Dispersion: Statistical measures (such as the range and standard deviation) that summarize a frequency distribution in terms of its variety or spread. Disproportionate stratified sampling (DSS): A stratified random sample design in which a different sampling fraction is used for different strata. Elaboration: A statistical procedure that examines the relationship between (usually) an independent, a dependent and a control variable by comparing the relationships between the first t w o in a series of partial sub-tables, one for each category of the control variable. Empirical: Based on, or concerned w i t h , observation and measurement. Epidemiological (study, research): A type of research design, based on survey findings or the analysis of official statistics, that seeks to explain the distribution of death, illness, crime or deviance i n a human population: the smoking and lung cancer research discussed in Box 1.4 is an example of epidemiology. Epsem sample: A special sampling arrangement in which every case and every group of cases i n the population stand an equal chance of inclusion in the sample. Experiment: A research strategy i n which the researcher aims to test hypotheses through a research design in which independent variables can be controlled and manipulated i n order to measure their impact on a dependent variable or variables. Experimental group: I n an experimental design, a group which is exposed to the independent variables under investigation; thus, i n a clinical trial of a drug, experimental groups receive the drug itself, not a placebo. Frequency distribution: A table or chart summarizing the number of cases observed in each category of a variable. General linear model ( G L M ) : A statistical model on which several techniques of multivariate analysis are based. Histogram: The graphical depiction of a frequency distribution i n which bars represent the categories of a continuous variable w i t h an interval or ratio level of measurement: the area of a bar is made proportional to the relative frequency of that category i n the distribution. Independent variable: A variable whose values have an effect on one or more dependent variables. Interval level of measurement: See levels of measurement. Interview guide: A list reminding the interviewer of the topics to be covered i n an unstructured interview.
Glossary
179
Interview schedule: A list of the questions to be asked by the interviewer, including instructions about procedures such as use of prompts and handling of queries and non-response. Levels of measurement: The nature of the logical relations that exist between the categories employed i n a variable, which dictates the statistical operations that can be performed on case values; progressively more sophisticated measurement can take place at each ascending level: in the lowest, nominal, level, the categories have no intrinsic ordering: i n the ordinal level, they form a ranking: at the interval level, a unit of measurement enables the relative distance between cases in different categories to be established, while the ratio level identifies the absolute position of cases on the dimension concerned. Likert scale: A question format, named after the American social psychologist, Rensis Likert, i n which respondents indicate their agreement or disagreement w i t h statements by choosing one of a set of categories ranging from, for example, strongly agree to strongly disagree. Line (graph chart): A graphical representation of trends in the values of a variable or variables, plotting a time sequence from left to right on the x-axis of a graph. Longitudinal survey: One which extends over a period of time, collecting the same data for different time periods; see also panel study. Measures of association: Statistics designed to provide an indication of the strength of the relationship between t w o variables. Multiple regression: A technique of multivariate analysis. Multi-stage sampling: Designs in which the selection process is repeated several times w i t h different sampling units, the cases selected at an earlier stage making up the sampling frame at the next stage. Multivariate analysis: Statistical procedures concerned to examine the mutual influences that exist within a set of variables. Nominal level of measurement: See levels of measurement. Non-parametric statistics: Tests and measures which rest on relatively few assumptions about the population from which a sample is drawn. Non-selection errors: Errors that affect surveys that have their origin outside the selection procedures (for example, mistakes by interviewers). N u l l hypothesis: A statement that no difference exists between t w o samples or distributions that an investigator normally seeks to refute by conducting statistical tests on collected data. One-stop (integrated) application: A software package that includes questionnaire design, data entry, data analysis and data presentation facilities. Open-ended questions: Questions i n which respondents are not offered alternative responses from which to choose, but are invited to give their answers i n their own words; also called open questions. Operationalization: The process of constructing empirical observations or measures that correspond to theoretical concepts. Optical mark reader ( O M R ) : A n electronic method of processing respondents' answers to closed questions by scanning a pre-printed form for ticks or marks in designated areas. Ordinal level of measurement: See levels of measurement. Overt research: Research based on the informed consent of its subjects. Panel study: Research i n which similar data is repeatedly collected from the same respondents; see also longitudinal survey.
180
Surveying the social world
Partial: One of the sub-tables representing a value of a control variable created in the elaboration procedure. Participant observation: A research technique, either overt or covert, i n which the researcher observes a social collectivity of which she or he is a member, where membership is often only for the purposes of the research. Pie chart: The graphical representation of a variable in which each segment of a circle represents a category; the area of each segment is proportional to the observed frequency and is usually labelled as a percentage; also called a pie diagram. Piloting: Inclusive term for pretests and pilot surveys. Pilot survey (project): A small-scale rehearsal of the survey proper, designed to test key features such as access to respondents, question wording, questionnaire layout, and arrangements for distribution and return of questionnaires. Placebo: A harmless preparation that has no medical value or pharmacological effects, as given to the members of a control group in a clinical trial. Population parameter: The value of a characteristic of the target population which the researcher may try to estimate using data derived from a sample. Postal/mail questionnaire: A self-completion questionnaire sent to respondents by post or internal mail. Post-coded: Variables, typically derived from open-ended questions, which are assigned categories and codes only after responses have been collected. Precision: The average size of the difference between a population parameter and the estimates of it derived from all the possible samples of a given size and design selected from the target population. Pre-coded: Variables, typically derived from closed questions, which are assigned codes prior to obtaining responses. Pretest: A dummy run of part of the research instrument as an element in the piloting. Primary sampling unit (PSU): The type of case that makes up the sampling frame in the first stage of a multi-stage sampling design. Probability proportional to size (PPS): A technique in multi-stage sampling which seeks to control sample size by arranging for the chances of selection of sampling units to be proportional to their size. Probability sampling: A method of drawing a sample, based on the mathematical theory of probability, that permits inferences to be made from it to the target population: a synonym for random sampling. Probe: Any technique used by an interviewer to elicit a fuller response from an interviewee. Prompt: A reminder to the respondent about the possible categories of response. Proportional reduction i n error measures (PRE): Measures based on the difference between attempting to predict the value of a dependent variable in complete ignorance and predicting it on the basis of the value of an observed independent variable. Proportionate stratified sampling (PSS): A stratified random sample design in which the same (uniform) sampling fraction is used for all the strata. Questionnaire: A research instrument consisting of a set of questions on a form which respondents fill in themselves; sometimes, though not i n this book, used to include interview schedules, in which case the term self-completion questionnaire is normally used to distinguish it from the interview schedule. Quota sampling: A technique in which interviewers are provided w i t h a list of
Glossary
181
interlocking interviewee attributes, such as age and sex, which they can satisfy by selecting appropriate respondents. Random assignment: In an experimental design, the random allocation of subjects to a control group or an experimental group. Random sampling: See probability sampling. Ratio level of measurement: See levels of measurement. Reactivity of research instruments: The tendency for a measure to produce different results solely because it is being used by different researchers or in a different context; see also reliability. Regression: A n important family of multivariate techniques based on constructing equations w i t h the dependent variable on one side and independent variables, coefficients of association and an error term on the other. Reliability: The extent to which a measuring instrument, such as a test or indicator, consistently produces the same results when used in the same conditions; see also validity. Residual: A term in a regression equation that represents the variation in the dependent variable that remains unaccounted for. Response bias: A predisposition amongst respondents to answer questions in a particular fashion irrespective of their content (for example, an acquiescence bias is a general tendency to agree w i t h statements and answer yes to items): see also social desirability. Response processing: The operations involved in converting respondents' answers into a digital format that can be handled by a computer survey analysis package. Response rate: The proportion of respondents w h o produce a usable set of responses out of the total number in the sample. Salience: The importance of an issue to a respondent. Sample: The sub-set of cases selected from the target population from which an attempt is made to collect data. Sampling distribution: Theoretical distributions which correspond to the results from random samples repeatedly drawn from populations: the normal distribution and student's t are t w o widely used sampling distributions. Sampling error: Different random samples of the same size and design w i l l contain different cases and thus generate different estimates of population parameters: sampling error is the average spread between such estimates, and is measured by the standard error family of statistics. Sampling frame: A listing of every case i n the target population. Sampling interval (sampling fraction): The interval between the cases selected for the sample on a sampling frame (calculated by dividing the size of the population by the size of the desired sample). Sampling unit: A n element i n terms of which a target population is organized for multi-stage sampling: the general public in the U K could be thought of as organized into administrative regions, parliamentary constituencies, boroughs, wards, street addresses, and households: the first element i n a multi-stage design is the primary sampling unit. Scientific sampling: A n alternative term for probability sampling. Selection errors: Errors in surveys that have their origin in some aspect of the sample selection procedures (for example, sampling error).
182
Surveying the social world
Self-completion questionnaire: A form filled i n by the respondent i n contrast to an interview schedule completed by an interviewer. Semantic differential: A question format i n which respondents are asked to rate items on a bipolar scale, where the poles (extremes) of the scale are described by a pair of adjectives such as warm/cold, strong/weak and so on. Semi-structured interview: A n interviewer employs a list of topics, possibly in a prescribed order, but not a complete script. Show card: A list of response categories shown to a respondent; a form of prompt. Simple random sampling (SRS): A basic sampling design based on a single stage of selection from a sampling frame representing the whole of the target population. Snowball sampling: A procedure i n which a few early cases located by researchers are used as the source of all the further cases i n the sample. Social desirability: A reactivity problem, specifically a response bias produced by the tendency of respondents to give socially approved answers, or to engage i n socially approved behaviour when observed by a researcher. Standard error: A statistical measure of the level of precision of the values generated in the estimation process: each kind of descriptive and summary statistic has its own measure of precision, for example, standard error of the mean, standard error of the percentage, etc. Standardized variables (Z scores): The transformation of the observed values of a variable so that i t is numerically comparable to other variables: the resulting values have a mean of zero and a standard deviation of 1 . Statistical inference: The attempt to predict population values based on sample data. Statistical Package for the Social Sciences (SPSS): A widely used, dedicated computer application for the analysis of survey data. Statistical significance: A measure of how likely it is that an observed difference is due entirely to sampling variations. Stem and leaf plot: A graphical way of displaying a frequency distribution. Stratified sampling: A refinement to simple random sampling in which the cases i n the target population are divided into separate strata or groupings on the basis of a characteristic relevant to the research, w i t h subsequent selection taking place from each stratum separately. Structured interview: A n interview i n which the precise wording and sequence of questions are predetermined by the researcher. Survey: A research strategy i n which the same information about all the cases i n a sample is systematically collected in a standardized form. Survey fatigue: A general tendency for response rates to decline as respondents grow weary of repeated social surveys. Symmetry: Statistical measures (such as skew and kurtosis) that summarize a frequency distribution in terms of its overall shape. Systematic selection: A procedure for selecting cases for the sample at a chosen period or sampling interval within the sampling frame, for example, choosing every tenth case. Taking the role of the other: A conscious attempt by a researcher or other actor to recognize the situation and outlook of another individual. Target population: The pool of cases relevant to a research topic from which a sample is selected.
Glossary
183
Theoretical: Based on abstraction and idealization and concerned w i t h explanation and prediction. Theoretical population: The infinite set of populations to which any general theory relates. Theoretical sampling: A component of grounded theory strategies in which cases are selected for their theoretically significant attributes rather than for their typicality. Time-series: Data derived from a longitudinal survey or panel study. Triangulation: Use of a variety of research strategies, or of data from a variety of sources, to test an hypothesis. Unstructured interview: A n interview in which neither the precise wording nor the sequence of questions are predetermined by the researcher. Validity: Whether a measuring instrument, such as a test or indicator, succeeds in measuring what it was designed for; see also reliability. Variable: A characteristic that is fixed at any one point of time for any case but varies between cases. Vignette: A constructed story presented to respondents in order to elicit an account of what they think should be done or what they would do themselves in such a situation. Z scores: See standardized variables.
Appendix I: The Travel Survey questionnaires
185
Appendix I : The Travel Survey questionnaires
Staff Campus Travel Survey 1998 This questionnaire has been sent to a sample selected at random from all main campus staff. It is entirely anonymous. Further details of the objectives and the prize draw for respondents are supplied on the covering letter. Please reply using the internal post to the Survey Unit» C51 Portland Building, by M a y 12th,
6 How often do you travel off the campus for work-related purposes?
1 What is the approximate distance from your home to your workplace? Please
J
one
Please
5 days a week or more
•
i
3 or 4 days a week
•
2
•
2
Once or twice a week
•
3
3 or 4 miles
•
3
Less than once a week
•
4
5 or 6 miles
•
4
Never
•
7 to 9 miles
• s
10 to 19 miles
• e
20 or more miles
•
i
one
box
Bicycle
•
2
Rail
•
3
Bus
•
4
Car as driver
•
s
•
i
Car as passenger
•
a
2
Motorbike as driver
•7
Motorbike as passenger
•
a
15-29 minutes
• a
45-59 minutes
•
60-89 minutes
• s
90 minutes or more
• a
8 How do you think facilities for pedestrians could be improved?
3 Which mode of transport do you use ^ most often for the longest stage of your journey to work? Di
Bicycle
Dz
Rati Bus
•
box
Q u e s t i o n 8 is f o r t h o s e w h o W A L K part or all of their journey to work once a week or more
4
Walk
Please go to Question 8
s-
Less than 15 minutes
30-44 minutes
box
7 What mode of travel do you use most frequently for the journeys covered by question 6? Walk • 1 Please Jone
2 Approximately how long does it take you on average to travel to work?
•
one
1
•
J
J
box
Less than one mile 1 or 2 miles
Please
A«>y>J
3
Da
Gar a s driver
•
Car a s passenger
D 6
Q u e s t i o n 9 is f o r t h o s e w h o CYCLE f o r part or all o f their j o u r n e y t o w o r k o n c e a w e e k o r more
5
Motorbike as driver
•
7
Motorbike as passenger
•
a
9
How do you think facilities for cyclists could be improved?
4 Do you use once a week, or more frequently, any alternative modes of transport for part or all of your journey to work? Pleuse J all thai apply No regular alternatives
•
Walk
•
2
Bicycle
•
3
l
Rail
•
4
Bus
•
s
Car as driver
•
e
Car as passenger
• r
campus?
Motorbike as driver
•
a
Motorbike as passenger
•
a
1 1 please indicate your daily return fare in the boxes. If you have a season ticket please apportion the total cost on a daily basis. , ,
Q u e s t i o n s 10, 11 & 12 are f o r t h o s e w h o use PUBLIC TRANSPORT f o r part or all of their j o u r n e y to w o r k o n c e a w e e k or m o r e 10 Do you purchase season tickets for bus/rail travel to the
On occasions when you travel to the campus by where do you park? Please J
one
Not applicable
•
1
In the Science City area
•
2
In the central area (including Highfields House, West Drive and Education)
•
3
On the periphery (Including Halls, History and the Sports Centre)
•
4
YesEh
U*
No
£
£
P
P
box
12 How do you think facilities for public transport users could be improved?
186
Surveying the social world
Q u e s t i o n s 13 & 14 are f o r t h o s e w h o use a CAR for their j o u r n e y f r o m h o m e to w o r k o n c e a w e e k or m o r e . If y o u do not use a car, please go t o Q u e s t i o n 15.
13 How important are the following considerations in your decision to use a car to travel to work?
15
An important consideration
Of some importance
Not important
Speed of car journeys
• i
• 2
• 3
Flexibility and convenience
•
• 2
• 3
l
This final s e c t i o n seeks b a c k g r o u n d i n f o r m a t i o n necessary for us t o interpret y o u r o t h e r r e s p o n s e s . May we reassure y o u that under n o c i r c u m s t a n c e s w i l l attempts be made t o identify individuals.
High cost of public transport
At what time do you usually arrive at work?
Before 06.00 [ J i 06.00-07.59 [ l a 08.00-08.29 D s 08.30-08.59 E h 09.00 - 09.29 Ö 9
09.30-12.29 12.30-17:30
16
Do you work shif
rota system?
J J
Do you have a part-time contract?
•
l
• 2
• 3
No direct bus service to campus
•
l
• 2
• 3
18
How many days per week do you visit the campus?
2
•
4
After 17.30 • a No regular • a time of arrival
Yes
Safety and security
•
[11
Yes n t
Lack of cycle facilities
No
[J2
NO
n
19 To which one of the following staff categories do you belong?
If there are any other important considerations for you not mentioned above, please describe them here. Secretarial, Clerical &
•
1
Academic related
•
2
•
3
Technical
•
4
Junior Administrative Other (please specify)
Manual & Ancillary
20
n
' what part of the University system do you work?
Arts
•
01
Registrar's
•
02
Law & Social Sciences
•
03
Bursar's
•
04
•
06
•
w
Education
•
05
Libraries
Science
•
07
Computing Centre
Engineering
•
(»
Other (please specify)
Medicine
•
10
21
What is the postcode of your home address?
22 What is the total number of cars and motorbikes that are available to your household? None D 1
14 How likely would the following developments be to alter the way you travel to work? Very likeiy Subsidised travel cards for public transport More direct bus routes to campus
•
1
Possibly •
2 2
One •
2
More than one Q 3
23 How attractive for you personally is the prospect of working from home for at least part of the week?
Unlikely •
3
•
1
•
•
3
Campus car park charging
•
1
•
2
n
Improved shower facilities in your buSding
•
1
•
2
• 3
Improved cycle security
•
1
•
2
n
3
None of my job could be done from home
• 1
Don't know /1 would need to think about it
Q 2
An unattractive prospect
[J 3
A fairly attractive prospect
f j4
A very attractive prospect
Q 5
I already work from home
Q 6
T h a n k \ o u f o r l a k i i i " (he t i m e lo c o m p l e t e this questionnaire.
P l e a s e r e t u r n it in the I N T E R N A L
3
m a i l u s i i i " the e n v e l o p e p r o v i d e d .
2
187
Appendix I ; The Travel Survey questionnaires
Student Campus Travel Survey 1998 This questionnaire has been sent to a sample selected at random from all main campus students. It is entirely anonymous. Further details of the objectives and the prize draw for respondents are supplied on the covering letter. Please reply using the internal post to the Survey Unit, C51 Portland Building, by May 12th. 6 How often do you travel off the campus and return later on the same day?
\ What is the approximate distance from your Nottingham address to where your teaching and/or research normally takes place? PI'ease J one Less than one mile 1 or 2 miles
• •
2
3 or 4 miles
•
3
5 or 6 miles
•
4
7 to 9 miles
•
5
10 to 19 miles
• e
box
Please
i
20 or more miles Approximately how long does it take you on average to
J o n e
•
3 or 4 days a week
• a
Once or twice a week
•
3
Less than once a week
•
4
Never
• a
box
Bicycle
•
2
Rail
•
3
Bus
•
4 s
Less than 15 minutes
•
1
Car as driver
•
15-29 minutes
•
2
Car as passenger
• a
Motorbike as driver
• ?
30-44 minutes
•
3
45-59 minutes
•
4
60-89 minutes
•
s
90 minutes or more 3 Which mode of transport do you use most often for the longest stage of your journey to the campus?
Motorbike as passenger
J
Walk
•
1
Bicycle
•
2 3
one
box
Please go to Question 8 box
••
Q u e s t i o n 8 is f o r t h o s e w h o W A L K part or all of their j o u r n e y to the c a m p u s o n c e a week or more
• a Please
J
1
7 What mode of travel do you use most frequently for the journeys covered by question 6? Please J o n e Walk • 1
teaching or research normally takes place? Please
5 days a week or more
one
box
§ How do you think facilities for pedestrians could be improved?
Rail
•
Bus
•
Car as driver
•
s
Car as passenger
•
6
Ul
Q u e s t i o n 9 is f o r t h o s e w h o CYCLE for part or all of their journey to the c a m p u s once a week or more
Motorbike as driver Motorbike as passenger
•
a
9
4
How do you think facilities for cyclists could be improved?
4 Do you use once a week, or more frequently, any alternative modes of transport for part or all of your journey to the campus? Please
J a i l that
No regular alternatives Walk
•
2
Bicycle
D 3
Rail
•
Bus
apply
D1
Q u e s t i o n s 1 0 , 1 1 & 12 are f o r t h o s e w h o use PUBLIC TRANSPORT for part or all of their j o u r n e y to the c a m p u s o n c e a week or more
4
D s
Car as driver
Q 6
Car as passenger
D 7
Motorbike as driver
•
a
Motorbike as passenger
•
9
5 On occasions when you travel to the campus by car, where do you park? Please J one Not applicable
•
1
In the Science City area
•
2
In the central area (including Highfields House, West Drive and Education)
•
3
On the periphery (Including Halls, History and the Sports Centre)
•
4
• a
10Do you purchase season tickets for bus/rail travel to the campus?
Yes Dt
No
D
2
1 1 Please indicate your daily return fare in the boxes. If you have a season ticket please apportion the total cost on a daily basis. £ £ p p box
12How do you think facilities for public transport users could be improved?
188
Surveying the social world
Questions 13 & 14 are f o r t h o s e w h o use a CAR f o r their j o u r n e y f r o m their term time address t o the c a m p u s oncej a week or m o r e . If y o u do not use a car, please g o t o | Question 15. 13 How important are the following considerations in your decision to use a car to travel to the campus? An important consideration
Of some importance
Speed of car journeys
• i
• 2
Flexibility and convenience
•
l
T h i s final s e c t i o n seeks b a c k g r o u n d i n f o r m a t i o n necessary for us to interpret y o u r other r e s p o n s e s . May we reassure y o u that under no c i r c u m s t a n c e s w i l l attempts be made t o identify i n d i v i d u a l s . •15
Please
• a
•
•
2
3
Safety and security
•
l
•
2
•
3
No direct bus service to campus
• i
•
2
•
3
09.30- 12.29 12.30-17:30 After 17.30
Before 06.00 fji 06.00-07.59 Ü 3 08.00-08.29 Ds 08.30-08.59 • ? 09.00 - 09.29 Ds 16
High cost of public transport
At what time do you usually arrive on campus? J
one
box
Not important •
2
•
4
• e No regular • a time of arrival
How many days per week do you visit the campus?
Which of the following are you?
Lack of cycle facilities
Full-time undergraduate
•
1
Full-time postgraduate
•
2
Part-time undergraduate
•
3
Part-time postgraduate
•
4
Which Faculty are you in? If there are any other important considerations for you not mentioned above, please describe them here.
19
Arts
•
1
Law & Social Science
•
2
Education
•
3
Science
•
4
Engineering
•
s
Medicine
•
e
Please^/
one
box
What is the postcode of your Nottingham address?
Q What is the total number of cars and motorbikes at are available for your use as a driver or passenger within your household? None •
1 One •
2
More than one •
3
2 1 The following space is available for any other comments you would like to make about travel to, across and from the campus.
14 How likely would the following developments be to alter the way you travel to the campus? Very likely
Possibly Unlikely
Subsidised travel cards for public transport
•
1
•
2
•
More direct bus routes to campus
•
1
•
2
•
3
Campus car park charging
•
1
•
2
•
3
Improved shower facilities in your building
•
1
•
2
•
3
Improved cycle security
•
1
•
2
•
3
3
T h a n k y o u for t a k i n g the t i m e to c o m p l e t e this questionnaire.
P l e a s e r e t u r n it in the I N T E R N A L
m a i l u s i n j i the e n v e l o p e p r o v i d e d .
Appendix 2: Websites of professional associations
Many professional associations of academic researchers and teachers publish codes of professional conduct, including ethical guidance on the conduct of research. These guidelines are accessible through the w o r l d wide web. Here is a selection of some that are particularly relevant to survey research. British Sociological Association Home page: www.britsocorg.uk/ American Sociological Association Home page: www.asanet.org/ British Psychological Society Home page: www.bps.org.uk/ The Australian Sociological Association Home page: www.newcastle.edu.au/department/so/tasa/ Sociological Association of Aotearoa (New Zealand) Home page: saanz.science.org.nz/ Canadian Sociology and Anthropology Association Home page: www.arts.ubc.ca/csaa/
References
Abercrombie, N . , Baker, J., Brett, S. and Foster, J. (1970) Superstition and religion: the God of the gaps, in D . M a r t i n and M . H i l l (eds) A Sociological Yearbook of Religion in Britain, Volume 3. London: SCM. Alasuutari, P. (1995) Researching Culture: Qualitative Method and Cultural Studies. London: Sage. Aldridge, A . E. (2000) Religion in the Contemporary World: A Sociological Introduction. Cambridge: Polity. Babbie, E. R. (2001) The Practice of Social Research, 9th edn. Belmont, CA: Wadsworth. Barker, E. (1984) The Making of a Moonie: Choice or Brainwashing? Oxford: Blackwell. Barnett, V. (1991) Sample Survey Principles and Methods. London: Edward Arnold. Beck, A . T., Brown, G., Epstein, N . and Steer, R. A. (1988) A n inventory for measuring clinical anxiety: psychometric properties, journal of Consulting and Clinical Psychology, 56: 893-7. Bennett, N . w i t h Jordan, J., Long, G. and Wade, B. (1976) Teaching Styles and Pupil Progress. London: Open Books Publishing. Blumer, H . (1956) Sociological analysis and the 'variable', American Sociological Review, 2 1 : 683-90. Booth, C. (1889-1902) Life and Labour of the People in London. London: Macmillan. Bourque, L . B. and Fielder, E. P. (1995) How to Conduct Self-administered and Mail Surveys. Thousand Oaks, CA: Sage. Bowley, A . L . and Burnett-Hurst, A . (1915) Livelihood and Poverty. London: Ratan Tata Foundation, University of London.
References
191
Bryman, A . and Cramer, D . (1990) Quantitative Data Analysis for Social Scientists. London: Routledge. Buchanan, D . , Boddy, D . and McCalman, J. (1988) Getting i n , getting on, getting out, and getting back, in A . Bryman (ed.) Doing Research in Organizations. London: Routledge. Chalmers, A . F. (1999) What is This Thing called Science?, 3rd edn. Buckingham: Open University Press. Cohen, A . K. (1955) Delinquent Boys: The Culture of the Gang. N e w York: Free Press. Coleman, C. and Moynihan, J. (1996) Understanding Crime Data. Buckingham: Open University Press. Cutts, M . (1996) The Plain English Guide. Oxford: Oxford University Press. Czaja, R. and Blair, j . (1995) Designing Surveys: A Guide to Decisions and Procedures. Thousand Oaks, CA: Sage. Davie, D . (1988) To Scorch or Freeze: Poems About the Sacred. Manchester: Carcanet. de Vaus, D . A. (1991) Surveys in Social Research, 3rd edn. London: U C L Press. Devine, F. and Heath, S. (1999) Sociological Research Methods in Context. London: Macmillan. Doll, R. and H i l l , A . B. (1952) A study of the aetiology of carcinoma of the lung, British Medical Journal, 13 December: 1271-85. D o l l , R. and Peto, R. (1976) Mortality in relation to smoking: 20 years' observations of male British doctors, British Medical Journal, 25 December: 1525-36. ESRC (Economic and Social Research Council) (1995) Writing for Business: How to Write Reports that Capture the Attention of Businesses. Swindon: ESRC. Evans, H . (2000) Essential English for Journalists, Editors and Writers. London: Pimlico. Everitt, B. S. and Dunn, G. (1991) Applied Multivariate Analysis. London: Edward Arnold. Fielding, J. and Gilbert, N . (1999) Understanding Social Statistics. London: Sage. Finch, J. and Mason, J. (1993) Negotiating Family Responsibilities. London: Routledge. Fink, A . (ed.) (1995) The Survey Handbook. Thousand Oaks, CA: Sage. Gilbert, N . (1993a) Writing about social research, in N . Gilbert (ed.) Researching Social Life. London: Sage. Gilbert, N . (1993b) Analysing Tabular Data: Loglinear and Logistic Models for Social Researchers. London: U C L Press. Glaser, B. G. and Strauss, A . L . (1967) The Discovery of Grounded Theory: Strategies for Qualitative Research. New York: Aldine. Goldthorpe, J. H . , Lockwood, D . , Bechhofer, F. and Piatt, J. (1969) The Affluent Worker in the Class Structure. Cambridge: Cambridge University Press. Hammond, P. E. (ed.) (1964) Sociologists at Work: Essays on the Craft of Social Research. N e w York: Basic Books. Healey, J. F. (1990) Statistics: A Tool for Social Research. Belmont, CA: Wadsworth. Hornsby-Smith, M . (1993) Gaining access, in N . Gilbert (ed.) Researching Social Life. London: Sage. Hughes, J. A . (1976) Sociological Analysis: Methods of Discovery. London: Nelson.
192
Surveying the social world
Hughes, J. A . and Sharrock, W. (1997) The Philosophy of Social Research, 3rd edn London: Longman. Kalton, G. (1966) An Introduction to Statistical Ideas for Social Scientists. London: Chapman and Hall. Kalton, G. (1983) Introduction to Survey Sampling. London: Sage. Kish, L. (1965) Survey Sampling. N e w York: Wiley. Lazarsfeld, P. (1958) Evidence and inference in social research, Daedalus, 87: 99-130. Levine, K. (1999) Definitional and methodological problems in the cross-national measurement of adult literacy, Written Language and Literacy, 1(1): 41-62. L i t w i n , M . S. (1995) How to Measure Survey Reliability and Validity. Thousand Oaks, CA: Sage. Loether, H . J. and McTavish, D . G. (1974) Descriptive Statistics for Sociologists. Boston: Allyn and Bacon. Marsh, C. (1982) The Survey Method: The Contribution of Surveys to Sociological Explanation. London: George Allen and Unwin. Marsh, C. (1988) Exploring Data: An Introduction to Data Analysis for Social Scientists. Cambridge: Polity. Miller, D . E. (1991) Handbook of Research Design and Social Measurement, 5th edn. London: Longman. Mills, C. Wright (1970) The Sociological Imagination. Harmondsworth: Penguin. Moser, C. A . and Kalton, G. (1971) Survey Methods in Social Investigation, 2nd edn. London: Heinemann Educational Books. Murphy, L . L., Conoley, J. C. and Impara, J. C. (eds) (1994) Tests in Print (IV): An Index to Tests, Test Reviews and the Literature on Specific Tests. Lincoln, Nebraska: Buros Institute of Mental Measurements. Norusis, M . J. (1995) SPSS 6.1: Guide to Data Analysis. Englewood Cliffs, CA: Prentice H a l l . Oakley, A . (1998) Gender, methodology and people's ways of knowing: some problems w i t h feminism and the paradigm debate is social science, Sociology, 32(4): 707-31. OECD (1995) Literacy, Economy and Society: Results of the First International Adult Literacy Survey. Paris: Organization for Economic Co-operation and Development. OPCS (1991) Standard Occupational Classification. Volume 3. London: H M S O . Oppenheim, A . N . (1992) Questionnaire Design, Interviewing and Attitude Measurement. London: Pinter. Robson, C. (1993) Real World Research. Oxford: Blackwell. Rose, D . and Sullivan, O. (1996) Introducing Data Analysis for Social Scientists, 2nd edn. Buckingham: Open University Press. Rose, D . and O'Reilly, K. (eds) (1997) Constructing Classes: Towards a New Social Classification for the UK. Swindon: ESRC/ONS. Rowntree, B. Seebohm (1902) Poverty: A Study of Town Life. London: Macmillan. Saunders, P. (1990) A Nation of Home Owners. London: Unwin Hyman. Siegel, S. and Castellan, N . J. Jr. (1988) Non-parametric Statistics for the Behavioural Sciences, 2nd edn. New York: M c G r a w - H i l l . Strauss, A . L . (1987) Qualitative Analysis for Social Scientists. Cambridge: Cambridge University Press.
References
193
Strauss, A . L . and Corbin, J. (1993) Basics of Qualitative Research: Grounded Theory Procédures and Techniques. Newbury Park: Sage. Titmuss, R. M . (1970) The Gift Relationship: From Human Blood to Social Policy. London: Allen &c Unwin. Webster, F. (1995) Theories of the Information Society. London: Routledge. Wright, E. O . (1985) Classes. London: Verso. Zigmond, A . and Snaith, R. (1983) The Hospital Anxiety and Depression Scale, Acta Psychiatrica Scandinavica, 67: 361-70.
Index
anonymity, 2 1 , 22-3, 50, 54, 56-9, 88, 104, 111-12, 171, 176, 177 averages, see central tendency, measures of bar charts, 140, 168-70, 176 British Household Panel Survey, 31
correlation, 12-13, 152-5, 158-9, 177 costs, 45-9, 50-5, 57-8, 90, 126 covering letters, 4, 19-20, 45, 48, 49, 86-90, 104, 178 crime statistics, 18-19, 64 cross-tabulation, 144-7, 159, 178 diaries, 3, 57-8, 59, 173
causal relationships, 7, 9-10, 12, 14, 27-8, 30, 153, 159 census, 15, 23, 26, 36, 62, 176 central tendency, measures of, 142-3, 176 chi square test, 148-50, 177 closed questions, 95, 103, 177 see also coding; open-ended questions codebooks, 124-6, 128-9, 132-4, 138, 177 coding, 44, 78, 101, 121-2, 125, 127, 132-4 confidence intervals, 75-6, 144, 177 confidentiality, 177 see also anonymity
email surveys, 56-7, 125-6 factor analysis, 159 Family Expenditure Survey (FES), 73-4 focus groups, 46, 59, 86 General Household Survey (GHS), 15 histograms, 1 4 0 - 1 , 178 inferential statistics, 75-6, 143-4, 147, 160, 182 International Adult Literacy Survey (IALS), 11 interview schedules, design of, 118-21
Index interviewer bias, 52, 53 interviewer effects, 52, 54, 55 Labour Force Survey, 15 levels of measurement, 125, 126, 129-31, 154, 179 Likert scales, 96, 112, 131, 152, 179 line graphs, 169-70, 179 log-linear analysis, 159 longitudinal studies, 3 1 , 179 missing data, 132-4, 140 multiple regression, 155-60, 179 see also regression non-parametric statistics, 148-9, 160, 179 null hypothesis, 147-9, 155, 179 Office of National Statistics (ONS), 38-9, 40, 73-4 Office of Population Census and Surveys (OPCS), 38-9, 73-4 open-ended questions analysis of, 49, 126, 128-9, 137-8 coding of, 122, 132 definition of, 179 role of, 29-30, 85, 95, 101-3 panel studies, 3 1 , 180 path analysis, 159 pie charts, 167-8, 170, 180 piloting, 3-4, 26-7, 90-2, 173, 180 PRE (proportional reduction i n error) measures, 150-2, 180 pretests, 90, 173, 180 see also piloting Question Bank, 23 questionnaires advantages and disadvantages of, 11, 22-3, 29-30, 51-2, 56-8, 86 definition of, 6, 181 distribution and return of, 92-3 layout of, 109, 114-18 questions ambiguity i n , 98-100, 105, 110 double barrelled, 106 negative and double-negative, 106-7
195
overlapping categories, 104-5 ranking, 54, 9 1 , 95-6, 107-8 rating, 54, 108 quota sampling, see sampling, types of, quota sampling random number tables, 66-7, 75 random sampling, see sampling, types of, probability sampling rapport, 49-50, 54, 92, 103 Registrar General's classification of social class, 38-9 regression, 155-60, 181 reliability, 14, 38, 39-40, 181 Religion and Politics Survey, 99 report writing, 102, 135-7, 161-75 research ethics, 22-3, 36-7, 171 response bias, 52, 55, 58, 181 see also social desirability response rates, 27, 5 1 , 52, 86-8,101,107 salience, 13-14, 53, 95-6, 101, 107, 181 sampling, types of convenience sampling, 79-80 non-probability sampling, 79-82 probability sampling, 27, 64-79, 143-4, 149, 154, 180 purposive sampling, 8 0 - 1 quota sampling, 81-2, 181 snowball sampling, 79-80 sampling bias, 57 sampling error, 64, 70, 76-9, 81-2, 143-4, 147-8, 181 scattergrams, 154 semantic differential, 112-13, 182 significance tests, see statistical significance social class, measurement of, 38-9, 40 social desirability, 3, 13-14, 54, 57, 9 1 , 1 0 3 - 4 , 1 0 9 - 1 1 , 182 statistical significance, 34, 147-9, 152, 155, 177, 182 survey, definition of, 5 tape recording, 46-7, 121, 173 target population, 3 0 - 1 , 32, 35, 43, 46, 5 2 , 6 1 - 8 3 , 9 1 , 127, 143-4, 183
196
Surveying the social world
telephone interviews, 5, 5 1 , 55-6, 126-7 theory, role of, 15, 30, 3 2 - 4 1 , 63-4, 8 0 - 1 , 137, 159, 163 transcription, 45, 46, 121 triangulation, 14, 59-60, 183 validity, 14, 38, 39-40, 78, 85, 183 variables confounding, 32, 35, 41 definition of, 5, 183
dependent, 7,10, 32, 4 1 , 69-70, 178 derived, 37-8 independent, 7, 10, 32, 4 1 , 179 interval, 130-1 nominal, 130 ordinal, 130 ratio, 131 standardized, 143, 182 stratifying, 68-9, 72 see also levels of measurement vignettes, 1 2 0 - 1 , 183
Surveying the Social World Principles and practice in survey research • what are the strengths and limitations of social surveys? • How can the principles of surveying be put into practice? • How are findings analysed and results presented? The survey has become a widely used technique for gathering information and opinions from individuals, organizations, and other groups. In Surveying the Social World, Aldridge and Levine begin by examining the contemporary state of surveys within society and social science methodology, explaining the potential of the survey method and the ways it can be used effectively when resources are limited. They then take the reader systematically through the process of conducting survey researchcovering in turn: the role of theory; the planning and design of projects; p i b t work; access to informants; ethical issues,sampling methods,- the preparation of questionnaires,- interviewing; the use of computer packages,- processing responses,- statistical methods of data analysis,and the presentation of findings. Unlike some rival texts that stress complications and difficulties of conducting social surveys, this book adopts a consciously 'can-do' approach, emphasizing strategies and practical tips. Written in a direct style with a clear structure, each chapter begins with a list of key elements and concludes with summary points, points for reflection and suggestions for further reading. As well as examples of techniques and good practice from a variety of surveys, the authors use their own Travel Survey throughout the book to illustrate the decisions that need to be taken at each stage of the survey process. For the technical topics, there rs a glossary containing over 130 technical terms that are highlighted in the text. The result is an essential guide to conducting social surveys for students in the social sciences, and for others who need to carry out a community or organizational survey but who may have no previous training in social research methods or experience of survey work. Alan Aldridge is Senior Lecturer in Sociology at the University of Nottingham. He is the author of Religion in the Contemporary World (9O0Q) and Consumption (forthcoming). Ken Levine is Lecturer in Sociology and Director of the Survey Unit at the University of Nottingham. He is the author of The Social Context of Literacy (1986).
www.openup.co.uk ISBN
0-335-20240-3