A Step-by-Step Guide to Analysis and Interpretotion
Brian C. Cronk
I ll
L
:,-
Choosing the Appropriafe Sfafistical...
173 downloads
1042 Views
3MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
A Step-by-Step Guide to Analysis and Interpretotion
Brian C. Cronk
I ll
L
:,-
Choosing the Appropriafe Sfafistical lesf Ytrh.t b Yq I*l QraJoi?
Dtfbsr h ProportdE
Mo.s Tha 1 lnd€Fndont Varidl6
lhre Thsn 2 L€wls of Indop€nddtt Varisd€
lldr Tho 2 L6Eb d li(bsxlq* Vdidb
f'bre Tha 'l Indopadqrl Vdbue
'| Ind.Fddrt Vri*b
NOTE:Relevant sectionnumbers are givenin parentheses. Forinstance, '(6.9)"refersyouto Section6.9in Chapter6.
fro.! Itn I l.doFfihnt Vdi.bb
I
Notice SPSSis a registeredtrademarkof SPSS,Inc. Screenimages@ by SPSS,Inc. and MicrosoftCorporation.Usedwith permission. This book is not approvedor sponsoredby SPSS.
"PyrczakPublishing"is an imprintof FredPyrczak,Publisher, A CaliforniaCorporation. Althoughtheauthorandpublisherhavemadeeveryeffortto ensuretheaccuracyand for no responsibility completeness of informationcontainedin thisbook,we assume herein.Any slightsof people, errors,inaccuracies, omissions, or anyinconsistency places,or organizations areunintentional. ProjectDirector:MonicaLopez. ConsultingEditors:GeorgeBumrss,JoseL. Galvan,MatthewGiblin,DeborahM. Oh, JackPetit.andRichardRasor. providedby CherylAlcorn,RandallR. Bruce,KarenM. Disner, Editdrialassistance BrendaKoplin,EricaSimmons,andSharonYoung. Coverdesignby RobertKibler andLarryNichols. Printedin theUnitedStates of Americaby Malloy,Inc. All rights Publisher. Copyright@2008,2006,2004,2002,1999 by FredPyrczak, in anyform or by any or transmitted reserved. No portionof thisbookmaybe reproduced meanswithouttheprior writtenpermission of thepublisher. r s B N l -8 8 4 s8 5 -79 -5
Tableof Contents Introduction to theFifth Edition
v
What'sNew? Audience Organization SPSSVersions Availabilityof SPSS Conventions Screenshots PracticeExercises Acknowledgments'/ ChapterI Ll t.2 1.3 1.4 1.5 1.6 1.7 Chapter2
2.1 ') ') Chapter3
3.1 3.2 3.3 3.4 3.5 Chapter4 4.1 4.2 4.3 4.4 4.5 4.6 Chapter5
5.1 5.2 5.3 5.4
v v v vi vi vi vi vii vii
GettingStarted
I
StartingSPSS EnteringData DefiningVariables LoadingandSavingDataFiles RunningYour FirstAnalysis ExaminingandPrintingOutputFiles Modi$ing DataFiles
I I 2 5 6 8
EnteringandModifyingData VariablesandDataRepresentation Transformation andSelection of Data
ll t2
Descriptive Statistics
l7
Frequency Distributions andpercentileRanksfor a singlevariable Frequency Distributions andpercentileRanksfor Multille variables Measuresof CentralTendencyandMeasuresof Dispersion for a SingleGroup Measures of CentralTendency andMeasures of Dispersion for MultipleGroups StandardScores
t7 20
ll
2l 24 )7
GraphingData
29
GraphingBasics TheNew SPSSChartBuilder Bar Charts,Pie Charts,andHistograms Scatterplots AdvancedBarCharts EditingSPSSGraphs
29 29 3l 33 36 39
Predictionand Association
4l
PearsonCorrelation Coeffi cient SpearmanCorrelation Coeffi cient Simple Linear Regression Multiple Linear Regression
4l 43 45 49
u,
Chapter6
Parametric InferentialStatistics
53
Reviewof BasicHypothesis Testing Single-Sample t Test Independent-Samples I Test Paired-Samples t Test One-WayANOVA FactorialANOVA Repeated-Measures ANOVA Mixed-Design ANOVA Analysisof Covariance MultivariateAnalysisof Variance(MANOVA)
53 )) 58 6l 65 69 72 75 79 8l
Chapter7
Nonparametric InferentialStatistics
85
7.1 7.2 7.3 7.4 7.5 7.6
Chi-Square Goodness of Fit Chi-SquareTestof Independence Mann-Whitney UTest WilcoxonTest Kruskal-Wallis,F/Test FriedmanTest
85 87 .90 93 95 97
6.1 6.2 6.3 6.4 6.5 6.6 6.7 6.8 6.9 6.10
Chapter8 8.1 8.2 8.3 8.4
TestConstruction
99
Item-TotalAnalysis Cronbach's Alpha Test-RetestReliability Criterion-Related Validiw
99 100 l0l t02
AppendixA
Effect Size
103
AppendixB
PracticeExerciseDataSets
r 09
PracticeDataSet I PracticeDataSet2 PracticeDataSet3
109 ll0 ll0
AppendixC
Glossary
lt3
AppendixD
SampleDataFilesUsedin Text
tt7 n7
COINS.sav GRADES.sav HEIGHT.sav QUESTIONS.sav RACE.sav SAMPLE.sav SAT.sav OtherFiles
l l7 l l7
n7 l18 l18 lt8 lt8
AppendixE
Informationfor Usersof EarlierVersionsof SPSS
l19
AppendixF
GraphingDatawith SPSS13.0and 14.0
t2l
tv
ChapterI
GettingStarted Section1.1 StartingSPSS
ffi$ t't**** ffi
c rrnoitllttt (- lhoari{irgqrory r,Crcrt*rsrcq.,y urhgDd.b6.Wbrd (i lpanrnaridirgdataura
f- Dml*ro* fe tf*E h lholifrra
Startup proceduresfor SPSS will differ slightly, dependingon the exact configurationof the machine on which it is installed.On most computers,you can start SPSS by clicking on Start, then clicking on Programs, then on SPSS. On many installations,there will be an SPSSicon on the desktopthat you can double-click to start the program. When SPSS is started,you may be presentedwith the dialog box to the left, depending on the optionsyour systemadministratorselected for your version of the program. If you have the dialog box, click Type in data and OK, which will presenta blank data window.' If you were not presentedwith the dialog box to the left, SPSSshould open automatically with a blank data window. The data window and the output window provide the basic interface for SPSS. A blank data window is shownbelow.
Section1.2 EnteringData One of the keys to success with SPSSis knowing how it stores .:g H*n-g:fH"gxr__}rry".** r! *9*_r1_*9lt and uses your data. To illustrate the rtlxlel&l *'.1 rtl ale| lgj'Slfil Hl*lml sl el*l I basics of data entry with SPSS,we will useExample1.2.1. Example 1.2.1 A surveywas given to several students from four different classes (Tues/Thurs momings, Tues/Thurs afternoons, Mon/Wed/Fri mornings, and afternoons). Mon/Wed/Fri The students were asked ' Items that appearin the glossaryare presentedin bold. Italics areusedto indicatemenu items.
ChapterI GeningStarted
whether or not they were "morning people" and whether or not they worked. This survey also asked for their final grade in the class (100% being the highest gade possible).The responsesheetsfrom two studentsarepresentedbelow:
Response SheetI ID: Dayof class: Classtime: Are you a morningperson? Finalgradein class: Do youwork outsideschool?
4593 MWF Morning Yes
X TTh X Aftemoon X No
8s% Part{ime
Full-time
XNo Response Sheet2 ID: Dayof class: Classtime: Are you a morningperson?
Finalgradein class: Do vou work outsideschool?
l90l x MwF X Morning X Yes
83% Full-time No
_ -
TTh Afternoon No
X Part-time
Our goal is to enterthe data from the two studentsinto SPSSfor use in future analyses. The first stepis to determinethe variablesthatneedto be entered.Any informaExample tion that can vary amongparticipantsis a variablethat needsto be considered. 1.2.2liststhevariables we will use. Example1.2.2 ID Dayof class Classtime Morningperson Finalgrade Whetheror not the studentworksoutsideschool particivariablesandrowsrepresent In the SPSSdata window,columnsrepresent (variables) rows and two pants.Therefore,we will be creatinga datafile with six columns (students/participants). Section1.3 Defining Variables Beforewe can enterany data,we must first entersomebasicinformationabout eachvariableinto SPSS.For instance, variablesmustfirst be givennamesthat: o beginwith a letter; o do not containa space.
Chapter I Getting Started
Thus, the variable name "Q7" is acceptable,while the variable name "7Q" is not. Similarly, the variable name "PRE_TEST" is acceptable, but the variable name "PRE TEST" is not. Capitalizationdoes not matter, but variable namesare capitalizedin this text to make it clear when we are referring to a variable name, even if the variable name is not necessarilycapitalizedin screenshots. To define a variable.click on the Variable View tab at thebottomofthema inscre e n .Th is wills h o wy o u t h e V a ri-@ able View window. To return to the Data View window. click on the Data View tab. Fb m u9* o*.*Trqll
t!-.G q".E u?x !!p_Ip
,'lul*lEll r"l*l ulhl **l {,lrl EiliEltfil_sJ elrl
l
.lt-*l*lr"$,c"x.l
From the Variable View screen,SPSSallows you to createand edit all of the variables in your data file. Each column representssome property of a variable, and each row representsa variable. All variablesmust be given a name. To do that, click on the first empty cell in the Name column and type a valid SPSSvariable name. The program will then fill in default valuesfor most of the other properties. One usefulfunctionof SPSSis the ability to definevariableand value labels.Variable labels allow you to associatea descriptionwith each variable.Thesedescriptionscan describethe variablesthemselvesor the valuesof the variables. Value labelsallow you to associatea descriptionwith eachvalue of a variable.For example,for most procedures,SPSSrequiresnumerical values.Thus, for data such as the day of the class (i.e., Mon/Wed/Fri and Tues/Thurs),we need to first code the values as numbers.We can assignthe number I to Mon/Wed/Friand the number2to Tues/Thurs. To help us keep track of the numberswe have assignedto the values,we use value labels. To assignvalue labels,click in the cell you want to assignvaluesto in the Values column. This will bring up a small gray button (seeanow, below at left). Click on that button to bring up the Value Labelsdialog box. --When you enter a iv*rl** v& 12 -Jil value label, you must click L.b.f ll6rhl| s*l !!+ | Add after eachentry. This will J::::*.-,.Tl
mOVe the value and itS associated label into the bottom section of
the window. When all labels have been added, click OK to return to the Variable View window.
ChapterI Gening Starred
In additionto namingand labelingthe variable,you havethe option of definingthe variabletype. To do so, simply click on the Type, Width,or Decimals columns in the Variable View window. The default value is a numeric field that is eight digits wide with two decimalplacesdisplayed.If your dataare more than eight digits to the left of the decimal place,they will be displayedin scientificnotation(e.g.,the number2,000,000,000 will be displayedas 2.00E+09).'SPSSmaintainsaccuracybeyondtwo decimalplaces,but all output will be roundedto two decimal placesunlessotherwiseindicatedin the Decimals column. In our example,we will be usingnumericvariableswith all of the defaultvalues. Practice Exercise Createa data file for the six variablesand two samplestudentspresentedin Example 1.2.1.Name your variables:ID, DAY, TIME, MORNING, GRADE, and WORK. You should code DAY as I : Mon/Wed/Fri,2 = Tues/Thurs.Code TIME as I : morning, 2 : afternoon.CodeMORNING as 0 = No, I : Yes. Code WORK as 0: No, I : Part-Time,2 : Full-Time. Be sure you enter value labels for the different variables.Note that because value labelsare not appropriatefor ID and GRADE, theseare not coded.When done,your Variable View window should look like the screenshotbelow: J -rtrr,d
r9"o'ldq${:ilpt"?- "*- .? --
{!,_q, ru.g
Click on the Data View tab to open the data-entryscreen.Enter data horizontally, beginningwith the first student'sID number.Enter the code for eachvariable in the appropriate column; to enterthe GRADE variablevalue,enterthe student'sclassgrade.
F.E*UaUar Qgtr Irrddn
An hna gnphr Ufrrs
Hhdow E*
ulFId't Al 'r i-l-Etetmt lr*lEl&lr6lgl ot olrt' otttr *lgl dJl blbl slglglqjglej
2
Dependingupon your versionof SPSS,it may be displayedas 2.08 + 009.
Chapter I Getting Started
Thepreviousdata window canbe changedto look insteadlike the screenshot bel*.bv clickingon the ValueLabelsicon(seeanow).In this case,the cellsdisplayvalue labelsratherthanthe corresponding codes.If datais enteredin this mode,it is not necessaryto entercodes,asclickingthebuttonwhichappearsin eachcell asthe cell is selected will presenta drop-downlist of thepredefined lablis. You may useeithermethod,according to yourpreference.
: [[o|vrwl
vrkQ!9try /
)1
*rn*to*u*J----.--
Instead of clicking the Value Labels icon, you may optionallytogglebetweenviews by clicking valueLaiels under the Viewmenu.
tu l{il
Ddr lrm#m
,t
Anrfrrr Cr6l!
,t1
r ti il 'i.
Section1.4 Loading and SavingData Files Onceyou haveenteredyour data,you will need to saveit with a uniquenamefor later useso that you canretrieveit whennecessary. Loadingand savingSpSSdatafiles worksin the sameway as most Windows-based software.Underthe File menu, there are Open, Save, and Save As commands.SPSSdata files have a .,.sav" extension. which is addedby defaultto the end of the filename. ThistellsWindowsthatthefile is anSpSSdatafile.
I
r lii
|:
H-
Save Your Data When you save your data file (by clicking File, then clicking Save or SaveAs to specify a unique name),pay specialattentionto where you saveit. trrtist systemsdefault to the.location.You will probably want to saveyour data on a floppy disk, cD-R, or removableUSB drive so that you can taie the file withvou.
Load YourData When you load your data (by clicking File, then clicking Open, thenData, or by clicking the open file folder icon), you get a similar window. This window lists all files with the ".sav" extension.If you have trouble locating your saved file, make sure you are D{l lriifqffi looking in the right directory.
ChapterI GeningStarted
Practice Exercise To be surethat you havemastered saving and openingdata files, nameyour sample datafile "SAMPLE" andsaveit to a removable
FilE Edt
$ew
Data Transform Annhze @al
storagemedium. Once it is saved,SPSSwill display the name of the file at the top of the data window. It is wise to save your work frequently,in caseof computer crashes.Note that filenamesmay be upper- or lowercase.In this text, uppercaseis usedfor clarity. After you have saved your data, exit SPSS (by clicking File, then Exit). Restart SPSSand load your databy selectingthe "SAMPLE.sav"file you just created.
Section1.5 RunningYour First Analysis Any time you open a data window, you can mn any of the analysesavailable.To get started,we will calculatethe students'averagegrade.(With only two students,you can easily checkyour answerby hand,but imaginea data file with 10,000studentrecords.) The majority of the available statistical tests are under the Analyze menu. This menu displaysall the optionsavailablefor your versionof the SPSSprogram (the menusin this book were createdwith SPSSStudentVersion 15.0).Otherversionsmay haveslightly different setsof options.
j File Edlt Vbw Data Transform I nnafzc Gretrs UUtias gdFrdov*Help
r ktlml lff
El tlorl rl(llnl Cottpsr Milns
,
al ol lVisible:6
)
GanoralHnnar f&dd ) i ,) Corr*lrtr ) Re$$r$on 't901.00 ir l. Classfy ,. ), . OdrRrdrrtMr ) Scab Norparimetrlc lcrtt l ) Tirna5arl6t ) Q.rlty Corfrd I tj\g*r*qgudrr,*ts"uss-Rff(trve,.,
rttrtJJ
Eipbrc,,. CrogstSr,.. Rdio,., P-Pflok,., Q€ Phs.,,
To calculatea mean (average),we are asking the computerto summarizeour data set. Therefore,we run the commandby clicking Analyze, then Descriptive Statistics,then Descriptives. This brings up the Descriptives dialog box. Note that the left side of the box containsa OAY list of all the variablesin our data file. On the right .Sr ql is an area labeled Variable(s), where we can 3s,l specifythe variableswe would like to use in this particularanalysis. A*r*.. I f- 9mloddrov*p*vri*lq
Chapter I Getting Started
l:rt.Ij We want to compute the mean for the variable called GRADE. Thus, we need to select the variable name in the left window (by clicking ;F* | on it). To transfer it to the right window, click on -t:g.J the right arrow between the two windows. The -!tJ arrow always points to the window opposite the f- Smdadr{rdvdarvai& PR:l highlighted item and can be used to transfer selectedvariablesin either direction.Note that double-clickingon the variable name will also transfer the variable to the opposite window. Standard Windows conventions of "Shift" clicking or "Ctrl" clicking to selectmultiplevariablescan be usedas well. When we click on the OK button, the analysiswill be conducted,and we will be readyto examineour output.
in
m
Section1.6 Examiningand PrintingOutput Files After an analysis is performed, the output is placed in the output window, and the output window becomesthe active window. If this is the first analysis you have conducted since starting SPSS, then a new output window will be created.If you have run previous analyses and savedthem,your
outputis addedto theendof yourpreviousoutput. To switchbackandforthbetweenthedata window andtheoutput window,select thedesiredwindowfrom the Windowmenubar(seearrow,below). The output window is split into two sections. The left sectionis an outlineof the (SPSS "outline output refersto this asthe view").Theright sectionis theoutputitself. H. Ee lbw A*t
lra'dorm
-qg*g!r*!e!|ro_
o
slsl*gl elsl*letssJ sl#_#rl + l *l + l - l &hj l ornt El Pccc**tvs* r'fi Trb
6r**
I -d * lnl-Xj irllliliirrillliirrr
Craphr ,Ufr!3 Uhdo'N Udp
:l ql el , * Descrlptlves
lS Adi\€D*ard ffi Dcscrtfhcsdkdics
f]aiagarll
l: \ lrrs
datc\ra&ple.lav
lle*crhlurr N ufinuc
I
valldN (|lstrylsa)
2
ffiffi?iffi
Sl.*liilca
Mlnlmum Hadmum 83.00 85.00
Xsrn
81,0000
rr---*.*
Std. Dwiation 1.41421
r*4
The sectionon the left of the output window provides an outline of the entire output window. All of the analysesare listed in the order in which they were conducted.Note that this outline can be used to quickly locate a sectionof the output. Simply click on the sectionyou would like to see,and the right window will jump to the appropriateplace.
ChapterI GeningStarted
Clicking on a statisticalprocedurealso selectsall of the output for that command. By pressingtheDeletekey, that outputcan be deletedfrom the output window. This is a quick way to be sure that the output window containsonly the desiredoutput. Output can also be selectedand pastedinto a word processorby clicking Edit, then Copy Objecls to copy the output.You can then switch to your word processorand click Edit, thenPaste. To print your output, simply click File, then Print, or click on the printer icon on the toolbar. You will have the option of printing all of your output or just the currently selected section. Be careful when printing! Each time you mn a command, the output is addedto the end of your previous output. Thus, you could be printing a very large output file containinginformation you may not want or need. One way to ensurethat your output window containsonly the resultsof the current commandis to createa new output window just before running the command.To do this, click File, then New, then Outpul. All your subsequentcommandswill go into your new output window. Practice Exercise Load the sampledata file you createdearlier (SAMPLE.sav). Run the Descriptives command for the variable GRADE and print the output. Your output should look like the exampleon page7. Next, selectthe data window and print it.
Section1.7 ModifyingData Files Once you have createda data file, it is really quite simple to add additional cases (rows/participants) or additionalvariables(columns).ConsiderExample1.7.1.
Example1.7.1 Two morestudents provideyou with surveys. Theirinformationis: ResponseSheet3 ID: Day of class: Classtime: Are you a morningperson? Final gradein class: Do you work outsideschool?
ResponseSheet4 ID: Day of class: Classtime: Are you a morning person? Final gradein class: Do you work outsideschool?
8734 MWF Morning Yes
X TTh Afternoon XNo
80% Part-time
Full-time No
1909 X MWF X Morning X Yes 73% Full+ime No
TTH Afternoon No X
Part-time
Chapter I Getting Started
To add thesedata, simply place two additionalrows in the Data View window (after loading your sampledata).Notice that as new participantsare added,the row numbers becomebold. when done,the screenshouldlook like the screenshothere. j
.l
'..,
nh E*__$*'_P$f_I'Sgr
lrrl
vl
&1{1zc Omhr t$*ues $ilndonHug_
Tffiffi 1
2 3
ID DAY TIME MORNING GRADE WORK 4593.00 Tueffhu aternoon No 85.00 No 1gnl.B0MonMed/ m0rnrng Yes 83.00 Part-Time 8734.00 Tue/Thu mornrng No 80,00 No 1909.00MonAfVed/ mornrng Yeg 73.00 Part-Time
var
\^
)
. mfuUiewffi
I
l{ l
rb$ Vbw /
Procus*rlsready 15P55
'.-
I
i
-
rll -,,,---Jd*
,4
New variables can also be added.For example, if the first two participantswere given specialtraining on time management,and the two new participantswere not, the data file can be changedto reflect this additionalinformation.The new variable could be called TRAINING (whether or not the participant receivedtraining), and it would be coded so that 0 : No and I : Yes. Thus, the first two participantswould be assigneda "1" and the Iast two participantsa "0." To do this, switch to the Variable View window, then add the TRAINING variable to the bottom of the list. Then switch back to the Data View window to updatethe data. f+rilf,t
-
tt
Inl vl
Sa E& Uew Qpta lransform &rpFzc gaphs Lffitcs t/itFdd^,SE__--
14:TRAINING l0 1 I
3 4
s
i of lvGbt€r
t0 NAY TIME MORNING GRADE woRK I mruruwe 4593.0f1 Tueffhu aterncon No 85.0u No l Yes yes 1901.OCI ManA/Ved/ m0rnrng Yes ffi.0n iiart?mel8734"00 Tueffhu momtng No 80.n0 Noi No 1909.00M onrlVed/ morning Yes 73.00 Part-TimeI No '
1r
I (l)
.r View { Vari$c Vlew
.
isPssW
l-.1 =J "
rll'l ,i
Adding data and adding variablesare just logical extensionsof the procedureswe used to originally createthe data file. Save this new data file. We will be using it again later in the book.
Chapter I Getting Started
Practice Exercise Follow the exampleabove(whereTRAINING is the new variable).Make the modifications to your SAMPLE.savdatafile andsaveit.
l0
Chapter2
EnteringandModifying Data In Chapter 1, we learnedhow to createa simple data file, save it, perform a basic analysis,and examine the output. In this section,we will go into more detail about variablesand data.
Section2.1 Variablesand DataRepresentation In SPSS,variablesare representedas columns in the data file. Participantsare representedas rows. Thus, if we collect 4 piecesof information from 100 participants,we will have a data file with 4 columnsand 100 rows. Measurement Scales There are four types of measurementscales:nominal, ordinal, interval, and ratio. While the measurementscalewill determinewhich statisticaltechniqueis appropriatefor a given set of data, SPSSgenerally does not discriminate.Thus, we start this section with this warning: If you ask it to, SPSSmay conduct an analysis that is not appropriatefor your data.For a more completedescriptionof thesefour measurementscales,consultyour statisticstext or the glossaryin Appendix C. Newer versionsof SPSS allow you to indicate which types of Measure data you have when you define your variable. You do this using the Measurecolumn. You can indicateNominal, Ordinal, or Scale(SPSS @Nv doesnot distinguishbetweeninterval and ratio scales). f $cale .sriltr Look at the sampledata file we createdin Chapter l. We calcur Nominal lated a mean for the variable GRADE. GRADE was measuredon a ra-
summarystatistic(assumingthat the distribution tio scale,andthe mean is an acceptable is normal). We could have had SPSScalculatea mean for the variableTIME insteadof here. GRADE.If we did, we wouldgettheoutputpresented The outputindicatesthat the averageTIME was 1.25.Rememberthat TIME was coded as an ordinal variable ( I = m or ni ngcla ss,2-a fte rnoon trlllql eilr $l-g class).Thus, the mean is not an *lq]eH"N-ql*l appropriatestatisticfor an ordinal :* Sl astts .l.:D gtb scale,but SPSScalculatedit any:$sh way. The importance of considering the type of data cannot be overemphasized. Just because ht6x0tMn SPSS will compute a statistic for you doesnot mean that you should .6M6.ffi
ll
$arlrba"t S#(| LS a
2.qg
Lt@
Chapter2 Enteringand Modifying Data
use it. Later in the text, when specificstatisticalproceduresare discussed,the conditions underwhich they are appropriatewill be addressed. Missing Data Often, participantsdo not provide completedata.For some students,you may have a pretestscore but not a posttestscore.Perhapsone studentleft one question blank on a survey,or perhapsshe did not stateher age.Missing data can weakenany analysis.Often, a singlemissingquestioncan eliminatea subject from all analyses. ql total If you have missing data in your data 2.00 2.Bn 4.00 set, leave that cell blank. In the example to 3.00 1.0 0 4.00 the left, the fourth subject did not complete Question2. Note that the total score(which is 4.00 3.00 7.00 calculatedfrom both questions)is also blank becauseof the missing data for Question 2. 2.00 SPSS representsmissing data in the data 1 .0 0 2.UB 3.00 window with a period (althoughyou should not enter a period-just leave it blank).
Section2.2 Transformationand Selectionof Data We oftenhavemoredatain a datafile thanwe wantto includein a specificanalysis.For example,our sampledatafile containsdatafrom four participants, two of whom receivedspecialtrainingand two of whom did not. If we wantedto conductan analysis usingonly the two participants who did not receivethe training,we wouldneedto specify theappropriate subset. Selectinga Subset F|! Ed vl6{ , O*. lr{lrfum An*/& e+hr O*fFV{ldrr PrS!tU6.,. CoptO.tafropc,tir3,..
t'llitl&JE
il :id
l,j.l,/r,:irrlrr! lif l ll:L*s,,.
Hh.o*rr,., Dsfti fi*blc Rc*pon$5ct5,,,
(
We can use the SelectCasescommandto specify a subset of our data. The Select Cases command is located under the Data menu. When you select this command,the dialog box below will appear. q*d-:-"-- "-"""-*--*--**-""*-^*l 6 Alce a llgdinlctidod ,r l
ConyD*S
r irCmu*dcaa
]
i*np* | {^ lccdotincoarrpr
sd.rt Csat
;.,* |
c
You can specify which cases(participants) you want to select by using the selection criteria, which appearon the right side of the SelectCasesdialog box.
-:--J llaffrvci*lc
l0&t
C6ttSldrDonoan!.ffi
foKl
t2
aar I c-"rl x* |
i:
Chapter2 Enteringand Modifying Data
By default,All caseswill be selected.The most common way to selecta subsetis to click If condition is satisfied, then click on the button labeledfi This will bring up a new dialog box that allowsyou to indicatewhich casesyou would like to use. You can enter the logic used to select the subset in the upper section. If the logical statement is true for a given case, then that case will be U;J;J:.1-glL1 E{''di',*tI 'J-e.l-,'JlJ.!J-El[aasi"-Eo,t----i , selected.If the logical statement is false. that case will not be 0 U IAFTAN(r"nasl ,Jl _!JlJ selected.For example, you can sl"J=tx -s*t"lBi!?Blt1trb :r select all casesthat were coded ?Ais"I c'-t I Ht I as Mon/Wed/Fri by entering the formula DAY = I in the upperright part of the window. If DAY is l, thenthe statementwill be true,and SPSSwill select the case.If DAY is anything other than l, the statementwill be false, and the casewill not be selected.Once you have enteredthe logical statement,click Continue to return to the SelectCasesdialog box. Then,click OK to returnto the data window. After you have selectedthe cases,the data window will changeslightly. The casesthat were not selectedwill be markedwith a diagonalline through the casenumber. For example,for our sampledata, the first and third casesare not selected.only the secondand fourth casesare selectedfor this subset.
ilqex4q lffiIl,?,l*;*"'=
,.,:r.
EffEN'EEEgl''EEE'o
rt
lnl vl
-'4 4
s
MORNING ERADE WORK TRAINING 4533.m Tueffhui affsrnoon No ffi.m Na Yes Not Selected 1901.m MpnMed/i mornino Yss 83,U1Fad-Jime Yes Splacled 6h4lto TuElThu No m.m . morning No No Not Selected ieifrfft MonA/Ved/1morning Yes ru.mPart-Time No . -..- ^,-.-.*.*..,--
1
,1
'l
1 I
'l
TIME
ID
*-
l
t
-#gdd.i.&l Flib'2-
i{
1 :
!k_l**
/,-<
1
, I I I
J.- . - .-..,..".*-....- ':
!LJ\ii. vbryJv,itayss 7
I
. *-JI
fsPssProcaesaFrcady
i
*] ,1,
An additional variable will also be createdin your data file. The new variable is called FILTER_$ and indicateswhethera casewas selectedor not. If we calculatea mean DescripthreStailstics GRADE using the subset we just selected,we will receive std. N Minimum Maximum M e a n Deviation the output at right. Notice that UKAUE 2 73.00 83.00 78.0000 7 . 0 7 1 1 we now have a mean of 78.00 Va lid N 2 IliclwisP'l with a samplesize (M) of 2 insteadof 4.
l3
1
'l
Chapter2 Enteringand Modifying Data
Be careful when you selectsubsets.The subsetremains in ffict until you run the commandagain and selectall cases.You can tell if you have a subsetselectedbecausethe bottom of the data window will indicatethat a filter is on. In addition, when you examine your output, N will be less than the total number of recordsin your data set if a subsetis selected.The diagonal lines through some caseswill also be evident when a subsetis selected.Be careful not to saveyour data file with a subsetselected,as this can causeconsiderableconfusionlater. Computing a New Variable SPSScan also be used to compute a new variable or nh E* vir$, D.tr T|{dorm manipulateyour existing vari*lslel EJ-rlrj -lgltj{l -|tlf,l a*intt m eltj I ables. To illustrate this, we will create a new data file. This file will contain data for four participants and three variables (Ql, Q2, and Q3). The variables represent the points each number of l* ,---- LHJ participant received on three {#i#ffirtr!;errtt*; different questions.Now enter the data shown on the screen to the right. When done, save this data file as "QUESTIONS.sav."We will be usingit againin laterchapters. I TrnnsformAnalyze Graphs Utilities Whds
Rersdeinto5ameVariable*,,, intoDffferant Varlables. Racodo ,, Ar*omSicRarode,,. Vlsual 8inrfrg,..
After clicking the Compute Variable command, we get the dialog box at right. The blank field marked Target Variable is where we enter the name of the new variable we want to create. In this example, we are creating a variable called TOTAL, so type the word "total." Notice that there is an equals sign between the Target Variable blank and the Numeric Expression blank. These two blank areas are the
Now you will calculatethe total score for eachsubject.We could do this manually,but if the data file were large, or if there were a lot of questions,this would take a long time. It is more efficient (and more accurate) to have SPSS compute the totals for you. To do this, click Transform and then click Compute Variable.
,
rrw I i+t*...
*l
gl w ca
U $J-:iidijl lij -!CJ:l Jslcl rtg-sJ ll;s rt rt rl ,_g-.|J :3 lll--g'L'"J
lllmr*dCof
0rr/ti* &fntndi) Oldio.
til
E${t iil
, rr | {q*orfmsrccucrsdqf
n* ri c* rl
l4
"*l
:J
Chapter2 Enteringand Modifying Data
iii:Hffiliji:.:
. i . i> t
i-Jr:J::i l:j -:15 tJ -tJ-il is:Jlll
two sides of an equation that SPSS will calculate.For example,total : ql + q2 + q3 is the equation that is entered in the sample presentedhere (screenshot at left). Note that it is possible to create any equation here simply by using the number and operational keypad at the bottom of the dialog box. When we click OK, SPSSwill createa new variablecalled TOTAL and make it equalto the sum of the threequestions. Save your data file again so that the new variablewill be available for future sessions.
ii"alCt
i-3J:J JJJI --q-|J --q*J
m
J- l
|f- - | ldindm.!&dioncqdinl
tsil
nact I
c:nt I
x*
|
t::,,
- ltrl-Xl
Eile gdit SEw Qata lransform $nalyza 9aphs [tilities Add'gns Sindow Help
3.n0
4,n0
3.0n
10.00
4.00
31..........;. 2.oo l I
41
1.0 0 1
.:1
l-'r --i-----i
2.oo 3 0 01
il I i , l, l\qg,t_y!"*_i Variabte ViewJ
Recodinga Variable-Dffirent
lit W*;
r ljl
Variable
SPSS can create a new variable based upon data from another variable. Say we want to split our participantson the basisof their total score.We want to create a variablecalled GROUP, which is coded I if the total score is low (lessthan or equal to 8) or 2 if the total score is high (9 or larger). To do this, we click Transform, then Recodeinto Dffirent Variables.
F{| [dt
!la{
Data j Trrx&tm
,-.lu l,rll r-al +.
---.:1.l{ rr
I I
'r tr- l
4.00
Analrra
vdiouc',' conp$o Cd.nVail'r*dnCasas.,, -o ..* * ^ c- - u - r - c
Art(tn*Rrcodr... U*dFhn|ro,,.
2.00
i.m Racodrlrto 0ffrror* Yal
l5
S*a *rd llm tllhsd,,, Oc!t6 I}F sairs.., Rid&c l4sitE V*s.,. Rrdon iMbar G.rs*trr,,.
Ch a p te 2 r En te ri n ga n d Mo d i fy i n gD ata
This will bring up the Recode into Different Variables dialog box shown here. Transfer the variable TOTAL to the middle blank. Type "group" in the Name field under Output Variable.Click Change,and the middle blank will show that TOTAL is becoming GROUP.as shownbelow.
til
ladtnl c€ rlccdm confbil
-'tt"
rygJ**l-H+ |
I
NtnHbvli|bL-lo|rnrV*#r
To help keep track of variablesthat have been recoded, it's a good idea to open the t *.!*lr Variable View and enter "Recoded" in the Label rr&*ri*i*t column in the TOTAL row. This is especially ;rln I r-":-'-'1** useful with large datasets which may include i T I r nryrOr:frr**"L many recodedvariables. ,fClick Old andNew Values.This will bring i c nq.,saa*ld6lefl; Fup the Recodedialog box. In this example,we have entered a 9 in the Range, value through HIGHEST field and a 2 in the Value field under New Value.When we click Add, the blank on the ,.F--*-_-_-_____ right displaysthe recodingformula. Now enter an :I "a *r***o lrt*cn*r I I nni. 8 on the left in the Range, LOWEST through rT..".''..."...value blank and a I in the Value field under New I ir:L-_t' Value. Click Add, then Continue.Click OK. You l6; F i4i'|(tthah* T &lrYdd.r*t will be redirectedto the data window. A new I " n *'L,*l'||.r.$, : r----**-: variable (GROUP) will have been added and ; r {:ei.* gf-ll codedas I or 2, basedon TOTAL. lirli
i " r, . ! * r h ^ . , " , r
'-
*u"'." Flc Ed Yl.ly Drt! Tr{lform {*!c
ce|6.,||tf^,!!!ry
-ltrlIl
I+
l6
y..,t
li-l a ri r, r
it : . '
$q I
I
'*J
Chapter3
DescriptiveStatistics ln Chapter2, we discussed manyof the optionsavailablein SPSSfor dealingwith our data.The procedures usedto describe data.Now we will discusswaysto summarize andsummarizedataarecalleddescriptivestatistics. Section3.1 FrequencyDistributions and PercentileRanks for a SingleVariable Description The Frequencies commandproducesfrequencydistributionsfor the specifiedvaripercentages, ables.The outputincludesthe numberof occurrences, valid percentages, and and the cumulativepercentages cumulativepercentages. The valid percentages comprise asmissing. only thedatathatarenot designated TheFrequencies commandis usefulfor describingsampleswherethe meanis not useful(e.g.,nominalor ordinal scales). It is alsousefulasa methodof gettingthefeelof your data.It providesmoreinformationthanjust a meanandstandarddeviationandcan be usefulin determiningskewandidentifyingoutliers.A specialfeatureof the command percentileranks. is its abilityto determine Assumptions Cumulativepercentages and percentilesare valid only for datathat are measured on at leastan ordinal scale.Because the outputcontainsoneline for eachvalueof a variable,thiscommandworksbeston variableswith a relativelysmallnumberof values. Drawing Conclusions TheFrequencies outputthatindicatesboththenumberof cases commandproduces in the sampleof a particularvalueandthe percentage of caseswith that value.Thus,conclusionsdrawnshouldrelateonly to describingthe numbersor percentages of casesin the regardingthe cumulativepersample.If the dataareat leastordinalin nature,conclusions centageand/orpercentilescanbe drawn. .SPSSData Format The SPSSdatafile for obtainingfrequencydistributions requiresonly onevariable, andthatvariablecanbeof anytype.
tt
C h a p te 3 r D e s c ri p ti v e Sta ti s ti c s
Creating a Frequency Distribution To run the Frequer?ciescommand, click Analyze, then Descriptive Statistics, I slsl}sl &99rv i:rl.&{l&l&l @ then Frequencies.(This example uses the i 1 r mpg } Disbtlvlr... cdrFrb'l{tirE i 18 N Erpbr,.. CARS.savdata file that comeswith SPSS. croac*a,.. Rrno,., It is typically located at .) This will bring up the main dialog r5117gl box. Transfer the variable for which you would like a frequencydistributioninto the Variable(s)blank to the right. Be surethat xl tables option is the Display MilesperGallonlm r q ! l checked.Clickfrequency OK to receiveyour output. /Erqlr,onispUcamr that the dialog boxes in / Hurepowor[horc Note dv*,id"w"bir 1|ut jq? | newer versions of SPSS show both the d t!rc toAceileistc .f"tq I type of variable(the icon immediatelyleft dr',Ccxr*yol Orbin[c . He_l of the variable name) and the variable labels if they are entered. Thus, the l7 Oisgayhequercytder variable YEAR shows up in the dialog box as Model Year(modulo I0). sr**i,1..1 rry*,:. I f*:.,. I Output for a Frequency Distribution The output consistsof two sections.The first sectionindicatesthe numberof records with valid data for eachvariableselected.Recordswith a blank scoreare listedas missing.In this example,the datafile contained406 records.Notice that the variablelabel is ModelYear (modulo100). The second section of the output contains a statistics cumulative frequency distribution for each variable Wse lected.A tth e topof t h e s e c t io n , t h e v a ria b le la b e lis * oo? y.1"1 | | given. The output iiself consistsof five columns.The first | Missing t I I I Jolumnliststhi valuesof the variablein sortedorder.There is a row for eachvalue of your variable, and additional rows are added at the bottom for the Total and Missing data. The secondcolumn gives the frequency of eachvalue,includingmissingvalues. The third columngivesthe percentage of all records (including records with missingdata) for eachvalue.The fourth column, labeled Valid Percenl, gives the percentage of records(without including records with missing data) for each value. If there were any missingvalues, these values would be larger than the valuesin column threebecausethe total
Modol Yo.r (modulo 100) Cumulativs Pcr ce n l
34
vatE
72 73 74 75 76 77
28 40 27 30 34 28
79 80 81 82 Total Missing 0 (Missing) Total
29 29 30 31 405 1 406
r8
I 4 7.1 6.9 9.9 6.7 8.4 6.9 8.9 7.1 7.1 7.4 7.6 99.8 100.0
Valid P6rc€nl I 4
7.2 6.9 9.9 6.7 7.4 8.4 6.9 8.9 f.2
7.2 7.4 7.7
100 .0
E4 15.6 22.5 32.3 39.0 46.4 54.8 61.7 70.6 77.8 84.9 92.3 | 00.0
Chapter3 DescriptiveStatistics
numberof recordswould have beenreducedby the numberof recordswith missing values. The final column gives cumulative percentages.Cumulative percentagesindicate the percentageof records with a score equal to or smaller than the current value. Thus, the last value is always 100%. These values are equivalentto percentile ranks for the values listed.
D eterm ining P ercenti Ie Ranl<s The Frequencies command can be used to provide a number of descriptive Sfndr*Pi*rcsnr statistics,as well as a variety of percentile SHslsp{rierltuso (including quartiles, cut points, and values /v***v*$t*(ttu YI /lino toaccrbrar !rydI scorescorrespondingto a specificpercentile $1C**{ry o{Origr[c |*"1 rank). To obtain either the descriptiveor lT Oirpbar frcqlcreyttblce percentile functions of the Frequencies command,click the Statisticsbutton at the frfix*... I bottomof the main dialog box. Note that the Central Tendencyand Dispersior sectionsof this box are useful for calculatingvalues, suchas the Median or Mode. which cannotbe calculatedwith the Descriptiyescommand (seeSection3.3). :,,.
tril
Mla pa Galmlm3
This brings up the Frequencies: Statisticsdialog box. Check any additional desiredstatisticby clicking on the blank next to it. For percentiles, enter the desired percentile rank in the blank to the right of the Percentile(s)label. Then, click Add to add it to the list of percentiles requested.Once you have selectedall your requiredstatistics, click Continue to return to the main dialog box. Click OK.
tr Ourilr3 F nrs**rtd!i*
c{q I *g"d I
I ,crnqo,p, i
Hdo I
f- Vdrixtgor0mi&ohlr Oi$.r$pn " l* SUaa** n v$*$i I* nmgc
f Mi*n n |- Hrrdilrtl l- S"E.mcur
0idthfim' t- ghsrurt T Kutd*b
Outputfor Percentile Ranl<s
Statistics ModelYear (modulo100 N Vatid Missing Percentiles 25 50 75 80
xl
PscdibV.lrr
405 1 73.00 76.00 79.00 80.00
The Statistics dialog box adds on to the previous output from the Frequenciescommand.The new sectionof the output is shown at left. The output containsa row for each piece of information you requested.In the example above, we checkedQuartiles and asked for the 80th percentile. Thus, the output contains rows for the 25th, 50th. 75th,and 80th percentiles.
l9
Ch a p re ,1 r D e s c ri p tier S ta ti s ti cs
Practice Exercise UsingPracticeDataSet I in AppendixB, createa frequencydistributiontablefor the mathematics skills scores.Determine the mathematics skills scoreat which the 60th percentile lies. section 3.2 FrequencyDistributions and percentileRanks for Multiple Variables Description The Crosslabscommandproducesfrequencydistributionsfor multiple variables. The outputincludesthenumberof occurrences of eachcombinationof levelJof eachvariable.It is possibleto havethecommandgivepercentages for anyor all variables. The Crosslabscommandis usefulfor describingsampleswherethe mean is not useful(e'g.,nominalor ordinal scales). It is alsousefulasa methodfor gettinga feelfor your data. Assumptions Becausethe outputcontainsa row or columnfor eachvalue of a variable.this command worksbeston variables with a relativelysmallnumberof values. ,SPSSData Format The SPSS data file for the Crosstabs lnalyzc Orphn Ut||Uot commandrequirestwo or more variables.Those I ) RcF*r variablescanbe of anytype. Runningthe CrosstabsCommand
This exampleusesthe SAMpLE.sav data file, which you createdin Chapter l. To run the procedure, ctick Analyze, then Descriptive Statistics,then Crosstabs.This will bring up ttt. main Crosstabsdialog box, below.
i,
r---r
Ror{.} ftr;;ho.-
l rJ I
.;lm&!
T€K I '-l ryq I
(orprycrllcEnr ) G*ncral llrgar Flodcl ) , ) ;ilffi; ) chrfy DttaRcd.Etbn ) ) scah
The dialog box initially lists all variables on the left and contains two blanks labeled Row(s) and Column(s). Enter one variable (TRAINING) in the Row(s) box. Enter the second (WORK) in the Column(s) box. To analyze more than two variables, you would enter the third, fourth, etc., in the unlabeled area(ust under theLayer indicator).
20
Chapter3 DescriptiveStatistics
The Cells button allows you to specify W:
percentagesand other information to be generatedfor eachcombinationof values.Click Cells, and you will get the box at right. For the example presented here, check Row, Column, and Total percentages.Then click Continue. This will return you to the Crosstabsdialog box. Click OK to run the analvsis.
t C",ti* |
t*"1
,"1
''Pd€rl.!p. ;F Bu F corm
-
r-Bait*" : ,l- U]dadr&ad if- sragatrd
"1'"1- -_rry-ys___ .
TRAINING' WURKCross|nl)tilntlo|l WORK Parl-Time I 1 50.0% 50.0% 50. 0% 50.0% 25. 0% 25.0% 1 1 50.0% 50.0% 50. 0% 50.0% 25.0% 25.0%
NO
TRAINING Yes
No
Total
Count % withinTRAININO % withinwoRK % ofTolal Count % withinTRAINING % withinWORK % ofTolal Count % withinTRA|NtNo % wilhinWORK % ofTolal
a
50.0% 100. 0% 50. 0%
5 00 % 100.0% 50.0%
Tolal
100.0% 50.0% 50.0% ? 1000% 50.0% 50.0% 4 r 00.0% 100.0% 100.0%
Interpreting Cros stabs Output The output consistsof a contingencytable. Each level of WORK is given a column. Each level of TRAINING is given a row. In addition, a row is added for total, and a column is added for total.
Each cell containsthe numberof participants(e.g.,one participantreceivedno trainingand doesnot work; two participants receivedno training,regardless of employmentstatus). The percentages for eachcell are also shown.Row percentages add up to 100% horizontally. Columnpercentages addup to 100%vertically.Forexample,of all the individualswho had no training, 50ohdid not work and 50o% workedpart-time(using the"o/o within TRAINING" row). Of the individualswho did not work, 50o/ohad no trainingand 50%hadtraining(usingthe"o/owithin work" row). Practice Exercise Using PracticeData Set I in AppendixB, createa contingencytable using the Crosstabscommand.Determinethe numberof participantsin eachcombinationof the variablesSEX andMARITAL. Whatpercentage of participants is married?Whatpercentparticipants ageof is maleandmarried? Section3.3 Measuresof Central Tendencyand Measuresof Dispersion for a SingleGroup Description Measuresof centraltendencyare valuesthat representa typical memberof the sampleor population.Thethreeprimarytypesarethemean,median,andmode.Measures of dispersiontell you the variabilityof your scores.The primarytypesare the range and the standarddeviation.Together,a measure of centraltendencyanda measureof dispersionprovidea greatdealof informationabouttheentiredataset. 2l
Chapter,l DescriptiveStatistics
We will discussthesemeasures of central tendencyand measuresof dispersionin the context of the Descriplives command. Note that many of these statisticscan also be calculated with several other commands (e.g., the Frequenciesor Compare Means commandsare required to compute the mode or median-the Statisticsoption for the Frequenciescommandis shownhere).
iffi{ltl*::l'.,xl Fac*Vd*c-----:":'-'-"-"
"-
|7 Arruer |* O*pai*furjF tqLteiotpr F rac$*['*
r.-I k'I +l
':'I
16-
lcer**r**nc*r1 !*{* f- rlm Cr* -:.-i , f u"g.t
| |
: T Modt
:-^ t5m
I l- Vdsm$apn&bcirr
i0hx*ioo*".'*-' lf Sld.dr',iitbnl* lli*nn f.H**ntrn ]fV"iro f.5.t.ncr l fnxrgo
oidrlatin -- -r5tcffi:
; f Kutu{b
i
Assumptions Each measureof central tendencyand measureof dispersionhas different assumptions associatedwith it. The mean is the most powerful measureof centraltendency,and it has the most assumptions.For example,to calculatea mean, the data must be measuredon an interval or ratio scale.In addition,the distribution shouldbe normally distributedor, at least,not highly skewed.The median requiresat leastordinal data.Becausethe median indicatesonly the middle score (when scoresare arrangedin order), there are no assumptions about the shapeof the distribution.The mode is the weakestmeasureof central tendency.There are no assumptionsfor the mode. The standard deviation is the most powerful measureof dispersion,but it, too, has severalrequirements.It is a mathematicaltransformationof the variance (the standard deviation is the squareroot of the variance).Thus, if one is appropriate,the other is also. The standard deviation requiresdata measuredon an interval or ratio scale.In addition, the distribution should be normal. The range is the weakestmeasureof dispersion.To calculatea range, the variablemust be at leastordinal. For nominal scale data,the entire frequencydistribution shouldbe presentedas a measureof dispersion. Drawing Conclusions A measureof central tendencyshouldbe accompaniedby a measureof dispersion, Thus, when reporting a mean, you should also report a standard deviation. When presentinga median, you shouldalso statethe range or interquartilerange. .SPSSData Format Only one variable is required.
22
Chapter3 DescriptiveStatistics
Running the Command The Descriptives command will be the lA-dy* ct.dn Ltffibc command you will most likely use for obtaining measuresof centraltendencyand measuresof dispersion. This example uses the SAMPLE.sav data file ! D GonardtFra*!@ we have used in the previouschapters. ' cond*s ) . : :
,t X
) ) )
To run the command, click Analyze, then Descriptive Statistics,then Descriptives. n".d I This will bring up the main dialog box for the cr*l I Descriptives command. Any variables you f,"PI would like information about can be placed in opdqr".. I the right blank by double-clickingthem or by selectingthem, then clicking on the anow.
qil
dlt
da.v d** ?n-"* ?,r,qx /t**ts
f
Rolrar*n classfy 0€tdRedrctitrt
S&r dr.d!r&!d
Y*rcr ri vdi.bb
By default, you will receive the N (number of cases/participants),the minimum value, the maximum value, the mean, and the standard deviation. Note that some of thesemay not be appropriatefor the type of data you have selected. If you would like to changethe default statistics that are given, click Options in the main dialog box. You will be given the Optionsdialog box presented here.
F Morr
l- Slm
F su aa**n f u"or-
F, Mi*ilm
l- nrrcr
I- S.r.npur
F7 Maiilrn
r@t
qq..' I ,|' ?bl
* I otlnyotdq:
Reading the Output The output for the Descriptivescommand is quite straightforward.Each type of output requestedis presentedin a column, and eachvariable is given in a row. The output presentedhere is for the sampledata file. It showsthat we have one variable (GRADE) and that we obtainedthe N, minimum, maximum, mean, and standard deviation for this variable. DescriptiveStatistics Minimum Maximum 73.00 85.00
23
,l t il
'i
I I
: I
r I *car*re mar i r Dccemdnnmre
4 4
'!t
"i
I r lpr,*an
N
{l
,l
I {f V;i*hlC
graoe ValidN (listwise)
ltl
Mean Std.Deviation 80.2500 5.25198
",
;i I ;
Chapter3 DescriptiveStatistics
Practice Exercise Using PracticeData Set I in AppendixB, obtainthe descriptive statisticsfor the ageof the participants. What is the mean?The median?The mode?What is the standard deviation?Minimum? Maximum?The range? Section 3.4 Measures of Central Tendency and Measures of Dispersion for Multiple Groups Description The measuresof centraltendencydiscussedearlierare often needednot only for the entiredataset,but also for severalsubsets.One way to obtainthesevaluesfor subsets would be to use the data-selection techniquesdiscussedin Chapter2 and apply the Descriptivescommandto each subset.An easierway to perform this task is to use the Means command.The Means commandis designedto provide descriptive statisticsfor subsets ofyour data. Assumptions The assumptionsdiscussedin the sectionon Measuresof Central Tendencyand Measuresof Dispersionfor a SingleGroup(Section3.3) alsoapply to multiplegroups. Drawing Conclusions A measureof centraltendencyshouldbe accompanied by a measureof dispersion. Thus,when giving a mean, you shouldalsoreporta standard deviation. When presenting a median,you shouldalsostatethe range or interquartilerange. SPSSData Format Two variablesin the SPSSdata file are required.One representsthe dependent variable and will be the variablefor which you receivethe descriptive statistics.The other is the independentvariable and will be usedin creatingthe subsets.Note that while SPSScalls this variablean independentvariable, it may not meet the strict criteriathat definea true independentvariable (e.g.,treatmentmanipulation).Thus, someSPSSproceduresreferto it as the grouping variable.
Runningthe Command This example uses the SAMPLE.sav data file you created in Chapterl. The Means commandis run by clicking Analyze, then Compare Means, thenMeans. This will bring up the main dialog box for the Means command. Place the selectedvariablein the blank field labeled DependentList.
! RnalyzeGraphs Utilities WindowHetp F r.l nsportt ' Descriptive Statistirs )
-l
I
Firulbgt5il |
' . '
1A LA
General Linearftladel F ) Csrrelata I Regression (fassify F
Ona-Sarnple f feft. Independent-Samdes T Te T Test,,, Falred-SarnplEs Ons-Way*|iJOVA,,,
Chapter3 DescriptiveStatistics
Placethe grouping variable in the box labeledIndependentList.In this example, through use of the SAMPLE.sav data file, measuresof central tendencyand measuresof dispersion for the variable GRADE will be given for each level of the variable MORNING.
I
ryl
List Dependant
:I tu
€ arv
r T ril
,du**
ll".i I
/wqrk €tr"ining
lLayarl al1*-
I :'r:rrt| I i
lr-, r
l*i.rl
Heset I Cancel I
Ii ..!'l?It.
l"rpI
Independent Li$: r:ffi
tffi, I
L-:By default,the mean, numberof cases,and standard deviation are given. If you would like additional measures,click Options and you will be presentedwith the dialog box at right. You can opt to include any numberof measures. Reading the Output The output for the Means command is split into two sections.The first section,called a case processingsummary, gives informationabout the data used. In our sample data file, there are four students(cases),all of whom were included in the analysis.
fd Stdirtlx:
mil'*-* ofCa*o*
Medan
lltlur$u 5tt Minirn"rm Manimlrn Rarqo Fist La{ VsianNc
ml
lStardad
I
Std.Enord Kutosis Skemrcro Sld.Eno ol $karm HanorricMcan :J
Lqlry-l
Doviaion
I x,r I
c""d I
GaseProcessingSummary Cases Excluded N Percent
lncluded N
grade - morning
4
Percent 100.0%
0
25
.OYo
Total N
4 |
Percent 100.0%
Chapter3 DescriptiveStatistics
The secondsectionof the outRepott put is the report from the Means comGRADE mand. MOR N IN G Mean N Std.Deviation This report lists the name of NO 82.5000 2 3.53553 the dependent variable at the top Yes 78.0000 7.07107 (GRADE). Every level of the indeTotal 80.2500 4 5.25198 pendent variable (MORNING) is shown in a row in the table. In this example,the levels are 0 and l, labeledNo and Yes. Note that if a variable is labeled,the labelswill be usedinsteadof the raw values. The summary statisticsgiven in the report correspondto the data, where the level of the independentvariable is equalto the row heading(e.g.,No, Yes). Thus,two participantswere includedin eachrow. An additional row is added, named Total. That row contains the combined data. and the valuesare the sameas they would be if we had run theDescriptiyescommandfor the variableGRADE. Extension to More Than One Independent Variable If you have more than one independent variable, SPSS can break down the output even further. Rather than adding more variables to the Independent List section of the dialog box, you need to add them in a different layer. Note that SPSS indicates with which layer you are working.
id
If you click Next, you will be presentedwith Layer 2 of 2, and you can select a secondindependent variable (e.g., TRAINING). Now, when you run the command(by clicking On, you will be given summary statistics for the variable GRADE by each level of MORNING and TRAINING. Your output will look like the output at right. You now have two main sections(No and yes), along with the Total. Now, however, each main section is broken down into subsections(No, yes, and Total). The variable you used in Level I (MORNING) is the first one listed, and it defines the main sections.The variable you had in Level 2 (TRAINING) is listed sec-
Report ORADE MORNING TRAINING No Yes NO Total Yes Yes NO Total
Total
Yes NO
Total
26
Mean 85.0000 80.0000 82.5000 83.0000 73.0000 78.0000 84.0000 76.5000 80.2500
Std.Deviation
N
1 1 I
3. 53553
1 1 1
a z
4
7.07107 1. 41421 4.54575 5. ?5198
Chapter3 DescriptiveStatistics
ond. Thus,the first row represents thoseparticipantswho were not morningpeopleand participants who werenot morningpeowho receivedtraining.The secondrow represents ple anddid not receivetraining.The third row represents who the total for all participants werenot morningpeople. Noticethat standarddeviationsarenot givenfor all of the rows.This is because thereis only oneparticipantpercell in thisexample.Oneproblemwith usingmanysubsets is that it increases requiredto obtainmeaningfulresults.Seea the numberof participants research designtextor your instructorfor moredetails. Practice Exercise UsingPracticeDataSet I in AppendixB, computethe meanandstandarddeviation of agesfor eachvalueof maritalstatus.What is the averageageof the marriedparticipants?The singleparticipants? Thedivorcedparticipants? Section3.5 Standard Scores Description of differentscalesby transforming thescores Standardscoresallow thecomparison into a commonscale.The mostcommonstandardscoreis the z-score.A z-scoreis based on a standardnormal distribution(e.g.,a meanof 0 anda standarddeviationof l). A z-score,therefore,represents thenumberof standarddeviationsaboveor belowthe mean (e.9.,a z-scoreof -1.5 represents a scoreI % standarddeviationsbelowthemean). Assumptions Z-scoresare basedon the standardnormal distribution. Therefore,the distributionsthatareconverted to z-scores shouldbe normallydistributed, andthescalesshouldbe eitherinterval or ratio. Drawing Conclusions Conclusions basedon z-scores consistof thenumberof standarddeviationsabove or belowthe mean.For example,a studentscores85 on a mathematics examin a classthat hasa meanof 70 andstandarddeviationof 5. The student'stestscoreis l5 pointsabove z-scoreis 3 because the classmean(85 - 70: l5). The student's shescored3 standard :3). deviationsabovethe mean(15 + 5 If the samestudentscores90 on a readingexam, with a classmeanof 80 anda standarddeviationof 10,thez-scorewill be I .0 because she is one standard deviation abovethe mean. Thus,even thoughher raw scorewas higheron thereadingtest,sheactuallydid betterin relationto otherstudentson the mathematicstestbecause herz-scorewashigheron thattest. .SPSSData Format Calculatingz-scoresrequiresonly a singlevariablein SPSS.That variablemustbe numerical.
27
Chapter3 DescriptiveStatistics
Running the Command Computingz-scoresis a componentof the eqhs Uti$tbl WMow Help Myzc Descriptivescommand.To accessit, click Analyze, ) b,lrstlK- al then Descriptive Statistics, then Descriptives. This example uses the sample data file (SAMPLE.sav) createdin ChaptersI and2. @nerdLlneuFbdel ) )
Correlate
This will bring up the standard dialog box for the Descrip/ives command. Notice the checkdoay drnue box in the bottom-left corner ladMonNtNs tsr.dI beled Save standardized values as dwnnn c"od I variables.Check this box and move drR$HtNs HdpI the variable GRADE into the righthand blank. Then click OK to com19 Srva*ndudi3ad vduosts vcriaHas plete the analysis.You will be preldry | sented with the standard output from the Descriptivescommand.Notice that the z-scoresare not listed. They were inserted into the data window as a new variable. Switch to the Data View window and examine your data file. Notice that a new variable, called ZGRADE, has been added.When you asked SPSS to save standardized values,it createda new variable with the samename as your old variable precededby a Z. The z-scoreis computedfor eachcaseand placedin the new variable.
t*l
Eb E*
S€w Qpt. lrnsfam
lr| -tsJX
end/2. gr$t6
elslel&l *il{|lelej sJglel ffilslffilfw,qlqj 11i-io-
rc
$citffrtirffi
Tua/Thulaiemoon
Yas Yes No
Mi-
Reading the Output After you conductedyour analysis,the new variable was created.You can perform any numberof subsequentanalyseson the new variable. Practice Exercise Using PracticeData Set 2 in Appendix B, determinethe z-scorethat correspondsto each employee's salary.Determinethe mean z-scoresfor salariesof male employeesand femaleemployees.Determinethe mean z-scorefor salariesof the total sample.
28
Chapter4
GraphingData Section4.1 GraphingBasics In addition to the frequency distributions,the measuresof central tendency and measuresof dispersiondiscussedin Chapter3, graphingis a useful way to summarize,organize,and reduceyour data. It has been said that a picture is worth a thousandwords. In the caseof complicateddatasets,this is certainlytrue. With Version 15.0of SPSS,it is now possibleto make publication-qualitygraphs using only SPSS.One importantadvantageof using SPSSto createyour graphsinsteadof other software (e.g., Excel or SigmaPlot)is that the data have alreadybeen entered.Thus, duplicationis eliminated,and the chanceof makinga transcriptionerror is reduced.
Section4.2 The New SPSSChart Builder Data Set For the graphingexamples,we will use a new set of data.Enter the databelow by defining the three subject variablesin the Variable View window: HEIGHT (in inches), WEIGHT (in pounds),and SEX (l = male, 2 = female).When you createthe variables, designateHEIGHT and WEIGHT as Scale measuresand SEX as a Nominal measure(in
the far-rightcolumnof the VariableView).Switchto the Data Viewto Measure enterthe datavaluesfor the 16participants. Now usethe SaveAs comScale mandto savethefile,namingit HEIGHT.sav. -HEIGHT 66 69 /5
72 68 63 74 70 66 64 60 67 64 63 67 65
WEIGHT 1 50 1 55 1 60 1 60 1 50 1 40 1 65 1 50 ll0 1 00 95 ll0 1 05 100 ll0 1 05
29
SEX I I I I I I I I 2
2 2 2 2 2 2 2
bCIb iNiomiiiai
Chapter4 GraphingData
Make sure you have enteredthe data correctly by calculating a mean for each of the three variables(click Analyze,thenDescriptive Statistics,thenDescriptives).Compare your resultswith thosein the tablebelow. Descrlptlve Statistics
srd. Minimum Maximum
N
l-ttstuFl I WEIGHT SEX ValidN (listwise)
16 16 16
60.00 06 nn
1.00
Mean
74.00 66.9375 165.00 129.0625 2.00 1.5000
Dpvi2lion
J.9Ub//
26.3451 .5164
16
Chart Builder Basics Make sure that the HEIGHT.sav data file you createdabove is open. In order to usethe chart builder, you must have a data file open. NewwithV ersio n l5 . 0 o f S P S S is t h e Ch a rt B u ild e rc o m. W mand. This command is accessedusing Graphs, then Chart rct"ph; Lulitieswindt Builder in the submenu.This is a very versatilenew commandthat can make graphsof excellentquality. When you first run the Chart Builder command,you will probably be presentedwith the following dialog box:
Bcforeyur rrc thlsdalog,moasuranar*hvel shold bc sct gecrh fw cadr vadabb h yourdurt. In dtbn, f yow chartcodahs cataqo*d v6d&. v*re hbds sha.rldbr &fhcd for eachcrtrgory kass O( to doflrc yorr chart, Pr6srDafineV.riaHafroportbs to mt masrcnrant brd or ddhe v*.te l&b for rhart vsi$bs,
:,
f* non't*row $rUdalogagaFr
ol(
Ocfknvubt# kopcrtcr.,.
This dialog box is askingyou to ensurethatyour variables are properly defined. Refer to Sections1.3 and2.1 if you had difficulty definingthe variablesusedin creatingthe datasetfor this example,or to refreshyour knowledge of thistopic.Click
oK.
The Chart Builder allows you to make any kind of graph that is normally used in publication or presentation,and much of it is beyond the scopeof this text. This text, however,will go over the basics of the Chart Builder so that you can understandits mechanics. On the left side of the Chart Builder window are the four main tabs that let you control the graphsyou are making. The first one is the Gallery tab. The Gallery tab allows you to choosethe basic format ofyour graph.
30
cc[ffy
Eesknotnents
l"ry{Y:_ litleo/Footndar
Chapter4 Graphing Data
For example, the screenshothere showsthe different kinds of bar chartsthat the Chart Builder can create. After you have selectedthe basic 0rr9 a 63llst ctrt fsg b re it e form of graph that you want using the y* 6t'fig pohr Gallery tab, you simply drag the image OR from the bottom right of the window up to Clkl m f€ 86r Ele|mb * b tulH r dwt €lsffirt bf ele|Ft the main window at the top (where it @9Pk8: reads,"Drag a Gallery chart here to use it as your startingpoint"). Chrtpftrbv [43 airr?b deb Alternatively, you can use the Badnsrfiom: sic Elemenlstab to drag a coordinatesys- 8arts ElpnF& Ll3 tem (labeledChooseAxes) to the top winAroa PleFokr dow, then drag variables and elements Scalbillot Hbbqran into the window. HUH-ot, 8oph The other tabs (Groups/Point ID DJ'lAm and Titles/Footnotes)can be used for adding other standard elements to your graphs. ,bh I n"ct I cror | The examples in this text will cover some of the basic types of graphs you can make with the Chart Builder. After a little experimentationon your own, once you have masteredthe examplesin the chapter,you will soon gain a full understandingof the ChartBuilder.
Section4.3 Bar Charts, Pie Charts, and Histograms Description represent thenumberof timeseachscoreocBar charts,pie charts,andhistograms cursthroughthevaryingheightsof barsor sizesof pie pieces.Theyaregraphicalrepresenin Chapter3. tationsof thefrequency discussed distributions Drawing Conclusions TheFrequencies outputthatindicatesboththenumberof cases commandproduces particular in the samplewith a valueand the percentage of caseswith that value.Thus, for the conclusionsdrawn shouldrelateonly to describingthe numbersor percentages regardingthe cumulativepersample.If the dataareat leastordinal in nature,conclusions centages and/orpercentilescanalsobedrawn. SPSSData Format You needonlv onevariableto usethiscommand.
3l
Chapter4 GraphingData
Running the Command The Frequenciescommand will produce *nalyze Gr;pk Udties Window Hdp | graphicalfrequencydistributions.Click Analyze, t LiwL lW .a'fJul then Descriptive Statistics, then Frequencies. You will be presentedwith the main dialog box (6fnpSg MBan* ) for the Frequencies command, where you can GeneralLinearMsdel) enter the variables for which vou would like to creategraphsor charts.(SeeChapter3 for other optionswith this command.) );,r.:
Click the Charts button at the bottom to producefrequencydistributions.This will give you the Chartsdialog box. n"*dI There are three types of chartsavailc"!q I able with this command: Bar charts, Pie l1t"l charts, andHistograms. For each type, the I axis can be either a frequency count or a percentage(selectedwith the Chart Values option). xl
0Kl
You will receive the charts for any variables lectedin the main Frequenciescommanddialog box.
Output The bar chart consistsof a I'axis, representing the frequency,and an Xaxis, representingeachscore.Note that the only values representedon the X axis are those values with nonzero frequencies(61, 62, and 7l are not represented). h.lgtrt
G a
, I a
L
65.00 66.!0
67.m
68.00
h.lght
70.s
NEUMAf{l{COLLEiSE Lt*i:qARy pA .igU14 A$TO|',J,
Chapter4 Graphing Data
hclght s0.00 t3r0 €alr 05!0 66.00 67.!0 Gen0 !9.!0 tos nfit 13!o il.m
The Histogram commandcreatesa groupedfrequencydistribution. Therangeof scoresis splitintoevenly spacedgroups.The midpointof each groupis plottedon theX axis,andthe I axisrepresents the numberof scores for eachgroup. If you select With Normal Curve,a normalcurve will be superimposedover the distribution. This is very useful in determiningif the distribution you have is approximately normal. The distributionrepresented here is clearly not normaldue to the asymmetryof thevalues.
The pie chart shows the percentageof the whole that is representedby eachvalue.
h166.9l S. Oae,.lr07 flrl0
h.lght
Practice Exercise Use PracticeData Set I in Appendix B. After you have enteredthe data,constructa histogramthat representsthe mathematicsskills scoresand displaysa normal curve, and a bar chart that representsthe frequenciesfor the variableAGE.
Section4.4 Scatterplots Description Scatterplots(also called scattergramsor scatterdiagrams)display two values for eachcasewith a mark on the graph.The Xaxis representsthe value for one variable.The I axis representsthe value for the secondvariable.
JJ
Ch a p te -1 r Gra p h i n gD a ta
Assumptions Both variablesshouldbe interval or ratio scales.If nominalor ordinal dataare used,be cautiousaboutyour interpretation of thescattergram. .SPSSData Format You needtwo variablesto performthiscommand. Running the Command You can produce scatterplotsby clicking Graphs, then Chart I 6raph* ulfftlqs Wnd Builder. (Note: You can also use theLegacyDialogs. For this method, pleaseseeAppendixF.) ln Gallerv Choose r l 0l select Scatter/Dol.Then drag the Simple from: Scatter icon (top left) up to the main chart areaas shown in the screenshotat left. Disreorrq a 6ilby (h*t fes b & it e gard the ElementPropertieswindow that pops tl ".:' ;o on, l ln up by choosingClose. iLs clr* s fE Bs[ pleitbnb t b b krth Next, dragthe HEIGHT variableto the 3 cfst Bleffit by €l8ffit X-Axis area,and the WEIGHT variable to the Y-Axisarea(rememberthat standardgraphing (& mtrpb dstr Chrifrwr* conventions indicate that dependent variCtffii'w: Frwih ables shouldbe I/ and independentvariables Si LtE shouldbe X. This would mean that we are trylr@ Fb/Fq|n ing to predict weights from heights).At this gnt$rrOol l,lbbgran point, your screenshould look like the examHlgfFl"tr l@bt Ral Ars ple below. Note that your actual data are not shown-just a set of dummy values. ,, .Jol
Wrilitll'.,: V*l&bi:
^ry.J Y*J - '"? | Click OK. You shouldget your new graph(nextpage)asOutput.
8n Lh PlrifsLa Scfflnal xbbrs Hg||rd iEbM{
Ffip*t!4.,
opbr.,
, x**J " s*J ...ryFl
34
Chapter4 Graphing Data
Output Theoutputwill consistof a markfor eachparticipant at theappropriate X and levels.
arlo
i?Jo hdtht
?0.00
t:.${
Adding a Third Variable Even thoughthe scatterplotis a two-dimensional graph,it canplot a third variable.To make it do so, selectthe Groups/PointID tab in the ChartBuilder. Click the Grouping/stacking variable option. Again,disregardtheElementProperties window that pops up. Next, drag thevariableSEX into theupper-rightcorner where it indicatesSet Color. When this is done,your screenshouldlook like the imageat right. If you are not ableto dragthe variableSEX,it may be because it is not identifiedas nominalor ordinal in the VariableViewwindow. Click OK to have SPSSproduce thegraph.
!|||d
d*|er
l- cotrnrcpr:tvr$ I- aontpl*rt
35
btrdtn-
b$tdl
Chapter4 GraphingData
Now our output will have two different setsof marks. One set representsthe male participants,and the secondset representsthe female participants.Thesetwo setswill appear in two different colors on your screen.You can use the SPSSchart editor (seeSection 4.6) to make them different shapes,as shown in the examplebelow. sPx
iil os
60.00
67.50
65,00
helght
Practice Exercise to examinetherelaUsePracticeDataSet2 in AppendixB. Constructa scatterplot tionshipbetweenSALARY andEDUCATION. Section4.5 Advanced Bar Charts Description command(see Section4.3). Bar chartscan be producedwith the Frequencie.s I/ axis is not a frequency. where the in a bar chart however.we are interested Sometimes. To producesucha chart,we needto usetheBar chartscommand. SPSS Data Format You need at least two variables to perform this command. There are two basic kinds of bar charts-those for between-subjectsdesignsand those for repeated-measures methodif one variableis the independentvariable and designs.Use the between-subjects method if you have a dethe other is the dependent variable. Use the repeated-measures pendent variable for eachvalue of the independentvariable (e.g.,you would havethree
36
Chapter4 GraphingData
variablesfor a designwith threevaluesof the independentvariable). This normally occurs when you make multiple observationsover time. This exampleusesthe GRADES.savdata file, which will be createdin Chapter6. Pleaseseesection6.4 for the dataif you would like to follow along. Running the Command Open the Chart Builder by clicking Graphs, then Chart Builder. In the Gallery tab, select Bar. lf you had only one inde- tG*ptrl uti$Ueswh& pendent variable, you would selectthe Simple Bar chart example (top left corner).If you have more than one independentvariable (as in this example), tfldr( select the Clustered Bar Chart example l?i;ffitF.td'd{4rfr trrd... from the middle of the top row. /ft,Jthd) /l*n*|ts,., Drag the example to the top workdq*oAtrm, , h4 | G.lary ahd lsr to @ t 6 p cfwxry ing area. Once you do, the working area m should look like the screenshotbelow. ffi * $r 0* dds t bto h.td. drr drrrl by.lr!* (Note that you will need to open the data file you would like to graph in order to run this command.) 9{ m hlpd{ sc.ffp/Dat tffotm tldrtff 60elot oidA#
:gi
y"J
.*t I
r,* |
yu vttdld {a b. rsd te grmt! lh. y*rfts yw d.t, qa..dr vrt d. {db. Edr.*6ot.' rh ffi h *. dst, vtlB enpcr*.dby |SddSri,lARV vrtdb cdon d b Ur Y d. Vrtdrr U* d.ftr (&gqb n.ryst d !d c **h o b. red o. c&eskd d q 6 *rdd nDe(rd*L, . gdslo a F Ftrg Yrt aic.
Cdtfry LSdrl
f
o,-l
ryl
*. r! " l
If you are using a repeated-measures design like our example here using GRADES.sav from Chapter 6 (three different variablesrepresentingthe i values that we want), you needto selectall threevariables(you can -click them to selectmultiple variables)and then drag all threevariablenamesto the Y-Axisarea.When you do. vou will be given the warning messageabove. Click OK.
JI
Chapter4 GraphingData
,'rsji,. *lgl$ r*dlF*... dnif*ntmld,.. /tudttbdJ {i*rEkucrt}&"., &rcqsradtrcq,,.
Next, you will need to drag the INSTRUCT variable to the top right in the Cluster: set color area (see screenshot at left).
i: Jr g;? i I' '!
;:Nl
iai
Note: The Chart Builder pays attention to the types of variables that you ask it to graph. If you are getting etTor messages or unusual results, be sure that your categorical variables are properly designatedas Nominal in the Variable View tab (See Chapter2, Section2.l).
*rrrt plYkrlur.r ollmbdaa. 8{ Lll. ,fat H.JPd., t(.&|rih Krtogrqn HCtstoef loxpbt orrl Axas
n"i*
l.
crot J
rr!
|
Output inilrut
lr nt
&t:
Practice Exercise Use PracticeData Set I in Appendix B. Constructa clusteredbar graph examining the relationshipbetweenMATHEMATICS SKILLS scores(as the OepenOentvariabtej and MARITAL STATUS and SEX (as independentvariables). Make sure you classify both SEX and MARITAL STATUS as nominalvariables.
38
Chapter4 Graphing Data
Section4.6 Editing SPSSGraphs Whatever command you rI use to create your graph, you will cFtA$-qli*LBul0l al ll probably want to do some editing $r to make it appear exactly as you want it to look. In SPSS,you do this in much the sameway that you edit graphs in other software programs(e.g., Excel). After your *. graph is made, in the output window, select your graph (this will createhandlesaround the outside of the entire object) and rightclick. Then. click SPSS Chart Object, and click Open. Alternatively, you can double-clickon the graphto open it for editing. When you open the graph, the Chart Editor window and the correspondingProperlies window will appear. FFF,Ffu FF|* "'4 F&'E'
q
qb li. lin.tlla.
,, ; l 61f
ltliL&{
*rll..!!lflE.!l
L: lr ! . H; gb. t c t - ] pu1 r i
;l Jxr il.dk'nl
F4*.it.r":!..*
r--."1
:,-
IE
Ittttr tlttIr tllrwel w&&$!{!rJ JJJJ-JJ
r 9-,rt
JJJJJJ .nlqrl cnl,f,,!sl
I
fil
mlryl
OnceChart Editor is open,you can easily edit eachelementof the graph.To select an element,just click on the relevantspot on the graph. For example,if you have addeda title to your graph ("Histogram" in the examplethat follows), you may selectthe element representingthe title of the graphby clicking anywhereon the title.
39
Chapter4 GraphingData
jn
Ex Yt
Once you have selected an element, you can tell whether the correct element is selectedbecauseit will have handlesaroundit. If the item you have selectedis a text element(e.g., the title of the graph), a cursor will be presentand you can edit the text as you would in a word processing program. If you would like to change another attribute of the element (e.g., the color or font size), use the Propertiesbox. (Text properties are shownbelow.) With a linle practice, you can make excellent graphs using SPSS.Once your graph is formatted the way you want it, simply select File, Save, then Close.
l tb " :€ k l g tH ,U:;
Li . A: r::-' t ' '* trTT.":.TJ *" ^' ir sGssir
:J*r o :l
A I
3 *l.A-I,--
o,tl*" ffiln*fot*.1 P?*l!r
h ?frtmd
Sa
$o gdt lbw gsion Ek $vr {hat Trm$tr,,, Spdy$a*Tmpt*c.,. flpoft {bdt rf'.|1,,,
. .
AaBbCc123
gltaridfu;
Ua*tr$Sie
40
Chapter5
PredictionandAssociation Section5.1 PearsonCorrelation Coefficient Description The Pearson product-moment correlationcoefficient(sometimes calledthe Pearson correlationcoefficientor simplythe Pearsonr) determines the strengthof the linearrelationshipbetween two variables. Assumptions Both variables on intervalor ratio scales(or a dichotomous shouldbe measured nominalvariable).If a relationship them,thatrelationship existsbetween shouldbe linear. Becausethe Pearsoncorrelationcoefficientis computedwith z-scores,both variables shouldalsobe normallydistributed.If your datado not meettheseassumptions, consider usingthe Spearman rho correlationcoefficientinstead. SP.SSData Format Two variablesarerequiredin your SPSSdatafile. Eachsubjectmusthavedatafor bothvariables. Running the Command To selectthe Pearsoncorrelationcoefficient, lfratyil qapnsUtl&i*s t#irdow Heb click Analyze, then Conelate, then Bivariate ) Reportr (bivariate refers to two variables).This will bring DescripHve Salirtk* ) ) Ccmpara Hranr up the Bivariate Correlations dialog box. This ) ue"qer:d lirwar mo{d example uses the HEIGHT.sav data file enteredat the startof Chapter4.
rqsl
Vdri.blcr
I
I
I .
.i
lwolalad
Move at leasttwo variablesfrom the box at left into the box at right by using the transfer arrow (or by double-clicking each variable). Make sure that a check is in the Pearson box under Correlation Cofficients. It is acceptableto move more than two variables.
n"."tI
ry{l i*l
{. 0rG-tr8.d
9@,. 1
4l
4
n 1
..
Chapter5 Predictionand Association
For our example,we will move all threevariablesover and click OK.
-N-
Vdi{$b*
OX I
lsffi
m l/'*
Reading the Output
nql
The output consists of a :rydl !4 1 correlation matrix. Every variableyou entered in the command is represented as both a row and a column. We entered three variables in our command. I Tc* d $lrfmma*--*=*-*:-**-*l Therefore,we have a 3 x 3 table. There are also three rows in each cell-the 17 Flag{flbrrcorda&rn correlation,the significance level, and the N. If a correlation is signifiCorrelations cant at less than the .05 level, a single * will appear next to the heioht weioht sex netgnt Pearsonuorrelalron 1 .806' -.644' correlation.If it is significant at Sig. (2-tailed) .000 .007 the .01 level or lower, ** will apN 16 16 16 pear next to the correlation. For weight PearsonCorrelation .806' .968' example, the correlation in the Sig. (2-tailed) .000 .000 output at right has a significance N 16 16 16 PearsonCorrelation 1 -.644' -.968' level of < .001, so it is flagged sex Sig. (2-tailed) .007 .000 with ** to indicatethat it is less N 16 16 16 than.0 1 . ". Correlationis significantat the 0.01 levet(2-tailed). To read the correlations. select a row and a column. For example,the correlationbetweenheight and weight is determinedthrough selectionof the WEIGHT row and the HEIGHT column (.806).We get the sameanswerby selectingthe HEIGHT row and the WEIGHT column. The correlationbetween a variable and itself is always l, so thereis a diagonalsetof I s.
l_i::x-
.--i
I
Drawing Conclusions The correlationcoefficientwill be between-1.0 and +1.0. Coefficientscloseto 0.0 representa weak relationship.Coefficientscloseto 1.0or-1.0 representa strongrelationship. Generally,correlationsgreaterthan 0.7 are consideredstrong. Correlationsless than 0.3 are consideredweak. Correlationsbetween0.3 and 0.7 are consideredmoderate. Significant correlationsare flagged with asterisks.A significant correlation indicatesa reliable relationship,but not necessarilya strong correlation.With enoughparticipants,a very small correlationcan be significant.PleaseseeAppendix A for a discussion of effect sizesfor correlations.
Phrasinga SignificantResult In the example above, we obtained a correlation of .806 between HEIGHT and WEIGHT. A correlationof .806 is a strongpositivecorrelation,and it is significantat the .001level.Thus,we could statethe following in a resultssection:
4/
Chapter5 Predictionand Association
A Pearsoncorrelationcoefficientwas calculatedfor the relationshipbetween participants' height and weight. A strong positive correlation was found (r(14) : .806,p < .001),indicatinga significantlinear relationshipbetween the two variables.Taller participantstendto weigh more. The conclusionstatesthe direction(positive),strength(strong),value (.806), degreesof freedom(14), and significancelevel (< .001) of the correlation.In addition,a statementof direction is included(taller is heavier). is 14. The output indicatesan Note that the degreesof freedomgiven in parentheses the correlationcommand of freedom, N of 16. While most SPSSproceduresgive degrees gives only the N (the numberof pairs).For a correlation,the degreesof freedomis N - 2. Phrasing Results That Are Not Significant Using our SAMPLE.savdataset from the previous chapters,we could calculatea correlationbetweenID and GRADE. If so, we get the outPut at right. The correlationhasa significance level of .783. Thus, we could write the following in a resultssection(note that the degreesof freedomis N - 2):
Correlations lD
GMDE
PearsonUorrelatlon Sig.(2{ailed) N PearsonCorrelation Sig.(2-tailed) N
ID 1.000
GRADE .217 7A?
4 .217 .783 4
4
1.000 4
A Pearsoncorrelation was calculatedexamining the relationshipbetween participants' ID numbers and grades. A weak correlation that was not significantwas found(, (2): .217,p > .05).ID numberis not relatedto grade in the course. Practice Exercise Use PracticeData Set 2 in Appendix B. Determinethe value of the Pearsonconelation coefficient for the relationshipbetweenSALARY and YEARS OF EDUCATION.
Section5.2 SpearmanCorrelationCoeflicient Description The Spearmancorrelationcoefficientdeterminesthe strengthof the relationshipbetween two variables.It is a nonparametricprocedure.Therefore,it is weakerthan the Pearson correlationcoefficient.but it can be usedin more situations. Assumptions Becausethe Spearmancorrelationcoefficient functions on the basisof the ranks of data,it requiresordinal (or interval or ratio) data for both variables.They do not needto be normally distributed.
43
Cha p te r5 Pre d i c ti o na n dA s s o c i a ti on
SP.SSData Format Two variablesarerequiredin your SPSSdatafile. Eachsubjectmustprovidedata for bothvariables. Running the Command Click Analyze, then Correlate, then Grapk Utilitior wndow Halp |;,rfiy* Bivariate.This will bring up the main dialog box ) RrFarts for Bivariate Correlations(ust like the Pearson ) Statistics Oescri$ive I correlation). About halfway down the dialog ComparcMeans ) box, there is a section for indicating the type of " GenerdLinearf{udel ) correlation you will compute. You can selectas many correlationsas you want. For our example, removethe check in the Pearsonbox (by clicking on it) and click on theSpearmanbox. i* CsreldionCoefficients
j
j f f"igs-"jjl- fienddrs tzu.b
Use the variablesHEIGHT and WEIGHT from our HEIGHT.savdatafile (Chapter4). This is also one of the few commandsthat allows you to choosea one-tailedtest.if desired.
Reading the Output Correlations The output is essentially the sameas for the PearH E IGH T WEIGHT tr-4. Spearman'srho HEIGHT CorrelationCoeflicient 1.000 son correlation.Each pair of Sig. (2-tailed) .000 variables has its correlation N 16 16 coefficient indicatedtwice. The ffi .883 1.000 Sig. (2-tailed) .000 Spearman rho can range from 't6 N 16 -1.0 to +1.0,just like the Pear". Correlationis significantat the .01 level (2-tailed) son r. The output listed above indicatesa correlationof .883 between HEIGHT and WEIGHT. Note the significancelevel of .000,shown in the "Sig. (2-tailed)"row. This is, in fact, a significancelevel of <.001. The actualalpha level roundsout to.000, but it is not zero.
Drawing Conclusions The correlationwill be between-1.0 and +1.0. Scorescloseto 0.0 representa weak relationship.Scorescloseto 1.0or -1.0 representa strongrelationship.Significantcorrelations are flagged with asterisks.A significant correlation indicatesa reliable relationship, but not necessarilya strong correlation.With enoughparticipants,a very small correlation can be significant. Generally,correlationsgreaterthan 0.7 are consideredstrong. Correlations less than 0.3 are consideredweak. Correlationsbetween0.3 and 0.7 arc considered moderate.
44
Chapter5 Predictionand Association
PhrasingResultsThatAre Significant In the exampleabove,we obtaineda correlationof .883 betweenHEIGHT and WEIGHT. A correlationof .883 is a strongpositivecorrelation,and it is significantat the .001level.Thus,we could statethe following in a resultssection: A Spearmanrho correlationcoefficient was calculatedfor the relationship betweenparticipants'height and weight. A strong positive correlationwas t la t io n s h ip found (rh o (1 4 ):.883, p <. 0 0 1 ), in d ic a t in ga s ig n if ic a n re participants weigh more. tend to Taller betweenthe two variables. The conclusionstatesthe direction(positive),strength(strong),value (.883), degreesof freedom(14), and significancelevel (< .001) of the correlation.In addition,a statementof directionis included(talleris heavier).Note that the degreesof freedomgiven is 14.The outputindicatesan N of 16.For a correlation,the degreesof freein parentheses dom is N- 2. Phrasing Results That Are Not Significant Correlations Using our SAMPLE.sav to GRADE data set from the previouschapters, .UUU CorrelationCoenicten Spearman'srho lD 000 we could calculatea Spearmanrho 1.000 S i g. (2{ai l ed) and correlation between ID N 1.000 ffi .000 GRADE. If so, we would get the Sig. (2{ailed) 1.000 output at right. The correlationcoN 4 efficient equals.000 and has a significance level of 1.000.Note that thoughthis value is roundedup and is not, in fact, exactly 1.000,we could statethe following in a resultssection:
A Spearmanrho correlationcoefficient was calculatedfor the relationship betweena subject's ID number and grade.An extremely weak correlation that was not significantwas found (r (2\ = .000,p > .05).ID numberis not relatedto gradein the course. Practice Exercise Use PracticeData Set 2 in AppendixB. Determinethe strengthof the relationship by calculatingthe Spearmanr&o correlation. betweensalaryandjob classification Section 5.3 Simple Linear Regression Description allowsthe predictionof one variablefrom another. Simplelinearregression Assumptions Simple linear regressionassumesthat both variablesare interval- or ratio-scaled. In addition,the dependentvariable shouldbe normally distributedaroundthe prediction line. This, of course,assumesthat the variablesare relatedto each other linearly.Typi-
45
Ch a p te 5 r P re d i c ti o na n dA s s o c i ati on
cally, both variablesshould be normally distributed.Dichotomous variables (variables with only two levels)are alsoacceptable as independentvariables. .SPSSData Format Two variablesare required in the SPSSdata file. Each subject must contribute to both values. Running the Command Click Analyze, then Regression,then Linear. This will bring up the main diatog Aulyze Graphs LJtl$ties Whdow Help R;porte box for Linear Regression. On the left sideof ' Descrptive5tatistkf > the dialog box is a list of the variablesin Comparc ) Mems your data file (we are using the HEIGHT.sav Generallinear frlod l data file from the start of this section). On ' Corrolate the right are blocks for the dependent variable (the variable you are trying to Clasifu ) predict),and the independent variable (the DataReductbn ) variablefrom which we are predicting).
lt{*rt*
0coandart
t '-J ff*r,'-j iL,:,,,r,,,'l Itd.p.nd6r(rl
u* I i -Iqil I Crof
rrr r { r l PmUcitbd
lErra
i E :J
I
We are interestedin predicting someone'sweight on the basisof his or her height. Thus, we should place the variable WEIGHT in the dependent variable block and the variable HEIGHT in the independent variable block. Then we can click OK to run the analysis.
SdrdhVui.bh
Ar-'"1
Reading the Output
Est*6k
I'J
For simple linear regressions, we are interestedin three components of the output. The first is called the sui*br...I pbr.. I Srrs... I Oaly*..I Model Summary,and it occursafter the Variables Entered/Removed section. For our example,you shouldseethis output.R Square(calledthe coeflicient of determination) gives you the proportionof the variance of your dependentvariable (yEIGHT) that can be explainedby variationin your independentvariable (HEIGHT). Thus, 649% of the variation in weight can be explainedby differencesin height (talier individuals weigh more). The standard error of Modetsummarv ModelSummary WLSWaidrl:
estimategivesyou a measure Adjusted Std.Errorof of dispersionfor your predic- Model R R Square R Souare the Estimate tion equation. When the 1 .E06 .649 .624 16.14801 predictionequationis used. a. Predictors: (Constant), height 68%of thedatawill fall within
46
Chapter5 Predictionand Association
one standard error of estimate (predicted)value. Just over 95ohwill fall within two standard errors.Thus, in the previous example,95o/oof the time, our estimatedweight will be within 32.296poundsof beingcorrect(i.e.,2 x 16.148:32.296). ANOVAb
Sumof Model
1
Sorrares
df
Kegressron 6760.323 Residual 3650.614 Total 10410.938
I 14 15
Mean Souare 6760.323 260.758
F
25.926
Sio.
.0004
a' Predictors: (Constant), HEIGHT b. Dependent Variable: WEIGHT The secondpart of the output that we are interestedin is the ANOVA summarytable, as shown above.The important numberhere is the significance level in the rightmost column. If that value is lessthan .05, then we have a significantlinear regression.If it is largerthan .05,we do not. The final sectionof the output is the table of coefficients.This is wherethe actual predictionequationcan be found. Coefficientt' Standardized Unstandardized Coefficients Coefficients Beta B Std.Error Model 1 (Constant) -234.681 71.552 height 1.067 .806 5.434
t
-3.280 5.092
S i o.
.005 .000
a. Dependent weight Variable:
In most texts, you learn that Y' : a + bX is the regressionequation.f' (pronounced "Y prime") is your dependent variable (primes are normally predictedvalues or dependent variables),and X is your independentvariable. In SPSSoutput,the valuesof both a andb are found in the B column.The first value,-234.681,is the value of a (labeledConstant).The secondvalue,5.434,is the value of b (labeledwith the name of the independent variable). Thus, our prediction equation for the example above is WEIGHT' : -234.681+ 5.434(HEIGHT). In otherwords,the averagesubjectwho is an inch taller than anothersubjectweighs 5.434 poundsmore. A personwho is 60 inchestall shouldweigh -234.681+ 5.434(60):91.359pounds.Givenour earlierdiscussion of standarderror of estimate,95ohof individualswho are 60 inchestall will weigh between59.063(91.35932.296: 59.063)and 123.655(91.359+ 32.296= 123.655)pounds.
47
/: "
I
Chapter5 Predictionand Association
Drawing Conclusions Conclusionsfrom regressionanalysesindicate(a) whether or not a significant prediction equation was obtained,(b) the direction of the relationship,and (c) the equation itself. Phrasing Results That Are Significant In the exampleson pages46 and 47, we obtainedan R Squareof .649 and a regression equationof WEIGHT' : -234.681 + 5.434(HEIGHT). The ANOVA resultedin .F = 25.926 with I and 14 degreesof freedom.The F is significant at the less than .001 level. Thus, we could statethe following in a resultssection: A simple linear regressionwas calculatedpredicting participants' weight basedon their height.A significantregressionequationwas found (F(1,14) : 25.926,p < .001),with an R' of .649.Participants'predictedweight is equal to -234.68 + 5.43 (HEIGHT) pounds when height is measuredin inches. Participants'averageweight increased5.43poundsfor eachinch of height. The conclusion statesthe direction (increase),strength(.649), value (25.926), degreesof freedom(1,14), and significancelevel (<.001) of the regression.In addition,a statementof the equationitself is included. Phrasing ResultsThatAre Not Significant If the ANOVA is not significant (e.g.,seethe output at right), the section of the output labeled SE for the ANOVA will be greaterthan .05, and the regressionequationis not significant.A results section might include the following statement: A simple linear regression was calculatedpredicting participants' ACT scoresbasedon their height. The regressionequation was not significant(F(^1,14): 4.12, p > .05) with an R' of .227.Height is not a significantpredictor of ACT scores.
llorlol Srrrrrrry
Hodel
attt
R Souare 221
Std.Eror of
Adjuslsd R Souare
lh.
112
Fsl i m a l e
3 06696
a. Predlclors:(Constan0,h8lghl
rt{)vP Sumof xean Souare
dl
Xodel
Rssldual Tolal
J U/?U r 31 688 170t38
t 4.12U
I
1a t5
Slo 0621
I 408
a. Prodlclors:(Conslan0.h8lghl b. OependentVarlableracl
Cootlklqrrr Unstandardiz€d ts
Hod€l
(u0nslan0 hei9hl
| 9.35I -.411
Std. Erol
13590 203
Slandardizsd Sio
Bsta J OJI
. r 17
.2 0 3 0
003 062
a. OBDendsnlva.iable:acl
Note that for resultsthat are not significant,the ANOVA results and R2resultsare given,but the regressionequationis not. Practice Exercise Use PracticeData Set 2 in Appendix B. If we want to predict salary from years of education,what salary would you predict for someonewith l2 years of education?What salarywould you predict for someonewith a collegeeducation(16 years)?
48
Chapter5 Predictionand Association
Section5.4 Multiple Linear Regression Description The multiple linear regressionanalysisallows the prediction of one variable from severalother variables.
Assumptions Multiple linear regressionassumesthat all variablesare interval- or ratio-scaled. In addition, the dependent variable should be normally distributedaround the prediction line. This, of course,assumesthat the variablesare relatedto eachother linearly. All variablesshouldbe normally distributed.Dichotomousvariables are also acceptableas independentvariables. ,SP,S,S Data Format At least three variablesare required in the SPSSdata file. Each subject must contributeto all values.
Running the Command Click Analyze,thenRegression, thenLinear. At"h* eoptrc utiltt 5 I This will bring up the main dialog box for Linear Regression. On the left sideof the dialog box is a i &ry!$$sruruct list of the variablesin your datafile (we are using Cglpsaftladls GarnrdLhcar ldd the HEIGHT.savdata file from the start of this chapter).On the right side of the dialog box are blanksfor thedependentvariable(thevariableyou aretrying to predict)andthe independentvariables (thevariablesfrom whichyou arepredicting).
l-...G
Dmmd*
.roj I
LLI l&-*rt
n{.rI
ryl
tb.l
S€lcdirnVdir*
fn f*---*--
,it'r:,I
Cs Lrbr&:
Er'---
ti4svlit{
Li-J rsr"u*t. I Pr,rr... I s*
| oei*. I
t{,lrdq., }l+
We are interested in predicting someone'sweight basedon his or her height and sex. We believe that both sex and height influence weight. Thus, we should place the dependent variable WEIGHT in the Dependentblock and the independent variables HEIGHT and SEX in the Independent(s)block. Enterboth in Block l. This will perform an analysisto determine if WEIGHT can be predicted from SEX and/or HEIGHT. There are several methods SPSS can use to conduct this analysis. These can be selected with the Method box. Method Enter. the most widely
49
Cha p te 5 r Pre d i c ti o n a n dA s s o c i a ti on
used,puts all variablesin the equation, whether they are significant or not. The other methodsuse various meansto enter only those variables that are significant predictors. Click OK to run the analvsis.
UethodlE,rt-rl Readingthe Output For multiplelinearregression,therearethreecomponents of the output in which we are interested.The first is calledthe Model Summary,which is foundafterthe
ModelSummary Model
R
R Souare
Adjusted R Square .992
.99 .993 a. Predictors: (Constant), sex,height
Std.Errorof the Estimate 2.29571
VariablesEntered/Removedsection.For our example,you should get the output above.R Square(calledthe coefficientof determination) tells you the proportionof the variance in the dependentvariable (WEIGHT) that can be explainedby variationin the independent variables (HEIGHT and SEX, in this case).Thus, 99.3%of the variationin weight can be explained by differencesin height and sex (taller individuals weigh more, and men weigh more). Note that when a secondvariable is added,our R Squaregoes up from .649 to .993.The .649was obtainedusingthe SimpleLinear Regressionexamplein Section5.3. The StandardError of the Estimategives you a margin of error for the prediction equation.Using the predictionequation,68%o of the datawill fall within one standard error of estimate (predicted)value.Just over 95% will fall within two standard errors of estimates.Thus, in the exampleabove,95ohof the time, our estimatedweight will be within 4.591 (2.296 x 2) poundsof being correct.In our Simple Linear Regressionexample in Section5.3, this numberwas 32.296.Note the higherdegreeof accuracy. The secondpart of the output that we are interestedin is the ANOVA summarytable. For more information on readingANOVA tables,refer to the sectionson ANOVA in Chapter6. For now, the importantnumberis the significancein the rightmostcolumn. If that value is lessthan .05,we havea significantlinearregression. If it is largerthan .05,we do not. eHoveb Sum of Souares
Model xegresslon Residual Total
03424 24 68.514 10410.938
df
z 13
Mean Souare 5171.212 5.270
F v61.ZUZ
S i o.
.0000
15
a. Predictors:(Constant),sex, height b. DependentVariable:weight
The final sectionof output we are interestedin is the table of coefficients.This is wherethe actualpredictionequationcan be found.
50
Chapter5 Predictionand Association
Coefficientf Standardized
Unstandardized Coefficients Model 1
(Constant) height sex
Coefficients
B Std.Error 14.843 47j38 .198 2 .1 0 1 1.501 -3 9 .1 3 3
Beta
t
Sio.
.007 .000 .000
176
.312 -.767
10.588 -26.071
a. DependentVariable:weight
In most texts, you learn that Y' = a + bX is the regressionequation.For multiple regression,our equationchangesto l" = Bs + B1X1+ BzXz+ ... + B.X.(where z is the number of IndependentVariables). I/' is your dependent variable, and the Xs are your independent variables. The Bs are listed in a column. Thus, our predictionequationfor the example aboveis WEIGHT' :47.138 - 39.133(SEX)+ 2.101(HEIGHT)(whereSEX is codedas I : Male, 2 = Female,and HEIGHT is in inches).In other words, the averagedifferencein weight for participantswho differ by one inch in height is 2.101 pounds.Males tend to weigh 39.133 pounds more than females.A female who is 60 inchestall should weigh 47.138- 39.133(2)+ 2.101(60):94.932 pounds.Given our earlierdiscussionof the standard error of estimate,95o/oof femaleswho are 60 inchestall will weigh between90.341 (94.932- 4.591: 90.341)and99.523(94.932+ 4.591= 99.523)pounds. Drawing Conclusions Conclusionsfrom regressionanalysesindicate(a) whether or not a significant prediction equation was obtained,(b) the direction of the relationship,and (c) the equation itself. Multiple regressionis generallymuch more powerful than simple linear regression. Compareour two examples. you must alsoconsiderthe significancelevel of eachinWith multiple regression, dependentvariable. In the exampleabove,the significancelevel of both independent variables is lessthan .001. Morbl Sratrtny
PhrasingResultsThatAre Significant In our example,we obtainedan equaR Squareof.993 anda regression tion of WEIGHT' = 47.138 39 . 1 3 3 ( S E X+) 2.1 0 1 (H E IGH TT).he ANOVA resultedin F: 981.202with 2 and 13 degreesof freedom.F is significantat the lessthan.001level.Thus.we couldstatethe followinein a resultssection:
xodsl
R Souars
Adlusted Std.Eror of R Souare lhe Estimatg 992 2 2C 5r 1
.997. a Prsdictorsr (Conslan0, sex,hsighl
ANr:rVAD
Sumof XeanSouare 5171212
Sd r r r r a q Xodel dt Heorsssron r u3t2.424 I 2 R es i dual 68.5t4 Tutal | 0410.938 15 a. Predlctors: (Conslan0, ser,hoighl
981202
000r
b. OspBndontVariablo reighl
Coefllcldasr
Unslanda.dizsd Xodel
at 38 hei0hl 2.101 .39.133 sex a. Depsndenl Varlabl€: rei0hl
5l
Std.Eror
Slandardizad
Beta
.198 L501
Sio
I
3
4 843
.312
t6
10.588 - 26.071
007 000 000
Chapter5 Predictionand Association
A multiple linear regressionwas calculatedto predict participants' weight basedon their height and sex. A significantregressionequationwas found (F(2,13): 981.202,p < .001),with an R' of .993. Participants'predicted weightis equalto 47.138- 39.133(SEX) + 2.10l(HEIGHT),whereSEX is = : coded as I Male, 2 Female, and HEIGHT is measuredin inches. Participantsincreased2.101 pounds for each inch of height, and males weighed 39.133 pounds more than females. Both sex and height were significantpredictors. The conclusionstatesthe direction(increase),strength(.993), value (981.20),degreesof freedom(2,13),and significancelevel (< .001) of the regression.In addition,a statementof the equationitself is included.Becausethere are multiple independent variables,we havenotedwhetheror not eachis significant. Phrasing Results That Are Not Significant If the ANOVA does not find a significantrelationship,the Srg section of the output will be greaterthan .05, and the regressionequation is not significant. A resultssectionfor the output at right might include the following statement:
llorlel Surrrwy XodBl
Notethatfor resultsthatare
R Souare
AdtuslBd R Souare
Std Eror of
528. t68 (ConslanD. a Prsdlclors: se4 hel9ht
3 07525
ANI]VIP gum of qin
dt
I
A multiple linear regressionwas calculated predicting participants' ACT scoresbasedon their height and sex. The regression equation was not significant (F(2,13): 2.511,p > .05)with an R" of .279. Neither height nor weight is a significant predictor of lC7" scores.
x
Reoressron Rssidual Total
1t.191 122.9a1 't70.t38
23.717 9.a57
l3 't5
2.5rI
i tn.
(ConslanD, a Pr€dictors: se( hsight o. OoDendBnl Vaiabloracl
Coetllc lst 3r Unstandardizsd Cosilcisnls Sld E.rol
Yo d e l
(Constan0 h€l9hl s€x
I
,ir";;;;ilJlovA
"o, given,but theregression equation is not.
oJ ttl - 576 -t o??
19.88{ .266 2011
Standardized Coeilcionts Beia
std
- .668 - 296
3.102 2.168 - s 62
007 019 35{
resultsandR2resultsare
Practice Exercise UsePracticeDataSet2 in AppendixB. Determinethe predictionequationfor predictingsalarybasedon education, yearsof service,andsex.Whichvariablesaresignificant predictors?If you believethat men were paid more than womenwere,what would you conclude afterconducting thisanalysis?
52
Chapter6
ParametricInferentialStatistics Parametricstatisticalproceduresallow you to draw inferencesabout populations basedon samplesof those populations.To make theseinferences,you must be able to makecertainassumptions aboutthe shapeof the distributionsof the populationsamples.
Section6.1 Reviewof BasicHypothesisTesting TheNull Hypothesis In hypothesistesting,we createtwo hypothesesthat are mutually exclusive(i.e., both cannotbe true at the sametime) and all inclusive(i.e.,one of them must be true).We refer to thosetwo hypothesesas the null hypothesisand the alternative hypothesis.The null hypothesis generallystatesthat any differencewe observeis causedby random error. The alternative hypothesis generallystatesthat any differencewe observeis causedby a systematicdifferencebetweengroups.
TypeI and TypeII Eruors
REAL WORLD
NullHypothesisFalse All hypothesistesting attemptsto draw conclusions about the real world basedon the resultsof a test (a statistical test, in this case).There are four possible zdi TypeI Error I No Error combinationsof results (see the figure at <.r) 6a right). = ETwo of the possibleresultsare corA rect test results.The other two resultsare enors. A Type I error occurs when we U ; - ^6 reject a null hypothesis that is, in fact, 6!u fr trO true, while a Type II error occurs when l- o> No Error I Typell Error we fail to reject the null hypothesis that 'F: n2 is, in fact,false. Significance tests determine the probabilityof making a Type I error. In other words, after performing a seriesof calculations,we obtain a probability that the null hypothesisis true. If there is a low probability,suchas 5 or less in 100 (.05), by convention, we rejectthe null hypothesis.In other words,we typically use the .05 level (or less) as the maximumType I error ratewe are willing to accept. When there is a low probability of a Type I error, such as .05, we can statethat the significancetest has led us to "reject the null hypothesis."This is synonymouswith saying that a difference is "statistically significant." For example,on a reading tesr, suppose you found that a random sampleof girls from a school district scoredhigher than a random Null Hypothesis True
53
Chapter6 ParametricInferentialStatistics
sampleof boys. This result may have been obtainedmerely becausethe chanceenors associatedwith random sampling createdthe observeddifference (this is what the null hypothesis asserts).If there is a sufficiently low probability that random errors were the cause(as determinedby a significance test),we can statethat the differencebetweenboys and girls is statisticallysignificant.
SignificanceLevelsvs.Critical Values Moststatistics textbooks present hypothesis testingby usingtheconcept of a critical value. With such an approach,we obtain a value for a test statisticand compareit to a critical value we look up in a table.If the obtainedvalue is larger than the critical value, we reject the null hypothesis and concludethat we have found a significant difference(or relationship).If the obtainedvalue is lessthan the critical value, we fail to reject the null hypothesisand concludethat there is not a significantdifference. The critical-value approachis well suited to hand calculations.Tables that give critical valuesfor alpha levels of .001, .01, .05, etc., can be created.tt is not practicalto createa table for every possiblealpha level. On the other hand, SPSScan determinethe exact alpha level associatedwith any value of a test statistic.Thus, looking up a critical value in a table is not necessary. This, however,doeschangethe basic procedurefor determiningwhetheror not to reject the null hypothesis. The sectionof SPSSoutput labeledSrg.(sometimesp or alpha) indicatesthe likelihood of making a Type I error if we rejectthe null hypothesis.A value of .05 or lessindicatesthat we should reject the null hypothesis(assumingan alpha level of .05). A value greaterthan .05 indicatesthat we shouldfail to reject the null hypothesis. In other words, when using SPSS,we normally reject the null hypothesis if the output value under Srg. is equal to or smaller than .05, and we fail to reject the null hypothesisif the outputvalueis largerthan .05. One-Tailed vs. Two-Tailed Tests SPSSoutput generally includes a two-tailed alpha level (normally labeled Srg. in the output). A two-tailed hypothesisattemptsto determinewhether any difference (either positive or negative)exists.Thus, you have an opportunity to make a Type I error on either of the two tails of the normal distribution. A one-tailedtest examinesa differencein a specificdirection.Thus, we can make a Type I error on only one side (tail) of the distribution.If we have a one-tailedhypothesis, but our SPSSoutput gives a two-tailed significance result, we can take the significance level in the output and divide it by two. Thus, if our differenceis in the right direction,and if our output indicatesa significance level of .084 (two-tailed), but we have a one-tailed hypothesis,we can report a significance level of .042 (one-tailed). Phrasing Results Resultsof hypothesistestingcan be statedin different ways, dependingon the conventionsspecifiedby your institution.The following examplesillustrate someof thesedifferences.
54
Chapter6 ParametricInferentialStatistics
Degreesof Freedom Sometimesthe degreesof freedom are given in parenthesesimmediately after the symbol representingthe test,as in this example: (3):7.0 0 ,p<.0 1 Other times, the degreesof freedomare given within the statementof results,as in this example: t:7.0 0 , df :3, p < .01 SignificanceLevel When you obtain results that are significant, they can be describedin different ways. For example,if you obtaineda significancelevel of .006 on a t test, you could describeit in any of the following threeways: (3 ):7 .00,p <.05 (3 ):7 .00,p <.01 (3 ) : 7.0 0 ,p : .006 Notice that becausethe exactprobabilityis .006,both .05 and .01 are alsocorrect. There are also variousways of describingresultsthat are not significant.For example, if you obtaineda significancelevel of .505, any of the following three statementscould be used: t(2 ): .8 0 5 ,ns t(2):.8 0 5 , p > .05 t(2 )=.8 0 5 ,p:.505 Statementof Results Sometimesthe resultswill be statedin terms of the null hypothesis,as in the following example: The null hypothesis wasrejected1t: 7.00,df :3, p:.006). Other times,the resultsare statedin terms of their level of significance,as in the following example: A statistically significantdifferencewas found:r(3):7.00,p <.01. StatisticalSymbols Generally,statisticalsymbolsare presentedin italics. Prior to the widespreaduse of computersand desktop publishing, statisticalsymbols were underlined.Underlining is a signal to a printer that the underlinedtext should be set in italics. Institutions vary on their requirementsfor student work, so you are advised to consult your instructoraboutthis. Section 6.2 Single-Sample I Test Description The single-sampleI test comparesthe mean of a single sampleto a known population mean. It is useful for determiningif the current set of data has changedfrom a long-
55
I
Chapter6 ParametricInferentialStatistics
term value (e.g., comparing the curent year's temperaturesto a historical averageto determine if global wanning is occuning).
Assumptions The distributions from which the scoresare taken should be normally distributed. However, the t test is robust and can handleviolations of the assumptionof a normal distribution. The dependentvariable must be measuredon an interval or ratio scale. .SP,SS Data Format The SPSSdata file for the single-sample / test requiresa single variablein SPSS. That variablerepresentsthe set of scoresin the samplethat we will compareto the population mean. Running the Command The single-sample/ test is locatedin the CompareMeans submenu,under theAnalyze menu.The dialog box for the single-sample/ test requiresthat we transferthe variable representing the currentset ofscores ry to the Test Variable(s) section. We i Ardvze Graphswllties Wiidnru nrb ) Reports must also enter the population averI Sescriplive Stati*icr ] age in the Test Value blank. The exMcrns.,, ample presentedhere is testing the General linearModel ) variable LENGTH againsta popula- ' ) CorrElata Indegrdant-SrmplCs T tion meanof 35 (this exampleusesa ' Regrossion ) Paired-Samples T Tert.,, hypotheticaldata set). ) Clasdfy 0ne-WeyANOVA,..
Readingthe Output The output for the single-samplet test consistsof two sections.The first section lists the samplevariable and somebasic descriptive statistics (N, mean, standard deviation, and standarderror).
56
Chapter6 ParametricInferentialStatistics
T-Test One-SampleS'tatistics Mean
N
10
LENGTH
35.9000
Std.Enor std. Deviation Mean 1.1972
.3786
One-SampleTes{ TeslValue= 35
?
LENGTH
2.377
df
s
95%gsnl i 6sngg Interual ofthe Difierence Mean sis. (2-tailed) DitTerence Lowef U pner 7564 .041 .9000 4.356E -02
The secondsection of output containsthe results of the t test. The example presentedhere indicatesa / value of 2.377,with 9 degreesof freedomand a significance level of .041.The mean differenceof .9000is the differencebetweenthe sampleaverage(35.90) and the populationaveragewe enteredin the dialog box to conductthe test (35.00). Drawing Conclusions The I test assumesan equality of means.Therefore, a significant result indicates that the sample mean is not equivalentto the population mean (hence the term "significantly different"). A result that is not significantmeansthat there is not a significantdifference betweenthe means.It doesnot mean that they are equal. Refer to your statisticstext for the sectionon failure to reject the null hypothesis.
PhrasingResultsThatAre Significant The aboveexamplefound a significantdifferencebetweenthe populationmean and the samplemean.Thus, we could statethe following: A single-samplet test compared the mean length of the sample to a populationvalue of 35.00. A significantdifferencewas found (t(9) = 2.377, p <.05).The samplemeanof 35.90(sd: 1.197)was significantlygreater than the population mean.
57
Chapter6 ParametricInferentialStatistics
Phrasing ResultsThatAre Not Significant If the significance level had been greaterthan .05, the differencewould not be significant. For example,if we receivedthe output presentedhere, we could statethe following:
L{n-S.I|pla Sl.llsllcs Std.Eror tean
N
tempetalure
I
OU .OOD T
Sld Deviation
9.11013
L'|D.S.Ittte
3.03681
Tesi
TsstValue= 57.1
A single-sample / test 95% Contldsnc6 Inlsryalofthe comparedthe mean tempOit8rsnce llean t dt Sro (2-lail€d) DilTer€nce Lowsr UDOEI erature over the past year lemp€latute 1 ?6667 I 688 8.2696 - 5.736 2 to the long-term average. The difference was not significant (r(8) = .417, p > .05). The mean temperatureover the past year was 68.67(sd = 9.1l) comparedto the longterm averageof 67.4. Practice Exercise The averagesalary in the U.S. is $25,000.Determine if the averagesalary of the participantsin PracticeData Set 2 (Appendix B) is significantlygreaterthan this value. Note that this is a one-tailedhypothesis.
Section6.3 Independent-samples I Test Description The independent-samples I test comparesthe meansof two samples.The two samples are normally from randomly assignedgroups. Assumptions The two groupsbeing comparedshouldbe independentof eachother. Observations are independentwhen information about one is unrelated to the other. Normally, this meansthat one group of participantsprovidesdata for one sampleand a different group of participants provides data for the other sample (and individuals in one group are not matchedwith individualsin the other group).One way to accomplishthis is throughrandom assignmentto form two groups. The scoresshould be normally distributed,but the I test is robust and can handle violationsof the assumptionof a normal distribution. The dependent variable must be measuredon an interval or ratio scale.The independentvariable shouldhaveonly two discretelevels. SP,S,S Data Format The SPSSdata file for the independentI test requirestwo variables.One variable, grouping the variable, representsthe value of the independentvariable. The grouping variable should have two distinct values (e.g., 0 for a control group and I for an experi-
58
Chapter6 ParametricInferentialStatistics
mental group). The secondvariablerepresentsthe dependent variable, such as scoreson a test. Conducting an Independent-Samples t Test For our example,we will use 6adrs Utltties window Heh [n;y* the SAMPLE.savdatafile. Click Analyze, then Compare Std{ic* l Means, then Independent-SamplesT ?"est.This will bring up the main diaI log box. Transfer the dependent ) Can*bto variable(s) into the Test Variable(s) l Regf6s$isl t ctaseff blank. For our example,we will use
ls elrl
thevariableGRADL,. Transfertheindependentvariableinto the GroupingVariablesection.For our example,we will usethevariableMORNING. Next, click Define Groups and enter the values of the two levels of the independent variable. Independent t tests are capable of comparingonly two levels at a time. Click Continue, then click OK to run the analysis.
it dry lim mk lr*iT
Outputfrom the Independent-Samples t Test The output will have a sectionlabeled"Group Statistics."This sectionprovidesthe basicdescriptivestatisticsfor the dependentvariable(s)for eachvalue of the independent variable. It shouldlook like the outputbelow. Group Statistics mornrnq grade No Yes
N
2 2
Std.Deviation Mean 82.5000 3.53553 78.0000 7.07107
59
Std.Error Mean 2.50000 5.00000
'1
Chapter6 ParametricInferentialStatistics
Next, there will be a sectionwith the resultsof the I test. It should look like the outputbelow. lnd6pondent Samplo! Telt Leveng's Tost for Fdlralitv
ltest for Eoualitv of Msans
df Variencac
Sio . graoe
Sio. (2{ailed}
df
t
tqual vanances assum6d
.805
Equal variances not assumed
.805
2 1.471
Mean Diff6rence
Sld. Error Ditforenc6
95% Confidenco lntsrual of the Diffor€nc6
Uooer
Lower
505
4.50000
5.59017 -19.55256 28.55256
530
4.50000
5.59017 -30.09261 39.09261
The columns labeledt, df, and Sig.(2-tailed) provide the standard"answer" for the I test.They provide the value of t, the degreesof freedom(number of participants,minus 2, in this case),and the significancelevel (oftencalledp). Normally, we usethe "Equal variancesassumed"row. Drawing Conclusions Recall from the previous section that the , test assumesan equality of means. Therefore,a significant result indicatesthat the means are not equivalent.When drawing conclusionsabouta I test,you must statethe directionof the difference(i.e., which mean was larger than the other). You should also include information about the value of t, the degreesof freedom,the significancelevel,and the meansand standard deviationsfor the two groups. Phrasing Results That Are Significant For a significant / test (for example,the output below), you might statethe following:
6adO slil|llk!
'.do,.ilhrh! sc0r9
0r0uD conlfol €rDoflmontal
x€an al 0000
N
Sld. Odiation
a.7a25a
2osr67
3
Sld Eror xaan 2 121J2 | 20185
hlapdrlra
S,uplcr lcsl
Lmno's T€sl for l l a sl fo r Fo u e l l to o t l n a n s
95(f, Contld9n.e Inl€ryalofthe
xsan dT
Si d
sc0re
EqualvarlencSs assum€d Equalvaiancos nol assumgd
5 058
071
5
31tl
Std Error
Sio a2-laile(n
a 53l
U Jb
029
7 66667
2 70391 2 138't2
71605
I r. 6 17 2 8
I 20060
r r. r 3 2 7 3
An independent-samplesI test comparing the mean scores of the experimental and control groups found a significant difference between the means of the two groups ((5) = 2.835, p < .05). The mean of the experimentalgroup was significantlylower (m = 33.333,sd:2.08) than the meanof the control group (m: 41.000,sd : 4.24).
60
Chapter6 ParametricInferentialStatistics
Phrasing Results That Are Not Signifcant In our example at the start of the section,we comparedthe scoresof the morning peopleto the scoresof the nonmorningpeople.We did not find a significant difference,so we could statethe following: t test was calculatedcomparingthe mean score of An independent-samples participantswho identified themselvesas morning people to the mean score of participants who did not identi$ themselvesas morning people. No significant difference was found (t(2) : .805, p > .05). The mean of the morning people (m : 78.00,sd = 7.07) was not significantly different from the meanof nonmomingpeople(m = 82.50,sd = 3.54). Practice Exercise Use PracticeData Set I (AppendixB) to solvethis problem.We believethat young individualshave lower mathematicsskills than older individuals.We would test this hypothesisby comparingparticipants25 or younger(the "young" group) with participants26 or older (the "old" group). Hint: You may need to createa new variable that represents which age group they are in. SeeChapter2 for help.
/ Test Section6.4 Paired-Samples Description The paired-samplesI test (also called a dependent/ test) comparesthe means of two scoresfrom relatedsamples.For example,comparinga pretestand a posttestscorefor I test. a group of participantswould requirea paired-samples Assumptions The paired-samplesI test assumesthat both variablesare at the interval or ratio levels and are normally distributed.The two variablesshould also be measuredwith the samescale.If the scalesare different. the scoresshouldbe convertedto z-scoresbefore the , testis conducted. .SPSSData Format Two variablesin the SPSSdata file are required.These variablesshould represent two measurements from eachparticipant. Running the Command We will createa new data file containing five variables:PRETEST, MIDTERM, FINAL, INSTRUCT, and REQUIRED. INSTRUCT representsthree different instructors for a course.REQUIRED representswhether the coursewas required or was an elective (0 - elective, I : required).The other threevariablesrepresentexam scores(100 being the highestscorepossible).
6l
Chapter6 ParametricInferentialStatistics
PRETEST 56 79 68 59 64 t.+ t)
47 78 6l 68 64 53 7l 6l 57 49 7l 6l 58 58
MIDTERM 64 9l 77 69 77 88 85 64 98 77 86 77 67 85 79 77 65 93 83 75 74
Enter the data and save it as GRADES.sav.You can check your data entry by computinga mean for each instructor using the Means command (see Chapter 3 for more information). Use INSTRUCT as the independentvariable and enter PRETEST, MIDTERM, ANd FINAL as your dependent variables. Once you have entered the data,conducta paired-samplesI test comparing pretest scores and final scores. Click Analyze, then Compare Means, then Paired-SamplesT ?nesr.This will bring up the main dialog box. i;vr*; '
6r5db trl*lct
Bsi*i l)06ctftlv.s44q
l t
@Cdnrdur*Modd
t
tilkrdo|
FINAL 69 89 8l 7l
REQUIRED 0 0 0
INSTRUCT I I I
IJ
I I 2 2 2 2 2 2 2
86 86 69 100 85 93 87 76 95 97 89 83 100 94 92 92
0 0 0
0 0 0
J J J J J t J J
Report INSTRUCT Mean N
1.00
std. Deviation 2.00
3.00
Total
tbb
q8-sdndo T Tart...
i c*@ i lryt'sn i'w
62
Mean N std. Deviation Mean N std. Deviation Mean N std. Deviation
PRETEST MIDTERM 67.5714 (6.t'l 4J 7 7
FINAL
79.5714 7
8.3837
9.9451
7.9552
63.1429 7
79.1429 7
86.4286 7
10.6055
11.7108
10. 9218
59.2857 7
78.0000 7
92.4286 7
6.5502
8.62',t7
5.5032
63.3333 21
78.6190 21
86. 1429 21
8.9294
9.6617
9.6348
Chapter6 ParametricInferentialStatistics
You must select pairs of variables to compare. As you select them, they are placed in the Current Selections area. Click once on PRETEST. then once on FINAL. Both variableswill be moved into the Cnrrent Selections area.Click on the right anow to transfer the pair to the Paired Variablessection.Click OK to conductthe test.
m |y:tl
ry{l .ry1 : CundSdacti:n: I VcirHal: Vri{!ilcz
"l*;"J xl
y.pllarq Cf;tndcim *,n*f.n n
PaildVaiibblt
.!K I
ril',n*
"ttr.I
atrirdftEl d1;llqi.d
-L*rd .t
fc
i
Reading the Output The output for PairedSamplesStatistics paired-samples the t test std. Std.Error consists of three comMean N Deviation Mean ponents. The first part Pair PRETEST 63.3333 21 8.9294 1.9485 givesyou basicdescrip1 rtNet 86.1429 21 9.6348 2.1025 tive statistics for the pair of variables.The PRETESTaveragewas 63.3,with a standard deviation of 8.93.The FINAL averagewas 86.14,with a standarddeviationof 9.63. PairedSamplesGorrelations Correlation
N PaIT1
PRETEST & F IN A L
21
.535
S i o.
.013
The secondpart ofthe output is a Pearson correlation coefficientfor the pair of variables.
Within the third part of the output (on the next page),the sectioncalled paired Differences contains information about the differencesbetweenthe two variables.you miy have learnedin your statisticsclassthat the paired-samples / test is essentiallya singlesamplet testcalculatedon the differencesbetweenthe scores.The final threecolumnscontain the value of /, the degreesof freedom,and the probability level. In the examplepresentedhere,we obtaineda I of -11.646,with 20 degreesof freedomand a significance level of lessthan.00l. Note that this is a two-tailedsignificancelevel. Seethe startof this chapterfor more detailson computinga one-tailedtest.
63
I
Chapter6 ParametricInferentialStatistics
Paired Samples Test
Paired Differences
95%Confidence Inlervalof the
srd, Mean
Pair1
PRETEST -22.8095 . FINAL
l)iffcrcncc
Std.Errot
Dcvirl
Lower
Maan
8.9756
sig U D oer
1.9586 -26.8952 -18.7239
df
I
'|1.646
l)Jailatl\
20
.000
Drawing Conclusions Paired-samplesI testsdeterminewhetheror not two scoresare significantly different from each other. Significant values indicate that the two scoresare different. Values that are not significant indicatethat the scoresare not significantly different.
PhrasingResultsThatAre Significant When statingthe resultsof a paired-samples I test,you shouldgive the value of t, the degreesof freedom,and the significancelevel. You should also give the mean and standard deviation for each variable.as well as a statementof results that indicates whetheryou conducteda one- or two-tailedtest.Our exampleabovewas significant,so we could statethe following: A paired-samples I test was calculatedto comparethe mean pretestscoreto the mean final exam score.The mean on the pretestwas 63.33 (sd : 8.93), and the mean on the posttestwas 86.14(sd:9.63). A significantincrease from pretestto final was found (t(20) : -ll .646,p < .001). Phrasing Results That Are Not Significant If the significancelevel had beengreaterthan .05 (or greaterthan .10 if you were conductinga one-tailedtest),the resultwould not have beensignificant.For example,the hypotheticaloutput below representsa nonsignificantdifference.For this output, we could state: P.l crl Sdr{ror St.rtlstkr Sld.Etrof x6an Patr 1
midlsrm nnal
N
r B t1a3 79.571 1
I 9.509 7 95523
I
P.rl c(l Sxtlro3
3 75889 3.00680
Cq I olrlqt6
N
ts4trI
Sl d D d a l l o n
7
mtdt€rm&flnal
I
Cotrelali 96S
5 to
000
P.*od Silrlta!
Tcrl
ParredOrflerences
x s an . 8571a
Std Oryiation
2 96808
Std.Eror X€an
I 12183
64
95%ConidBncs lnlsNrlotth€ DilYerenca u008r Lmt 1 88788
tl - 76a
b
Sio (2-lailad) a7a
Chapter6 ParametricInferentialStatistics
A paired-samples t testwascalculatedto comparethe meanmidtermscoreto themeanfinal examscore.Themeanon the midtermwas78.71(sd: 9.95), difference andthe meanon the final was79.57(sd:7.96). No significant = >.05). from midtermto finalwasfound((6) -.764,p Practice Exercise datafile, andcomputea paired-samples Usethe sameGRADES.sav / testto determineif scoresincreased from midtermto final. Section6.5 One-Wav ANOVA Description Analysisof variance(ANOVA) is a procedurethat determines the proportionof variabilityattributedto eachof severalcomponents. It is oneof the mostusefulandadaptablestatistical techniques available. the meansof two or moregroupsof participants The one-wayANOVA compares that vary on a singleindependentvariable (thus,the one-waydesignation). When we havethreegroups,we couldusea I testto determinedifferencesbetweengroups,but we wouldhaveto conductthreet tests(GroupI comparedto Group2, Group I comparedto Group3, andGroup2 compared to Group3). Whenwe conductmultipleI tests,we inflate the Type I error rate and increaseour chanceof drawingan inappropriate conclusion. gives ANOVA compensates multiple comparisons and for these us a singleanswerthat tellsus if anyof thegroupsis differentfrom anyof theothergroups. Assumptions The one-wayANOVA requiresa singledependentvariable and a singleindependentvariable. Which groupparticipants belongto is determinedby the valueof the independentvariable. Groupsshouldbe independent of eachother.If our participants belong to more than one group each,we will have to conducta repeated-measures ANOVA. If we havemorethanone independentvariable,we would conducta factorial ANOVA. ANOVA alsoassumes thatthedependentvariableis at the intervalor ratio levis elsand normallydistributed. SP,S,S Data Format Two variablesare requiredin the SPSSdatafile. One variableservesas the dependentvariable andtheotherasthe independentvariable.Eachparticipantshouldprovideonly onescorefor thedependentvariable. Running the Command datafile we createdin the previous For this example,we will usetheGRADES.sav section.
65
Chapter6 ParametricInferentialStatistics
To conduct a one-way ANOVA, click Analyze, then Compare Means, then One-I7ay ANOVA. This will bring up the main dialog box for the Onelltay ANOVA command.
i An$tea &ryhr l.{lith; wr$|f ) R;pa*tr I DrrsFfivsft#l|'
f,'',I
xrded
f *wm dtin"t
R*g I c*"d I
S instt.,ct / rcqulad
Hdp I
miqq|4^__ !qt5,l
fqiryt,,,l odiors"' I
]
orn-5dtFh
R!E'l'd6al
I )
l|1|J46tddg.5arylarT16t,., P*G+tunpbs I IcJt..,
cl#y
rffillEil
lhcd
Csnalda
l ':l i rl :.,. :J
$alp
friodd
f f64...
You should place the independent variable in the Factor box. For our example, INSTRUCT representsthree different instructors.and it will be used as our independentvariable. Our dependentvariable will be FINAL. This test will allow us to determine if the instructor has any effect on final sradesin the course.
-rl ToKl
pela$t
/n*Jterm d 'cqfuda
f- Fncdrdr*dcndlscls
l* Honroga&olvairre,* j *_It
Ro"t I
J
c*ll H& l
mrffi,.*-
Pqyg*-I odi"*.'.I "?s'qt,.,l Click on the Options box to get the Options dialog box. Click Descriptive. This will give you meansfor the dependentvariable at each level of the independentvariable. Checkingthis box preventsus from having to run a separatemeans command. Click Continue to return to the main dialog box. Next, click Post Hoc to bring up the Post Hoc Multiple Comparisonsdialog box. Click Tukev.then Continue.
, Eqiolvdirnccs NotfudfiEd
.
r TY's,T2
Sigflber!
l,?."*:,_:
t or*ol:: _l6{''6:+l.'od
j
h,v!t tffi-
rc"'*i.'I r"ry{-l" H* |
Post-hoctests are necessaryin the event of a significant ANOVA. The ANOVA only indicatesif any group is different from any other group. If it is significant,we needto determinewhich groupsare different from which other groups.We could do I teststo determine that, but we would have the sameproblem as before with inflating the Type I error rate. There are a variety of post-hoccomparisonsthat correct for the multiple comparisons.The most widely used is Tukey's HSD. SPSSwill calculatea variety of post-hoc testsfor you. Consult an advancedstatisticstext for a discussionof the differencesbetween thesevarioustests.Now click OK to run the analvsis.
66
Chapter6 ParametricInferentialStatistics
Readingthe Output (i.e.,levelof the independDescriptivestatisticswill be givenfor eachinstructor ent variable)andthe total.For example,in hisftrerclass,InstructorI hadan averagefinal examscoreof 79.57. Descriptives final
N UU
2.00 3.00 Total
7 7 2'l
95% ConfidenceIntervalfor Mean Std. Deviation Std.Enor LowerBound UooerBound Minimum Maximum Mean 72.2',t41 6Y.UU tiv.uu 86 9288 7.95523 3.00680 79.5714 100.00 69.00 76.3276 96.5296 1 0 .9 2180 4.12805 86.4286 100.00 83.00 87.3389 97.5182 5.50325 2.08003 92.4286 100.00 69.00 81.7572 90.5285 9.63476 2.10248 86. 1429
The next section of the output is the ANOVA sourcetable. This is where 579.429 the various componentsof 1277. 143 the variance have been 1856.571 listed,along with their relative sizes. For a one-way ANOVA, there are two componentsto the variance: Between Groups (which representsthe differencesdue to our independent variable) and Within differenceswithin eachlevel of our independentvariable). For Groups(which represents our example,the BetweenGroups variancerepresentsdifferencesdue to different instructors. The Within Groupsvariancerepresentsindividual differencesin students. The primary answeris F. F is a ratio of explainedvariance to unexplainedvariance. Consult a statisticstext for more informationon how it is determined.The F has two different degreesof freedom,one for BetweenGroups(in this case,2 is the number of levels of our independentvariable [3 - l]), and anotherfor Within Groups(18 is the number of participantsminusthe numberof levelsof our independentvariable [2] - 3]). The next part of the output consistsof the results of our Tukey's ^EISDpost-hoc comparison. This table presentsus with every possiblecombinationof levels of our independent variable. The first row representsInstructor I comparedto Instructor 2. Next is Instructor I compared to Multlple Comparlsons Instructor
3. Next
is In-
structor 2 compared to Instructor l. (Note that this is redundantwith the first row.) Next is Instructor 2 comparedto Instructor 3, and so on. The column labeled Sig. representsthe Type I error (p) rate for the simple (2-level) comparison in that row. In our
F,NAL variabre: Dooendenr HS D 95% Confidence Mean (J) Difference ( t) Il-.tl INSTRUCT INSTRUCT 1 .0 0 2.OU €.8571
3.00 2.00
-12.8571' 6.8571
1.00
3.00 3.00
-6.0000 12.8571' 6.0000
1.00
2.00
Sld. Enor 4.502
4.502 4.502 4.502 4.502 4.502
'. The mean differ€nceis significantat the .05 level.
67
Sia JU4
.027 .304 .396 .027 .396
Lower B ound -16.J462
-24.3482 -4.6339 -17.4911 1.3661 -5.4911
Upper R6r rn.l
4.6339 -1.3 661 18.3 482 5,49 11 24.3482 17.49 11
Chapter6 ParametricInferentialStatistics
exampleabove,InstructorI is significantlydifferentfrom Instructor3, but InstructorI is not significantlydifferentfrom Instructor2, andInstructor2 is not significantlydifferent from Instructor3. Drawing Conclusions Drawing conclusionsfor ANOVA requiresthat we indicate the value of F, the degreesof freedom,and the significance level. A significantANOVA should be followed by the resultsof a post-hocanalysisand a verbal statementof the results. Phrasing Results That Are Significant In our exampleabove,we could statethe following: We computed a one-way ANOVA comparing the final exam scores of participantswho took a course from one of three different instructors.A significant difference was found among the instructors (F(2,18) : 4.08, p < .05). Tukey's f/SD was usedto determinethe natureof the differences between the instructors. This analysis revealed that students who had InstructorI scoredlower (m:79.57, sd:7.96) than studentswho had Instructor3 (m = 92.43,sd: 5.50).Studentswho had Instructor2 (m:86.43, sd : 10.92) were not significantly different from either of the other two groups. Phrasing Results That Are Not Significant If we had conductedthe analysis using PRETEST as our dependent variable instead of FINAL, we would have received the following output: The ANOVA was not significant. so there is no need to refer to the Multiple Comparisons table. Given this result, we may statethe following:
Ocrcr lriir
95$ Conlldanc0 Inloml Sld Ddaton
N
200 300 Tolel
1 21
63.1r 29 592857 633333
10.605r8 7 6.5501 I 92935
Sld Eiror
LMf
4.00819 2.at5t1 I ga€sa
Bo u n d
tol
Uooer gound
5333r a 53 2278 592587
7? 95r 3 55 3a36 67 3979
1700 r9.00 at 0 0
All:'Vl
sumof wrlhinOroups Tol.l
210.667 135a.000 t 594667
t8 20
120 333 15222
1 600
229
The pretest means of students who took a course from three different instructors were compared using a one-way ANOVA. No significant differencewas found (F'(2,18)= 1.60,p >.05).The studentsfrom the three different classesdid not differ significantlyat the start of the term. Students who had Instructor I had a mean scoreof 67.57 (sd : 8.38). Studentswho had Instructor2 had a mean scoreof 63.14(sd : 10.61).Studentswho had lnstructor3 had a meanscoreof 59.29(sd = 6.55).
68
78 00 71 00 79 00
Chapter6 ParametricInferentialStatistics
Practice Exercise UsingPracticeDataSet I in AppendixB, determineif the averagemathscoresof of are significantlydifferent.Write a statement single,married,and divorcedparticipants results. Section6.6 Factorial ANOVA Description The factorialANOVA is one in which thereis morethan one independentvariable.A 2 x 2 ANOVA, for example,hastwo independentvariables,eachwith two levels.A 3 x 2 x 2 ANOVA hasthreeindependentvariables.Onehasthreelevels,andthe othertwo havetwo levels.FactorialANOVA is very powerfulbecauseit allowsus to assesstheeffectsof eachindependentvariable,plustheeffectsof the interaction. Assumptions of one-wayANOVA (i.e.,the FactorialANOVA requiresall of the assumptions ratio or levels interval at the and normallydistributed).In dependentvariable mustbe of eachother. variablesshouldbe independent addition,theindependent .SPSSData Format SPSSrequiresonevariablefor the dependentvariable,andone variablefor each variablethatis represented asmultiindependent variable.If we haveanyindependent ple variables(e.g., PRETESTand POSTTEST),we must use the repeated-measures ANOVA. Runningthe Command This exampleuses the GRADES.savI Rnalyze Graphs USiUas llilindow Halp data file from earlierin this chapter.Click AnaRepsrts lyze, then GeneralLinear Model. then UnivariDescrbtiwStatFHcs ate.
!!{4*,l
R{'dsr! F.dclt}
?**" I -..fP:.J fi*"qrJ !t,, I 9*ll
This will bring up the main dialog box for Univariate ANOVA. Select the dependent variable and place it in the DependentVariableblank (use FINAL for this example). Select one of your independent variables (INSTRUCT, in this case) and place it in the Fixed Factor(s) box. Place the secondindependent variable (REQUIRED) in the Fixed Factor(s)
Chapter6 ParametricInferentialStatistics
box. Having defined the analysis, now click Options. When the Options dialog box comes up, move INSTRUCT, REQUIRED, and INSTRUCT x REQUIRED into the Display Means for blank. This will provide you with means for each main effect and interaction term. Click Continue. If you were to selectPost-Hoc,SPSSwould run post-hoc analysesfor the main effects but not for the interaction term. Click OK to run the analvsis.
Reading the Output
1 .IN S T R U C T
At the bottom of the output, you will find the means 95oloConfidencelnterval for each main effect and inLower Upper you selected with the Bound teraction Error Bound Mean Std. I NS T R U C T 1 .UU 72.240 86.926 79.583 3.445 Optionscommand. 2.00 78.865 93.551 3.445 86.208 three were There 3.00 99.426 3.445 84.740 92.083 instructors, so there is a mean FINAL for each instructor. We 2. REQUIRED also have means for the two values of REQUIRED. FinallY, ariable:FINAL Interval we have six means representing 95%Confidence the interaction of the two Upper Lower Rorrnd Bound Std. Error variables (this was a 3 x 2 REOUIRED Mean ,UU 91.076 78.257 3.007 84.667 design). 1.00 92.801 2.604 81.699 87.250 had Participants who Instructor I (for whom the class Studentswho had Instructor I 79.67. of was not required) had a mean final exam score (for whom it was required)had a mean final exam scoreof 79.50,and so on. The example we just 3. IN S TR U C T' R E QU IR E D ran is called a two-way Variable:FINAL ANOVA. This is becausewe Interval 95%Confidence had two independent variUpper Lower Bound Bound ables. With a two-way INSTRUCTREQUIRED Mean Std.Error 90.768 68.565 5.208 1.00 .00 79.667 ANOVA, we get three an89. 114 69.886 4.511 1.00 79.500 swers: a main effect for 95.768 73.565 5.208 .00 84.667 2.00 INSTRUCT, a main effect 78.136 97.364 4.511 1.00 87.750 for REQUIRED, and an in100.768 78.565 5.208 .00 89.667 3.00 for result teraction 104.114 84.886 4.511 1.00 94.500 INSTRUCT ' REQUIRED (seetop ofnext page). ariable:FINAL
70
Chapter6 ParametricInferentialStatistics
Tests of Between-SubjectsEffects Dependent Variable:FINAL
635.8214 5 Intercept 1s1998.893 1 151998.893 1867.691 INSTRUCT 536.357 2 268.179 3.295 REQUIRED 34.32'l 1 34.321 .422 INSTRUCT ' REQUIRED 22.071 2 11.036 .136 Error 1220.750 15 81.383 Total 157689.000 21 CorrectedTotal 1856.571 20 a. R Squared= .342(Adjusted = R Squared .123)
.000 .065 .526 .874
The source-table above gives us these three answers (in the INSTRUCT, REQUIRED, and INSTRUCT * REQUIRED rows). In the example,none of the main effects or interactions was significant.tn the statementsof results, you must indicate two ^F, degreesof freedom (effect and residual/error),the significance'level, and a verbal statement for each of the answers(three, in this case).uite that most statisticsbooks give a much simpler version of an ANOVA sourcetable where the CorrectedModel, Intercept, and CorrectedTotal rows are not included. Phrasing Results That Are Significant If we had obtainedsignificantresultsin this example,we could statethe following (Theseare fictitious results.For the resultsthat correspona to the example above, please seethe sectionon phrasingresultsthat are not significant): A 3 (instructor) x 2 (requiredcourse)between-subjects factorial ANOVA was calculatedcomparingthe final exam scoresfor participants who had one of three instructorsand who took the courseeither as a required courseor as an elective. A significant main effect for instructor was found (F(2,15) : l0.ll2,p < .05). studentswho had InstructorI had higher final .*u- r.o16 (m = 79.57,sd:7.96) thanstudents who had Instructor3 (m:92.43, sd: 5.50). Studentswho had Instructor2 (m : g6.43,sd :'10.92) weie not significantlydifferent from eitherof the other two groups.A significant main effect for whetheror not the coursewas requiredwas found (F(-1,15):3g.44, p < .01)'Studentswho took the coursebecauseit was required did better(z : 9l '69, sd : 7.68)thanstudentswho took thecourseas an electi (m : ve 77.13, sd:5.72). The interaction wasnot significant(F(2,15)= l.l5,p >.05). The effect of the instructorwas not influencedby whetheror not the studentstook the coursebecauseit was required. Note that in the above exampre,we would have had to conduct Tukey,s HSD to determinethe differencesfor INSTRUCT (using thePost-Hoc command).This is not nec-
7l
Chapter6 ParametricInferentialStatistics
Tests of Between-SubjectsEffects t Variable:FINAL
Typelll Sumof Source Souares uorrecleoMooel 635.8214 Intercept 151998.893 INSTRUCT 536.357 REQUIRED 34.321 INSTRUCT 22.071 - REQUIRED Error 1220.750 Total 157689.000 't856.571 CorrectedTotal
Mean Souare
df
5 1 2 1 2 1E
F
127.164 1.563 151998.893 1867.691 268.179 3.295 34.321 .422 11.036 .136 81.383
siq. .230 .000 .065 .526 .874
21 20
a. R Squared= .342(AdjustedR Squared= .123)
The source table above gives us these three answers (in the INSTRUCT, REQUIRED, and INSTRUCT * REQUIRED rows). In the example,none of the main effects or interactions was significant.In the statementsof results,you must indicatef', two degreesof freedom (effect and residual/error),the significance level, and a verbal statement for each of the answers(three,in this case).Note that most statisticsbooks give a much simpler versionof an ANOVA sourcetable where the CorrectedModel, Intercept, and CorrectedTotal rows are not included. Phrasing Results That Are Signifcant If we had obtainedsignificantresultsin this example,we could statethe following (Theseare fictitious results.For the resultsthat correspondto the exampleabove,please seethe sectionon phrasingresultsthat arenot significant): A 3 (instructor) x 2 (required course)between-subjects factorial ANOVA was calculatedcomparingthe final exam scoresfor participantswho had one of three instructorsand who took the courseeither as a requiredcourseor as an elective. A significant main effect for instructor was found (F(2,15\ : l0.l 12,p < .05). Studentswho had InstructorI had higher final exam scores (m : 79.57,sd : 7.96) than studentswho had Instructor3 (m : 92.43,sd : 5.50). Studentswho had Instructor 2 (m : 86.43, sd : 10.92) were not significantlydifferent from eitherof the othertwo groups.A significantmain effect for whetheror not the coursewas requiredwas found (F'(1,15):38.44, p < .01). Studentswho took the coursebecauseit was requireddid better(ln : 91.69,sd:7.68) thanstudentswho took the courseas an elective(m:77.13, sd : 5.72).The interactionwas not significant(,F(2,15): I . 15,p > .05).The effect of the instructorwas not influencedby whetheror not the studentstook the coursebecauseit was required. Note that in the above example,we would have had to conduct Tukey's HSD to determinethe differencesfor INSTRUCT (using the Post-Hoc command).This is not nec-
71
Chapter6 Parametriclnferential Statistics
essaryfor REQUIREDbecauseit hasonly two levels(andonemustbe differentfrom the other). Phrasing ResultsThatAre Not Significant Our actualresultswerenot significant,sowe canstatethe following: factorial ANOVA A 3 (instructor)x 2 (requiredcourse)between-subjects who hadone participants wascalculatedcomparingthe final examscoresfor of threeinstructorsand who took the courseas a requiredcourseor as an elective.The maineffectfor instructorwasnot significant(F(2,15): 3.30,p > .05).The maineffectfor whetheror not it wasa requiredcoursewasalso wasalsonot (F(1,15)= .42,p > .05).Finally,the interaction not significant = significant(F(2,15) .136,p > .05). Thus, it appearsthat neitherthe instructornor whetheror not thecourseis requiredhasany significanteffect on finalexamscores. Practice Exercise Using PracticeDataSet 2 in AppendixB, determineif salariesare influencedby Write a stateor an interactionbetweensexandjob classification. sex,job classification, mentof results. ANOVA Section6.7 Repeated-Measures Description ANOVA extendsthe basicANOVA procedureto a withinRepeated-measures providedatafor morethanonelevelof subjectsindependentvariable (whenparticipants / test when more thantwo an independentvariable).It functionslike a paired-samples levelsarebeingcompared. Assumptions on an interThe dependentvariableshouldbe normallydistributedandmeasured of the dependentvariable shouldbe from the val or ratio scale.Multiple measurements same(or related)participants. Data Format SP,S^S At leastthreevariablesarerequired.Eachvariablein the SPSSdatafile shouldrepvariable'Thus,an resenta singledependentvariableat a singlelevelof theindependent analysisof a designwith four levelsof an independentvariable would requirefour variablesin the SPSSdatafile. ANOVA effect,usethe Mixed-Design a between-subjects If any variablerepresents commandinstead.
72
Chapter6 ParametricInferentialStatistics
Runningthe Command sample ThisexampleusestheGRADES.sav three includes data set. Recall that GRADES.sav sets of grades-PRETEST, MIDTERM, and threedifferenttimesduring '@ FINAL-that represent This allowsus to analyzethe effects t{!idf-!de the semester. of time on the test performanceof our sample f,txrd*r gp{|ssriori population(hencethe within-groupscomparison). tSoar Click Analyze,then GeneralLinear Model, then
r )
!8*vxido"' $.dryrigp,,, garlrr! Cffiporur&'..
RepeatedMeasures. Note that this procedure requires an optional module. If you do not have this command, you do not have the proper module installed. This procedure is NOT included in the student version o/SPSS. After selectingthe command, you will be W,;, -xl presented with the Repeated Measures Define Factor Nanc ffi* ,,.,. I Factor(s)dialog box. This is where you identify the Ui**tSubioct qllcv6ls: Nr,nrbar l3* B"*, I within-subjectfactor (we will call it TIME). Enter 3 cr,ca I for the numberof levels (threeexams) and click Add. Now click Define. If we had more than one Hsl independent variable that had repeatedmeasures, we could enter its name and click Add. You will be presentedwith the Repeated Mea*lxaNsre: I Measures dialog box. Transfer PRETEST, MIDTERM, and FINAL to the lhthin-Subjects Variables section. The variable names should be orderedaccordingto when they occurredin time (i.e., tl the values of the independent variable that they represent).
--J
l
l'or I -Ff? l I -9v1"
h{lruc{
-ssl
U"a* | cro,r"*.I
nq... I pagoq..I S""a.. I geb*.. I
Click Options,and SPSSwill computethe meansfor the TIME effect (seeone-way ANOVA for more detailsabouthow to do this). Click OK to run the command.
/J
Chapter6 ParametricInferentialStatistics
Readingthe Output This procedureusesthe GLM command. fil sesso,.rtput Model GLM standsfor "GeneralLinearModel." It is a H &[ GeneralLinear Title very powerful command,and many sectionsof Notes outputarebeyondthe scopeofthis text (seeoutWlthin-Subject s Factors put outlineat right). But for the basicrepeatedMultivariate Tests measures ANOVA, we are interested only in the Mauchly'sTes{ol Sphericity Testsol Wthin-SubjectsEflecls Testsof l{ithin-SubjectsEffects. Note that the Contrasts Tesisol \ /lthin-Subjects SPSSoutputwill includemanyothersectionsof Tests ol Between-Subjects Ellec{s output,whichyou canignoreat thispoint. Tests of WrllrilFstil)iects Effecls
Measure: MEASURE 1 TypelllSum ofSouares time Sphericity Assumed 5673.746 Gre e n h o u s e -Ge i s se r 5673.746 Huynh-Feldt 5673.746 Lower-bound 5673.746 Error(time)SphericityAssumed 930.921 Greenhouse-Geisser 930.921 Huynh-Feldt 930.921 Lower-bound 930.921 Source
df
2 1.211 1.247 1.000 40 24.218 24.939 20.000
MeanSouare 2836.873 4685.594 4550.168 5673.746 23.273 38.439 37.328 46.546
F 121.895 121.895 121.895 121.895
Siq.
000 000 000 000
The TPesls of l{ithin-SubjectsEffectsoutput shouldlook very similar to the output from the otherANOVA commands. In the aboveexample,the effect of TIME has an F value of 121.90with 2 and 40 degreesof freedom(we use the line for SphericityAssumed).It is significantat less than the .001 level. When describingtheseresults,we shouldindicatethetypeof test,F value,degrees of freedom,andsignificancelevel. Phrasing ResultsThatAre Significant Because the ANOVA resultsweresignificant,we needto do somesortof post-hoc analysis.One of the main limitationsof SPSSis the difficulty in performingpost-hoc analysesfor within-subjects factors.With SPSS,the easiestsolutionto this problemis to protected conduct dependentt testswith repeated-measures ANOVA. Thereare more powerful(andmoreappropriate) post-hocanalyses, but SPSSwill not computethem for us.For moreinformation,consultyour instructoror a moreadvanced statisticstext. To conductthe protectedt tests,we will comparePRETESTto MIDTERM, MIDTERM to FINAL, andPRETESTto FINAL, usingpaired-samples we I tests.Because areconductingthreetestsand,therefore,inflatingour Type I error rate,we will usea significancelevelof .017(.05/3)instead of .05.
74
Chapter6 ParametricInferentialStatistics
Paired Samples Test
Mean Yatr 1
PairedDifferences 950/o Confidence Intervalof the l)ifferance Std.Error std. Dcvietior Lower UoDer Mean
sig {2-tailed)
df
l,KE I E5 |
-15.2857
3 .9 6 41
.8650
-17.0902 - 1 3 . 4 8 1 3
-'t7.670
20
.000
M IDT ERM Pai 2
PRETEST . F INAL
-22.8095
8.97s6
1 . 9 5 8 6 -26.8952
-18.7239
-11.646
20
.000
P a i r3
M IDT ERM . F INAL
-7.5238
6.5850
1 . 4 3 7 0 - 1 0 . 5 231
4.5264
-5.236
20
.000
The three comparisonseachhad a significancelevel of less than.017, so we can concludethat the scoresimproved from pretestto midterm and again from midterm to final. To generatethe descriptive statistics, we have to run the Descriptivescommand for eachvariable. Becausethe resultsof our exampleabovewere significant,we could statethe following: A one-way repeated-measures ANOVA was calculatedcomparingthe exam scoresof participantsat three different times: pretest,midterm, and final. A significant effect was found (F(2,40) : 121.90,p < .001). Follow-up protectedI testsrevealedthat scoresincreasedsignificantlyfrom pretest(rn : 63.33,sd: 8.93)to midterm(m:78.62, sd:9.66), andagainfrom midterm to final (m = 86.14,sd:9.63). Phrasing Results That Are Not Significant With resultsthat are not significant,we could statethe following (the F valueshere havebeenmadeup for purposesof illustration): ANOVA was calculatedcomparingthe exam A one-way repeated-measures scoresof participantsat threedifferent times: pretest,midterm, and final. No significant effect was found (F(2,40) = 1.90, p > .05). No significant differenceexistsamongpretest(m: 63.33,sd = 8.93),midterm(m:78.62, sd:9.66), andfinal(m:86.14, sd: 9.63)means. Practice Exercise Use PracticeData Set 3 in Appendix B. Determine if the anxiety level of participants changedover time (regardlessof which treatmentthey received) using a one-way repeated-measures ANOVA and protecteddependentI tests.Write a statementof results. Section 6.8 Mixed-Design ANOVA Description The mixed-designANOVA (sometimescalled a split-plot design) teststhe effects of more than one independentvariable. At leastone of the independentvariables must
75
Chapter6 ParametricInferentialStatistics
be within-subjects(repeatedmeasures). At leastone of the independentvariables must be between-subjects. Assumptions The dependent variable shouldbe normally distributedand measuredon an interval or ratio scale. SPSS Data Format The dependent variable shouldbe representedas one variable for each level of the within-subjectsindependentvariables.Anothervariableshouldbe presentin the datafile for each between-subjectsvariable. Thus, a 2 x 2 mixed-designANOVA would require three variables,two representingthe dependent variable (one at each level), and one representingthe between-subjects independentvariable. Running the Command The GeneralLinear Model commandruns the Mixed-Design ANOVA command. Click Analyze, then General Linear Model, then Repeated Measures. Note that this procedure requires an optional module. If you do not have this command, you do not have the proper module installed. This procedure is NOT included in the st u dent versi on o/,SPSS.
lAnqtr:l g{hs Udear Add-gnr Uhdorn Rqprto r oitc|hfiy. $rtBtlca i T.d*t Cocpa''UF46
@
)
ilrGdtilod6b
)
iifr.{*r &rfcrgon tAdhraf
} l i
The RepeatedMeasures command shouldbe used if any of the independent gfitsrf*|drvdbbht IK variables are repeatedmeasures(withinter I subjects). s*i I This example also uses the t*., GRADES.sav data file. Enter PRETEST, 34 1 MIDTERM, and FINAL in the lhthinSubjects Variables block. (See the RepeatedMeasuresANOVA command in Section 6.7 for an explanation.) This exampleis a 3 x 3 mixed-design.There are two independent variables (TIME and INSTRUCT), each with three levels. Uoo*..I cotrl... I 8q... I Podg*. I Sm.. | 8e4.. I We previously enteredthe information for TIME in the RepeatedMeasures Define Factorsdialog box. We needto transferINSTRUCT into the Between-Subjects Factor(s) block. Click Options and selectmeansfor all of the main effectsand the interaction (see one-way ANOVA in Section6.5 for more details about how to do this). Click OK to run the command.
mT-
76
Chapter6 ParametricInferentialStatistics
Readingthe Output
f
measures command,the GLM procedureprovidesa As with the standardrepeated ANOVA, we are interestedin two secwe will For mixed-design lot of output not use. a Effects. tions.The first is Testsof Within-Subjects Tests ot Wrtlrilr-SuuectsEflects M eas u re : Typelll Sum of Squares sphericity Assumed 6 5673.74 Gre e n h o u s e -Ge i s s el 5673.746 Huynh-Feldt 6 5673.74 Lower-bound 6 5673.74 time' instruct Sphericily Assumed 806.063 0 re e n h o u s e -Ge i s s e r 806.063 Huynh-Feldt 3 806.06 Lower-bound 806.063 Erro(time) Sphericity Assumed 124857 Gre e n h o u sGe e- isser 124.857 Huynh-Feldt 124.857 Lower-bound 124.857 S our c e time
df
2
1.181 1.356 1.000 4 2.363 2.f' t2 2.000 36 21.265 24.41'.| 18.000
MeanS ouare 2836.873 4802.586 4' t83.583 5673.746 201.516 341.149 297.179 403.032 3.468 5.871 5.115 6.S37
F
817.954 817.954 817.954 817.954 58.103 58.103 58.103 58.103
sis 000 000 000 000 .000 .000 .000 .000
This sectiongives two of the threeanswerswe need (the main effect for TIME and the interactionresult for TIME x INSTRUCTOR). The secondsectionof output is Testsof Between-subjectsEffects (sample output is below). Here, we get the answersthat do not contain any within-subjects effects. For our example, we get the main effect for INSTRUCT. Both of thesesectionsmust be combinedto oroduce the full answer for our analysis. Eltecls Testsot Betryeen-Sill)iects Me a s u re M:EA SU R1E ransformed Variable: Typelll Sum S o u rc e of Squares Intercepl 3 6 4 19 2 .0 6 3 inslrucl 18 .6 9 8 En o r 43 6 8 .5 7 1
df I
2 18
F MeanS ouare 364192.063 1500.595 9.349 .039 242.6S 8
Siq.
.000 .962
If we obtain significant effects, we must perform some sort of post-hoc analysis. Again, this is one of the limitations of SPSS.No easyway to perform the appropriatepost(within-subjects)factors is available. Ask your instructor hoc test for repeated-measures for assistance with this. When describingthe results,you should include F, the degreesof freedom,and the significancelevel for eachmain effect and interaction.In addition,somedescriptivestatistics must be included(eithergive meansor includea figure). Phrasing Results That Are Significant There are three answers(at least) for all mixed-designANOVAs. Pleasesee Section 6.6 on factorial ANOVA for more detailsabouthow to interpretand phrasethe results.
77
Chapter6 ParametricInferentialStatistics
For the aboveexample,we could statethe following in the resultssection(note that this assumesthat appropriatepost-hoctestshave beenconducted): A 3 x 3 mixed-designANOVA was calculatedto examinethe effectsof the instructor(Instructors1,2, and,3)and time (pretest,midterm, and final) on scores.A significant time x instructor interactionwas present(F(4,36) = 58.10, p <.001). In addition,the main effect for time was significant (F(2,36): 817.95,p < .001). The main effect for instructor was not significant(F(2,18): .039,p > .05).Upon examinationof the data,it appears that Instructor3 showedthe most improvementin scoresover time. With significant interactions, it is often helpful to provide a graph with the descriptive statistics.By selectingthe Plotsoption in the main dialog box, you can make graphsof the interaction like the one below. Interactionsadd considerable complexityto the interpretationof statisticalresults.Consult a researchmethodstext or ask your instructor for more help with interactions. ilme
-
E@rdAs
I, I lffi---_ sw&t.tc
f Crtt"-l
-l
cd.r_J
s.Cr&Plob:
L:J r---r
-3
x E i I
-==-
-
-z
o0
-
.
ia E
, ,,,:;',,1. "l ","11::::J
I !
-gr! I
E t
UT
2(Il
inrt?uct
Phrasing Results That Are Not SigniJicant If our resultshad not beensignificant,we could statethe following (note that the .F valuesare fictitious): A 3 x 3 mixed-designANOVA was calculatedto examinethe effectsof the instructor(lnstructors1,2, and 3) and time (pretest,midterm, and final) on scores.No significant main effects or interactionswere found. The time x instructor interaction(F(4,36) = 1.10,p > .05), the main effect for time (F(2,36)= I .95,p > .05),andthe main effectfor instructor(F(2,18): .039,p > .05) were all not significant.Exam scoreswere not influencedby either time or instructor. Practice Exercise Use PracticeData Set 3 in AppendixB. Determineif anxiety levels changedover time for each of the treatment(CONDITION) types. How did time changeanxiety levels for eachtreatment?Write a statementof results.
78
Chapter6 ParametricInferentialStatistics
Section6.9 Analysisof Covariance Description Analysis of covariance(ANCOVA) allows you to remove the effect of a known covariate. In this way, it becomesa statisticalmethod of control. With methodological controls (e.g., random assignment),internal validity is gained.When such methodological controlsare not possible,statisticalcontrolscan be used. ANCOVA can be performed by using the GLM command if you have repeatedmeasuresfactors. Becausethe GLM commandis not included in the Base Statisticsmodule, it is not includedhere.
Assumptions ANCOVA requiresthatthe covariatebe significantlycorrelatedwith the dependent variable.The dependentvariableandthecovariateshouldbe at theinterval or ratio levels.In addition.bothshouldbe normallydistributed. SPS,SData Format The SPSSdatafile mustcontainonevariablefor eachindependentvariable,one variablerepresenting thedependentvariable,andat leastonecovariate. Running the Command The Factorial ANOVA commandis tsr.b"" GrqphrUUnbr,wrndowl.lab usedto run ANCOVA. To run it, click AnaReports lyze, then General Linear Model, then UniDascrl$lveSt$stks variate. Follow the directionsdiscussedfor factorial ANOVA, using the HEIGHT.sav sampledata file. Placethe variableHEIGHT as your DependentVariable.Enter SEX as your Fixed Factor, then WEIGHT as the CoDogrdrfvdirt{c variate.This last stepdeterminesthe differt{od.L. I l-.:-r [7mencebetweenregularfactorialANOVA and c**-1 ANCOVA. Click OK to run theANCOVA. llr,.. I |'rii l.ri
till Crd{'r.t{r}
td f-;-'l wlswc0't l '|ffi
j.*.1,q*J*lgl
79
I
*l*:" I q&,, I
Chapter6 ParametricInferentialStatistics
Reading the Output The outputconsistsof onemain sourcetable(shownbelow).This tablegivesyou the main effects and interactionsyou would have receivedwith a normal factorial ANOVA. In addition,thereis a row for eachcovariate.In our example,we haveonemain effect(SEX)andonecovariate(WEIGHT).Normally,we examinethe covariateline only to confirmthatthecovariateis significantlyrelatedto thedependentvariable. Drawing Conclusions This sampleanalysiswas performedto determineif malesand femalesdiffer in height,afterweightis accounted for. We knowthatweightis relatedto height.Ratherthan matchparticipants or usemethodological controls,we canstatisticallyremovethe effectof weight. When giving the resultsof ANCOVA, we must give F, degreesof freedom,and significancelevelsfor all main effects,interactions,and covariates.If main effectsor interactions are significant,post-hoctests must be conducted.Descriptive statistics (meanandstandarddeviation)for eachlevelof theindependentvariable shouldalsobe given. Tests of Between-SubjectsEffects Va ri a b l e :H EIGH T Typelll Sumof Mean Source S orrare Souares df 9orrecreoMooel 215.0274 1U /.D 'tJ 2 lntercept 5.580 I 5.580 WEIGHT 1 1 9.964 1't9.964 1 SEX 66.367 66.367 1 Error 1 3.91 1 13 1.070 Total 7 1 9 1 9.000 16 CorrectedTotal 228.938 15 a. R Squared= .939(AdjustedR Squared= .930)
F
100.476 5.215 112.112 62.023
Siq.
.000 .040 .000 .000
PhrasingResultsThatAre Significant Theaboveexample obtained a significant result,sowecouldstatethefollowing: A one-waybetween-subjects ANCOVA wascalculatedto examinethe effect of sexon height,covaryingout theeffectof weight.Weightwassignificantly relatedto height(F(1,13): ll2.ll,p <.001). The maineffectfor sexwas significant (F(1,13): 62.02,p< .001),with malessignificantl!taler (rn: 69.38,sd:3.70) thanfemales (m = 64.50, sd = 2.33\. Phrasing ResultsThatAre Not Significant If the covariateis not significant,we needto repeatthe analysiswithoutincluding thecovariate(i.e.,run a norrnalANOVA). For ANCOVA resultsthat are not significant,you could statethe following (note thattheF valuesaremadeup for thisexample):
80
Chapter6 ParametricInferentialStatistics
ANCOVA was calculatedto examinethe effect A one-waybetween-subjects of sex on height,covaryingout the effect of weight. Weight was significantly relatedto height(F(1,13)= ll2.ll, p <.001). The main effectfor sexwas not significant(F'(1,13)= 2.02,p > .05),with malesnot beingsignificantlytaller (m : 69.38, sd : 3.70) than females (m : 64.50, sd : 2.33\, even after covaryingout the effect of weight. Practice Exercise Using Practice Data Set 2 in Appendix B, determine if salariesare different for males and females.Repeatthe analysis,statisticallycontrolling for years of service.Write a statementof resultsfor each.Compareand contrastyour two answers.
Section6.10 MultivariateAnalysisof Variance(MANOVA) Description Multivariatetestsare thosethat involve more than one dependentvariable. While it is possible to conduct severalunivariatetests (one for each dependent variable), this causesType I error inflation.Multivariatetestslook at all dependentvariables at once, in much the sameway that ANOVA looks at all levels of an independent variable at once.
Assumptions MANOVA assumesthat you havemultipledependentvariables that are relatedto each other. Each dependent variable should be normally distributedand measuredon an interval or ratio scale. SPSS Data Format The SPSSdatafile shouldhavea variablefor eachdependentvariable. One addiindependent variable. It is also postional variable is required for eachbetween-subjects MANOVA and a repeated-measures sible to do a MANCOVA, a repeated-measures additional variablesin the data file. MANCOVA as well. Theseextensionsrequire Running the Command Note that this procedure requires an optional module. If you do not have this command, you do not have the proper module installed. This procedure is NOT included in the student version o/SPSS. The following data representSAT and GRE scoresfor 18 participants.Six participants receivedno specialtraining, six receivedshort-termtraining before taking the tests, and six receivedlong-term training. GROUP is coded0 : no training, I = short-term,2 = lons-term.Enter the dataand savethem as SAT.sav.
8l
Chapter6 Parametriclnferential Statistics
SAT 580 520 500 410 650 480 500 640 500 500 580 490 520 620 550 500 540 600
GRE 600 520 510 400 630 480 490 650 480 510 570 500 520 630 560 510 560 600
GROUP 0 0 0 0 0 0 I I I I I I 2 2 2 2 2 2
gaphs t$ltt6t Ad$tfrs Locate the Multivariate commandby ffi; I RrW clickingAnalyze,thenGeneralLinearModel, ) : , Wq$nv*st*rus thenMultivariate. t i ,1$lie
(onddc
Bsrysrsn Lgdrcs
) ) ) ,
WNry
Iqkrcc cocporpntr'.,
This will bring up the main dialog box. Enter the dependentvariables (GRE and SAT, in this case)in theDependentVariables blank. Enter the independent variable(s)(GROUP,in this case)in the Fixed Factor(s)blank. Click OK to run the command.
Readingthe Output We are interested in two primarysectionsof output.The first onegivesthe results of the multivariatetests.The sectionlabeledGROUPis the one we want. This tells us whetherGROUPhadan effecton anyof our dependentvariables.Fourdifferenttypesof multivariatetestresultsaregiven.The mostwidely usedis Wilks'Lambda.Thus,the answerfor theMANOVA is a Lambdaof .828,with 4 and28 degrees of freedom.Thatvalue is not significant.
82
Chapter6 ParametricInferentialStatistics
hIIltiv.T i.ile Teslsc Efiecl Value F Hypothesis df Enordf sis. Intercept Pillai'sTrace 87. .988 569.1 2.000 4.000 000 Wilks'Lambda .012 569.1 97. 2.000 4.000 000 Hotelling's Trace 81.312 569.187: 2.000 4.000 000 Roy'sLargestRoot 81.31 2 569.1 87! 2.000 4.000 000 group Pillai's Trace 1t4 .713 4.000 30.000 590 Wilks'Lambda 828 .693: 1.000 28.000 603 Holelling's Trace .669 206 4.000 26.000 619 't.469b Roy'sLargestRool 196 2.000 15.000 261 a. Exactstatistic b. Theslatistic is an upperbound onFthatyields a lowerbound onthesignificance level. c. Design: Intercept+group
The secondsectionof output we want gives the resultsof the univariatetests (ANOVAs)for eachdependentvariable. Tests ot BelweerFs[l)iectsEflects Typelll Sum S our c e Dependen Vta ri a b l e of Squares df ConecledModel sat 3077.778' 2 gre 5200.000b 2 Intercept sat 5 2 05688.889 1 gre 5 2 48800.000 1 group sat 307t.t78 2 gre 5200.000 I Enor sal 64033.333 15 gre 66400.000 15 Total sat 5272800.000 18 g te 5320400.000 18 4' Corrected Total sat ta 67111.111 qre 71600.000 17 a. R S qu a re=d .0 4 6(A d j u s teRd S q u a re=d -.081)
Mean Suuare
F 1538.889 .360 2600.000 .587 5205688.889 1219.448 5248800.000 1185.723 1538.889 .360 2600.000 .597 4268.889 4426.667
Sio.
.703 .568 .000 .000 .703 .568
b. R S qu a re d = .0 7(Ad 3 j u s teRdSq u a re d=051) -
Drawing Conclusions We interpretthe resultsof the univariatetestsonly if the group Wilks'Lambdais significant.Our resultsare not significant,but we will first considerhow to interpretresultsthataresignificant.
83
Chapter6 ParametricInferentialStatistics
Phrasing ResultsThatAre Significant If we had received the following output instead, we would have had a significant MANOVA, and we could statethe following:
llldbl.Ielrc
Ertscl Inlercepl PillaIsTrace Wilks'Lambda Holsllln0's T€cs RoYsLargsslRoot group
piilai's Trec€ Wllks'Lambda
value
d]
.989 .0tI 91.912 91.912 .579 .123
HolellingbTrace
613.592. 6a3.592. 6t3.592. 613.592. 1 n{o
Frr6t
2.000 2.000 2 000 2.000 | 000 I 000 I 000 2.000
3.157. 1.t01 't0.t25r
dl
t.000 r.000 1.000 {.000 30.000 28.000 26.000 I 5.000
Roy's Largsst Rool t .350 a Exatlslalistic b. Thestaustic is an uppsrboundon F halyisldsa lffisr boundonthesiqnitcancB lml
Sr d
.000 .000 .000 .000 032 011 008 002
c. Desi0n: Intercepl+0roup
Ieir
oaSrtrocs|alEl.
Effoda
Typelll gum Sourca corected todet
htorc€pt gfoup
OsoandahlVa.i sat gf8
Sat ue sal ore
Erol
sat 0rs
T0lal ote
Cor€cled Tolal
xean
dl
62017.tt8' 963tt att! 5859605.556 5997338.889 620tt.t78 863aral a 61216 667 6S| |6 667 5985900.000 6152r00 000
sal
126291.444
0re
151761 1|1
2 2
2 15 15 18
31038.889 t.250 13'1t2.222 9.r 65 5859605.556 I 368.71 1 5997338.8S9I 314.885 |.250 31038.089 431t2.222 9.t65
si0 006 002 000 000 006 002
4281.11 ',l 1561.,|'rr
l7
A one-wayMANOVA wascalculated examiningthe effectof training(none, short-term, long-term)on MZand GREscores.A significanteffectwasfound = ,423,p: .014).Follow-upunivariateANOVAs indicated (Lambda(4,28) thatSATscoresweresignificantlyimprovedby training(F(2,15): 7.250,p : .006).GRE scoreswere also significantlyimprovedby training(F(2,15): 9'465,P = '002). Phrasing ResultsThat Are Not Significant The actualexamplepresented wasnot significant.Therefore, we couldstatethefollowingin theresultssection: A one-wayMANOVA wascalculated examiningtheeffectof training(none, short-term,or long-term)on SIZ andGREscores.No significanteffect was = .828,p > .05).NeitherSAT nor GRE scoreswere found (Lambda(4,28) significantlyinfluenced by training.
84
Chapter7
NonparametricInferentialStatistics procedure is inapparametric testsareusedwhenthecorresponding Nonparametric propriate.Normally,this is becausethe dependentvariable is not interval- or ratioscaled.It can alsobe becausethe dependentvariable is not normallydistributed.If the statisticsmayalsobe appropriate. dataof interestarefrequencycounts,nonparametric Section7.1 Chi-square Goodnessof Fit Description whetheror not sampleproportions of fit testdetermines The chi-squaregoodness values.For example,it couldbe usedto determineif a die is "loaded" matchthe theoretical It be or fair. couldalso usedto comparethe proportionof childrenborn with birth defects has a statistically to the populationvalue (e.g.,to determineif a certainneighborhood rateof birth defects). higher{han-normal Assumptions aboutthe shape Thereareno assumptions We needto makevery few assumptions. for eachcategoryshouldbe at leastl, andno of the distribution.The expectedfrequencies of lessthan5. shouldhaveexpectedfrequencies morethan20%o of thecategories Data Format SP,S,S onlya singlevariable. SPSSrequires Runningthe Command We will createthe following dataset and call it COINS.sav.The following data represent theflippingof eachof two coins20 times(H is codedasheads,T astails). C o i NI: H T H H T H H T H HHTTTHTHTTH C O i N 2:T T H H T H T H TTHHTH HTHTHH Namethe two variablesCOINI andCOIN2,andcodeH as I andT as 2. The data file thatyou createwill have20 rowsof dataandtwo columns,calledCOINI andCOIN2.
85
Chapter7 NonparametricInferentialStatistics
To run the Chi-Square comman{ ) click Analyze,then Nonparametric Tests, ,le*l 1-l RePsts I Dascri*hrcststistict then Chi-Square. This will bring up the :Y,:f l Cdmec lih€nt main dialog box for the Chi-SquareTest. t 6nsrdlhcir$,lo*l t Csrd*c Transfer the variable COINI into , RiErcrskn the Test Variable List. A "fair" coin has I d.$dfy an equal chance of coming up heads or , O*ERadwllon t ft.k tails. Therefore, we will leave the E'"r@tkfta 5*b, pected Values setto All categories equal. ) We could test a specific set of } Qudry{onbd ROCCt vi,,. proportions by entering the relative freTail quencies in the Expected Values area. Click OK to run the analysis. Haad
i
i***- xeio'
akrdoprdc* 5unphg.,. KMgerdant Sffrpbs,,, 2Rddldtudcs",, KRd*ad Ssrpbs,,.
.r...X
pKl n3nI iryd I
3Ll
' i E$cOrURutClc 6 Garrna*c ; r (. Usarp.dirdrarr0o l : i
: . i) iir ' l
I
i r t , t , i, t
I
Reading the Output The output consistsof two sections. The first sectiongives the frequencies (observed1f) of each value of the variable. The expectedvalue is given, along with the differenceof the observed from the expectedvalue (called the residual).In our example,with 20 flips of a coin, we shouldget l0 of eachvalue.
cotNl
Head Tail Total
Observed Expected Residual N N 1. 0 11 10.0 10.0 9 -1. 0 20
The second section of the output gives the resultsof the chisquaretest.
Test Statistics coinl unt-uquareo df
.200
Asymp. Sig.
.655
1
a. 0 cells(.0%)haveexpected frequencies lessthan 5. The minimumexpected cellfrequency is 10.0.
86
Chapter7 NonparametricInferentialStatistics
Drawing Conclusions A significantchi-square testindicatesthat the datavary from the expected values. A testthatis not significaniindicatesthatthedata areconsistent with theexpectedvalues. Phrasing ResultsThat Are Significant In describingtheresults,you shouldstatethe v.alue of chi-square (whosesymbolis the 1'1' degrees.of dt.aot, it.'rignin.ance level,anaa description of the results.For example'with a significantchi-square (for,a sample'different from the exampleabove,such asif we haduseda "loaded"die),we courdstati the foilowing: A chi-square goodness of fit testwascalculated comparingthe frequencyof occurrence of eachvalueof a die.It washypothesized thateachvaluewould occuran equalnumberof times.Significanideviation nor trtr rtypothesized valueswasfoundfXlSl :25.4g,p i.OS).Thedie appears to be.,loaded.,, Notethatthisexampleuseshypothetical values. Phrasing ResultsThatAre NotSignificant If the analysisproducesno significantdifference, as in the previousexample,we couldstatethefollowing: A chi-square goodness of fit testwascalculated comparingthe frequencyof occulrence of headsandtailson a coin.It washypothesized would occur an equalnumberof times.No significant thateachvalue deviationfrom the hypothesized valueswasfound(xre): .20,p rlOsl. The coin apfearsto be fair. Practice Exercise Use Practice Data Set 2 in Appendix B. In the entire population from which the sample was drawn , 20yo of employees are clerical, 50Yo are technical, and 30%oare profes-
sional.Determinewhetheror not thesampledrawnconformsto thesevalues.HINT: Enter therelative proportionsof thethreesamplesin order(20, 50, 30) in the"ExpectedValues" afea.
Section 7.2 Chi-Square Test of Independence Description The chi-squaretest of independencetests whether or not two variables are independent of each other. For example, flips of a coin should be independent events, so knowing the outcome of one coin toss should not tell us anything about the secondcoin toss.The chi-squaretest of independenceis essentiallya nonparametricversion of the interaction term in ANOVA.
87
Chapter7 NonparametricInferentialStatistics
Drawing Conclusions A significant chi-squaretest indicatesthat the data vary from the expectedvalues. A test that is not significantindicatesthat the dataare consistentwith the expectedvalues. Phrosing Results That Are Significant In describingthe results,you shouldstatethe value of chi-square(whosesymbolis th" degreesof freedom,the significance level, and a descriptionof the results.For ex1t;, ample, with a significant chi-square(for a sampledifferent from the example above,such as if we had useda "loaded"die), we could statethe following: A chi-squaregoodnessof fit test was calculatedcomparingthe frequencyof occunenceof eachvalue of a die. It was hypothesizedthat eachvalue would occur an equal numberof times. Significantdeviationfrom the hypothesized < .05).The die appearsto be "loaded." valueswas foundC6:25.48,p Note that this exampleuseshypotheticalvalues. Phrasing Results That Are Not Significant If the analysisproducesno significantdifference,as in the previousexample,we could statethe following: A chi-squaregoodnessof fit test was calculatedcomparingthe frequencyof occurrenceof headsand tails on a coin. It was hypothesizedthat eachvalue would occur an equal number of times. No significant deviation from the hypothesized valueswas found (tttl: .20,p > .05).The coin appearsto be fair. Practice Exercise Use PracticeData Set 2 in Appendix B. In the entire populationfrom which the samplewas drawn, 20o of employeesare clerical, 50Yoaretechnical, and30o/oare professional. Determinewhetheror not the sampledrawn conformsto thesevalues.HINT: Enter the relativeproportionsof the threesamplesin order(20, 50,30) in the "ExpectedValues" area.
Section7.2 Chi-SquareTestof Independence Description The chi-squaretest of independencetests whether or not two variables are independentof each other. For example,flips of a coin should be independent events, so knowing the outcome of one coin toss should not tell us anything about the secondcoin toss.The chi-squaretest of independenceis essentiallya nonparametricversion of the interaction term in ANOVA.
87
Chapter7 NonparametricInferentialStatistics
Assumptions Very few assumptionsare needed.For example,we make no assumptionsaboutthe shapeof the distribution. The expectedfrequenciesfor each categoryshould be at least l, and no more than 20o/oof the categoriesshouldhave expectedfrequenciesof lessthan 5. Data Format .SP,SS At leasttwo variablesare required. Running the Command The chi-square test of independenceis a llfr}1,* qrg" l.[fltias llllnds* Heb component of the Crosstabscommand. For more Rpports fraqrerrhs,,, details, see the section in Chapter 3 on frequency l&nllt Compali distributionsfor more than one variable. This exampleusesthe COINS.savexample. ' 6nn*rEll."f,rsarMMd ConeJat* COINI is placed in the Row(s) blank, and COIN2 R*lo;,. Reqn*e$uh is placedin the Column(s)blank. P+ P{att,,. Clarsfy .
OaioRrductlon Scala
*QPlotr,.,
*:51 .!rd I
siJ
H'b I
Click Statistics.then check the Chi- p&ffi square box. Click Continue. You may also , x\iref CedilgacycodlU*r want to click Cel/s to select expected fre[, quenciesin addition to observedfrequencies, f FiadCrrm#rv Llrnbda as we will do on the next page.Click OK to ruTa'rvYY run the analysis. -Ndridblrttavc --
88
l{do}Hlcse6l
Coc*rgrfs '| '.r ''
i lrYa'*o
'. f[qea , TnFr f- Uatcru - *,
f-Eta l-
$,iii1ilr'tirir ':l |- Cqrddionc -*-, Or*ut i larna il* , : i f- lalar'd f Kcnddrldr!
'ld
.::'t':,:;,
d.ridbs l t-
-
.w . l H +l
Chapter7 NonparametricInferentialStatistics
Readingthe Output The output consistsof two parts.The first part gives you the counts.In this example,the actual are shown and expectedfrequencies becausethey were selectedwith the Cel/soption.
I ?r{t$t,,,.l!{h I l?ry1!,,, ! Ccrr*c ;l? 0b*cvcd 1v rxpccico N
P"cc*$or il* npw ;l" Colrm t|* ral
fq.rrt ,--! :
I
Cr"*
|
fron
I
Rrdd'Alc i if uttlaltddtu I lf $.n*rtu I ll* n4,a"e*aru*Auco
i I I I
1.,..-...-..-..--,..----".*.J i.-*-..--.--.--.'***--*--***-l - ' - ""- "- ^ ' - " ' - "- l : N o d r l c $a W{ t*t l^ r 8o.ndc..trradtr no,"u.or"or*r i i l. Innc*cr*vralrf* I f Trurlccalcorntr | I r Xor4.r*rprr I
COINI' GOIN2Crosstabulatlon
corN2 golNl
Fleao
Tail
Total
uounl Expected Count Count Expected Count Count Expected Count
Total
Tail
Head
4
6.1
11 11.0
EN
4
Y
5.0
4.1
9 .0
11
I
20
11.0
on
20.0
Note that you can also use the Cells option to display the percentagesof each variable that are each value. This is especiallyuseful when your groups are different sizes. The secondpart of the output gives the results of the chi-squaretest.The most commonlyusedvalue is the Pearson chi-square, shown in the first row (valueof .737).
Chi-SquareTests Value PearsonL;nr-!'quare ContinuityCorrectiorf LikelihoodRatio Fisher'sExactTest Linear-by-Linear Association N of ValidCases
{3t"
165 740
700
Asymp.Sig (2-sided)
df I
1 1
I
ExactSig, (2-sided)
ExactSig. (1-sided)
.653
.342
.391 .684 .390 .403
20
a' ComputedonlYfor a 2x2lable b. 3 cells(75.0%)haveexpectedcountlessthan 5. The minimumexpectedcountis 4. 05.
Drawing Conclusions A significantchi-squaretest resultindicatesthat the two variablesare not independent.A valuethatis not significantindicatesthatthe variablesdo not vary significantly from independence.
89
Chapter7 NonparametricInferentialStatistics
Phrasing ResultsThatAre Significant the degreesof In describingthe results,you shouldgive the valueof chi-square, with a sigFor example, results. the of freedom,the significancelevel,and a description above),we couldstate (for a datasetdifferentfrom the one discussed nificantchi-square thefollowing: A chi-squaretestof independence wascalculatedcomparingthe frequencyof wasfound(f(l) : heartdisease in menandwomen.A significantinteraction (68%)thanwere 23.80,p < .05).Men weremorelikely to get heartdisease women(40%\ assumes thata testwasrun in whichparticipants'sex,as Notethatthis summarystatement wascoded. well aswhetheror not theyhadheartdisease, Phrasing ResultsThatAre Not Significant A chi-square testthatis not significantindicatesthatthereis no significantdependenceof onevariableon the other.The coin exampleabovewasnot significant.Therefore, we couldstatethefollowing: A chi-squaretest of independencewas calculatedcomparing the result of flipping two coins.No significantrelationshipwas found (I'(l) = .737,p>
events. .05).Flipsof a coinappearto be independent Practice Exercise wantsto knowwhetheror not individualsaremorelikely to helpin an A researcher emergencywhen they are indoorsthan when they are outdoors.Of 28 participantswho who wereindoors,8helpedand wereoutdoors,19helpedand9 did not.Of 23 participants 15 did not. Enterthesedata,and find out if helpingbehavioris affectedby the environment.The key to this problemis in the dataentry.(Hint: How many participantswere there,andwhatdo you knowabouteachparticipant?) Section7.3 Mann-Whitnev UTest Description t test. equivalentof the independent The Mann-WhitneyU testis thenonparametric It testswhetheror not two independent samplesarefrom thesamedistribution.The Mannweaker independent t test,and the / testshouldbe usedif you WhitneyU testis thanthe canmeetits assumptions. Assumptions The Mann-WhitneyU testusesthe rankingsof the data.Therefore,the datafor the aboutthe shapeof the distwo samplesmustbe at leastordinal. Thereareno assumptions tribution.
90
!il;tIFfFF".
Chapter7 NonparametricInferentialStatistics
,SPS,S Data Format This command requiresa singlevariablerepresenting thedependentvariableand a second variableindicating groupmembership. Running the Command This example will use a new data file. It representsl2 participantsin a series of races. There were long races, medium races, and short races. Participants either had a lot of experience (2), some experience(l), or no experience(0). Enter the data from the figure at right in a new file, and savethe data file as RACE.sav. The values for LONG. MEDIUM, and SHORT represent the results of the race, with I being first place and 12 being last. WMw I Andy* qaphr U*lx , R.po.ti Doroht*o1raru.r , t Cd||olaaMcrfE | 6on ralthccl'lodd I ) C*ru|*c j r Rooa!33hn )' cta*iry i j ) DatrRoddton i I
:@e@ J llrsSarfr Q.s*y Cr*rd BOCCuv.,,,
I I
nl nl
'
n]' bl
[6 gl*l
)
Sada
'
To run the command,click Analyze, then Nonparametric Tests, then 2 Independent Samples. This willbring up the maindialog box. Enter the dependentvariable (LONG, for this example) in the Test Variable List blank. Enter the independentvariable (EXPERIENCE)as the Grouping Variable.Make surethat Mann-WhitneyU is checked. Click Define Groupsto selectwhich two groups you will compare.For this example,we will compare those runners with no experience(0) to those runners with a lot of experience(2). Click OK to run the analvsis.
lbb
t t
3 --ei--'
rf
2:
O$squr.,.. thd|*rl... Rur... 1-sdTpbX-5,.. rl@ttsstpht,., e Rlnand5{ndnr,,, KRdetcdsrmplo*.,,
liiiriil',,, -5J
IKJ ],TYTI GnuphgVadadr:
fffii0?r-* l ;
..'
fry* |
l*J
,''r 1,
iTod Tiryc**il7 Mxn.\dWn6yu
I_ Ko&nonsov'SrnirnovZ I l- Mosasa
9l
| &r*!r*l C.,"*
ryl
|
Chapter7 NonparametricInferentialStatistics
Reading the Output The output consistsof two sections.The first sectiongivesdeexDenence scriptive statisticsfor the two sam- rong .uu 2.OO ples. Becausethe data are only reTotal quiredto be ordinal, summaries relatingto their ranksareused.Those participantswho had no experienceaveraged6.5 as their placein the race.Thoseparticipants with a lot of experience averaged 2.5astheirplacein therace. The secondsectionof the outputis the resultof the Mann-WhitneyU test itself. The value obtained was0.0,with a significance levelof .021.
Ranks N
Mean Rank
Sum of Ranks
o.cu 2.50
Zb.UU
4
4 8
10.00
Test Statisticsb lono
Mann-wnrrney u WilcoxonW
.000 10.000 -2.309 .021
z Asymp.Sig.(2-tailed) ExactSig.[2'(1-tailed
a
.029
sis.)l
a. Notcorrectedfor ties. b. Grouping Variable: experience
Drawing Conclusions A significantMann-WhitneyU resultindicatesthatthetwo samplesaredifferentin termsof theiraverageranks. Phrasing Results That Are Significant Our exampleaboveis significant,sowe couldstatethefollowing: A Mann-Whitney Utestwascalculated examiningtheplacethatrunnerswith varyinglevelsof experience took in a long-distance race.Runnerswith no experience did significantlyworse(m place: 6.50)thanrunnerswith a lot of (m place: 2.5A;U = 0.00,p < .05). experience Phrasing ResultsThatAre Not Significant If we conductthe analysison the short-distance raceinsteadof the long-distance race,we will getthefollowingresults,whicharenot significant. T..t St tlttlct
Ranks
shorl
short
exDerience .00 2.00 Total
N
4 4 I
MeanRank Sumof Ranks 4.63 18.50 4.38 17.50
Manrr-wtltnoy u Wilcoxon W
z Asymp. Sig. (2-tail6d) Exact Sig. [2'(1-tailed
sis.)l
/.cuu 17.500 -.145 .885 t
.886
a. Not corscled for ties. b GrouprngVanabb: sxporience
92
Chapter7 NonparametricInferentialStatistics
Therefore,we could statethe following: A Mann-Whitney U test was used to examine the difference in the race performance of runners with no experience and runners with a lot of race.No significantdifferencein the resultsof experiencein a short-distance :7.50,p > .05).Runnerswith no experienceaveraged (U the racewas found a placeof 4.63. Runnerswith a lot of experienceaveraged4.38. Practice Exercise Assumethat the mathematicsscoresin PracticeExercise I (Appendix B) are measured on an ordinal scale.Determineif youngerparticipants(< 26) have significantly lower mathematicsscoresthan older participants.
Section7.4 Wilcoxon Test Description (dependequivalentof the paired-samples The Wilcoxontestis the nonparametric ent) t test.It testswhetheror not two relatedsamplesare from the samedistribution.The I test,so the I testshouldbe usedif you can Wilcoxontestis weakerthanthe independent meetits assumptions. Assumptions The Wilcoxontestis basedon thedifferencein rankings.The datafor the two samplesmustbe at leastordinal. Thereareno assumptions aboutthe shapeof thedistribution. SP^SS Data Format The testrequirestwo variables.Onevariablerepresents the dependentvariable at the dependentvarionelevelof the independentvariable.The othervariablerepresents ableat thesecondlevelof theindependentvariable. Running the Command lrnil$ re*|r u*t
Crrd*!
Locate the command by clicking Analyze, then Nonparametric Tests,then 2 Related Samples.This example uses the RACE.savdataset.
;l ;
'.;,'{' sd*
--f---""
o*sqr'.., ' *!!fd...
ffi
I log box
I
r r r :t^^- .^
| wilcoxon
for the
test.
m
li*.1;X.**".. I Note the similar-
-35+i
This will bring up the dia-
;
-.lggg*g,lggj=l
.W
itv hetween it and
tne dialog box
for the dependentt test. If you have trouble,
(pairedreferto Section6.4 on the dependent samples) I testin Chapter6.
93
lcucf8drcdomivti*r, , Vrnra2
- " -"-l
9$l ,Hrl * - --f l.dltf. --: ip ulte! sip l- xrNdil : ',I
oetm..I
Chapter7 NonparametricInferentialStatistics
Transferthe variablesLONG andMEDIUM as a pair andclick OK to run the test. This will determineif the runnersperformequivalentlyon long- and medium-distance races. Readingthe Output The outputconsistsof two parts.The first partgivessummarystatisticsfor thetwo variables.The secondsectioncontainstheresultof theWilcoxontest(givenas4. Ranks
TestStatisticsb Mean Rank
N MEUTUM - NegaUVe LONG Ranks Positive Ranks Ties Total
4
a
D
5
Sumof Ranks
5.38
21.50
4.70
23.50
3c 12
MEDIUM LONG
z
-1214
Asymp. sig. (2-tailed)
.904
a' Basedon negativeranks. b'wilcoxon SignedRanks Test
A . M ED IU M< L ON G b. tr,lgOtUtr,it> LONG c . LON G = M ED IU M
The exampleaboveshowsthatno significantdifferencewasfoundbetweenthe resultsof the long-distance andmedium-distance races. Phrasing ResultsThatAre Significant A significantresultmeansthat a changehasoccurredbetweenthe two measurements.If thathappened, we couldstatethefollowing: A Wilcoxon test examinedthe resultsof the medium-distance and longdistance races.A significantdifference wasfoundin theresults(Z = 3.40,p < .05).Medium-distance results. resultswerebetterthanlong-distance Notethattheseresultsarefictitious. Phrasing ResultsThatAre Not Significant In fact,the resultsin the exampleabovewerenot significant,so we couldstatethe following: A Wilcoxon test examinedthe resultsof the medium-distance and longdistance races.No significantdifference wasfoundin theresults(Z:4.121, p > .05).Medium-distance resultswerenot significantlydifferentfrom longdistance results. Practice Exercise Use the RACE.savdata file to determinewhetheror not the outcomeof shortraces.Phraseyour results. distanceracesis differentfrom thatof medium-distance 94
Chapter7 NonparametricInferentialStatistics
Section7.5 Kruskal-WallisIl Test Description The Kruskal-Wallis .F/ test is the nonparametric equivalent of the one-way ANOVA. It testswhetheror not severalindependentsamplescome from the samepopulation. Assumptions Becausethe test is nonparametric,there are very few assumptions.However, the for the dependentvariable. The indetest doesassumean ordinal level of measurement pendentvariable shouldbe nominal or ordinal. ,SPS,S Data Format SPSSrequiresone variableto representthe dependentvariable and anotherto representthe levelsof the independentvariable. Running the Command This exampleusesthe RACE.savdatafile. To run ffi the command,click Analyze, then Nonparametric Tests, Co,rpftlbfF then K IndependentSamples.This will bring up the main | 6;?Cl"tsilodd Canl*t i dialog box. Ratatdan i Enter the independentvariable (EXPERIENCE) ; d # r t 0d! ildrdoft as the Grouping Variable, and click Define Range to de- Ii scdi fine the lowest (0) and highest(2) values.Enter the de- l@ J thr Sarbr pendent variable (LONG) in the Test VariableList, and ) qr.nr(o.*rd I nOCArs,., click OK.
) r I '
I I ('$5*|..,. l
t
ffudd,,. RrJf.,. l-Samda(-5.,.
2 (nC*dsdd.r.,,
t c"{h.I +ry{ |
RangpforEm{*tEVrri*lt Mi{run
l0
Mrrdlrrrn
la
Hdp I
Ranks
Reading the Output
exoerience
The output consists of two parts. The first part gives summary statisticsfor each of the groups defined by the groupvariable. ing (independent)
rong
95
.uu 1.00 2.00 Total
MeanRank
N
4 4 4 12
1 0. 5 0 6.50 2.50
Chapter7 NonparametricInferentialStatistics
The secondpart of the outputgivesthe results value, of the Kruskal-Wallistest(givenas a chi-square but we will describeit as an II).The examplehereis a significant valueof 9.846.
Test Statisticf,b lonq
unr-uquare
9.846 2 .007
df Asymp. Sig.
a' KruskalWallis Test b. GroupingVariable:experience
Drawing Conclusions Like the one-wayANOVA, the Kruskal-Wallistest assumesthat the groupsare equal.Thus,a significantresultindicatesthatat leastoneof the groupsis differentfrom at leastone othergroup.Unlike the One-llayANOVAcommand,however,thereareno optionsavailablefor post-hocanalysis. Phrasing ResultsThatAre Significant Theexampleaboveis significant,sowe couldstatethe following: A Kruskal-Wallistest was conductedcomparingthe outcomeof a longdistancerace for nmnerswith varying levels of experience.A significant resultwas found(H(2): 9.85,p < .01),indicatingthat the groupsdiffered from eachother.Runnerswith no experience a placement of 10.50, averaged while runnerswith someexperience 6.50andrunnerswith a lot of averaged experienceaveraged2.50. The more experiencethe runnershad, the better theyperformed. Phrasing ResultsThqtAre Not Significant race,we would If we conducted the analysisusingthe resultsof the short-distance getthefollowingoutput,whichis not significant. Teet Statlstlc$'b
Ranks exoenence
snon .uu 1.00 2.00 Total
N
4 4 4 12
short
MeanRank 6.3E 7.25 5.88
unFuquare df Asymp.Sig.
.299
2 .861
a. KruskalWallisTest b. Grouping Variable: experience
This resultis not significant,sowe couldstatethe following: A Kruskal-Wallistest was conductedcomparingthe outcomeof a shortdistancerace for runnerswith varying levelsof experience.No significant differencewasfound(H(2):0.299,p > .05),indicatingthat the groupsdid averaged not differ significantlyfrom eachother.Runnerswith no experience placement 7.25 and averaged a of 6.38,while nmnerswith someexperience did not seemto nmnerswith a lot of experience 5.88.Experience averaged influencethe resultsof the short-distance race.
96
Chapter7 NonparametricInfbrentialStatistics
Practice Exercise Use PracticeData Set 2 in AppendixB. Job classificationis ordinal (clerical< technical< professional). Determineif malesandfemaleshavedifferinglevelsofjob classifications.Phraseyourresults. Section7.6 Friedman Test Description The Friedmantestis thenonparametric equivalentof a one-wayrepeated-measures ANOVA. It is usedwhenyou havemorethantwo measurements from relatedparticipants. Assumptions The testusesthe rankingsof thevariables,so the datamustbe at leastordinal. No otherassumptions arerequired. SP,SSData Format SPSSrequiresat leastthreevariablesin the SPSSdatafile. Eachvariablerepresentsthedependentvariableat oneof thelevelsof theindependentvariable. Running the Command Locatethe commandby clickingAnalyze,then NonparametricTests,thenK RelatedSamples.This will bringup themaindialogbox.
f r;'ld{iv
f* Cdi#ln
Placeall the variablesrepresenting the levelsof the independentvariable in the TestVariablesarea.For this example,usethe RACE.savdatafile andthevariablesLONG, MEDIUM. andSHORT.Click O1(.
97
Chapter7 NonparametricInferentialStatistics
Readingthe Output The outputconsistsof two sections. The first sectiongivesyou summarystatistics for eachof the variables.The secondsectionof the outputgivesyou the resultsof the test as a chi-squarevalue. The examplehere has a value of 0.049 and is not significant (Asymp.Sig.,otherwise knownasp,is .976,whichis greater than.05). Ranks
Test Statisticf
Mean Rank LUNU
MEDIUM SHORT
Chi-Square|
2.00 2.04 1.96
12 .049
d f lz Asymp.Sis.
|
.976
a. Friedman Test
Drawing Conclusions The Friedmantest assumesthat the threevariablesare from the samepopulation.A significantvalue indicatesthat the variablesare not equivalent.
PhrasingResultsThatAre Significant If we obtained a significant result, we could state the following (these are hypotheticalresults): A Friedman test was conductedcomparingthe averageplace in a race of nrnnersfor short-distance, medium-distance, and long-distanceraces.A significant differencewas found C@: 0.057,p < .05). The length of the race significantlyaffectsthe resultsof the race.
PhrasingResultsThatAre Not Significant In fact,theexampleabovewasnot significant,sowe couldstatethe following: A Friedmantest was conductedcomparingthe averageplacein a raceof runnersfor short-distance, races.No medium-distance, and long-distance = > significantdifferencewas founddg 0.049,p .05).The lengthof the racedid not significantly affecttheresultsof therace. Practice Exercise Usethedatain PracticeDataSet3 in AppendixB. If anxietyis measured on an ordinal scale,determineif anxietylevelschangedovertime.Phraseyour results.
98
Chapter8 $
't, s
TestConstruction
.fi i$
j Section 8.1 ltem-Total Analvsis Description Item-totalanalysisis a way to assessthe internal consistencyof a data set. As such,it is one of many testsof reliability. Item-totalanalysiscomprisesa numberof items that make up a scaleor testdesignedto measurea singleconstruct(e.g.,intelligence),and determinesthe degreeto which all of the items measurethe sameconstruct.It doesnot tell you if it is measuringthe correctconstruct(that is a questionof validity). Before a test can be valid, however,it must first be reliable. Assumptions All the itemsin the scaleshouldbe measuredon an interval or ratio scale.In addition, eachitem shouldbe normallydistributed.If your itemsare ordinal in nature,you can conduct the analysisusing the Spearmanrho correlationinsteadof the Pearsonr correlation. SP.SSData Forrnat SPSSrequiresone variablefor each item (or I Andyle Grrphs Ulilities Window Help question)in the scale.In addition,you must have a RPpo*s , Ds$rripllvoStatirlicc ) variablerepresentingthe total scorefor the scale.
iilrliililrr' Vuiabla*
.q ! l
I"1l J13l
F Ccmpar*Msans GtneralLinearMadel I
WRiqression
!:*l', ."
)
:
Conducting the Test
analysisuses the . ry1 PearsonItem-total Coruelation command.
, Condation Coalfubr*c l- Kcndafsta* l|7 P"as*
To conduct it, open the you crefile data l * * - * _ _ . -_ ^ _ _ QUESTIONS.sav ,'. Tsd ot SlJnific&cs ated in Chapter 2. Click Analyze, f ona.{atud I rr r'oorxaa I then Coruelate, then Bivariate. Place all questionsand the |7 nag ris{icsitqorrcldimr total in the right-hand window, and click OK. (For more help on conductingconelations,seeChapter5.) The total can be calculatedwith the techniquesdiscussedin Chapter2. f- Spoamm
99
Chapter8 Test Construction
Readingthe Output The output consistsof a correlation matrix containingall questionsand the total. Use the column labeledTOTAL, and locate the correlationbetweenthe total scoreand each question.In the exampleat right, QuestionI hasa correlationof 0.873with the totalscore.Question2 hasa correlation of -0.130 with the total. Question3 has a correlationof 0.926with thetotal.
Correlatlons
o2
o1 l'oarson \,orrelallon Sig. (2-tailed) N PearsonCorrelalion UZ Si9.(2-tailed) N P€arsonCorrelation Q3 Sig.(2-tailed) N TOTAL PearsonConelation Sig.(2-tailed) N Lll
1.000
t to
.553 4 -.447 .553 4
.718 .?82 4
.873 .127 4
TOTAL
Q3
cll
4
1.000 4
..229 .771 4 -.130 .870 4
.282 4
-.229 .771 4
1.000 4 .926 .074 4
o/J
.127 4
- .130 .870 4 .YZ O
.074 4 1 .000 4
Interpreting the Output Item-totalcorrelations shouldalwaysbe positive.If you obtaina negativecorrelation, that questionshouldbe removedfrom the scale(or you may considerwhetherit shouldbe reverse-keyed). Generally,item-totalcorrelationsof greaterthan 0.7 are considereddesirable. Thoseof lessthan 0.3 are considered weak.Any questionswith correlationsof lessthan 0.3 shouldbe removedfrom thescale. After thetoNormally,theworstquestionis removed,thenthetotal is recalculated. tal is recalculated, the item-totalanalysisis repeatedwithout the questionthat was removed.Then,if anyquestions havecorrelations of lessthan0.3,the worstoneis removed, andtheprocessis repeated. When all remainingcorrelationsare greaterthan 0.3, the remainingitems in the scaleareconsidered to be thosethatareinternallyconsistent. Section8.2 Cronbach's Alpha Description Cronbach's As such,it is oneof many alphais a measure of internalconsistency. testsof reliability. Cronbach'salphacomprisesa numberof itemsthat makeup a scale designedto measurea singleconstruct(e.g.,intelligence), the degreeto and determines whichall the itemsaremeasuring the sameconstruct.It doesnot tell you if it is measuring the correctconstruct(thatis a questionof validity). Beforea testcanbe valid,however,it mustfirst be reliable. Assumptions All the itemsin thescaleshouldbe measured on an interval or ratio scale.In addition, eachitem shouldbe normallydistributed.
100
Chapter8 Test Construction
Data Format ,SP,SS SPSSrequiresonevariablefor eachitem(or question)in the scale. Running the Command This example uses the QUESTIONS.savdatafile we first createdin Chapter 2. Click Analyze,thenScale,then Reliability Analysis. This will bring up the maindialogbox for Reliability Analysis.Transferthe questions from your scaleto the Itemsblank,and click OK. Do not transferany variablesrepre' sentingtotal scores.
Udtbr Wh&* tbb ) f:
;
l, it.
) )
Notethatwhenyou changethe oPof tionsunderModel,additionalmeasures (e.9., can split-half) internal consistency be calculated.
ql
*
q3
Readingthe Output In this example,the reliability coefficientis 0.407. Numberscloseto 1.00are very good,but numberscloseto poorinternal consistency. 0.00represent
RelLrltilityStntistics Cronbach's N of ltems Aloha 3 .407
Section8.3 Test-RetestReliability Description Test-retestreliabilityis a measureof temporal stability. As such,it is a measure thattell you theextentto whichall of internal consistency of reliability. Unlikemeasures of temporal of the questionsthat makeup a scalemeasurethe sameconstruct,measures overtime and/orovermultiple stability tell you whetheror not the instrumentis consistent administrations. Assumptions The total scorefor the scaleshouldbe an interval or ratio scale.The scalescores shouldbe normallydistributed.
l 0l
Chapter8 Test Construction
.SP,S,S Data Format SPSSrequiresa variablerepresenting thetotalscorefor the scaleat the timeof first administration. A secondvariablerepresenting at a thetotal scorefor the sameparticipants differenttime (normallytwo weekslater)is alsorequired. Running the Command The test-retestreliabilitycoefficientis simplya Pearsoncorrelationcoefficientfor the relationship To computethe coefbetweenthetotal scoresfor the two administrations. ficient,follow the directionsfor computinga Pearsoncorrelationcoefficient(Chapter5, Section5.1).Usethetwo variablesrepresenting of thetest. thetwo administrations Readingthe Output The correlationbetweenthe two scoresis the test-retestreliabilitycoeffrcient.It shouldbe positive.Strongreliability is indicatedby valuescloseto 1.00.Weakreliability is indicated by valuescloseto 0.00. Section 8.4 Criterion-Related Validitv Description Criterion-related validity determines the extentto which the scaleyou are testing correlates with a criterion.For example,ACT scoresshouldcorrelatehighly with GPA. If theydo, thatis a measure of validity forACT scores.If theydo not,thatindicatesthatACT scoresmaynot be valid for theintendedpurpose. Assumptions All of the sameassumptions for the Pearson correlationcoefficientapplyto measuresof criterion-related validity(intervalor ratio scales,normal distribution,etc.). .SP,S,S Data Format Two variablesarerequired.Onevariablerepresents thetotal scorefor the scaleyou aretesting.Theotherrepresents thecriterionyou aretestingit against. Running the Command Calculatingcriterion-related validity involvesdeterminingthe Pearsoncorrelation valuebetweenthe scaleandthecriterion.SeeChapter5, Section5.1 for completeinformation. Readingthe Output The correlationbetweenthe two scoresis the criterion-related validity coefficient. It shouldbe positive.Strongvalidity is indicatedby valuescloseto 1.00.Weakvalidity is indicated by valuescloseto 0.00.
102
AppendixA
Effect Size Many disciplines are placing increasedemphasison reporting effect size. While statisticalhypothesistestingprovidesa way to tell the odds that differencesare real, effect sizesprovide a way to judge the relative importanceof thosedifferences.That is, they tell us the size of the differenceor relationship.They are alsocritical if you would like to estimatenecessarysamplesizes,conducta power analysis,or conducta meta-analysis. Many professionalorganizations(e.g.,the AmericanPsychologicalAssociation)are now requiring or strongly suggestingthat effect sizesbe reportedin addition to the resultsof hypothesis tests. Becausethere are at least 4l different types of effect sizes,reach with somewhat different properties,the purposeof this Appendix is not to be a comprehensiveresourceon effect size, but ratherto show you how to calculatesomeof the most common measuresof effectsizeusingSPSS15.0. Co h e n ' s d One of the simplest and most popular measuresof effect size is Cohen's d. Cohen's d is a memberof a classof measurements called"standardized meandifferences." In essence, d is the differencebetweenthe two meansdivided by the overall standard deviation. It is not only a popularmeasureof effect size,but Cohenhasalso suggested a simple basisto interpretthe value obtained.Cohen'suggested that effect sizesof .2 aresmall, .5 are medium,and .8 are large. We will discussCohen's d as the preferredmeasureof effect size for I tests.Unfortunately,SPSSdoes not calculateCohen's d. However,this appendixwill cover how to calculateit from the output that SPSSdoesproduce.
EffectSizefor Single-Sample t Tests Although SPSSdoesnot calculateeffect size for the single-sample / test,calculating Cohen's d is a simplematter.
' Kirk, R.E. (1996). Practicalsignificance:A conceptwhose time has come. Educational & Psychological Measurement,56, 7 46-7 59. ' Cohen,J. (1992). A power primer. PsychologicalBulletin, t t 2, 155-159.
103
Appendix A Effect Size
T-Test Om'S$da
sltb.tr
std xoan LENOIH
0
Cohen's d for a single-sampleI test is equal to the mean differenceover the standard deviation. If SPSSprovidesus with the following output,we calculated as indicatedhere:
Std Eror Ygan
Dflalon
16 qnnn
| 9?2
o'r..Srt!-
d-
Iad
sD
TsstValuo= 35 95$ Conld6nca lhlld.t ottho d
71r'
slg (2-lailsd) I
D
Igan Dill6f6nao
.90 t.t972 d = .752 d-
uo00r
sooo
I l 5 5 F- 0 7
In this example,using Cohen's guidelinesto judge effect size, we would have an effect size betweenmedium and large.
EffectSizefor Independent-Samples t Tests Calculating effectsizefromtheindependent t testoutputis a little morecomplex becauseSPSSdoesnot provideus with the pootedstandard deviation. The uppersection of the output, however, does provide us with the information we need to calculateit. The output presentedhere is the sameoutput we worked with in Chapter6. Group Statlltlcr
graoe
morntno No yes
S pooled =
N
z 2
Mean Std. Oeviation J.JJIDJ UZ.SUUU 7.07107 78.0000
Std.Error Mean Z,SUUUU
s.00000
(n ,-1 )s , 2+ (n , -l)s r2 n r+ n r-2
- tlr.Sr552+ (2 -t)7.07| (
rp o o te d-\ /12
2+ 21 S pool"d
=
Spooted= 5'59
Once we have calculatedthe pooled standard deviation (spooud), we can calculate Cohen'sd.
, x ,-x ,
Q=- -
S pooled
- 78.00 82.50 5.59 =.80 d , o =-
104
Appendix A
Effect Size
So, in this example,using Cohen's guidelinesfor the interpretationof d, we would have obtaineda large effect size.
EffectSizefor Paired-Samples t Tests As you have probably learned in your statisticsclass, a paired-samplesI test is just really I test.Therefore,the procedurefor calculata specialcaseof the single-sample ing Cohen's d is also the same.The SPSSoutput,however,looks a little different,so you will be taking your valuesfrom different areas. Pair€dSamplosTest Paired Differences 95% Confidence Intervalof the
std. Mean YAlt 1
|'KE ttss I . FINAL
f)eviatior
-22.809s
A OTqA
Std.Error Mean
f)iffcrcnne
Lower
sig Upper
1.9586 -26.8952 -18.7239
(2-tailed)
df 11.646
20
.000
,D
u --
JD
- 22.8095 8.9756 d = 2.54 Notice that in this example,we representthe effect size (d; as a positive number eventhoughit is negative.Effect sizesare alwayspositivenumbers.In this example,using Cohen'sguidelinesfor the interpretationof d, we have found a very large effect size. 12(Coefficient of Determination) While Cohen's d is the appropriatemeasureof effect size for / tests, correlation and regressioneffect sizes should be determinedby squaringthe correlation coefficient. This squaredcorrelationis called the coefficient of determination. Cohen' suggestedhere that correlationsof .5, .3, and.l corresponded to large,moderate,and small relationships. Thosevaluessquaredyield coefficientsof determination of .25,.09, and .01 respectively. It would appear,therefore,that Cohen is suggestingthat accounting for 25o/oof the variability representsa large effbct, 9oha moderateeffect, and lo/oa small effect.
EffectSizefor Correlation Nowhere is the effect of sample size on statisticalpower (and therefore significance) more apparentthan with correlations.Given a large enoughsample,any correlation can becomesignificant.Thus, effect size becomescritically important in the interpretation of correlations.
'Cohen, J. (1988). Statisticalpower analysisfor the behavioral sciences(2"d ed). New Jersey:Lawrence Erlbaum.
105
Appendix A
Effect Size
The standardmeasureof effect size for correlationsis the coefficient of determination (r2) discussedabove. The coefficient should be interpretedas the proportion of variance in the dependent variable that can be accountedfor by the relationshipbetween the independentand dependentvariables.While Cohenprovidedsomeusefulguidelines for interpretation,each problem should be interpretedin terms of its true practical significance. For example,if a treatmentis very expensiveto implement,or has significant side effects,then a larger correlationshould be required before the relationshipbecomes"important." For treatmentsthat are very inexpensive,a much smaller correlationcan be considered"important." To calculatethe coefficientof determination, simply take the r value that SPSS providesand squareit.
EffectSizefor Regression TheModelSummary section of the output reports R2 for you. The example output here shows a coefficient of determination of .649, meaningthat almost 65% (.649\ of the variability in the dependentvariable is accountedfor by the relationship betweenthe dependent and independentvariables.
ModelSummary Adjusted R R Souare R Souare .8064 .649 .624
Model
Std. Error of the Estimate 16.1 480
a. Predictors: (Constant), HEIGHT
Eta SquaredQtz) A third measureof effect size is Eta Squared (ry2).Eta Squared is usedfor Analysis of Variancemodels.The GLM (GeneralLinear Model) functionin SPSS(the function that runs the proceduresunderAnalyze-General Linear Model) will provide Eta Squared (tt'). Eta Squared has an interpretationsimilar to a squaredcorSS"ik", relationcoefficient1l1.lt represents the proportionof the variance tt n, -accountedfor by the effect.-Unlike ,2, ho*euer, which represents Sqr," . SS* onlylinearrelationships,,72canrepresentanytypeofrelationship.
Effect Sizefor Analysis of Variance For mostAnalysisof Varianceproblems,you shouldelectto reportEta Squared asyour effectsizemeasure. SPSSprovidesthis calculationfor you as part of the General LinearModel(GLIrl)command. To obtainEta Squared,you simplyclick on theOptionsbox in the maindialog box for the GLM commandyou arerunning(this worksfor Univariate,Multivariate,and RepeatedMeasuresversionsof the commandeventhoughonly the Univariateoption is presented here).
106
Appendix A Effect Size
Onceyou haveselectedOptions,a new dialog box will appear.One of the optionsin that box will be Estimatesof ffict sze. When you select that box, SPSSwill provide Eta Squaredvaluesaspartofyour output.
l-[qoral* |- $FdYr|dg f A!.|it?ld
rli*dr f'9n
drffat*l
Cr**l.|tniliivdrra5:
tl
l c,,*| n* | Testsot EetweerF$iltectsEftcct3
Dependenl Variable: score TypelllSum nf Sdila!es df tean Souare Source u0rrecleo M00el 1 0 .4 5 0 r 5.225 2 4 Inlercepl I s't.622 91.622 1 gr0up 5.225 1 0 .4 5 0 Enor 12 .27 4 3 .2 8 3 Total 15 1 0 5 .0 0 0 't4 Corrected Total 1 3 .7 3 3 a. R Squared = .761(Adiusied = .721) R Squared
PadialEta F
19.096 331.862 1S .096
si q, .000 .000 .000
Souared
761 965 761
In the examplehere,we obtainedan Eta Squaredof .761for our main effectfor groupmembership. Because we interpretEta Squaredusingthe sameguidelinesas,', we wouldconcludethatthisrepresents alargeeffectsizefor groupmembership.
107
Appendix B PracticeExerciseData Sets
PracticeData Set2 A survey of employeesis conducted.Each employeeprovides the following information: Salary (SALARY), Years of Service (YOS), Sex (SEX), Job Classification (CLASSIFY), and EducationLevel (EDUC). Note that you will haveto code SEX (Male: : 3). l, Female: 2) and CLASSIFY (Clerical: l, Technical: 2, Professional
ll*t:g - Jrs$q Numsric lNumedc
SALARY 35,000 18,000 20,000 50,000 38,000 20,000 75,000 40,000 30,000 22,000 23,000 45,000
YOS 8 4 I 20 6 6 17 4 8 l5 16 2
SEX Male Female Male Female Male Female Male Female Male Female Male Female
CLASSIFY EDUC 14 Technical l0 Clerical Professional l6 Professional l6 20 Professional 12 Clerical 20 Professional 12 Technical 14 Technical 12 Clerical 12 Clerical Professional l6
PracticeData Set3 Participants who havephobiasare given one of threetreatments(CONDITION). Their anxietylevel (1 to l0) is measured at threeintervals-beforetreatment(ANXPRE), one hour after treatment(ANXIHR), and againfour hoursafter treatment(ANX4HR). Notethatyou will haveto codethevariableCONDITION.
4lcondil 7l
,Numeric i''* '"
ll0
Appendix B Practice Exercise Data Sets
ANXPRE 8
t0 9 7 7 9 l0 9 8 6 8 6 9 l0 7
ANXIHR 7 l0 7 6 7 4 6 5 3 3 5 5 8 9 6
ANX4HR 7 l0 8 6 7 5 I 5 5 4 3 2 4 4 3
lll
CONDITION Placebo Placebo Placebo Placebo Placebo ValiumrM ValiumrM ValiumrM ValiumrM ValiumrM ExperimentalDrug ExperimentalDrug ExperimentalDrug ExperimentalDrug ExperimentalDrug
AppendixC
Glossary everypossibleoutcome. All Inclusive.A set of eventsthat encompasses Alternative Hypothesis. The oppositeof the null hypothesis,normally showing that there is a true difference. Generallv. this is the statementthat the researcherwould like to support. Case ProcessingSummary. A sectionof SPSSoutput that lists the number of subjects usedin the analysis. Coefficient of Determination. The value of the correlation, squared. It provides the proportionof varianceaccountedfor by the relationship. Cohen's d. A common and simple measureof effect size that standardizesthe difference betweengroups. Correlation Matrix. A section of SPSS output in which correlationcoefficientsare reportedfor allpairs of variables. Covariate. A variable known to be relatedto the dependentvariable,but not treatedas an independentvariable.Used in ANCOVA as a statisticalcontrol technique. Data Window. The SPSSwindow that containsthe data in a spreadsheetformat. This is the window usedfor running most commands. Dependent Variable. An outcome or response variable. The dependent variable is normally dependenton the independentvariable. Descriptive Statistics. Statisticalproceduresthat organizeand summarizedata. $ $
nialog Box. A window that allows you to enter information that SPSS will use in a command.
* il I 'f
Oichotomous with onlytwo levels(e.g.,gender). Variables.Variables
t
DiscreteVariable. A variablethat can have only certainvalues(i.e., valuesbetween *hich thereis no score,like A, B, C, D, F).
I'
i .
Effect Size. A measurethat allows one to judge the relative importanceof a differenceor relationshipby reportingthe size of a difference. Eta Squared (q2).A measureof effectsizeusedin Analysisof Variancemodels.
I 13
AppendixC Glossary
Grouping Variable. In SPSS,the variableusedto representgroup membership.SPSS often refersto independent variablesas groupingvariables;SPSSsometimesrefersto groupingvariablesasindependent variables. IndependentEvents.Two eventsareindependent if informationaboutoneeventgivesno informationaboutthe secondevent(e.g.,two flips of a coin). IndependentVariable.Thevariablewhoselevels(values)determinethe groupto whicha subjectbelongs.A true independentvariable is manipulatedby the researcher.See GroupingVariable. Inferential Statistics.Statisticalproceduresdesignedto allow the researcherto draw inferences abouta populationon thebasisof a sample. Interaction.With morethanone independent variable,an interactionoccurswhena level of oneindependent variableaffectstheinfluenceof anotherindependent variable. Internal Consistency.A reliabilitymeasurethat assesses the extentto which all of the itemsin an instrumentmeasure thesameconstruct. Interval Scale. A measurement scale where items are placed in mutually exclusive categories,with equal intervalsbetweenvalues.Appropriatetransformationsinclude counting,sorting,andaddition/subtraction. Levels.The valuesthata variablecanhave.A variablewith threelevelshasthreepossible values. Mean.A measure of centraltendencywherethesumof thedeviationscoresequalszero. Median.A measureof centraltendencyrepresenting the middleof a distributionwhenthe dataaresortedfrom low to high.Fifty percentof thecasesarebelowthe median. Mode. A measureof centraltendencyrepresenting the value (or values)with the most subjects(thescorewith thegreatestfrequency). Mutually Exclusive. Two events are mutually exclusivewhen they cannot occur simultaneously. Nominal Scale. A measurement scalewhere items are placed in mutually exclusive categories. Differentiationis by nameonly (e.g.,race,sex).Appropriatecategories include "same"or "different." Appropriatetransformations includecounting. NormalDistribution.A symmetric, unimodal, bell-shaped curve. Null Hypothesis.The hypothesisto be tested,normally in which there is no tnre difference.It is mutuallyexclusiveof thealternative hypothesis.
n4
Appendix C Glossary Ordinal Scale. A measurementscale where items are placed in mutually exclusive categories, in order. Appropriate categories include "same," "less," and "more." Appropriatetransformationsinclude countingand sorting. Outliers. Extreme scoresin a distribution. Scoresthat are very distant from the mean and the rest of the scoresin the distribution. Output Window. The SPSSwindow that containsthe resultsof an analysis.The left side summarizesthe resultsin an outline. The right side containsthe actual results. Percentiles(Percentile Ranks). A relative scorethat gives the percentageof subjectswho scoredat the samevalue or lower. Pooled Standard Deviation. A single value that representsthe standarddeviation of two groupsofscores. Protected Dependent / Tests. To preventthe inflation of a Type I error, the level needed to be significantis reducedwhen multipletestsare conducted. Quartiles. The points that define a distribution into four equal parts. The scoresat the 25th,50th,and 75th percentileranks. Random Assignment. A procedurefor assigningsubjectsto conditionsin which each subjecthasan equalchanceofbeing assignedto any condition. Range. A measureof dispersionrepresentingthe number of points from the highestscore through the lowest score. Ratio Scale. A measurementscale where items are placed in mutually exclusive categories, with equal intervals between values, and a true zero. Appropriate transformationsinclude counting,sorting,additiott/subtraction, and multiplication/division. Reliability. An indication of the consistencyof a scale. A reliable scale is intemally consistentand stableover time. Robust. A test is said to be robust if it continuesto provide accurateresultseven after the violationof someassumptions. Significance. A difference is said to be significant if the probability of making a Type I error is less than the acceptedlimit (normally 5%). If a difference is significant, the null hypothesisis rejected. Skew. The extent to which a distribution is not symmetrical.Positive skew has outliers on the positive(right) sideof the distribution.Negativeskew hasoutlierson the negative(left) sideof the distribution. Standard Deviation. A measureof dispersion representinga special type of average deviation from the mean.
I 15
AppendixC Glossary
Standard Error of Estimate.The equivalentof the standarddeviationfor a regression line with a standard line.The datapointswill be normallydistributedaroundthe regression deviationequalto the standarderrorof the estimate. StandardNormal Distribution. A normaldistributionwith a meanof 0.0 anda standard deviation of 1.0. String Variable. A stringvariablecancontainlettersandnumbers.Numericvariablescan containonly numbers.Most SPSScommands will not functionwith stringvariables. Temporal Stability. This is achievedwhen reliability measureshave determinedthat scoresremainstableovermultipleadministrations of the instrument. Tukey's HSD. A post-hoccomparisonpurportedto reveal an "honestly significant difference"(HSD). erroneouslyrejectsthe null Type I Error. A Type I error occurswhen the researcher hypothesis. fails to rejectthe erroneously Type II Error. A Type II erroroccurswhenthe researcher null hypothesis. Valid Data.DatathatSPSSwill usein its analyses. Validity. An indicationof theaccuracyof a scale. Variance.A measure deviation. of dispersion equalto thesquaredstandard
l16
AppendixD
SampleDataFilesUsedin Text A varietyof smalldatafiles areusedin examplesthroughoutthis text. Hereis a list of whereeachappears. COINS.sav Variables:
COINI COIN2
Enteredin Chapter7 GRADES.sav Variables:
PRETEST MIDTERM FINAL INSTRUCT REQUIRED
Enteredin Chapter6 HEIGHT.sav Variables:
HEIGHT WEIGHT SEX
Enteredin Chapter4 QUESTIONS.Sav Variables: Ql Q2 (recodedin Chapter2) Q3 TOTAL (addedin Chapter2) GROUP(addedin Chapter2) Enteredin Chapter2 Modifiedin Chapter2
tt7
AppendixD SampleDataFilesUsedin Text
RACE.sav Variables:
SHORT MEDIUM LONG EXPERIENCE
Enteredin Chapter7 SAMPLE.sav Variables:
ID DAY TIME MORNING GRADE WORK TRAINING (addedin Chapter1)
Enteredin ChapterI Modifiedin ChapterI SAT.sav Variables:
SAT GRE GROUP
Enteredin Chapter6 Other Files For somepracticeexercises, seeAppendixB for neededdatasetsthatarenot usedin any otherexamplesin thetext.
l18
AppendixE
Informationfor Users of EarlierVersionsof SPSS
r{ ril
Thereare a numberof differences betweenSPSS15.0and earlierversionsof the software.Fortunately,mostof themhavevery little impacton usersof this text. In fact, mostusersof earlierversionswill be ableto successfully usethis text withoutneedingto reference this appendix. Variable nameswere limited to eight characters. Versionsof SPSSolderthan 12.0arelimitedto eight-character variablenames.The othervariablenamerulesstill apply.If you areusingan olderversionof SPSS,you needto makesureyou useeightor fewerlettersfor yourvariablenames. The Data menu will look different. The screenshotsin the text where the Data menu is shown will look slightly different if you are using an older version of SPSS.These missing or renamedcommandsdo not have any effect on this text, but the menusmay look slightly different. If you are using a version of SPSS earlier than 10.0,the Analyzemenu will be called Statistics instead.
Dsta TlrBfqm
rn4aa€ getu
t
oe&reoacs,., r
hertYffd,e l|]lrffl Clsl*t
GobCse",,
Graphing functions. Prior to SPSS12.0,the graphingfunctionsof SPSSwerevery limited.If you are using a version of SPSSolder than version 12.0, third-partysoftwarelike Excel or of graphs.If you areusingVersion14.0of for theconstruction SigmaPlotis recommended graphing. to Chapter4, whichdiscusses thesoftware,useAppendixF asan alternative
l 19
Appendix E Information for Usersof Earlier Versionsof SPSS
Variableiconsindicatemeasurement type. In versions of SPSS earlier than 14.0,variableswere represented in dialog boxes with their variable label and an icon that represented whether the variable was string or numeric (the examplehere shows all variablesthat were numeric). Starting with Version 14.0, SPSS shows additional information about each variable. Icons now representnot only whether a variable is numeric or not, but also what type of measurementscale it is. Nominal variables are representedby the & icon. Ordinal variables are representedby the dfl i.on. Interval and ratio variables(SPSSrefers to them as scale variables) are represented bv the d i"on.
f Mlcrpafr&nlm /srsinoPtrplecnft / itsscpawei [hura /v*tctaweir**&r /TimanAccAolao 1c I $ Cur*ryU Orlgin ClNrmlcnu clrnaaJ dq$oc.t l"ylo.:J It liqtq'ftc$/droyte
sqFr*u.,| 4*r.-r-l qql{.,I SeveralSPSSdata filescan now be openat once. Versionsof SPSSolder than 14.0 could have only one data file open at a time. Copying data from one file to another entailed a tedious process of copying/opening files/pastingletc.Starting with version 14.0, multiple data files can be open at the same time. When multiple files are open,you can selectthe one you want to work with using the Windowcommand.
md*
Heb
t|sr $$imlzeA[ Whdows lCas,sav [Dda*z] - S55 DctoEdtry
t20
AppendixF
GraphingDatawithSPSS13.0and14.0 This appendixshouldbe usedas an alternativeto Chapter4 whenyou are using SPSS13.0or 14.0.Theseproceduresmay alsobe usedin SPSS15.0,if desired,by selectingLegacyDialogsinsteadof ChortBuilder. GraphingBasics In addition to the frequency distributions,the measuresof central tendency and measuresof dispersiondiscussedin Chapter3, graphing is a useful way to summarize,organize,and reduceyour data. It has been said that a picture is worth a thousandwords. In the caseof complicateddatasets,that is certainlytrue. With SPSSVersion 13.0 and later,it is now possibleto make publication-quality graphsusing only SPSS.One importantadvantageof using SPSSinsteadof other software to createyour graphs(e.9., Excel or SigmaPlot)is that the data have alreadybeen entered. Thus, duplicationis eliminated,and the chanceof making a transcriptionerror is reduced. Editing SP,S,SGraphs Whatever command you use to create your graph, you will probably want to do some editing to make it look exactly the way you want. In SPSS,you do this in much the sameway that you edit graphsin other software programs (e.9., Excel). In the output window, select your graph (thus creating handles around the outside of the entire object) and righrclick. Then, click SPSSChart Object, then click Open. Alternatively, you can double-click on the graph to open it for editing.
tl 9:.1tl 1bl rl .l @l kl D l ol rl : j
8L l
, l
t21
AppendixF GraphingDatawith SPSS13.0and 14.0
Whenyou openthe graphfor editing,theChartEditor HSGlHffry;'
window and the correspondingProperties window will appear.
I* -l
ql ryl
OnceChartEditoris open,you caneasilyedit eachelementof the graph.To select just click on therelevantspoton the graph.For example,to selectthe element an element, representing the title of the graph,click somewhere on the title (the word "Histogram"in theexamplebelow).
Once you have selectedan element,you can tell that the correctelementis selected becauseit will havehandlesaroundit. If the item you have selectedis a text element(e.g.,the title of the graph),a cursor will be presentand you can edit the text as you would in word processingprograms.If you would like to changeanotherattributeof the element(e.g., the color or font size), use the Propertiesbox (Text propertiesare shownabove).
t22
AppendixF GraphingDatawith SPSS13.0and 14.0
With a little practice,you can makeexcellentgraphs usingSPSS.Onceyour graphis formattedthe way you want it, simplyselectFile,thenClose. Data Set For the graphingexamples,we will usea new set of data.Enterthe databelowand savethe file as HEIGHT.sav. The data representparticipants' HEIGHT (in inches), WEIGHT(in pounds), andSEX(l : male,2= female). HEIGHT 66 69 73 72 68 63 74 70 66 64 60 67 64 63 67 65
WEIGHT 150 155 160 160 150 140 165 150 110 100 9 52 ll0 105 100 ll0 105
SEX r I I I l l I I 2 2 2 2 2 2 2
Checkthat you haveenteredthe datacorrectlyby calculatinga mean for eachof the threevariables(click Analyze,thenDescriptiveStatistics,thenDescriptives).Compare yourresultswith thosein thetablebelow. Descrlptlve Statlstlcs
std. Minimum Maximum
N ntrt(,n I
'to
OU.UU
WEIGHT SEX
16 16
95.00 1.00
Valid N (listwise)
16
123
Deviation Mean I1.U U oo.YJ/0 J .9U O/ 165.00 129.0625 26.3451 2.00 .5164 1.s000
AppendixF GraphingDatawith SPSS13.0and 14.0
Bar Charts,PieCharts,and Histograms Description Bar charts,pie charts,and histogramsrepresentthe number of times each scoreoccurs by varying the height of a bar or the size of a pie piece.They are graphicalrepresentations of the frequencydistributionsdiscussedin Chapter3. Drawing Conclusions The Frequenciescommandproducesoutput that indicatesboth the number of cases in the sample with a particular value and the percentageof caseswith that value. Thus, conclusionsdrawn should relate only to describing the numbers or percentagesfor the sample.If the dataare at leastordinal in nature,conclusionsregardingthe cumulativepercentages and/orpercentilescan alsobe drawn. SPSS Data Format You needonlv one variableto usethis command. Running the Command The Frequenciescommand will produce I gnahao Qr4hs $$ities graphical frequency distributions.Click Analyze, Regorts then Descriptive Statistics,then Frequencies.You @ will be presentedwith the main dialog box for the : lauor Comps?U6ar1s Frequenciescommand, where you can enter the ' Eener.lllnEarlrlodsl variables for which you would like to create r'txedModds Qorrelate graphsor charts.(SeeChapter3 for other options availablewith this command.)
tro ps8l 8"". I
i
BrsirbUvss,;. Explorc.,, 9or*abr,.,
) ) ;
&dtlo...
Click the Chartsbuttonat the bottom to produce frequencydistributions. Charts This will give you the Frequencies: dialogbox.
."-41
E
C hartTl p- " .' ---* ,
c[iffi
(e : q,l s r ur ' sachalts l* rfb chilk
. kj -i :
,i .,
1
,,,, I , j
r llrroclu* ,' There are three types of charts under this command: Bar charts, Pie charts, andHistograms.For each type, the f axis can be either a frequency count or a percentage(selectedthrough the Chart Valuesoption). You will receivethe chartsfor any variablesselectedin the main Frequenciescommanddialog box. 124
I C " ,tt "5l
.u-,,,,:, l- :+;:'.n
.
'1 i
rChatVdues' , 15 Erecpencias l -,
: f
Porcer*ag$
:
i - - ::- ,-
AppendixF GraphingDatawith SPSS13.0and 14.0
Output The bar chart consistsof a Y axis, representing the frequency, and an X axis, representing each score. Note that the only values representedon the X axis are those with nonzero frequencies(61, 62, and 7l are not represented).
hrleht
1.5
c a f s a L
1.0
66.00 67.00 68.00 h.llht
: ea t 61@ llil@ I 65@ os@ I r:@ o 4@ g i9@ o i0@ a i:@ B rro o :aa
ifr
The pie chart shows the percentageof the whole that is representedby eachvalue.
The Histogramcommandcreatesa groupedfrequencydistribution.The range of scores is split into evenly spaced groups.The midpoint of each group is plotted on the X axis, and the I axis represents the numberof scoresfor each group. If you selectLl/ithNormal Curve,a normal curve will be superimposed over the distribution.This is very useful for helping you determineif the distribution you haveis approximately normal.
M.Sffi td. 0.v. . 1.106?: ff.l$ h{ghl
Practice Exercise Use PracticeData Set I in Appendix B. After you have enteredthe data,constructa histogramthat representsthe mathematicsskills scoresand displaysa normal curve, and a bar chart that representsthe frequenciesfor the variableAGE.
t25
AppendixF GraphingDatawith SPSS13.0and 14.0
Scatterplots Description Scatterplots(also called scattergramsor scatterdiagrams)display two values for eachcasewith a mark on the graph.The Xaxis representsthe value for one variable.The / axis representsthe value for the secondvariable. Assumptions Both variablesshouldbe interval or ratio scales.If nominal or ordinal data are used,be cautiousaboutyour interpretationof the scattergram. .SP,t^tData Format You needtwo variablesto perform this command. Running the Command
I gr"pl,u Stllities Add-gns :
You can produce scatterplotsby clicking Graphs, then I Scatter/Dot.This will give you the first Scatterplotdialog box. i Selectthe desiredscatterplot(normally,you will select Simple ! Scatter),then click Define.
xl
*t
m m
Simpla Scattal 0veday Scattel
ffi ffi
Mahix Scattel
ffi sittpt" lelLl Oot
li Definell -
q"rylJ , Helo I
3,0 Scatter
ffil;|rlil',r
m s.fd
,-
9dMrk6by
,-
L*dqs.bx
| ,J l--
IJ T-P*lb
8off,
rl[I
ti ...:
C*trK
rI[I l*tCrtc
3*J qej *fl This will give you the main Scatterplot dialog box. Enter one of your variablesas the I axis and the secondas the X axis. For example, using the HEIGHT.sav data set, enter HEIGHT as the f axis and WEIGHT as the X axis. Click
oK.
- -' '
I'u;i** L":# .
126
AppendixF GraphingDatawith SPSS13.0and 14.0
Output X andI levels. Theoutputwill consistof a markfor eachsubjectat theappropriate 74.00
r20.m
Adding a Third Variable Even though the scatterplotis a twodimensionalgraph,it canplot a third variable. To makeit do so,enterthethird variablein the SetMarkers by field.In our example,we will enterthe variable SEX in the ^SelMarkers by space. Now our outputwill havetwo different sets of marks. One set representsthe male participants, the and the secondset represents femaleparticipants. Thesetwo setswill appear in differentcolorson your screen.You canuse the SPSSchart editor to make them different shapes, asin thegraphthatfollows.
t27
fa, I e- l at.l ccel
x*l
AppendixF GraphingDatawith SPSS13.0and 14.0
Graph sox aLm o 2.6 72.00
70.00
58.00
c .9 I
s
56.00
o 64.00
oo o
52.00
!00.00
120.m
110.@
r60.00
wrlght
Practice Exercise UsePracticeDataSet2 in AppendixB. Constructa scatterplot to examinethe relationshipbetweenSALARY andEDUCATION. Advanced Bar Charts Description You canproducebar chartswith theFrequencies command(seeChapter4, Section 4.3). Sometimes, however,we are interested in a bar chartwherethe I axis is not a frequency.To producesucha chart,we needto usetheBar Chartscommand. SPS,SData Format At leasttwo variablesare neededto performthis command.Thereare two basic kinds of bar charts-thosefor between-subjects designsand thosefor repeated-measures designs.Usethe between-subjects methodif onevariableis the independentvariableand the otheris the dependentvariable.Use the repeated-measures methodif you havea dependentvariable for eachvalueof the G"dr tJt$ths Mdsns €pkrv independentvariable(e.g.,you wouldhavethreevariablesfor a IrfCr$We designwith three valuesof the independentvariable). This ) Map normallyoccurswhenyou takemultipleobservations overtime.
128
]1
AppendixF GraphingDatawith SPSS13.0and 14.0
Running the Command Click Graphs,thenBar for eithertypeof bar chart. This will openthe Bar Chartsdialog box. If you haveone independentvariable, selectSimple.If you have more thanone,selectClustered. design,select If you are usinga between-subjects Summariesfor groups of cases.If you are using a repeated-measures design, select Summariesof separate variables. graph,you If you are creatinga repeatedmeasures will seethe dialog box below.Move eachvariableoverto theBars Representarea,and SPSSwill placeit insideparentheses followingMean.This will give you a graphlike the one below at right. Note that this exampleusesthe dataenteredin Section6.4 (Chapter6). GRADES.sav
lr# I t*ls o rhd
Practice Exercise Use PracticeDataSet I in AppendixB. Constructa bar graphexaminingthe relaskills scoresandmaritalstatus.Hint: In the BarsRepresent tionshipbetweenmathematics area.enterSKILL asthevariable.
t29