REVIEW ARTICLE
Sports Med 2010; 40 (7): 525-537 0112-1642/10/0007-0525/$49.95/0
ª 2010 Adis Data Information BV. All rights reserved.
Qualitative Attributes and Measurement Properties of Physical Activity Questionnaires A Checklist Caroline B. Terwee,1 Lidwine B. Mokkink,1 Mireille N.M. van Poppel,2 Mai J.M. Chinapaw,2 Willem van Mechelen2 and Henrica C.W. de Vet1 1 Department of Epidemiology and Biostatistics and the EMGO Institute for Health and Care Research, VU University Medical Center, Amsterdam, the Netherlands 2 Department of Public and Occupational Health and the EMGO Institute for Health and Care Research, VU University Medical Center, Amsterdam, the Netherlands
Contents Abstract. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1. The Quality Assessment of Physical Activity Questionnaire (QAPAQ) Checklist . . . . . . . . . . . . . . . . . . . 2. QAPAQ Part 1: Appraising the Qualitative Attributes of Physical Activity (PA) Questionnaires . . . . . . 2.1 Construct . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Setting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 Recall Period . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4 Purpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5 Target Population . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.6 Justification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.7 Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.8 Interpretability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.9 Ease of Use . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3. QAPAQ Part 2: Appraising the Measurement Properties of PA Questionnaires . . . . . . . . . . . . . . . . . . . 3.1 General Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Reliability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.1 Parameters of Measurement Error. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.2 Reliability Coefficients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3 Validity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3.1 Face Validity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3.2 Content Validity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3.3 Floor or Ceiling Effects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3.4 Construct Validity. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4 Responsiveness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4. Discussion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5. Future Recommendations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
526 527 527 527 528 528 528 528 528 528 528 529 529 529 531 531 532 532 532 532 533 533 534 534 535 535
Terwee et al.
526
Abstract
The large number of available physical activity (PA) questionnaires makes it difficult to select the most appropriate questionnaire for a certain purpose. This choice is further hampered by incomplete reporting and unsatisfactory evaluation of the content and measurement properties of the questionnaires. We provide a checklist for appraising the qualitative attributes and measurement properties of PA questionnaires, as a tool for selecting the most appropriate PA questionnaire for a certain target population and purpose. The checklist is called the Quality Assessment of Physical Activity Questionnaire (QAPAQ). This review is one of a group of four reviews in this issue of Sports Medicine on the content and measurement properties of physical activity questionnaires. Part 1 of the checklist can be used to appraise the qualitative attributes of PA questionnaires, i.e. the construct to be measured by the questionnaire, the purpose and target population for which it was developed, the format, interpretability and ease of use. Part 2 of the checklist can be used to appraise the measurement properties of a PA questionnaire, i.e. reliability (parameters of measurement error and reliability coefficients), validity (face and content validity, criterion validity and construct validity) and responsiveness. The QAPAQ can be used to select the most appropriate PA questionnaire for a certain purpose, but it can also be used to design or report a study on measurement properties of PA questionnaires. Using such a checklist will contribute to improving the assessment, reporting and appraisal of the content and measurement properties of PA questionnaires.
This review is one of a group of four reviews in this issue of Sports Medicine on the content and measurement properties of physical activity (PA) questionnaires.[1-3] Accurate measurement, such as identifying the causal relations between PA and health outcomes, the prevalence and differences in PA between individuals, monitoring changes in PA after interventions and the formulation of public health recommendations is important for all studies on PA.[4] Accurate measurement means that PA instruments should be adequately designed and described and should have adequate measurement properties, i.e. reliability, validity and responsiveness. If the measurement properties are poor, the risk of misclassification and biased results is high.[5,6] Questionnaires are relatively inexpensive and can be self-administered, which make them the most suitable method for assessment of PA in large populations.[7] Many different PA questionnaires exist.[1-3] Questionnaires differ in their qualitative attributes, i.e. the construct that is being measured ª 2010 Adis Data Information BV. All rights reserved.
(e.g. energy expenditure), setting, recall period, its justification, the purpose and target population, its format, interpretability and ease of use. The variety of available questionnaires makes it difficult to select the most appropriate questionnaire for a specific purpose. This choice is further hampered by incomplete reporting of these qualitative attributes. In particular, the construct, purpose and format are often incompletely described. Inadequate reporting of the qualitative attributes of a questionnaire impedes an adequate appraisal of its validity and applicability. Selection of a questionnaire should also be based on its measurement properties. However, many questionnaires, have only been partly tested for measurement properties, and some not at all.[1-3] The methods used to assess the measurement properties vary in content as well as in quality. Many studies have methodological limitations, such as small sample size, shortcomings in the design, or inappropriate statistical analyses. In addition, reporting of methods and statistical Sports Med 2010; 40 (7)
Properties of Physical Activity Questionnaires
analyses is often incomplete, which impedes a critical appraisal of the results. The aim of this review is to provide a checklist for appraising the qualitative attributes and measurement properties of PA questionnaires. The checklist is called the Quality Assessment of Physical Activity Questionnaire (QAPAQ) and can be used as a tool for selecting the most appropriate PA questionnaire for a certain purpose. 1. The Quality Assessment of Physical Activity Questionnaire (QAPAQ) Checklist The QAPAQ was developed using several sources. First, we used ideas from Feinstein on what he called ‘sensibility’, i.e. the face validity and other qualitative attributes of a questionnaire.[8] Second, we used a checklist developed by Terwee et al., containing criteria for adequate measurement properties of health status measures.[9] Third, we used preliminary results from the COSMIN (COnsensus-based Standards for the selection of health Measurement INstruments) Delphi study, which aims to develop a checklist for assessing the methodological quality of studies on measurement properties of patient-reported outcomes.[10] Fourth, we used input from several previous publications on the measurement of physical activity.[4,7,11-15] Finally, we used our experiences in appraising the quality of PA questionnaires in three systematic reviews.[1-3] 2. QAPAQ Part 1: Appraising the Qualitative Attributes of Physical Activity (PA) Questionnaires The choice for a suitable PA questionnaire depends, to a large extent, on its qualitative attributes. When choosing a PA questionnaire, it is important to have a clear description of what a questionnaire intends to measure and for what purpose and target population it was developed. This determines the content of a questionnaire in terms of type, frequency, duration and intensity of PA to be measured. Furthermore, it is useful to know what the questionnaire looks like and how it should be used ª 2010 Adis Data Information BV. All rights reserved.
527
Table I. Quality Assessment of Physical Activity Questionnaire: Part 1. A checklist for the appraisal of the qualitative attributes of a physical activity (PA) questionnaire Property
Definition
1. Construct
What is the construct that the questionnaire intends to measure (e.g. energy expenditure, mechanical loading, walking)?
2. Setting
In what setting is PA measured (e.g. work, transport, leisure time)?
3. Recall period
What is the recall period to which PA is referred (e.g. past week, usually)?
4. Purpose
What is the purpose of the questionnaire (i.e. discriminative, evaluative or predictive)?
5. Target population
For what kind of people was the questionnaire originally developed (e.g. age, sex, health status)?
6. Justification
Why is this questionnaire needed and why is it superior to analogous questionnaires that may already exist?
7. Format
Are the number of questions, the number and type of response categories and the scoring algorithm clearly described?
8. Interpretability
Is there any information available on the interpretability of scores, e.g. are (mean/median and SD/range) scores and change scores available for relevant groups, e.g. age and sex groups from the general population? Is it known what an MIC in scores on the questionnaire are?
9. Ease of use
Is the time and effort required to complete the questionnaire acceptable? Is it known how a full copy of the questionnaire can be obtained? Are clear instructions given for those who need to complete the questionnaire?
MIC = minimal important change.
and interpreted. Table I summarizes the qualitative attributes of PA questionnaires. 2.1 Construct
The construct refers to a description of what it is that a questionnaire intends to measure. A clearly defined and reported construct enables the evaluation of the validity of the questionnaire and facilitates choosing the most appropriate questionnaire for a specific purpose.[11] Many questionnaires intend to measure energy expenditure (e.g. the International Physical Activity Questionnaire [IPAQ][16] or the Arizona Activity Sports Med 2010; 40 (7)
Terwee et al.
528
Frequency questionnaire[17]). However, some questionnaires intend to measure other constructs, such as habitual physical activity,[18] mechanical loading[19] or walking.[20] 2.2 Setting
The setting refers to where PA is being measured. Some questionnaires intend to measure total PA – for example work/school, transport and leisure time[21] – while others only intend to measure leisure-time PA[22] or only occupational PA.[23] 2.3 Recall Period
The recall period refers to the time period to which the questions refer. There is no consensus on what the most appropriate recall period is. This depends on the construct to be measured and the purpose of the study. Many questionnaires refer to the ‘past week’. Other questionnaires refer to a ‘usual week’ instead of ‘past week’, to measure a more general PA pattern. The developers of the IPAQ tested two versions, one referring to the ‘past week’ and one referring to a ‘usual week’. They found that interpretation of a ‘usual week’ was sometimes problematic, as participants were not able to identify what is ‘usual’. Therefore, they concluded that the ‘past week’ version was better.[16] Other questionnaires, for example the modified Historical Leisure Activity Questionnaire,[24] refer to ‘lifetime’ PA. 2.4 Purpose
The purpose can be (i) discrimination (e.g. classify people into [sufficiently] active or inactive groups or to assess prevalence and differences in PA patterns between individuals); (ii) evaluation (e.g. monitor PA patterns over time or to evaluate the effect of interventions); or (iii) prediction of health outcomes (e.g. predict bone health at old age). Different purposes may require different questions to be asked and may also require a different validation approach.[25] For example, for discrimination, reliability is important, while for the evaluation of the efª 2010 Adis Data Information BV. All rights reserved.
fect of PA interventions, responsiveness is more important. 2.5 Target Population
It is important to know for whom the questionnaire is developed (e.g. adults, children, the elderly, obese people, patients with cardiovascular disease and pregnant women). This determines the content and applicability of the questionnaire. For instance, a questionnaire developed for children may contain questions about playing outside, active transport to school and physical education lessons at school, which are not relevant for elderly people. Also, age, sex, level of education and cultural aspects should be taken into account.[7] 2.6 Justification
When a new questionnaire is published it is useful to know why this questionnaire is needed and why it is superior to questionnaires that may already exist. The same holds for modifications of existing questionnaires that are already published. The many (versions of) questionnaires that are currently used hampers interpretation and comparison of study results. 2.7 Format
When a new questionnaire is published or a modification of a questionnaire is used, its format should be clearly described in terms of the number of questions, the number and type of response categories and the scoring algorithm. One could also refer to a website where this information can be found. 2.8 Interpretability
Interpretability has been defined as the degree to which one can assign qualitative meaning to an instrument’s quantitative scores;[26] specifically, clinical or commonly understood connotations. Although many PA scores have intuitive meaning, because they are expressed in MET per week or minutes of PA per week, some scores are more difficult to interpret. For instance, the Baecke questionnaire is scored from 1.0 to 5.0 points.[18] Sports Med 2010; 40 (7)
Properties of Physical Activity Questionnaires
On such as scale it is not directly clear what a score of, for example, 4.1 points means. Difficult interpretation hampers the suitability of a questionnaire. Interpretation is facilitated by reports of means and standard deviations, medians and ranges or proportions with confidence intervals of scores of the populations in which a questionnaire is evaluated or used. Especially helpful, are scores of different age and sex groups or scores from groups of people who differ in PA patterns, which could be used as reference values against other scores for comparison. For example, data on PA patterns in diseased populations would be easier to interpret if they could be compared with data from a general population. In addition, it is useful to know what the minimal change in score is over time that constitutes a meaningful change in PA, namely, minimal important change (MIC). This is helpful for the interpretation of intervention studies and for sample size calculations of studies on PA interventions. A question to be answered is ‘‘what amount of increase in physical activity could be valued as relevant or important?’’ The MIC of PA scores might be different for the different constructs of PA. The MIC may be defined based on evidence of the amounts of PA needed to obtain a certain health effect. Studies on dose-response relationships are therefore of great importance. 2.9 Ease of Use
Ease of use refers to the amount of time and effort required from the person who completes the questionnaire. Clear instructions such as defining light, moderate and vigorous activities, and how to use lists of examples of activities, are useful. Altschuler et al.,[5] identified several problems in the interpretation of PA questionnaires. They showed, for example, that people tend to interpret the intensity of PA in different ways.[5] Misinterpretation of questions may lead to misclassification, decreasing reliability and validity. Finally, it is recommended to indicate how a full copy of the questionnaire can be obtained. ª 2010 Adis Data Information BV. All rights reserved.
529
3. QAPAQ Part 2: Appraising the Measurement Properties of PA Questionnaires Selection of a questionnaire should also be based on its measurement properties. Table II summarizes the measurement properties of a PA questionnaire. 3.1 General Issues
For the appraisal of measurement properties it is important to know that the study was adequately performed. Therefore, a clear description is needed of the following aspects of the study: (a) study population (age, sex, country); (b) design of the study (e.g. sample size, version of the questionnaire that was used, time interval between administrations); (c) mode of administration (e.g. self-report, telephone, or interview-administered); (d) other instruments that were used for assessing validity, with a reference to their measurement properties; (e) statistical analyses performed. The study population needs to be representative for the population in which the questionnaire is going to be used in the future.[7,11] The sample size of the study on measurement properties should be sufficient. As a rule of thumb, we consider a sample size of at least 50 subjects adequate, based on a general guideline by Altman.[27] Others have suggested sample sizes of 100–200 subjects.[7] For reliability studies, a sample size calculation can be performed.[28] For judging the adequacy of the sample size, it is helpful to report confidence intervals around, for example, reliability coefficients. A clear description of the study population is also helpful to know to what population and setting the results can be generalized. Measurement properties differ between populations and settings. This means that if a questionnaire has good reliability when administered as a self-report, it cannot be assumed that the same questionnaire will also have good reliability when administered in an interview. Similarly, it cannot be assumed that measurement properties can be generalized from one country to another or from a general population to a diseased population, etc. Sports Med 2010; 40 (7)
Property
Definition
Preferred method
1. General
Quality criteria Clear description of study population (age, sex, country); design (version of the questionnaire that was used, time between the measurements, etc.); administration form (self-report or interview-administered, completed with of without assistance); other instruments that were used for assessing validity, with a reference to their measurement properties; statistical analyses. Adequate sample size: n ‡ 50
The degree to which the measurement is free from measurement error
Design: at least two measurements; independent measurements; similar measurement conditions; appropriate time interval
For past/usual wk, past y PA: 1 day to 3 mo; for lifetime PA: 1 day to 1 y
2a measurement error
The systematic and random error of a patient’s score that is not attributed to true changes in the construct to be measured
Statistical method: LOA, SEM, SDC pffiffiffi (SDC = 1.96* 2*SEM)
MIC outside the LOA; SDC < MIC
2b reliability
The proportion of the total variance in the measurements which is due to ‘true’ differences between patients
Statistical method: ICC; Kappa
ICC ‡0.70; Kappa ‡0.70
2. Reliability
3. Validity
530
ª 2010 Adis Data Information BV. All rights reserved.
Table II. Quality Assessment of Physical Activity Questionnaire: Part 2. A checklist for the appraisal of the measurement properties of physical activity (PA) questionnaires
The degree to which an instrument truly measures the construct it purports to measure The degree to which the items of an instrument indeed look as though they are an adequate reflection of the construct to be measured
Is the information being asked in a way that will evoke an accurate answer? Does the combination of items into scores make sense? Are the items comprehensible?
3b content validity
The degree to which the content of an instrument is an adequate reflection of the construct to be measured
Is the questionnaire comprehensive, i.e. are all relevant activities included? Are frequency, duration and intensity being addressed? Is the amount of detail regarding frequency, duration, intensity and the recall period appropriate for the chosen setting, construct, purpose and the kind of subjects for whom the questionnaire is developed? Is a justification provided for the choices? Are there any important questions missing? Are any unsuitable questions included? Are the scales used to score the questions not too coarse or too fine? Are the relative weights assigned to different questions in the calculation of a total score sensible? Continued next page
Terwee et al.
Sports Med 2010; 40 (7)
3a face validity
531
ª 2010 Adis Data Information BV. All rights reserved.
ICC = intraclass correlation coefficient; LOA = limits of agreement; MIC = minimal important change; SDC = smallest detectable change; SEM = standard error of measurement.
4. Responsiveness
3d construct validity
The ability of an instrument to detect change over time in the construct to be measured
Correlation of changes in scores on the questionnaire with changes in scores on other instruments that measure closely related constructs; idem construct validity
Expected correlation (r) with other instruments that measure closely related constructs, e.g. total energy expenditure: doubly labelled water total PA: accelerometer total counts vigorous PA: accelerometer vigorous activity time moderate PA: accelerometer moderate activity time walking: pedometer or accelerometer walking time leisure-time PA: accelerometer leisure-time activity occupational PA: direct observation
r ‡ 0.70 r ‡ 0.50 r ‡ 0.50 r ‡ 0.50 r ‡ 0.70 r ‡ 0.50 r ‡ 0.60
‡15% of the respondents achieved the highest or lowest possible scores
3.2 Reliability
The number of respondents who achieved the lowest or highest possible score 3c floor and ceiling effects
Property
Table II. Contd
Definition
Preferred method
Quality criteria
Properties of Physical Activity Questionnaires
Reliability is the degree to which the measurement is free from measurement error.[29] This refers to the extent to which scores for patients who have not changed are the same for repeated measurements. Reliability should be assessed by two administrations, either collected from the same person (test-retest), from the same interviewer (intra-rater) or from a different interviewer (interrater). The administrations should be independent from each other and performed under the same conditions. The time interval between the test and retest should be long enough to prevent recall of previous answers, though short enough to ensure that PA patterns have not changed. The most optimal time interval depends on the construct to be measured and the recall period of the questionnaire. For example, for measuring PA during the past week, usual week or past year, a time interval of 1 day to 3 months may be considered appropriate.[7] However, when seasonal variation is expected, a time interval of 3 months might be too long. For measuring lifetime PA a longer time interval, up to 1 year, might be appropriate. It might be argued that one should ask about the same recall period (i.e. exact same days) twice because PA patterns vary from day to day. However, one could also argue that this natural variation should be included in the measurement error because it will also affect the measurement of change, for example after an intervention. Below we discuss two useful types of reliability parameters: parameters of measurement error and reliability coefficients.[30] 3.2.1 Parameters of Measurement Error
Measurement error is the systematic and random error of a subject’s score that is not attributed to true changes in the construct to be measured. Parameters of measurement error assess how close the scores on repeated administrations are, expressed in the unit of the questionnaire.[30] One useful parameter of measurement error is the limits of agreement (LOA), described by Bland and Altman.[31] The LOA are defined as the mean change in scores of repeated measurements (or the mean difference in scores between Sports Med 2010; 40 (7)
Terwee et al.
532
raters) –1.96 * SD of this change (or difference; SDchange). The LOA indicate that if a person completes a questionnaire twice, the second score could be as much as these limits smaller or larger than the first score, due to measurement error. Thus, only changes (or differences) larger than the LOA can be considered ‘true’ changes (or differences). Wilbur et al.[32] determined the LOA of their quantitative survey measuring energy expenditure in midlife women; for example, the LOA was -0.05 – 2.25 MJ/day for leisure time energy expenditure. Measurement error can also be expressed as the standard error of measurement (SEM) or the smallest detectable change (SDC) [see Appendix].[30] The measurement error should be smaller than the MIC to be able to measure changes in PA over time. Thus, the MIC should lie outside the LOA. Users of PA questionnaires should make their own judgement about the acceptability of the size of the measurement error of a PA questionnaire, considering the purpose of their measurements and the MIC. 3.2.2 Reliability Coefficients
A reliability coefficient reflects the proportion of the total variance in the measurements, which is due to ‘true’ (i.e. consistent) differences between subjects (see Appendix). It concerns the degree to which subjects can be distinguished from each other, despite measurement error.[33] A high reliability is especially important for questionnaires that are used for discriminative purposes. The intraclass correlation coefficient (ICC) is the most adequate reliability parameter for continuous measures.[33] Many different ICCs can be calculated, therefore which ICC was calculated (two-way ANOVA is preferred) needs to be described.[34] In addition, confidence intervals should be presented. As an example, Matton et al.[35] used a one-way ANOVA to calculate ICCs and 95% confidence intervals for assessing test-retest reliability of the Flemish physical activity computerized questionnaire. The Pearson correlation coefficient does not take systematic differences between the two measurements into account and therefore often overestimates reliability.[33] Skewed data should be transformed or categorª 2010 Adis Data Information BV. All rights reserved.
ized. For ordinal measures, the weighted Cohen’s Kappa coefficient should be used; the absolute percentage of agreement is inadequate, because it does not adjust for the agreement attributable to chance.[36] Often 0.70 is recommended as a minimum standard for reliability coefficients.[37] 3.3 Validity
Validity is the degree to which an instrument truly measures the construct(s) it purports to measure. Different aspects of validity can be distinguished, which have different design requirements and statistical approaches. 3.3.1 Face Validity
Face validity is the degree to which the items of an instrument indeed look as though they are an adequate reflection of the construct to be measured. Face validity often gets little attention, because it is a rather subjective and not transparent judgement, and cannot be measured statistically. Nevertheless, it is often the most important measurement property of a questionnaire. Important questions to answer are (i) is the information asked in a way that will evoke an accurate answer; (ii) does the combination of items into scores make sense; and (iii) are the items comprehensible?[8] The formulation of the questions should be as simple and transparent as possible. For example, the question ‘‘During the last 7 days, on how many days did you walk for at least 10 minutes at a time?’’ may be easier to answer than the question ‘‘How much time did you usually spend walking on the last 7 days?’’ Simple questions will increase reliability and validity. An indication of adequate face validity can be obtained by interviewing respondents, or asking them to think aloud while completing the questionnaire, to examine how well they understand the questions. 3.3.2 Content Validity
Content validity refers to the degree to which the content of an instrument is an adequate reflection of the construct to be measured. While face validity refers to the suitability of the overt features of a questionnaire, content validity refers to the suitability of the included individual questions.[8] Sports Med 2010; 40 (7)
Properties of Physical Activity Questionnaires
Content validity refers to comprehensiveness and relevance of the questions, specifically, whether all relevant questions are being asked and whether all questions that are being asked are relevant. It also refers to the degree to which all relevant activities are included in sufficient detail. The amount of detail regarding frequency, duration, intensity and the included activities should be appropriate for the chosen setting, construct, recall period, purpose and target population.[8] For example, when the purpose is to measure total energy expenditure, the type, frequency, duration and intensity of physical activity should be measured.[7] When the purpose is to classify patients into ‘active’ or ‘inactive’, less detail may be required. A justification of the choices regarding the inclusion of items from the developers can help users to appraise the comprehensiveness of a questionnaire. Other questions one could ask when appraising content validity are ‘‘Are there any unsuitable questions included?’’, ‘‘Are the response options used to score the questions suitable and not too coarse or too fine?’’ or ‘‘Are the relative weights assigned to different questions in the calculation of a total score sensible?’’ When reviewing a questionnaire, it is also helpful to know how the questions were developed. For example, was a focus group conducted with patients to determine relevant aspects of PA, and was an expert panel used? 3.3.3 Floor or Ceiling Effects
Floor or ceiling effects are considered to be present if >15% of people have the lowest or highest possible score, respectively.[38] Floor or ceiling effects can, for example, affect PA questionnaires that are expressed as an ordinal activity score or expressed in hours per week. For instance, in the Flemish physical activity computerized questionnaire, the subscale ‘Tatransl’ describes the time (hours per week) spent in leisure-time active transportation (cycling and walking). In the study of Matton et al.[35] this score had a mean value of 1.73 with a standard deviation of 1.74. This means that about 16% of the respondents score 0 on this scale indicating a floor effect. If many people have the same lowest or highest score they cannot be distinguished from each other, thus reliability is ª 2010 Adis Data Information BV. All rights reserved.
533
reduced. Responsiveness is also limited because a change in PA cannot be detected in people who already have the lowest or highest score. The distribution of ordinal activity scores gives insight in the presence of floor or ceiling effects. 3.3.4 Construct Validity
The highest level of evidence for validity would be obtained by comparing the PA questionnaire with a gold standard; the instrument that measures the same construct and has perfect reliability and validity (criterion validity). For PA there is no perfect gold standard.[7,39] Doubly labelled water (DLW) is often considered a gold standard for assessing total daily energy expenditure.[40] However, DLW is not a perfect gold standard because total daily energy expenditure as measured by the DLW technique is caused not only by PA, but also by the basal metabolic rate and the thermic effect of food. Furthermore, the DLW technique is not perfectly reliable and valid and it cannot distinguish between type, frequency and duration of activities. Therefore, one has to rely on assessing ‘construct validity’. This could be done by comparing the PA questionnaire with other (validated) instruments that measure closely related constructs, e.g. accelerometers, by testing predefined hypotheses about expected relationships between the measures. In table II the most optimal comparison instruments are described for a number of PA constructs. The more similar the constructs that are being compared, the more evidence is provided for validity.[11] For example, for a questionnaire that aims to measure total PA the most optimal validation design currently is to compare the total score with total counts per day of an accelerometer. For a PA questionnaire that aims to measure only vigorous activities, a comparison with total counts per day is less optimal. In this case, a comparison with daily minutes of vigorous activity based on accelerometer data where a cut-off point of a certain number of counts per minute is used is more appropriate. This was performed in the study by Brown et al.[41] They compared self-reported total weekly minutes spent in vigorous leisure activity with weekly minutes spent in vigorous leisure activity as measured with Sports Med 2010; 40 (7)
Terwee et al.
534
an accelerometer during the same week, based on published cut-off values for vigorous counts. For questionnaires that aim to measure occupational PA, comparison with observations at the workplace is considered the most optimal method. For questionnaires that aim to measure walking, a comparison with a pedometer or with walking counts of an accelerometer is considered most optimal. There is no consensus on how high correlations should be to demonstrate adequate validity.[7] In table II we provide some rules of thumb that we used in our systematic reviews.[1-3] Higher correlations are to be expected when the constructs that are being compared are more similar. It is therefore important that hypotheses are defined in advance about expected correlations when testing validity. 3.4 Responsiveness
Responsiveness is the ability of an instrument to detect change over time in the construct to be measured. Responsiveness is an important aspect of validity, in a longitudinal context. While validity refers to the validity of a single score, responsiveness refers to the validity of a change score.[42] Responsiveness should be assessed by two administrations of the PA questionnaire. Between the administrations at least some of the subjects should have changed their PA to a relevant degree. Analogous to construct validity, responsiveness can be assessed by comparing changes in the PA questionnaire with changes in other instruments that measure closely related constructs and test hypotheses about expected correlations. The same approach can be applied to assess validity, except that change scores are being compared instead of single scores. Another possible approach is to examine how well the questionnaire can distinguish between people who have changed and people who have not changed, based on some external criterion; for example, by comparing a training group with a control group. This can be examined by drawing a receiver operating characteristics (ROC) curve, in which sensitivity is plotted against 1specificity for each possible cut-off value on the PA questionnaire. The area under the ROC curve ª 2010 Adis Data Information BV. All rights reserved.
(AUC) is a useful measure of the ability of the questionnaire to distinguish people who have changed from those who have not changed.[43] An AUC of at least 0.70 is often considered adequate.
4. Discussion In this review we present the QAPAQ as a tool for selecting a PA questionnaire for a certain purpose, by appraising the qualitative attributes and measurement properties of PA questionnaires. There is no single most appropriate questionnaire to measure PA.[44] The choice of a suitable PA questionnaire depends on the construct of interest, the purpose and target population, and the qualitative attributes and measurement properties of the available questionnaires. Sometimes it can be difficult to decide whether the qualitative attributes are ‘adequately’ or ‘clearly’ described, since no concrete criteria are available. This should be decided by the user of the checklist. The advantage of using a checklist is that all relevant aspects are being considered. Not all studies may have assessed all measurement properties, and therefore sometimes not all parts of the checklist can be completed. Using the checklist may indicate lack of research on certain measurement properties that could or should be performed in the future. For instance, using the checklist in our systematic reviews[1-3] showed that responsiveness of PA questionnaires was rarely studied. The checklist can also be used as an aid in designing or reporting a study on measurement properties of PA questionnaires. Furthermore, it can be used by reviewers or editors to appraise the conduct and reporting of studies on the measurement properties of PA questionnaires. Internal consistency was not included in the checklist. Internal consistency is the degree of the inter-relatedness among the items of a questionnaire, generally assessed by Cronbach’s alpha.[45] It is an important measurement property for uni-dimensional scales, consisting of homogeneous items that all reflect the construct to be measured.[46,47] These items should be highly Sports Med 2010; 40 (7)
Properties of Physical Activity Questionnaires
correlated. Internal consistency is not relevant for PA questionnaires because items refer to different aspects of the construct, for example, duration versus frequency or sports versus work. These items do not need to be highly correlated. There is increasing consensus on the methodology of assessing measurement properties of measurement instruments.[48] However, there is less consensus on criteria for what constitutes adequate measurement properties. The criteria that we presented here should therefore be considered as useful rules of thumb, but researchers may want to make their own choices. Systematic reviews show that the number of available PA questionnaires is large. We found 61 (versions of) questionnaires for children, 83 for adults and 13 for the elderly.[1-3] Many of them have been developed for a specific study and have been used and evaluated only once or were modified thereafter. Most of these questionnaires have only partly been tested for their measurement properties – some not at all. Measurement properties were often unsatisfactory. PA studies are hampered by lack of consistency with respect to the approach used to measure PA and lack of knowledge about the measurement properties of the questionnaires. Comparison between studies is difficult because many different questionnaires are used. More effort should be put into the improvement and validation of the most promising questionnaires. These are the questionnaires with a clearly defined construct, purpose and target population and with good content validity. Less effort should be put into the development of new questionnaires. 5. Future Recommendations We recommend: Consideration regarding the description of the constructs being measured with PA questionnaires. In addition, when a new questionnaire is developed, it needs to be justified as to why this questionnaire is needed and why it is superior to questionnaires that may already exist. More attention to the content validity of PA questionnaires, i.e. the relevance and comprehensiveness of the content for the construct, purpose and target population of interest. ª 2010 Adis Data Information BV. All rights reserved.
535
High quality studies with larger sample sizes on the measurement properties of PA questionnaires. Study samples need to be representative for the populations in which the questionnaire is going to be used in the future. In assessing construct validity, comparison instruments that measure similar constructs, with adequate measurement properties. Specific hypotheses should be defined and tested about expected correlations or differences between groups. More studies on responsiveness and interpretation of PA questionnaires. Discouraging researchers and journal editors from developing and publishing new (versions of) PA questionnaires without evidence that they are more appropriate than existing ones. 6. Conclusion The QAPAQ is a tool for selecting a PA questionnaire for a certain purpose, by appraising the qualitative attributes and measurement properties of PA questionnaires. It can also be used to design or report a study on measurement properties of PA questionnaires. Using such a checklist may contribute to improving assessment, reporting and appraisal of the content and measurement properties of PA questionnaires. Acknowledgements The authors received no funding for the conduct of this study or the writing of this review. The authors have no conflicts of interest that are directly relevant to the content of this review.
Appendix Parameters of Measurement Error and Reliability Reliability Intraclass Correlation Coefficient (ICC)
ICC ¼
varp varp þ vart þ vare
ðEq: 1Þ
Sports Med 2010; 40 (7)
Terwee et al.
536
Where varp = variance between people; vart = variance between time points; and vare = random error. Equation 1 is a general formula for the intraclass correlation coefficient (ICC). Many different ICCs can be calculated. For test-retest reliability, a two-way random effects model is preferred. For more information about different ICCs see McGraw and Wong.[34] Measurement Error Standard Error of Measurement (SEM)
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi pffiffiffiffiffiffiffiffiffi vart þ vare or SEM ¼ vare (Eq: 2Þ In equation 2, SEM is an indication of the error of one single score, and can be used to calculate a confidence interval around a single score. Some people prefer to include the variance between time points (vart) in the SEM because they consider this variance part of the measurement error,[30] while others do not. Using equation 3, the SEM can be converted into the smallest detectable change (SDC): pffiffiffi SDC ¼ 1:96 2 SEM (Eq: 3Þ SDC reflects the smallest change in score in one person that can be interpreted as a ‘true’ change, i.e. beyond measurement error.[30] The SDC reflects the confidence interval around a single change score, thus a change score of one individual.[33] In research, where the interest is in mean changes in groups of p people, the measurement is ffiffiffi reduced by a factor n (where n is the sample size). SDCgroup reflects the smallest mean change score in a group that can be interpreted as a ‘true’ change, beyond measurement error.[30] In equation 4, the SDCgroup reflects the confidence interval around a mean change score in a group. pffiffiffi 1:96 2 SEM pffiffiffi SDCgroup ¼ n ðEq: 4Þ Equation 5 presents limits of agreement (LOA). (Eq: 5Þ LOA ¼ 1:96 SDchange The LOA same pffiffiffiand SDC are the p ffiffiffiffiffiffiffiffiffi because: SDchange ¼ 2 SEM if SEM ¼ vare SEM ¼
ª 2010 Adis Data Information BV. All rights reserved.
References 1. Chinapaw MJM, Mokkink LB, Poppel MNM, et al. Physical activity questionnaires for youth: a systematic review of measurement properties. Sports Med 2010; 40 (7): 539-63 2. Forsen L, Waaler Loland N, Vuillemin A, et al. Self-administered physical activity questionnaires for elderly: a systematic review of measurement properties. Sports Med 2010; 40 (7): 601-23 3. van Poppel MNN, Chinapaw MJM, Mokkink LB, et al. Physical activity questionnaires for adults: a systematic review of measurement properties. Sports Med 2010; 40 (7): 565-600 4. Lagerros YT, Lagiou P. Assessment of physical activity and energy expenditure in epidemiological research of chronic diseases. Eur J Epidemiol 2007; 22 (6): 353-62 5. Altschuler A, Picchi T, Nelson M, et al. Physical activity questionnaire comprehension: lessons from cognitive interviews. Med Sci Sports Exerc 2009; 41 (2): 336-43 6. Lagerros YT. Physical activity: the more we measure, the more we know how to measure. Eur J Epidemiol 2009; 24 (3): 119-22 7. Pols MA, Peeters PH, Kemper HC, et al. Methodological aspects of physical activity assessment in epidemiological studies. Eur J Epidemiol 1998; 14 (1): 63-70 8. Feinstein AR. Clinimetrics. New Haven (CT): Yale University Press, 1987 9. Terwee CB, Bot SDM, de Boer MR, et al. Quality criteria were proposed for measurement properties of health status questionnaires. J Clin Epidemiol 2007; 60: 34-42 10. Mokkink LB, Terwee CB, Knol DL, et al. Protocol of the COSMIN study: COnsensus-based Standards for the selection of health Measurement INstruments. BMC Med Res Methodol 2006; 6: 2 11. Rennie KL, Wareham NJ. The validation of physical activity instruments for measuring energy expenditure: problems and pitfalls. Public Health Nutr 1998; 1 (4): 265-71 12. Martinez SM, Ainsworth BE, Elder JP. A review of physical activity measures used among US Latinos: guidelines for developing culturally appropriate measures. Ann Behav Med 2008; 36 (2): 195-207 13. Wareham NJ, Rennie KL. The assessment of physical activity in individuals and populations: why try to be more precise about how physical activity is assessed? Int J Obes Relat Metab Disord 1998; 22 Suppl. 2: S30-8 14. Vanhees L, Lefevre J, Philippaerts R, et al. How to assess physical activity? How to assess physical fitness? Eur J Cardiovasc Prev Rehabil 2005; 12 (2): 102-14 15. Ainsworth BE. How do I measure physical activity in my patients? Questionnaires and objective methods. Br J Sports Med 2009; 43 (1): 6-9 16. Craig CL, Marshall AL, Sjostrom M, et al. International physical activity questionnaire: 12-country reliability and validity. Med Sci Sports Exerc 2003; 35 (8): 1381-95 17. Staten LK, Taren DL, Howell WH, et al. Validation of the Arizona Activity Frequency Questionnaire using doubly labeled water. Med Sci Sports Exerc 2001; 33 (11): 1959-67 18. Baecke JA, Burema J, Frijters JE. A short questionnaire for the measurement of habitual physical activity in epidemiological studies. Am J Clin Nutr 1982; 36 (5): 936-42
Sports Med 2010; 40 (7)
Properties of Physical Activity Questionnaires
19. Dolan SH, Williams DP, Ainsworth BE, et al. Development and reproducibility of the bone loading history questionnaire. Med Sci Sports Exerc 2006; 38 (6): 1121-31 20. Tsubono Y, Tsuji I, Fujita K, et al. Validation of walking questionnaire for population-based prospective studies in Japan: comparison with pedometer. J Epidemiol 2002; 12 (4): 305-9 21. Ainsworth BE, Sternfeld B, Richardson MT, et al. Evaluation of the kaiser physical activity survey in women. Med Sci Sports Exerc 2000; 32 (7): 1327-38 22. Gionet NJ, Godin G. Self-reported exercise behavior of employees: a validity study. J Occup Med 1989; 31 (12): 969-73 23. Ainsworth BE, Jacobs Jr DR, Leon AS, et al. Assessment of the accuracy of physical activity questionnaire occupational data. J Occup Med 1993; 35 (10): 1017-27 24. Chasan-Taber L, Erickson JB, McBride JW, et al. Reproducibility of a self-administered lifetime physical activity questionnaire among female college alumnae. Am J Epidemiol 2002; 155 (3): 282-9 25. Kirschner B, Guyatt G. A methodological framework for assessing health indices. J Chron Dis 1985; 38: 27-36 26. Lohr KN, Aaronson NK, Alonso J, et al. Evaluating quality of life and health status instruments: development of scientific review criteria. Clin Ther 1996; 18 (5): 979-92 27. Altman DG. Practical statistics for medical research. London: Chapman and Hall, 1991 28. Giraudeau B, Mary JY. Planning a reproducibility study: how many subjects and how many replicates per subject for an expected width of the 95 per cent confidence interval of the intraclass correlation coefficient. Stat Med 2001; 20: 3205-14 29. Mokkink LB, Terwee CB, Patrick DL, et al. International consensus on taxonomy, terminology, and definitions of measurement properties for health-related patientreported outcomes: results of the COSMIN study. J Clin Epidemiol. In press 30. de Vet HCW, Terwee CB, Knol DL, et al. When to use agreement versus reliability measures. J Clin Epidemiol 2006; 59: 1033-9 31. Bland JM, Altman DG. Statistical methods for assessing agreement between two methods of clinical measurement. Lancet 1986; 1 (8476): 307-10 32. Wilbur J, Holm K, Dan A. A quantitative survey to measure energy expenditure in midlife women. J Nurs Meas 1993; 1 (1): 29-40 33. Streiner DL, Norman GR. Health measurement scales: a practical guide to their development and use. New York: Oxford University Press, 2003 34. McGraw KO, Wong SP. Forming inferences about some intraclass correlation coefficients. Psychol Method 1996; 1: 30-46
ª 2010 Adis Data Information BV. All rights reserved.
537
35. Matton L, Wijndaele K, Duvigneaud N, et al. Reliability and validity of the Flemish Physical Activity Computerized Questionnaire in adults. Res Q Exerc Sport 2007; 78 (4): 293-306 36. Rigby AS. Statistical methods in epidemiology: v. Towards an understanding of the kappa coefficient. Disabil Rehabil 2000; 22 (8): 339-44 37. Nunnally JC, Bernstein IH. Psychometric theory. 3rd ed. New York: McGraw-Hill, 1994 38. McHorney CA, Tarlov AR. Individual-patient monitoring in clinical practice: are available health status surveys adequate? Qual Life Res 1995; 4: 293-307 39. Patterson P. Reliability, validity, and methodological response to the assessment of physical activity via self-report. Res Q Exerc Sport 2000; 71 (2 Suppl.): S15-20 40. Plasqui G, Westerterp KR. Physical activity assessment with accelerometers: an evaluation against doubly labeled water. Obesity (Silver Spring) 2007; 15 (10): 2371-9 41. Brown WJ, Burton NW, Marshall AL, et al. Reliability and validity of a modified self-administered version of the Active Australia physical activity survey in a sample of mid-age women. Aust N Z J Public Health 2008; 32 (6): 535-41 42. Terwee CB, Dekker FW, Wiersinga WM, et al. On assessing responsiveness of health-related quality of life instruments: guidelines for instrument evaluation. Qual Life Res 2003; 12 (4): 349-62 43. Deyo RA, Centor RM. Assessing the responsiveness of functional scales to clinical change: an analogy to diagnostic test performance. J Chron Dis 1986; 39: 897-906 44. Troiano RP. Can there be a single best measure of reported physical activity? Am J Clin Nutr 2009; 89 (3): 736-7 45. Cortina JM. What is coefficient alpha? An examination of theory and applications. J Appl Psychol 1993; 78: 98-104 46. Fayers PM, Hand DJ. Causal variables, indicator variables and measurement scales: an example from quality of life. J R Statist Soc A 2002; 165: 233-61 47. Streiner DL. Being inconsistent about consistency: when coefficient alpha does and doesn’t matter. J Pers Assess 2003; 80 (3): 217-22 48. Mokkink LB, Terwee CB, Patrick DL, et al. The COSMIN checklist for assessing the methodological quality of studies on measurement properties of health status measurement instruments: an international Delphi study. Qual Life Res 2010; 19: 539-49
Correspondence: Dr Caroline B. Terwee, Department of Epidemiology and Biostatistics, EMGO institute for Health and Care Research, VU University Medical Center, Van der Boechorststraat 7, 1081 BT Amsterdam, the Netherlands. E-mail:
[email protected]
Sports Med 2010; 40 (7)
Sports Med 2010; 40 (7): 539-563 0112-1642/10/0007-0539/$49.95/0
REVIEW ARTICLE
ª 2010 Adis Data Information BV. All rights reserved.
Physical Activity Questionnaires for Youth A Systematic Review of Measurement Properties Mai J.M. Chinapaw,1 Lidwine B. Mokkink,2 Mireille N.M. van Poppel,1 Willem van Mechelen1 and Caroline B. Terwee2 1 Department of Public and Occupational Health, the EMGO Institute for Health and Care Research, VU University Medical Center, Amsterdam, the Netherlands 2 Department of Epidemiology and Biostatistics, the EMGO Institute for Health and Care Research, VU University Medical Center, Amsterdam, the Netherlands
Contents Abstract. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1. Methods. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1 Literature Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Eligibility Criteria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3 Selection of Papers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.4 Data Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.4.1 Description of Questionnaires . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.4.2 Reliability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.4.3 Construct Validity. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.4.4 Responsiveness. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2. Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 Description of Questionnaires . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Reliability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 Construct Validity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4 Responsiveness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3. Discussion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 Reliability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Validity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3 Comparison Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4 Recommendations Regarding Future Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Abstract
539 540 540 541 541 541 541 541 542 542 542 542 545 557 557 557 559 560 560 561 561
Because of the diversity in available questionnaires, it is not easy for researchers to decide which instrument is most suitable for his or her specific demands. Therefore, we systematically summarized and appraised studies examining measurement properties of self-administered and proxy-reported physical activity (PA) questionnaires in youth. Literature was identified through searching electronic databases (PubMed, EMBASE using ‘EMBASE only’ and SportDiscus) until May 2009. Studies were included if they reported on the measurement properties of self-administered and proxy-reported PA questionnaires in youth (mean age <18 years) and were published in the English language. Methodological quality and results of
Chinapaw et al.
540
included studies was appraised using a standardized checklist (qualitative attributes and measurement properties of PA questionnaires [QAPAQ]). We included 54 manuscripts examining 61 versions of questionnaires. None of the included questionnaires showed both acceptable reliability and validity. Only seven questionnaires received a positive rating for reliability. Reported validity varied, with correlations between PA questionnaires and accelerometers ranging from very low to high (previous day PA recall: correlation coefficient [r] = 0.77). In general, PA questionnaires for adolescents correlated better with accelerometer scores than did those for children. From this systematic review, we conclude that no questionnaires were available with both acceptable reliability and validity. Considerably more high-quality research is required to examine the validity and reliability of promising PA questionnaires for youth.
Physical activity (PA) is an important behaviour related to a number of health outcomes in children and adolescents.[1,2] Accurate assessment of PA levels is important, not only to understand the association between PA and health, but also to monitor secular trends in behaviour and to evaluate the effectiveness of interventions.[3] Therefore, valid, reliable and responsive instruments that measure PA are needed. Questionnaires are a commonly used method to estimate (change in) total amount of daily or weekly PA.[4,5] Other popular PA measures include movement counters and heart-rate monitoring. PA questionnaires are easy to administer, relatively inexpensive and acceptable to study participants.[6] Furthermore, in some situations, selfreports may be the only feasible method to be used in large-scale population surveys due to available resources. While objective methods such as heartrate monitoring and accelerometry may better capture the duration and intensity of PA, they provide no information about the type of PA behaviour or in what context and where the activity was performed (e.g. active transport, sports, school). In past decades, numerous questionnaires have been developed for different populations, including children and adolescents, with major differences in length, type of activities and recall period used. Recalling PA is a highly complex cognitive task requesting information about PA performed at some point in the past, with recall periods varying from 1 day to 1 week or ‘a usual week’. Youth are less likely to make accurate selfª 2010 Adis Data Information BV. All rights reserved.
report assessments than adults because of developmental differences, especially in the ability to think abstractly and perform detailed recall.[7,8] In addition, youth have an activity pattern that is much more variable and intermittent than that of adults.[9] Therefore, PA questionnaires may suffer from recall bias, especially in youth. Selection of an appropriate PA questionnaire depends not only on the specific purpose of the study (e.g. discrimination, evaluation, prediction), but also the characteristics of the population and the outcome of interest. Other critical considerations in the choice of a questionnaire are the relative importance of practical issues such as study size and budget, as well as reliability, validity and responsiveness. Because of the diversity in available questionnaires, it is not easy for researchers to decide which instrument is most suitable for his or her specific demands. Therefore, the aim of the present review is to summarize primary studies on measurement properties (reliability, construct validity and responsiveness) of self-report questionnaires that have been developed or modified for assessing PA in children and adolescents. This is one of a series of articles on measurement properties of PA questionnaires published in Sports Medicine. 1. Methods 1.1 Literature Search
Literature searches were performed in PubMed, EMBASE (using ‘EMBASE only’) and Sports Med 2010; 40 (7)
Physical Activity Questionnaires for Youth
in SportDiscus (complete databases up until May 2009) on the topic of self-report questionnaires of PA. The full search strategy in PubMed is presented as follows: ‘exercise’[mesh] OR ‘physical activity’[tiab] OR ‘motor activity’[mesh] AND ‘questionnaire’[mesh] OR ‘questionnaire*’[tiab]. Limits: ‘humans’. In EMBASE and SportDiscus, ‘physical activity’ and ‘questionnaire’ were used as free-text words, and in EMBASE this was complemented with the EMTREE term ‘exercise’. 1.2 Eligibility Criteria
We used the following inclusion criteria: (i) the aim of the study should be to evaluate the measurement properties of a self-report questionnaire; (ii) the aim of the questionnaire should be to measure PA in youth (average age of the study population <18 years) [PA was defined as any bodily movement produced by skeletal muscles that results in energy expenditure above resting level]; (iii) the questionnaire could be used to measure PA in youth in the general population; (iv) the article was published in the English language; and (v) information should be provided on at least one of the measurement properties of the self-report questionnaire in a sample of youth. We included information on measurement properties only if it was intentionally collected or calculated to assess the measurement properties of the particular self-report questionnaire. We included proxy-report questionnaires but excluded PA interviews or diaries. We also excluded studies that evaluated the measurement properties of a self-report questionnaire in a specific population, such as patients or obese youth. 1.3 Selection of Papers
Two independent reviewers (MC and LM) performed abstract selection, selection of full-text articles, data extraction and quality assessment. Disagreements were discussed and resolved. We retrieved the full-text paper of all abstracts that fulfilled the inclusion criteria and of abstracts that did not contain measurement properties but ª 2010 Adis Data Information BV. All rights reserved.
541
indicated that they were presented in the full-text paper. 1.4 Data Extraction 1.4.1 Description of Questionnaires
We extracted data from the included papers using a standardized data-extraction form based on a standard checklist for appraising the qualitative attributes and measurement properties of PA questionnaires (QAPAQ).[10] The following data were extracted: (i) the target population for which the questionnaire was developed; (ii) the construct(s) that the questionnaire intends to measure (e.g. habitual PA); (iii) the dimensions of PA that the questionnaire is measuring (e.g. frequency, duration and intensity); (iv) the type of activities that the questionnaire is measuring (e.g. sport, recreational, transport, school, household activities and other); (v) the number of questions; (vi) the recall period that the questions refer to; and (vii) the scoring algorithm (which includes the type and number of scores that were calculated, e.g. total energy expenditure or minutes of activity per day). In addition, we extracted and rated the methods and results based on the QAPAQ. Reliability, validity and responsiveness depend on the setting and the population in which they are assessed. Therefore, a clear description of the design of each individual primary study, including characteristics of the study population, design issues such as time interval, sample size and data analysis, was required in order to receive a positive rating. Furthermore, if any methodological weakness in the design or execution of the primary study was found (e.g. small sample size, inadequate time interval between test and retest), the evaluated measurement property was rated as ‘indeterminate’. 1.4.2 Reliability
Reliability was rated as positive (+), negative (-) or indeterminate (?), depending on the methods and results of the primary studies. According to the QAPAQ checklist, the preferred method is the intraclass correlation coefficient (ICC), or Kappa for dichotomous data or weighted Kappa for ordinal data. An ICC of >0.70 is considered Sports Med 2010; 40 (7)
Chinapaw et al.
542
acceptable.[11] The use of Pearson correlation coefficients is considered inadequate because systematic errors are neglected;[12,13] however, most studies included in this review did calculate Pearson correlation coefficients. We considered it too conservative to rate all these studies as ‘indeterminate’, as Pearson correlations >0.80 would likely result in ICCs >0.70. Therefore, we decided to rate studies that reported a Pearson correlation >0.80 as positive. Pearson correlations <0.70 would never result in ICCs ‡0.70, and were consequently rated as ‘negative’. The time interval between the test and retest should have been described and should be short enough to ensure that subjects had not changed their PA levels, but long enough to prevent recalling the previous answers. We defined an adequate time interval as follows: >1 day but <3 months for questionnaires recalling a usual week; >1 day but <2 weeks for questionnaires recalling the previous week; >1 day but <1 week for questionnaires recalling the previous day, assuming that the two tests recall the exact same day. A positive score was given if the study population consisted of at least 50 participants; the ICC or Kappa or Pearson correlation was above the specified cut-off point (ICC >0.70; Kappa >0.70; Pearson >0.80) and the time interval between test and retest was adequate. If the correlation was below the specified cut-off point, a negative score was given. If the sample size was <50 participants or the time interval inadequate, the score was rated as ‘indeterminate’. We sorted the questionnaires based on (i) outcome measures (ICC, Kappa, correlation) – highest to lowest; and (ii) sample size ‡50 and <50 (table I). 1.4.3 Construct Validity
We initially intended to use the preferred method for assessing construct validity from the QAPAQ, stating that hypotheses about expected correlations between the questionnaire under study and other measures, or about expected differences in scores on the questionnaire between specific groups of subjects, should be defined in advance when testing validity. Almost ª 2010 Adis Data Information BV. All rights reserved.
none of the studies included in this review formulated hypotheses a priori. The ‘best’ method to use for comparison depends on what the questionnaire is aiming to measure. Instead of rating all questionnaires as ‘indeterminate’, we did not rate the questionnaires but instead sorted the studies based on (i) the comparison instrument (accelerometer, doubly labelled water, direct observation, pedometer, heart rate monitor, other); (ii) the outcome measures – highest to lowest; and (iii) the sample size (table II). 1.4.4 Responsiveness
Responsiveness refers to the ability of an instrument to detect change over time in the construct to be measured.[64] It should be considered an aspect of validity in a longitudinal setting.[64] Since we included only one study reporting on responsiveness, this study was not rated, but is briefly described in the results section. 2. Results The literature search yielded a total of 21 891 hits: 9733 in PubMed, 7601 in EMBASE and 4284 in SportDiscus. We included 54 manuscripts examining 61 versions of questionnaires (see figure 1). 2.1 Description of Questionnaires
Table III presents a description of the included questionnaires. We sorted the questionnaires on target population: preschool children (mean age <6 years); children (mean age >6 and <12 years); and adolescents (mean age >12 and <18 years). We found six questionnaires for assessing PA in preschool children, all of which had been completed by proxy report, 25 questionnaires for children and 31 that had been developed for adolescents. The construct of what the questionnaire intends to measure was mostly broadly described as ‘physical activity’, sometimes limited to certain types of activities. The dimensions that were measured were duration, frequency, intensity or a combination of these dimensions. Because of the different dimensions used, the unit of measurement of the questionnaires differed. Sports Med 2010; 40 (7)
Physical Activity Questionnaires for Youth
543
Table I. Reliability of physical activity (PA) questionnaires for youth sorted by level of evidence Questionnairea
Study populationb
Time interval
Results
Rating
Preschoolers (mean age <6 y) CLASS (proxy)[14]
n = 58 Sex: 63% ~ Age: 5.3 (0.5) [5–6]
At least 14 d
MVPA/VPA/total PA frequency: proxy 5–6 y: ICC = 0.74/0.87/0.83 MVPA/VPA/total PA duration: proxy 5–6 y: ICC = 0.49/0.81/0.76 % agreement total PA 89.2; total VPA 58.6; total MPA 84.2
Proxy +
NPAQ (proxy)[15]
n = 72 Sex: 55% ~ Age: 5.7 (0.5) [NR]
2–8 wk
NPAQ total (collapsed into low, moderate, high) weighted Kappa: 0.39 (0.22–0.56); ICC: 0.70 (0.58–0.87); Spearman r = 0.61
+/-
CPAR (proxy)[16]
n = 27 Sex: 38% ~ Age: 4.9 (0.7) [4–5]
7d
ICC (one-way factor ANOVA): MVPA 0.39; PAEE 0.25
Ind
Children (mean age >6 and <12 y) PAQ-C[17]
Study 1 n = 215 Sex: 42% ~ Age: [9–15] Study 2 n = 84 Sex: 51% ~ Age: [9–14]
1 wk
ICC: # 0.75; ~ 0.82
+
GAQ[18]
n = 68 Sex: 100% ~ Age 9.0 (0.6) [8–10]
4d
ICC 28 activities: yesterday 0.78; usual 0.82 18 activities: yesterday 0.70; usual 0.79
+
CLASS (selfreport and proxy)[14]
n = 111 Sex: 27% ~ Age: 10.6 (0.8) [10–12]
Children 7 d, proxy at least 14 d
ICC MVPA/VPA/total PA frequency: self-report 0.75/0.42/0.36; proxy 10–12 y 0.67/0.75/0.69 MVPA/VPA/total PA duration: self-report 0.37/0.41/0.24; proxy 10–12 y 0.58/0.62/0.74 % agreement total PA 89.2; total VPA 58.6; total MPA 84.2
Self-report and proxy +/-
CLASS (selfreport and proxy)[19]
n = 112 Sex: 63% ~ Age: 10.6 (0.76) [9–13]
Children within 7 d, proxy –14 d
ICC proxy report/self-report (in 10–12 y only) Frequency MPA: ICC = 0.75/0.97; frequency VPA: ICC = 0.74/0.58; duration MPA: ICC = 0.73/0.43; duration VPA: ICC = 0.31/0.29; proxy/self-report: 10/7 of 30 PA items Kappa >0.70
Proxy +/-; self-report -
GAQ[20]
n = 172 Sex: 100% ~ Age: 8.8 (0.8) [8–10] Race: African American
12 wk
ICC: 28 activities: yesterday 0.59; usual 0.59 18 activities: yesterday 0.57; usual 0.55
Yesterday Ind; usual -
Daughter questionnaire[21]
n = 69 Sex: 100% ~ Age: 9.9 (8.5–12.7)
Girls: 12–16 d
ICC: walking schoolday/weekend 0.48/0.32; exercise schoolday/weekend 0.36/0.32
-
Modified SAPAC[22]
n = 103 Sex: 50% ~ Age: 11.7 (0.5)
Minimum 5 d
Total PA ICC # 0.20; ~ 0.19
-
Stairs score[23]
n = 84 Sex: 100% ~ Age: 11.1 (1.54) [7–15]
11 mo
Spearman r: 0.59 (95% CI 0.43–0.71)
-
Continued next page
ª 2010 Adis Data Information BV. All rights reserved.
Sports Med 2010; 40 (7)
Chinapaw et al.
544
Table I. Contd Questionnairea
Study populationb
Time interval
Results
Rating
Specific activity score[23]
n = 84 Sex: 100% ~ Age: 11.1 (1.54) [7–15]
11 mo
Spearman r: 0.53 (0.36–0.67)
-
Godin-Shephard (proxy)[23]
n = 84 Sex: 100% ~ Age: 11.1 (1.54) [7–15]
11 mo
Spearman r: 0.48 (0.30–0.63)
-
Activity-rating instrument[24]
n = 30 Sex: 50% ~ Age: 11.2 (2.0) [7–15]
1 mo
ICC: PA rating 0.85
Ind
CPAR[25]
n = 22 Sex: 50% ~ Age: 11.8 (1.0)
1–2 wk
ICC: total EE 0.95; activity EE 0.82
Ind
Mother and father Questionnaire[21]
n = 47/35 mother/father Sex: 100% ~ Age: 9.9 (8.5–12.7)
12–28 d
ICC: mother/father schoolday light 0.11/0.32; MPA 0.28/0.24; VPA 0.72/0.75 ICC: mother/father weekend light 0.32/0.12; MPA 0.33/0.13; VPA 0.65/0.72
Ind Ind
PAQ[26]
n = 24 children Age: 8–11 y (3rd–5th grade)
Ind
Frequency section: short PAQ r = 0.82; long PAQ r = 0.49
Ind
Older children and adolescents (mean age >12 y) QAPACE[27]
n = 121 Sex: 54% ~ Age: [8–16]
6 wk
Pearson ICC: 0.96 (0.95–0.97); LOA -515.5 and 532.5 kJ · 24-1 h, mean figures 7566 kJ/d-1 h
+
OPAQ[28]
n = 87 Sex: 45% ~ Age: 13.1 (0.9)
1 wk
ICC: MPA 0.76; VPA 0.80; MVPA 0.91
+
Refined 60-min MVPA screening measure[29]
n = 73 Sex: 65% ~ Age: 12.1 (0.9)
Same d up to 1 mo
ICC: 0.77 (0.76 with time to retest as a co-variate); same day 0.88 (n = 42), up to 1 mo 0.53 (n = 31); Kappa: 61%, same day 84%, up to 1 mo 36%
+
WHO HBSC[30]
n = 71 Sex: 56% ~ Age: 14.9 (1.6) [13–18]
8–12 d
Frequency: ICC = 0.73; duration: ICC = 0.71
+
Epidemiological questionnaire[31]
n = 100 Sex: 53% ~ Age: [15–18]
1 mo
Spearman rank r 1 mo: h/wk 0.79; MET h/wk 0.85; VPA h/wk 0.91 (1 y: 0.66, 0.72, 0.72, respectively)
1 mo +; 1 y -
3DPAR[32]
n = 71 Sex: 68% ~ Age: 12.5 (1.1)
1d
Pearson r MVPA 0.68; VPA 0.83 % agreement in activities mentioned: # 51%; ~ 47%
MVPA -; VPA +
APARQ[33]
Sample 1 n = 121 Sex: 48% ~ Age: 13.7 (0.4) Sample 2 n = 105 Sex: 29% ~ Age: 15.7 (0.4)
2 wk
ICC total EE # grade 8/10: summer 0.30/0.79, winter 0.49/0.52 ~ grade 8/10: summer 0.52/0.86, winter 0.36/0.91 Weighted Kappa (vigorous, adequate and inactive) # grade 8/10: summer 0.33/0.62, winter 0.39/0.59 ~ grade 8/10: summer 0.55/0.71, winter 0.71/0.58
Grade 8 -; grade 10 +/-
PA screening measure[29]
n = 250 Sex: 56% ~ Age: 14.6 (1.4)
2 wk
Nine scores: 20-min bout typical wk/past 7 d/composite; accumulate 30-min typical wk/past 7 d/composite; accumulate 60-min typical wk/past 7 d/composite: ICC range 0.55–0.79; Kappa % 45–61
-
Continued next page
ª 2010 Adis Data Information BV. All rights reserved.
Sports Med 2010; 40 (7)
Physical Activity Questionnaires for Youth
545
Table I. Contd Questionnairea
Study populationb
Time interval
Results
Rating
n = 229 Sex: 57% ~ Age: [7–19]
6d
ICC Total 0.48–0.68; sport 0.62–0.71; leisure 0.60–0.76; work/chore 0.49–0.65
-
SAPAC[32]
n = 66 Sex: 71% ~ Age: 12.5 (1.1)
1d
Pearson r MVPA 0.67; VPA 0.63 % agreement in activities mentioned: # 34%; ~ 42%
-
IPAQ[35]
n = 200 Age: 16 (0.4)
2 wk
Spearman r and ICC Total 0.45/0.37; LPA 0.44/0.28; MPA 0.33/0.15; VPA 0.52/0.40
-
PAQA[35]
n = 158 Age: 16 (0.4)
2 wk
Spearman r and ICC Total 0.48/0.40; LPA 0.22/0.28; MPA 0.32/0.12; VPA 0.57/0.40
-
IPAQ[30]
n = 71 Sex: 56% ~ Age: 14.9 (1.6) [13–18]
8–12 d
VPA: d/wk: ICC = 0.54; and min/d: ICC = 0.30 MPA: d/wk: ICC = 0.55; and min/d: ICC = 0.34; walking: d/wk: ICC = 0.62; and min/d: ICC = 0.10; sitting: min/d: ICC = 0.27
-
WHO HBSC[36]
Sample 1 n = 121 Sex: 48% ~ Age: 13.7 (0.4) Sample 2 n = 105 Sex: 29% ~ Age: 15.7 (0.4)
2 wk
Kappa: frequency 0.36–0.60; duration 0.22–0.58; combination 0.12–0.70 (1 · Kappa = 0.70 boys, y 10, two categories)
-
YPAQ[16]
Sample 1 n = 25 Sex: 30% ~ Age: 13.1 (0.3) [12–13] Sample 2 n = 24 Sex: 70% ~ Age: 17.1 (0.6) [16–17]
7d
ICC (one-way factor ANOVA) Group 12–13 y: MVPA 0.92; PAEE 0.86 Group 16–17 y: MVPA 0.73; PAEE 0.79
Ind
CHASE[16]
n = 25 Sex: 30% ~ Age: 13.1 (0.3) [12–13]
7d
ICC (one-way factor ANOVA): lifestyle score 0.64
Ind
SWAPAQ[16]
n = 24 Sex: 70% ~ Age: 17.1 (0.6) [16–17]
7d
ICC (one-way factor ANOVA) MVPA 0.05; PA EE 0.02
Ind
Fels PAQ
[34]
a
See table IV for definitions of questionnaire names/acronyms.
b
Age is presented as mean years (SD) [range].
EE = energy expenditure; ICC = intraclass correlation coefficient; Ind = indeterminate score; LOA = limits of agreement; LPA = light-intensity PA; MPA = moderate-intensity PA; MVPA = moderate to vigorous PA; NR = not reported; r = correlation coefficient; VPA = vigorous PA; + indicates positive score; - indicates negative score; +/- indicates some positive and some negative scores; ~ indicates female; # indicates male.
The recall period was variously ‘the previous day’, ‘a usual day or week’ or ‘the past year’. 2.2 Reliability
Table I summarizes the reliability studies. Thirty-five questionnaires were tested for reliaª 2010 Adis Data Information BV. All rights reserved.
bility (three among preschoolers, 14 among children and 17 among adolescents). The time interval between the first and second administration varied from the same day to 11 months. Only seven questionnaires[14,17,18,27-30] received a positive rating for reliability: in preschoolers, the CLASS (see table IV for a full list of definitions Sports Med 2010; 40 (7)
Chinapaw et al.
546
Table II. Construct validity of physical activity (PA) questionnaires for children and adolescents sorted by comparison measure, outcome and sample size Questionnairea
Study populationb
Comparison measure
Results
Preschool children (mean age <6 y) CPAQ (proxy)[16]
n = 27 Sex: 38% ~ Age: 4.9 (0.7) [4–5]
Accelerometer (MVPA); DLW (PA EE)
Accelerometer Spearman r = 0.42; DLW Spearman r = 0.22, wide ratio LOA
NPAQ (proxy)[15]
n = 204 Sex: 55% ~ Age: 5.7 (0.5)
Accelerometer (MTI)
Accelerometer total/vigorous counts: rho = 0.33/0.36
Parental report – outdoor time checklist (proxy)[37]
n = 250 Sex: 43% ~ Age: 44 mo [29–52] Country: USA
Accelerometer (RT3 triaxial research tracker); recall questionnaire
Accelerometer: r = 0.33; recall: r = 0.57
Parental report – outdoor time recall questionnaire (proxy)[37]
n = 250 Sex: 43% ~ Age: 44 mo [29–52] Country: USA
Accelerometer (RT3 triaxial research tracker); checklist questionnaire
Accelerometer: r = 0.20; checklist: r = 0.57
CLASS (proxy)[14]
n = 58 Sex: 63% ~ Age: 5.3 (0.5) [5–6]
Accelerometer (MTI)
Accelerometer MPA/VPA/total PA/total counts/d: r = -0.06/-0.04/-0.04/0.05
Questionnaire to teachers (proxy)[38]
n = 49 Sex: 51% ~ Age: [5–6]
Direct observation; pedometer
Direct observation: r = -0.19–0.27; pedometer: r = 0.25
Questionnaire to mothers (proxy)[38]
n = 49 Sex: 51% ~ Age: [5–6]
Direct observation; pedometer
Direct observation: r = -0.14–0.12; pedometer: r = 0.14
Primary school children (mean age >6 and <12 y) SNAP[39]
n = 121 Sex: 60% ~ Age: 10.7 (2.2)
Accelerometer (GT1M)
Mean difference between SNAP and accelerometer: -9 min (-23, 5) mean difference in proportions complying to 60 min/d MVPA guideline 0.02; 90% CI -0.08, 0.12)
PA Questionnaire for parents and teachers[40]
n = 62 Sex: 48% ~ Age: 7.0 (0.7)
Accelerometer (Caltrac); other (HR monitor)
Accelerometer: r = 0.53; HR: r = 0.40
ACTIVITY[41]
n = 47 Sex: 60% ~ Age: 7.7 (0.45)
Accelerometer (Caltrac); HR monitor
Accelerometer (CNTSMIN): r = 0.40; HR: average activity/50%; HR reserve: 0.17/0.51
CPAR[25]
n = 45 Sex: 56% ~ Age: 11.8 (1.0)
Accelerometer (Tritrac)
TEE/AEE vs accelerometer: r = 0.51/0.20; % agreement = 78%; Kappa = 0.398 categorizing in active/inactive
MARCA[42]
n = 66 Sex: 50% ~ Age: 11.6 (0.8)
Accelerometer (MTI)
6 of 7 hypotheses correct PAL/VPA/min locomotion: r = 0.45/0.35/0.37
PAQ-C[43]
n = 97 Sex: 58% ~ Age: 11.3 (1.39) [9–14]
Accelerometer (Caltrac); questionnaire (7-d recall interview, activity rating; leisure-time exercise (Godin 1 and 2); fitness test (Chester step test)
Five hypotheses: moderate correlations with all measures Accelerometer: r = 0.39; 7-d recall interview: r = 0.46, 0.43; activity rating: r = 0.57; leisure-time exercise (Godin 1 and 2): r = 0.41, -0.57; fitness test: r = 0.28; sex differences: none Continued next page
ª 2010 Adis Data Information BV. All rights reserved.
Sports Med 2010; 40 (7)
Physical Activity Questionnaires for Youth
547
Table II. Contd Questionnairea
Study populationb
Comparison measure
Results
n = 449 Sex: 52% ~ Age: 11.2 (0.3)
Accelerometer (Actigraph, MVPA); hip BMC; spine BMC; whole body BMC
Spearman r: accelerometer: r = 0.38 (boys) and r = 0.24 (girls); Partial Spearman rank r: hip BMC: r = 0.28 (boys), r = 0.08 (girls); spine BMC r = 0.19 (boys), r < 0.01 (girls); whole body BMC: r = 0.22 (boys), r = 0.08 (girls)
SAPAC[45]
n = 125 Sex: 56% ~ Age: 10.9 (0.53)
Accelerometer; HR monitor; PA interview
Accelerometer min MVPA/MVPA METs/weighted; MVPA METs/no. of activities: r = 0.30/0.32/0.32/0.02; HR min MVPA/MVPA METs/weighted; MVPA METs/no. of activities: r = 0.58/0.60/0.59/0.28; PACI min MVPA/MVPA METs/weighted; MVPA METs/no. activities: r = 0.64/0.65/0.65/0.47
GAQ[18]
n = 68 Sex: 100% ~ Age: 9.0 (0.6) [8–10]
Accelerometer (MTI/Computer Science and Applications, Inc.)
Accelerometer 18 activities yesterday: r = 0.28; 18 activities usual d: r = 0.30
PAQ[7]
n = 52, grade 3 Sex: % ~ Age: NR Race: American Indian
Accelerometer (Tritac)
Accelerometer before and after school: r = 0.15; during school r = 0 0.41
OPAQ[28]
n = 51 Sex: 47% ~ Age: 12.6 (0.5)
Accelerometer (Caltrac)
Spearman r: MPA: 0.01; VPA: 0.33; MVPA: 0.32
PAQ[26]
n = 24 Sex: NR Age: grade 3–5 (NR) [NR]
Accelerometer (Caltrac)
Short PAQ vs accelerometer: r = 0.27; long PAQ vs accelerometer: r = 0.13
Health Survey for England PA Questionnaire (proxy)[46]
n = 130 Sex: 51% ~ Age: 7.0 (0.3) [6–7]
Accelerometer
LOA: -131–376 min/d Spearman r: 0.16
Self-report PA Questionnaire for Schoolchildren[47]
n = 34 Sex: 100% # Age: 10.8 (0.8) Country: Japan
Accelerometer (Actiwatch); other (life recorder)
Accelerometer: regression coefficients for counts/d ranging from -0.25 to 0.07
CLASS (self-report and proxy)[19]
n = 112 Sex: 63% ~ Age: 10.6 (0.76) [9–13]
Accelerometer (MTI)
Proxy vs accelerometer: MVPA r = 0.01/0.18; self-report vs accelerometer MPA/VPA: -0.11/0.15
CLASS (self-report and proxy)[14]
n = 111 Sex: 27% ~ Age: 10.6 (0.8) [10–12]
Accelerometer (MTI)
Proxy vs accelerometer MPA/VPA/total PA/total counts/d: r = 0.07/0.24/0.09/0.11; self-report vs accelerometer MPA/VPA/total PA/total counts/d: r = 0.02/-0.04/-0.04/0.06
Modified GodinShephard[48]
n = 24 Sex: 50% ~ Age: [10–13]
Accelerometer (Caltrac)
School d: questionnaire = 17.2 + 1.16 · Caltrac (–98.5); weekend d: questionnaire = 68.5 + 1.26 · Caltrac (–164.8)
GAQ[20]
n = 172 Sex: 100% ~ Age: 8.8 (0.8) [8–10] Race: African American
Accelerometer (Computer Science and Applications, Inc.)
18 activities yesterday vs accelerometer: 06:00–12:00/12:00–18:00: r = 0.06/0.03; 18 activities usual d vs accelerometer: 06:00–12:00/12:00–18:00; r = 0.12/0.11
PAQ-C
[44]
Continued next page
ª 2010 Adis Data Information BV. All rights reserved.
Sports Med 2010; 40 (7)
Chinapaw et al.
548
Table II. Contd Questionnairea
Study populationb
Comparison measure
Results
Activity rating instrument[24]
n = 30 Sex: 50% ~ Age: 11.2 (2.0) [7–15]
Accelerometer (Computer Science and Applications, Inc.); other questionnaire
Accelerometer average movement count/frequency VPA: r = –0.03/0.04
PAQ-C[49]
Study 1 n = 991 Sex: 49% ~ Age: 10.7 (0.5) Study 2 n = 414 Sex: 49% ~ Age: 8.7 (0.57) n = 89 Sex: 57% ~ Age: 11.06 (1.46) [8–13]
Cardiovascular fitness (modified Harvard Step test)
Study 1: fitness test: r = 0.08; Study 2: fitness test: European American/African American: 0.30/0.02
Questionnaire (athletic competence); other (behavioural conduct, activity rating, teacher’s rating, MVPA) Other questionnaires
Athletic competence: r = 0.48; behavioural conduct: r = 0.16, activity rating: MVPA: 0.53, MVPA >10 min: 0.41, teacher’s rating: r = 0.45, PAQ-C + MVPA: r = 0.53, 0.41; sex differences: p < 0.05 Perspire score: r = 0.4; Stair score: r = 0.2; specific activity score: r = 0.38
Other questionnaires
Perspire score/Stair score/Godin-Shephard: r = 0.3/0.10/0.38
Other questionnaires
Perspire score: r = 0.17; Godin-Shephard: r = 0.2; Specific activity score: r = 0.1
Questionnaire (mother and father version); activity diary
Activity diary: ICC = 0.19–0.52 (for 10 subscores)
Accelerometer (Caltrac); pedometer; HR monitor
Caltrac r = 0.77 (n = 48); pedometer: r = 0.88 (n = 48); HR: r = 0.63 (n = 26)
Total volume of self-reported PA (total METmin); time spent in PA/total counts/total counts per min per d/time spent sedentary: 0.51/0.49/0.45/-0.45 MPA = 0.31; VPA = 0.25; MVPA = 0.44; EE from MVPA = 0.44
PAQ-C[43]
n = 479 Sex: 100% ~ Age: 11.1 (1.54) [7–15] Specific activity score[23] n = 471 Sex: 100% ~ Age: 11.1 (1.54) [7–15] n = 479 Stairs score[23] Sex: 100% ~ Age: 11.1 (1.54) [7–15] n = 69 Daughter Sex: 100% ~ Questionnaire[21] Age: 9.9 [8.5–12.7] Older children and adolescents (mean age >12 y) Sample 1 PDPAR[50] n = 48 Age: grade 7–12 Sample 2 n = 26 Sex: 46% ~ Age: [15–18] Modified GodinShephard (proxy)[23]
SAPAQ[51]
n = 50 Sex: 62% ~ Age: 16.8 (0.4)
Accelerometer (MTI)
SHAPES[52]
n = 53 Sex: 53% ~ Age: [6–12] Country: Canada Study 1 n = 49 Sex: 43% ~ Age: 13.5 (0.3) Study 2 n = 210 Sex: % ~ Age: at test 11 y, at retest 13 y (same children different measures)
Accelerometer (MTI)
PAQ-A[53]
Study 1: activity monitor (Actigraph): total PA, percent d MVPA; Study 2: PAQ-C
Study 1: Spearman r: total PA/percent d MVPA, original score: 0.47/0.49, rescaled score: 0.56/0.63; Study 2: Spearman r; PAQ-C: original/rescaled score: 0.30/0.39
Continued next page
ª 2010 Adis Data Information BV. All rights reserved.
Sports Med 2010; 40 (7)
Physical Activity Questionnaires for Youth
549
Table II. Contd Questionnairea
Study populationb
Comparison measure
Results
Refined 60-min MVPA[29]
n = 138 Sex: 65% ~ Age: 12.1 (0.9)
Accelerometer (Computer Science and Applications, Inc.)
Accelerometer r = 0.40
PAQ-A[54]
n = 85 Sex: 52% ~ Age: 16.25 (1.51) [13–20]
Accelerometer (Caltrac), questionnaire (7-d recall interview, activity rating, leisure-time exercise)
Hypotheses: PAQ-A would be moderately correlated with all other PA measures Accelerometer: r = 0.33 7-d recall interview (PAR and PAR h): r = 0.59/0.51; activity rating: r = 0.73; leisuretime exercise (Godin 1/2): 0.57/-0.62
PA screening measure[29]
n = 57 Sex: 65% ~ Age: 13.9 (1.7)
Accelerometer (Computer Science and Applications, Inc.)
VPA (typical wk/past 7-d/composite): r = 0.31/0.36/0.37; 30-min MPA: r = 0.20/0.26/0.26 (NS); 60-min MPA: r = 0.46/0.37/0.47
Fels PAQ[34]
n = 229 Sex: 57% ~ Age: [7–19] Country: USA
Accelerometer (Actiwatch)
Elementary/middle/high school Total: r = 0.32/0.12/0.11; sport: r = 0.32/0.07/0.34; leisure: r = 0.28/0.28/0.20; work: r = 0.08/-0.13/-0.08
SAPAC[32]
n = 107 Sex: 70% ~ Age: 12.5 (1.1)
Accelerometer (Actigraph)
MVPA/VPA: r = 0.24/0.28
3DPAR[32]
n = 130 Sex: 66% ~ Age: 12.5 (1.1)
Accelerometer (Actigraph); questionnaire (SAPAC)
Different cut-off points: MVPA/VPA: r = 0.28 (0.31)/0.16 (0.19)
Modified GodinShephard[55]
n = 114 Sex: 60 ~ Age: grades 6–8 Country: USA
Accelerometer (MTI)
Strenuous/moderate: r = 0.23/0.13
PAQA[35]
n = 188 Sex: NR Age: 16 (0.4)
Accelerometer (MTI)
Spearman r: total: 0.27; LPA: 0.20; MPA: 0.18; VPA: 0.24
IPAQ[35]
n = 188 Sex: NR Age: 16 (0.4)
Accelerometer (MTI)
Spearman r: total: 0.21; LPA: 0.14; MPA: -0.01; VPA: 0.29
FPACQ[56]
n = 33 Sex: 70% ~ Age: 14.4 (1.4) [12–18]
Accelerometer (Computer Science and Applications, Inc.)
Ranging from r = –0.22 (sports participation at school) to r = 0.78 (frequency hard activities)
YRBS[55]
n = 114 Sex: 60% ~ Age: grades 6–8 Country: USA
Accelerometer (MTI)
Accelerometer: r = 0.10
HAQ[57]
n = 683 Sex: 100% ~ Age: [9–19]
Accelerometer (Caltrac)
Caltrac (past 3 d): r = 0.09
PAQ[58]
n = 260 Sex: 100% ~ Age: 13.4 (1.1) [11–15]
Accelerometer (Caltrac); 3-d diary
Caltrac: r = 0.12/0.26, Kappa: 1/0, % agreement: 33/48%; 3 d diary: r = 0.57/0.16, Kappa: 0.15/0, % agreement: 43/26% Continued next page
ª 2010 Adis Data Information BV. All rights reserved.
Sports Med 2010; 40 (7)
Chinapaw et al.
550
Table II. Contd Questionnairea
Study populationb
Comparison measure
Results
n = 24 Sex: 70% ~ Age: 17.1 (0.6) [16–17]
Accelerometer (MTI) and DLW
Accelerometer: Spearman r = 0.23 DLW: Spearman r = 0.40, wide ratio LOA
YPAQ[16]
n = 25 Sex: 30% ~ Age: 13.1 (0.3) [12–13] n = 24 Sex: 70% ~ Age: 17.1 (0.6) [16–17]
Accelerometer (MTI) and DLW
12–13 y: accelerometer/DLW: Spearman r = 0.42/0.09; 16–17 y: accelerometer/DLW: Spearman r = 0.11/0.46, wide ratio LOA
CHASE[16]
n = 25 Sex: 30% ~ Age: 13.1 (0.3) [12–13]
Accelerometer (MTI) and DLW
MVPA-accelerometer/DLW: Spearman r = 0.12/0.45, wide ratio LOA
IPAQ-A (long version)[59]
n = 248 Sex: 49% ~ Age: [12–17]
Accelerometer (Actigraph, MTI)
MPA/total Actigraph Spearman rank r = 0.15/0.20; MPA: LOA: 12–14 y: -283–149 min/d; 15–17: -186–170 min/d; VPA: LOA: 12–14 y: -120–64 min/d; 15–17: -101–59 min/d
OPAQ[28]
n = 51 Sex: 47% ~ Age: 12.6 (0.5)
Accelerometer (Caltrac)
Spearman rank-order correlation: MPA 0.01; VPA 0.33; MVPA 0.32
7D-PAR[60]
n = 27 Sex: 48% ~ Age: 13.0 (1.2) [12–15]
Continuous monitoring of the HR (Polar Precision Performance 3.0 HR monitor) HR >140 bpm = MPA, HR >160 bpm = VPA
MPA: Kappa = 0.02; Pearson r = 0.05 VPA: Kappa = 0.20; Pearson r = 0.37 (n = 25)
7-d recall questionnaire[61]
n = 93 Sex: 51% ~ Age: 12.2 (0.3)
HR monitor; 7-d interview; Godin-Shephard questionnaire
MONICA survey[62]
n = 125–223 Sex: ? % ~ Age: [9–19]; Pedometer sample n = 223; sport act n = 125; BMI n = 221; MONICA n = 220
Pedometer (Pedoboy); . VO2max
HR >159 bpm MPA/VPA: 0.30/0.34; interview: little concordance; modified Godin-Shephard: r = 0.38 . Pedoboy: r = 0.22, n = 223; VO2max: r = 0.17, n = 220; weekly sports act in club: r = 0.55, n = 125
QAPACE[27]
n = 36 Sex: 50% ~ Age: 12(2.6) [8–16]
Aerobic fitness: indirect . VO2peak, by Le`ger test, direct . VO2peak, by ergo-spirometry
APARQ[33]
n = 1072 Sex: 48% ~ Age: 13.1 n = 954 Sex: 45% ~ Age: 15.1
20 metre shuttle run test
Grade 8: #/~: r = 0.15/0.21; grade 10: #/~: r = 0.14/0.39
PA and Exercise questionnaire[63]
n = 745 Sex: 54% ~ Age: 14.3 (1.2)
2.4 km walk-run test
Walk-run test: r = 0.21
Epidemiological questionnaire[31]
n = 100 Sex: 53% ~ Age: [15–18]
Fitness tests (BMI, 1 mile run, sit and reach, pull-ups, grip strength); 4 · past wk questionnaire; roster
Fitness tests ranging from -0.47 to 0.25; H/wk: questionnaire: r = 0.63/0.76 (‘92, ‘93); MET-h/wk, questionnaire: r = 0.68/0.83 (‘92, ‘93); VPA h/wk questionnaire: r = 0.76/0.84 (‘92, ‘93)
SWAPAQ
[16]
. DEE vs indirect/direct VO2peak: ICC = 0.56/0.69
Continued next page
ª 2010 Adis Data Information BV. All rights reserved.
Sports Med 2010; 40 (7)
Physical Activity Questionnaires for Youth
551
Table II. Contd Questionnairea WHO HBSC
[36]
Modified GodinShephard (leisure-time exercise questionnaire)[61]
Study populationb
Comparison measure
Results
Sample 1 n = 1072 Sex: 48% ~ Age: 13.1 Sample 2 n = 954 Sex: 45% ~ Age: 15.1
20 metre shuttle run test
Active group had significantly higher aerobic fitness than inactive group
n = 93 Sex: 51% ~ Age: 12.2 (0.3)
HR monitor; other questionnaires
MPA modified Godin-Shephard 7-d recall: r = 0.38; other correlations: low
a
See table IV for definitions of questionnaire names/acronyms.
b
Age is presented as mean years (SD) [range].
AEE = activity-related energy expenditure; b = regression coefficient; BMC = bone mineral content; BMI = body mass index; bpm = beats per minute; CNTSMIN = counts per minute; DEE = daily energy expenditure; DLW = doubly labelled water; EE = energy expenditure; HR = heart rate; ICC = intraclass correlation coefficient; LOA = limits of agreement; LPA = light-intensity PA; MET = metabolic equivalent; MPA = moderateintensity PA; MVPA = moderate- to vigorous-intensity PA; NR = not reported; NS = not significant;.PAL = physical activity level; PAR = 7-day PA recall . kilocalorie energy expenditure index; r = correlation coefficient; TEE = total energy expenditure; VO2max = maximum oxygen uptake; VO2peak = peak oxygen uptake; VPA = vigorous-intensity PA; ? indicates unknown or unclear; ~ indicates female; # indicates male.
for all questionnaire acronyms mentioned throughout this article) questionnaire (ICC = 0.49–0.87)[14] was the most reliable; in children, the most reliable questionnaires were the
GAQ,[18] which recalled 28 activities in a usual week (ICC = 0.82), and the PAQ-C (ICC = 0.75 and 0.82 for boys and girls, respectively);[17] and, in adolescents, the most reliable instruments were
Total 21 891
PubMed 9733
EMBASE 7601
SportDiscus® 4284
Selection based on titles and abstracts 284
Selection based on titles and abstracts not in PubMed 55
Selection based on titles and abstracts not in PubMed or EMBASE 54
Total 3931
Children 83
Adults 260
Elderly 59
Excluded 292 Included 54 papers on 61 questionnaires Fig. 1. Flowchart of study inclusion. 1 One paper appears in both the review for adults and for the elderly; 2 The main reason for exclusion was an interview instead of self-report.
ª 2010 Adis Data Information BV. All rights reserved.
Sports Med 2010; 40 (7)
552
ª 2010 Adis Data Information BV. All rights reserved.
Table III. Description of physical activity (PA) questionnaires for youth Questionnairea
Target population
Construct
Format
construct
setting
recall period
dimensions
no. of questions
scores
Preschoolers (mean age <6 y) Questionnaire to mothers (proxy),[38] V
Kindergarten children
Habitual activity level
All
In general
None
1
Activity score (1–5)
Questionnaire to teachers (proxy),[38] V
Kindergarten children
Habitual activity level
All
In general
None
1
Activity score (1–5)
Parental report – outdoor time checklist (proxy),[37] V
Preschoolers
Time playing outdoors
Recr, school
24 h (wake-up to bedtime)
D
2
Min activities Range: 0–24 min
Parental report – outdoor time recall questionnaire (proxy),[37] V
Preschoolers
Time playing outdoors
Recr, school
Typical wk/ weekend d in the last mo
D
2
Average daily time (in min) spent playing outdoors
NPAQ (proxy parents or teachers),[15] R, V
Young children (4–7 y)
Usual activity patterns
All
Previous 6 mo
None
7 (+1 on TV/ watching video
Activity score (0–5), watching TV
CPAQ (proxy),[16] R, V
Children (4–5 y)
Mode, frequency and duration of PA and sedentary activities
All
Past 7 d
F, D
?
MVPA, PA EE
Children (mean age >6 and <12 y) Preadolescent children (4–8 y)
PA
All
Previous d
D
12 parents, 15 and 16 for teachers
Amount of daily MVPA in min
SAPAC,[45] V
Fifth graders (10–11 y)
MVPA
All
Previous d
F, D
Checklist format (21 activities)
No. of activities, min of PA, volume of PA, volume of PA including intensity ratings, min sedentary pursuits, min MVPA, PA MET scores, weighted MET score
Modified SAPAC,[22] R
Primary school children
PA
Sport, recr, school
Previous d
D
Checklist format (24 activities)
Light, MPA, VPA, total PA + TV/video, computer use, total sedentary activity
ACTIVITY,[41] V
Young children (<10 y)
PA
All
Previous school d
I
10
Activity score: potential range 0–1396
Continued next page
Chinapaw et al.
Sports Med 2010; 40 (7)
PA questionnaire for parents and teachers (proxy),[40] V
Questionnairea GAQ,[20] R, V
Target population African American girls (8–10 y)
Construct
Format
construct
setting
recall period
dimensions
no. of questions
scores
PA
Sport, recr, trans, school, home
Yesterday and usual d
F, D
28 · 4 + sedentary activities
Activity score
4 · 8 + 7 questions about sedentary activities
Total PA score, (weighted) MET values, GAQ summary score
GAQ,[18] R, V
All
Non-obese preadolescent girls
Patterns of PA
All
Typical school d and typical weekend d (24 h)
D
Three timetables (school, weekend, TV)
Hrs/d sitting, standing, walking, exercising, TV/VCR/video games
Mother and Father questionnaire (proxy),[21] R
Non-obese preadolescent girls
Daily activity level
All
Typical school d and typical weekend d (24 h)
D
?
Hrs/d sleeping, sitting, light PA/MPA/VPA, TV/VCR/video games
PAQ-C modified,[44,53] V
Children (8–14 y)
MVPA
Sports and leisure
Previous 7 d
F
9 (1–5 scale), 28 activities
Original PAQ-C summary score (averaged of the sum of the nine items); rescaled PAQ-C summary
PAQ-C,[43,49] V
Children (9–15 y)
MVPA
Sport, recr, school
Previous 7 d
F
9
Activity score
PAQ-C,[17] R
Older children (9–15 y/grades ‡4)
Habitual MVPA
All
Previous 7 d
F
10
Activity score + checklist, PE class, recess, lunch, after school, evening, weekend, described best, wk summary
MARCA,[42] V
Children and adolescents
Activity behaviour, i.e. use of time and daily EE
All
1 d recall
D, I
Segmented-d format (webbased)
PAL, time spent above a given MET level, time spent lying down, sitting, standing or in locomotion, no. of min and estimated energy cost for any activity or set of activities, time distribution of any activity or set of activities
Self-report PA Questionnaire for Schoolchildren,[47] V
Primary school children (9–11 y)
PA and outdoor playing
Sport, recr (playing outdoors)
General wk
F, I
4 + section on TV watching and video games
Participation in sports club, PA intensity, frequency of PA, preferences for PA
CLASS,[14,19] R, V
Primary school children
Usual PA
All
Usual weekday and weekend d, typical wk
F, D
30 activities + 6
Frequency MPA, frequency VPA, duration MPA, duration VPA intensity (min/wk)
Modified GodinShephard,[48] V
Schoolchildren (10–13 y)
PA
All
Previous d
D
Checklist format
TEE, kcal without the resting metabolic rate
Modified GodinShephard (proxy),[23] R, V
Schoolchildren
Habitual PA
Sport, recr, exercise during free time
Past y, usual wk
F, I
1
Weekly average of the no. of times they engaged in strenuous, moderate or mild exercise for >15 min during their free time over the last y Continued next page
553
Sports Med 2010; 40 (7)
Daughter questionnaire (proxy),[21] R, V
Physical Activity Questionnaires for Youth
ª 2010 Adis Data Information BV. All rights reserved.
Table III. Contd
554
ª 2010 Adis Data Information BV. All rights reserved.
Table III. Contd Questionnairea
Target population
Construct
Format
construct
setting
recall period
dimensions
no. of questions
scores
Modified GodinShephard (leisuretime exercise questionnaire),[61] V
Schoolchildren
Leisure-time PA
All
Previous wk
F
?
Frequency hard, moderate, easy
Activity-rating instrument,[24] R, V
Children (7–15 y)
Usual PA in last 3 mo
?
3 mo
None
1
Activity level (1–7), activity level compared with peers
Specific activity score,[23] R, V
Girls (7–15 y)
Habitual PA
Sport (11 types)
Past y
F, D, I
?
Average weekly TEE over past y
PAQ,[26] R, V
Elementary school children and their parents
Usual activity patterns
?
?
F, D
Checklist with 22 activities
Activity score
Stairs score,[23] R, V
Girls (7–15 y)
No. of flights of stairs climbed daily
Trans.
Past y
F
1
5-point scale/no.
CHASE,[16] R, V
Primary school children living in the UK
Mode and frequency of PA and sedentary activities
All
?
F
25
Lifestyle score
Health Survey for England PA Questionnaire (parent-report),[46] V
Parents of British children
Habitual level of All outside MVPA school
Previous 7 d
?
?
MVPA min per d
SNAP,[39] V
Children and adolescents
Physical and sedentary activities
Sedentary, structured, household chores and play, trans
Previous 24 h
D, I
Web-based, segmented-d format
MVPA
Older children and adolescents (mean age >12 y) Modified for children (12 y)
MPA/VPA
All
Normal 7-d period
D
?
Vigorous and moderate no. of h
SHAPES,[52] V
Schoolchildren
MVPA
All
Previous 7 d
F, D
10
Min/d VPA/MPA, MVPA, PAL, weekly screen time, EE on MVPA
Pathway PA recall questionnaire (PAQ),[7] R, V
Children and adolescents
PA
All (standard list of common activities)
Previous 24 h
F
Checklist format
No. of activities reported, frequencies of different types of activities, intensity
Continued next page
Chinapaw et al.
Sports Med 2010; 40 (7)
Self-administered 7-day recall questionnaire,[61] V
Questionnairea
construct
Construct setting
recall period
Format dimensions
no. of questions
scores
CPAR,[25] R, V
Youth (middle school)
Sedentary and PA
All
Previous d
D
Checklist
Min activities/d, activity-related EE
PDPAR,[50] V
Youth (high school)
PA
Previous d Sports, recr, trans, home, (after school h, i.e. 1500–2330)
I
35 activities to be filled in 30-min blocks
TEE, EE during specific periods of time, EE in specific activities, no. of 30-min block >4 MET
3DPAR,[32] R, V
Adolescents
Daily PA patterns
All
Previous 3 d
F
50 activities with main activity to be filled in 30-min blocks
No. of blocks MVPA (‡3 METs) or VPA (‡6 METs) per d
SAPAC,[32] R, V
Adolescents
Daily PA patterns
All
Previous 3 d
D
50 activities
No. of min MVPA (‡3 METs) or VPA (‡6 METs) per d
PAQ-A,[54] V
Adolescents
General levels Sports, recr, of PA during the school (PE school y and lunch recess)
Last 7 d
F
9
Range: 1–5
SWAPAQ,[16,51] R, V
Adolescents
PA
Leisure time, trans, school
Last 7 d
F, D, I
25
Total min of self-reported PA and total MET min, MVPA
YRBS,[55] V
Youth
Participation in strenuous PA
All
Previous wk
F
1
No. of d
APARQ,[33] R, V
Adolescents
PA
Sport, recr, trans
Normal wk
F, D
4
EE and activity score
FPACQ,[56] V
Adolescents aged 12–18 y
PA
All, except PE
Usual wk
F, D
?
H/d and MET-hrs trans and sports; h/wk using TV and computer; sport-intensity index (MET); F/wk VPA; d/wk MPA
Modified Godin Shephard Questionnaire,[55] V
Middle schoolaged children
Participation in leisure-time exercise
All
Average wk
F
3
D/wk strenuous, moderate and mild PA during school y and summer
WHO HBSC,[30,36] R, V
Schoolchildren, children and adolescents
PA, time spent being vigorously active outside school h
Sports, recr (outside school h)
Usually (in a wk)
F, D
2
Frequency score, duration score, combination score
MONICA survey,[62] V
Children and adolescents (9–19 y)
Habitual PA
All
Previous wk/past 12 mo
F, D
?
No. of sport activities/sessions performed in last wk/no. of min of PA inducing sweating per d Continued next page
555
Sports Med 2010; 40 (7)
Target population
Physical Activity Questionnaires for Youth
ª 2010 Adis Data Information BV. All rights reserved.
Table III. Contd
556
ª 2010 Adis Data Information BV. All rights reserved.
Table III. Contd Questionnairea
Target population
Construct
Format
construct
setting
recall period
no. of questions
scores
Singapore primary and secondary schoolchildren
PA patterns
Sport, recr, all
Current PA level, previous 14 d PA level, annual sports participation/events
F, D
5 (1 + 4 multiplechoice questions)
Activity scores: d of hard exercise, d of easy exercise, TV, video computer h, no. of sports played (annual), activity grouping
Fels PAQ,[34] R, V
Children (7–19 y)
Habitual PA
Sport, recr, trans, home
Past y
F
8
Activity score, and sport, leisure and work index
HAQ,[57] V
Girls (10–18/19 y) Habitual PA
Sports, recr, school sports
Past y
F
?
Activity score, MET times/wk
Epidemiological questionnaire,[31] R, V
Adolescents
Leisure-time PA
Sports, recr (leisure time)
Past y
F, D
Table format
MET h/wk, VPA h/wk
MVPA screening measure,[29] R, V
Adolescents in primary-care setting
Meeting guideline for PA
All
Previous 7 d and typical wk
F, D, I
6 (2 VPA, 4 MPA)
Meeting guidelines for healthy activity/fitness
Refined 60-min MVPA,[29] R, V
Adolescents
Meeting guideline for PA
All (not described)
Previous 7 d, usual/typical wk
F
2
Meeting guidelines for healthy activity (d/wk)
Weight-bearing PAQ,[58] V
Girls (11–15 y)
Level of weightbearing activities
Sport, recr, school, home
Average weekly time in previous mo
D
58
(Corrected) energy score (min * METS and weight-bearing score (min * WEIGHT factor), and high active/medium active/low active
QAPACE,[27] R, V
Youngsters in Bogota
Daily PA
All
Past y
F, D
18
Daily energy expenditure
IPAQ (short version),[30,35] R, V
Adults
PA
VPA, MPA, walking
Habitual or past wk
F, D
?
VPA: d/wk and min/d MPA: d/wk and min/d Walking: d/wk and min/d MET min/d
PAQA,[35] R
Adolescents
PA
?
Habitual wk
F, D
?
MET min/d, LPA (sitting/sleeping), MPA/d, VPA/d
PAQ-A,[53] V
High school students (14–18 y)
MVPA
Sports and leisure
Past 7 d
F
9 (1–5 scale), 28 activities
PAQ-A summary score (original or rescaled)
YPAQ,[16] R, V
Schoolchildren (12–17 y)
Mode, frequency and duration of PA and sedentary activities
All
Past 7 d
F, D
47 activities
MVPA, PA EE
Continued next page
Chinapaw et al.
Sports Med 2010; 40 (7)
dimensions
PA and Exercise Questionnaire,[63] V
See table IV for definitions of questionnaire names/acronyms. a
D, F Previous wk All MVPA Secondary school students OPAQ,[28] R, V
ª 2010 Adis Data Information BV. All rights reserved.
D = duration; EE = energy expenditure; F = frequency; home = home-based activities (household and gardening); I = intensity; kcal = kilocalories; LPA = light-intensity PA; MET = metabolic equivalent; MPA = moderate-intensity PA; MVPA = moderate- to vigorous-intensity PA; PAL = physical activity level; PE = physical education; R = reliability data available; recr = recreational; trans = transport; TEE = total energy expenditure; TV = television; V = validity data available; VCR = video cassette recorder; VPA = vigorous-intensity PA; ? indicates not specified or unclear; * indicates multiplication.
MPA, VPA, MVPA
557
Timetable format
? D Previous 7 d All Children and adolescents 7D-PAR,[60] V
PA
?
D/wk and F/d walking, MPA and VPA; min/d walking, VPA, MPA; MET min/d as a measure of total health-enhancing activity; daily PA (MET min/d) F, D, context Last 7 d School, trans, home, recr Adolescents IPAQ-A (long version)[59] V
All dimensions of healthenhancing PA
?
scores no. of questions
Web-based, segmented-d format
dimensions
D, I 24-h recall Sedentary, structured, home play and trans Children and adolescents SNAP,[39] R
Physical and sedentary activities
Format
recall period setting Construct
construct
Target population Questionnairea
Table III. Contd
MVPA
Physical Activity Questionnaires for Youth
the QAPACE (ICC = 0.96)[27] and the OPAQ (ICC = 0.76–0.91).[28] 2.3 Construct Validity
Table II summarizes the studies on construct validity. Construct validity was assessed for seven questionnaires among proxies of preschoolers, 25 questionnaires among children and 31 among adolescents. Construct validity was mostly evaluated by correlations between the questionnaire and accelerometers (n = 46). In preschool children, the highest correlation with accelerometers was found for the CPAQ (r = 0.42)[16] and the NPAQ (r = 0.33 and 0.36 for total activity and vigorous activity, respectively).[15] In primary school children, the highest correlations with an accelerometer were found for the Physical Activity Questionnaire for Parents and Teachers[40] (r = 0.53) and the ACTIVITY[41] (r = 0.40). Another questionnaire, the SNAP,[39] found a mean difference of -9 minutes between the SNAP and an accelerometer. In adolescents, the highest correlations with an accelerometer were found for the PDPAR (r = 0.77)[50] and the SAPAC (r = 0.51).[51] 2.4 Responsiveness
Responsiveness of PA questionnaires was studied for only one questionnaire: the HAQ.[57] For this questionnaire, there was a parallel trend in the pattern of the decline in activity among the HAQ, an activity diary and a Caltrac accelerometer over a period of 3 years. From years 3 to 5 (ages 11–12 to 13–14 years), the diary score decreased by 22%, whereas both the HAQ and Caltrac declined by 21%. 3. Discussion A wide variety of PA questionnaires are available for youth of varying age recalling different dimensions of PA. Few have been examined for use in preschool children. None of the questionnaires included in our review showed acceptable reliability and acceptable validity. Reported reliability and validity varied, with testretest correlations ranging from 0.02 to 0.96, and correlations between activity questionnaires and Sports Med 2010; 40 (7)
Chinapaw et al.
558
Table IV. Full list of questionnaire acronyms and their corresponding definitions Questionnaire acronym
Definition
3DPAR
3-Day Physical Activity Recall
7D-PAR
7-Day Physical Activity Recall Questionnaire
ACTIVITY
Assessment of Young Children’s Activity using Video Technology
APARQ
Adolescent PA Recall Questionnaire
CHASE
Child Heart and Health Study in England Questionnaire
CLASS
Children’s Leisure Activities Study Survey
CPAQ
Children’s Physical Activity questionnaire
CPAR
Computerized PA Recall
Fels PAQ
Fels PA Questionnaire for Children
FPACQ
Flemish PA computer questionnaire
GAQ
Girls health Enrichment Multisite Study Activity Questionnaire
HAQ
Habitual Activity Questionnaire
IPAQ
International PA Questionnaire
IPAQ-A
International PA Questionnaire-modified for adolescents
MARCA
Multimedia activity recall for children and adolescents
MONICA
Monitoring instrument for cardiovascular disease survey
NPAQ
Netherlands Physical Activity Questionnaire for Young Children
OPAQ
Oxford Physical Activity Questionnaire
PAQ
Physical Activity Questionnaire
PAQA
Physical Activity Questionnaire for Adolescents, locally modified
PAQ-A
Physical Activity Questionnaire for Adolescents, modified
PAQ-C
Physical Activity Questionnaire for Older Children
PDPAR
Previous Day Physical Activity Recall
QAPACE
Quantification de l’activite´ physique en altitude chez les enfants
SAPAC
Self-Administered Physical Activity Checklist
SAPAQ
Self-administered Physical Activity Questionnaire
SHAPES
School Health Action, Planning and Evaluation System
SNAP
Synchronised Nutrition and Activity Program
SWAPAQ
Swedish Adolescent Physical Activity Questionnaire
WHO HBSC
World Health Organization Health Behaviour in Schoolchildren questionnaire
YPAQ
Youth PA Questionnaire
YRBS
Youth Risk Behavior Survey
accelerometers ranging from ‘very poor’ to 0.77. Responsiveness was only studied in one questionnaire: the HAQ.[57] These results suggest that the response patterns of the HAQ are comparable to that of the Caltrac accelerometer or a diary. In general, PA questionnaires for adolescents correlated better with accelerometer scores than PA questionnaires for children. This finding may be due to difficulties in recalling PA, in comprehensibility of the questions or the difference in the activity patterns of children and adolescents. ª 2010 Adis Data Information BV. All rights reserved.
Few instruments have been evaluated in multiple studies (e.g. the PAQ-C,[17,43] CLASS[14,19] and the GAQ).[18,20] The reliability of the PAQ-C was good in one study,[17] and its validity was moderate in another.[43] Both studies[14,19] that investigated the reliability of the CLASS found it to be adequate, while validity relative to accelerometry was poor. For the GAQ, reliability was adequate in one of the two studies,[18] while validity relative to accelerometry was poor in both.[18,20] The Godin-Shephard questionnaire, Sports Med 2010; 40 (7)
Physical Activity Questionnaires for Youth
which was originally developed for adults, was modified for children in three studies.[23,48,55] However, all three studies evaluated a different version. Since there were no questionnaires with both acceptable reliability and validity, we propose that the most promising questionnaires are improved and evaluated in multiple high-quality studies. Promising questionnaires for children are the PAQ-C,[17] GAQ,[18,20] CLASS,[14,19] the Physical Activity Questionnaire for Parents and Teachers,[40] the ACTIVITY[41] and the CPAR.[25] For adolescents, the QAPACE,[27] OPAQ,[28] SNAP,[39] PDPAR[50] and SAPAC[51] seem promising. As with any systematic review, this review is limited by the quality of the included studies. Because of the large variation in study design, incomplete reporting of the studies and the limited methodological quality of the majority of the primary studies, it was not possible to apply our intended criteria of adequacy for the methodological quality and study results. Frequent methodological shortcomings of the studies were small sample sizes (25 studies with sample sizes of <50), inadequate time intervals between test and retest (frequently too long), not taking systematic differences into account in assessing reliability (i.e. using a correlation instead of ICCs in seven studies), only evaluating relative validity and not absolute validity (i.e. using correlations instead of measures for agreement in all but two studies[14,39]). In concordance with Sallis and Saelens[65] and Oliver et al.,[66] we also found that almost all studies only examined relative validity expressed as correlations. Correlations do not pick up systematic differences between two measures. Thus, two measures may have a strong and statistically significant correlation while the agreement between both measures may be low. In cases where measures have the same unit of measurement, it is preferable to calculate the absolute agreement by using, for instance, Bland Altman plots. This method has seldom been used in validation studies of PA questionnaires. Only four studies included in our review calculated Bland Altman plots.[16,27,46,59] We only included studies that intentionally evaluated measurement properties of PA quesª 2010 Adis Data Information BV. All rights reserved.
559
tionnaires. It is possible that more evidence is available in the literature that could be used to determine the validity or responsiveness of the questionnaires (e.g. in studies that examine the validity of other PA measures). Furthermore, we included only English-language publications and, therefore, we may have missed some publications on additional PA questionnaires in other languages. Questionnaires that received a negative rating are not necessarily bad questionnaires. It may also be that reliability has been inadequately studied or that the report of the study was incomplete. Three measurement properties were not rated in our review: content validity, criterion validity and measurement error. Content validity refers to the degree to which the content of an instrument is an adequate reflection of the construct to be measured. Content validity was not rated because no studies examined or reported on content validity. Criterion validity refers to the degree to which the scores of an instrument are an adequate reflection of a ‘gold standard’. There is no gold standard for the assessment of PA; thus, criterion validity could not be rated. Although the doubly labelled water technique or the respiratory chamber is considered a gold standard for the assessment of energy expenditure, these methods are not considered a gold standard for the assessment of PA. Measurement error is the systematic and random error of a subject’s score that is not attributed to true changes in the construct to be measured. None of the included studies evaluated measurement error. 3.1 Reliability
A reliability study should have an adequate time interval between the two administrations. For questionnaires recalling the previous day or previous week, retests need to cover the same timeframe as the initial test.[65] Otherwise, lower ICCs may be the result of actual differences in the activity pattern between the recalled days. Recalls of ‘usual’ PA should be less sensitive to the time interval between tests. We acknowledge that the criteria relating to the appropriate time interval between test and retest are arbitrarily chosen. Sports Med 2010; 40 (7)
Chinapaw et al.
560
3.2 Validity
A reasonable gold standard for measuring PA does not exist; thus, criterion validity cannot be assessed. Instead, the construct validity of instruments measuring PA can be evaluated. In construct validity or responsiveness studies, it is important to state an a priori hypotheses. When these hypotheses are not specified, the risk of bias is large because often only the positive results will be presented. This is an ongoing process. Furthermore, the construct of PA is a formative model, i.e. the items in the questionnaire measuring PA need not be highly correlated. Therefore, structural validity (the degree to which the scores of an instrument are an adequate reflection of the dimensionality of the construct to be measured), usually evaluating with a factor analyses, is also not applicable to PA questionnaires. Therefore, to evaluate validity of a PA questionnaire, one can only rely on content validity and construct validity. 3.3 Comparison Measures
The selection of an appropriate comparison measure against which to validate PA questionnaires is difficult. As such, in the included studies, many different criteria were used to validate PA questionnaires for youth: direct observation; accelerometers; heart rate monitors; pedometers; fitness tests; and other questionnaires. Each of these comparison measures has advantages and disadvantages, and it depends on the dimension of interest as to the most appropriate comparison measure. According to Sirard and Pate,[67] direct observation is the most practical and appropriate measure for PA. However, observation is a highly demanding method for the researcher, and the actual presence of an observer may also influence the behaviour of a subject. Other measures (e.g. doubly labelled water and indirect calorimetry) are not practical for use in large populations under freeliving conditions. Moreover, these measures are only suitable for assessing energy expenditure. The accelerometer is a commonly used tool against which to compare PA surveys. This is because of its ability to objectively detect amount, ª 2010 Adis Data Information BV. All rights reserved.
frequency and duration of movement,[68,69] and its predictive relationship with heart rate and energy expenditure in the laboratory.[70,71] Accelerometers have also shown their validity during free-living activities in youth.[72] However, accelerometers are better at detecting ambulatory activity (e.g. walking and jogging) than nonambulatory activities (e.g. cycling), the lifting of heavy objects and surface incline or decline during locomotion such as stair walking.[70] Other limitations of accelerometry include errors associated with regression equations used to derive cut-off points for moderate- and vigorousintensity activity.[73,74] There is no consensus about appropriate cut-off points for classifying accelerometer output into different intensity levels for youth; intensity cut-off points vary widely. Corder et al.[75] recommend moving away from the use of arbitrary count-based cut-off points towards a more universally comparable approach of using acceleration (metres/second) to summarize accelerometry data. In particular, activity patterns of young children may include more horizontal motion, such as rolling, crawling and climbing, highlighting the need for more sophisticated accelerometers capturing omnidirectional movement rather than only vertical accelerations. The epoch time used may influence the results. Most studies assessing PA in youths have set the epoch at 60 seconds.[34,76,77] However, typical for youth are short, intermittent bursts of PA with frequent rest periods of a longer duration.[9,78] The median duration of moderate- and high-intensity exercise appears to be only 6 and 3 seconds, respectively. As a result, moderate- and high-intensity exercise bouts may become inconsequential when summed over a 60-second epoch, which suggests the need to use a smaller epoch time. The new accelerometer/heart rate monitors show promising results and might be better at estimating PA levels than either measure alone. Other measures seem less suitable for validation studies. Heart rate monitors have been validated against doubly labelled water and seem valid for classifying groups of individuals rather than estimating individual PA levels.[67] Heart rate is not only sensitive to emotional stress and body position, but also to body mass.[79] Sports Med 2010; 40 (7)
Physical Activity Questionnaires for Youth
In regards to pedometers, more high-quality research is needed to show their validity and reliability for assessing PA in children and adolescents.[80,81] Pedometers detect only total counts or steps, and cannot assess activity patterns or intensity. Validation against other questionnaires or diaries is problematic since both are dependent on self-report, and we cannot say which is superior. Physical fitness tests should not be used to validate PA questionnaires since these are two different constructs. Aerobic fitness is weakly associated with PA, especially in youth.[82,83] Moreover, changes in PA only influence physical fitness in the longer term. 3.4 Recommendations Regarding Future Studies
Terwee et al.,[10] provide general recommendations for evaluating the measurement properties of PA questionnaires (QAPAQ). For assessing validity of youth PA questionnaires relative to accelerometry we propose the following additional recommendations: a monitoring period of at least 6 days; using a smaller epoch time (e.g. 15 seconds); standard methods for analysis of accelerometer data; and, preferably, the use of accelerometers that capture omnidirectional movement rather than just vertical accelerations. 4. Conclusions Considerably more high-quality research is needed to examine the validity and reliability of promising PA questionnaires for youth. Since there is no gold standard for assessing PA, validation against different measures such as direct observation combined with accelerometry should be considered. Furthermore, in validity or responsiveness studies, it is important to state a priori hypotheses. When these hypotheses are not specified, the risk of bias is large because often only positive results are presented. Standardized, quality criteria (such as QAPAQ) for studies examining measurement properties of PA questionnaires are important for the improvement of the methodological quality of future validity and reliability studies. ª 2010 Adis Data Information BV. All rights reserved.
561
Acknowledgements This review was financially supported by the Department of Public and Occupational Health, the Department of Epidemiology and Biostatistics, the EMGO Institute for Health and Care Research, VU University Medical Center and Body@Work, Research Center Physical Activity, Work and Health, TNO-VU University Medical Center, Amsterdam, the Netherlands. The authors have no conflicts of interest directly relevant to the contents of this article.
References 1. Ekblom B, Astrand PO. Role of physical activity on health in children and adolescents. Acta Paediatr 2000; 89 (7): 762-4 2. Hallal PC, Victora CG, Azevedo MR, et al. Adolescent physical activity and health: a systematic review. Sports Med 2006; 36 (12): 1019-30 3. Ward DS, Evenson KR, Vaughn A, et al. Accelerometer use in physical activity: best practices and research recommendations. Med Sci Sports Exerc 2005; 37 (11 Suppl.): S582-8 4. Trost S. Measurement of physical activity in children and adolescents. Am J Lifestyle Med 2007; 1 (4): 299-314 5. Salmon J, Booth ML, Phongsavan P, et al. Promoting physical activity participation among children and adolescents. Epidemiol Rev 2007; 29: 144-59 6. Montoye HJ, Kemper HCG, Saris WHM, et al. Measuring physical activity and energy expenditure. Champaign (IL): Human Kinetics, 1996 7. Going SB, Levin S, Harrell J, et al. Physical activity assessment in American Indian schoolchildren in the Pathways study. Am J Clin Nutr 1999; 69 (4 Suppl.): 788-95S 8. Sallis JF. Self-report measures of children’s physical activity. J Sch Health 1991; 61 (5): 215-9 9. Baquet G, Stratton GVPEBS. Improving physical activity assessment in prepubertal children with high-frequency accelerometry monitoring: a methodological issue. Prev Med 2007; 44 (2): 143-7 10. Terwee CB, Mokkink LB, van Poppel MNM, et al. Qualitative attributes and measurement properties of physical activity questionnaires: a checklist. Sports Med 2010; 40 (7): 525-37 11. Scientific Advisory Committee of the Medical Outcomes Trust. Assessing health status and quality-of-life instruments: attributes and review criteria. Qual Life Res 2002; 11: 193-205 12. Terwee CB, Bot SD, de Boer MR, et al. Quality criteria were proposed for measurement properties of health status questionnaires. J Clin Epidemiol 2007; 60 (1): 34-42 13. de Vet HCW. Observer reliability and agreement. In: Armitage P, Colton T, editors. Encyclopedia of biostatistics. Boston (MA): John Wiley & Sons Ltd., 1998: 3123-8 14. Telford A, Salmon J, Jolley D, et al. Reliability and validity of physical activity questionnaires for children: the Children’s Leisure Activities Study Survey (CLASS). Pediatr Exerc Sci 2004; 16: 64-78 15. Janz KF, Broffitt B, Levy SM. Validation evidence for the Netherlands physical activity questionnaire for young
Sports Med 2010; 40 (7)
Chinapaw et al.
562
16.
17.
18.
19.
20.
21. 22.
23.
24.
25.
26.
27.
28.
29.
30.
31.
32.
children: the Iowa Bone Development study. Res Q Exerc Sport 2005; 76 (3): 363-9 Corder K, van Sluijs EM, Wright A, et al. Is it possible to assess free-living physical activity and energy expenditure in young people by self-report? Am J Clin Nutr 2009; 89 (3): 862-70 Crocker PR, Bailey DA, Faulkner RA, et al. Measuring general levels of physical activity: preliminary evidence for the Physical Activity Questionnaire for Older Children. Med Sci Sports Exerc 1997; 29 (10): 1344-9 Treuth MS, Sherwood NE, Butte NF, et al. Validity and reliability of activity measures in African-American girls for GEMS. Med Sci Sports Exerc 2003; 35 (3): 532-9 Salmon J, Telford A, Crawford D. Assessment of physical activity among primary school aged children: the Children’s Leisure Activities Study (CLASS). Australas Epidemiolog 2002; 9: 10-14 Treuth MS, Sherwood NE, Baranowski T, et al. Physical activity self-report and accelerometry measures from the Girls Health Enrichment Multi-site Studies. Prev Med 2004; 38 Suppl.: S43-9 Ching PLYH, Dietz WH. Reliability and validity of activity measures in preadolescent girls. Ped Exerc Sci 1995; 7: 389-99 Brown TD, Holland BV. Test-retest reliability of the selfassessed physical activity checklist. Percept Mot Skills 2004; 99 (3 Pt 2): 1099-102 Koo MM, Rohan TE. Comparison of four habitual physical activity questionnaires in girls aged 7–15 yr. Med Sci Sports Exerc 1999; 31 (3): 421-7 Janz KF, Witt J, Mahoney LT. The stability of children’s physical activity as measured by accelerometry and selfreport. Med Sci Sports Exerc 1995; 27 (9): 1326-32 McMurray RG, Harrell JS, Bradley CB, et al. Comparison of a computerized physical activity recall with a triaxial motion sensor in middle-school youth. Med Sci Sports Exerc 1998; 30 (8): 1238-45 Taren DL, Freeman MB, Brandenburg NA. Evaluation of dietary and activity questionnaires for elementary school children. Ann N Y Acad Sci 1993; 699: 298-300 Barbosa N, Sanchez CE, Vera JA, et al. A physical activity questionnaire: reproducibility and validity. J Sports Sci Med 2007; 6: 505-18 Lubans DR, Sylva K, Osborn Z. Convergent validity and test-retest reliability of the Oxford Physical Activity Questionnaire for secondary school students. Behav Change 2008; 25 (1): 23-34 Prochaska JJ, Sallis JF, Long B. A physical activity screening measure for use with adolescents in primary care. Arch Pediatr Adolesc Med 2001; 155 (5): 554-9 Rangul V, Holmen TL, Kurtze N, et al. Reliability and validity of two frequently used self-administered physical activity questionnaires in adolescents [abstract]. BMC Med Res Methodol 2008; 8: 47 Aaron DJ, Kriska AM, Dearwater SR, et al. Reproducibility and validity of an epidemiologic questionnaire to assess past year physical activity in adolescents. Am J Epidemiol 1995; 142 (2): 191-201 McMurray RG, Ring KB, Treuth MS, et al. Comparison of two approaches to structured physical activity surveys for adolescents. Med Sci Sports Exerc 2004; 36: 2135-43
ª 2010 Adis Data Information BV. All rights reserved.
33. Booth ML, Okely AD, Chey TN, et al. The reliability and validity of the Adolescent Physical Activity Recall Questionnaire. Med Sci Sports Exerc 2002; 34 (12): 1986-95 34. Treuth MS, Hou N, Young DR, et al. Validity and reliability of the Fels physical activity questionnaire for children. Med Sci Sports Exerc 2005; 37 (3): 488-95 35. Lachat CK, Verstraeten R, Khanh le NB, et al. Validity of two physical activity questionnaires (IPAQ and PAQA) for Vietnamese adolescents in rural and urban areas. Int J Behav Nutr Phys Act 2008; 5: 37 36. Booth ML, Okely AD, Chey T, et al. The reliability and validity of the physical activity questions in the WHO health behaviour in schoolchildren (HBSC) survey: a population study. Br J Sports Med 2001; 35 (4): 263-7 37. Burdette HL, Whitaker RC, Daniels SR. Parental report of outdoor playtime as a measure of physical activity in preschool-aged children. Arch Pediatr Adolesc Med 2004; 158 (4): 353-7 38. Nishikido N, Kashiwazaki H, Suzuki T. Preschool children’s daily activities: direct observation, pedometry or questionnaire. J Hum Ergol (Tokyo) 1982; 11 (2): 214-8 39. Moore HJ, Ells LJ, McLure SA, et al. The development and evaluation of a novel computer program to assess previousday dietary and physical activity behaviours in school children: the Synchronised Nutrition and Activity Program (SNAP). Br J Nutr 2008; 99 (6): 1266-74 40. Harro M. Validation of a questionnaire to assess physical activity of children ages 4–8 years. Res Q Exerc Sport 1997; 68 (4): 259-68 41. Tremblay MS, Inman JW, Willms JD. Preliminary evaluation of a video questionnaire to assess activity levels of children. Med Sci Sports Exerc 2001; 33 (12): 2139-44 42. Ridley K, Olds TS, Hill A. The Multimedia Activity Recall for Children and Adolescents (MARCA): development and evaluation. Int J Behav Nutr Phys Act 2006; 3: 10 43. Kowalski KC, Crocker PR, Faulkner RA. Validation of the physical activity questionnaire for older children. Pediatr Exerc Sci 1997; 9 (2): 174-86 44. Janz KF, Medema-Johnson HC, Letuchy EM, et al. Subjective and objective measures of physical activity in relationship to bone mineral content during late childhood: the Iowa Bone Development Study. Br J Sports Med 2008; 42 (8): 658-63 45. Sallis JF, Strikmiller PK, Harsha DW, et al. Validation of interviewer- and self-administered physical activity checklists for fifth grade students. Med Sci Sports Exerc 1996; 28 (7): 840-51 46. Basterfield L, Adamson AJ, Parkinson KN, et al. Surveillance of physical activity in the UK is flawed: validation of the Health Survey for England Physical Activity Questionnaire. Arch Dis Child 2008; 93 (12): 1054-8 47. Chen X, Sekine M, Hamanishi S, et al. Validation of a selfreported physical activity questionnaire for schoolchildren. J Epidemiol 2003; 13 (5): 278-87 48. Jurisson A, Jurimae T. The validity of the Godin-Shephard physical activity questionnaire in children. Biol Sport 1996; 13: 291-5 49. Moore JB, Hanes Jr JC, Barbeau P, et al. Validation of the Physical Activity Questionnaire for Older Children in children of different races. Pediatr Exerc Sci 2007; 19 (1): 6-19
Sports Med 2010; 40 (7)
Physical Activity Questionnaires for Youth
50. Weston AT, Petosa R, Pate RR. Validation of an instrument for measurement of physical activity in youth. Med Sci Sports Exerc 1997; 29 (1): 138-43 51. Ekelund U, Neovius M, Linne Y, et al. The criterion validity of a last 7-day physical activity questionnaire (SAPAQ) for use in adolescents with a wide variation in body fat: the Stockholm Weight Development Study. Int J Obes (Lond) 2006; 30 (6): 1019-21 52. Wong SL, Leatherdale ST, Manske S. Reliability and validity of a school-based physical activity questionnaire. Med Sci Sports Exerc 2006; 38 (9): 1593-600 53. Janz KF, Lutuchy EM, Wenthe P, et al. Measuring activity in children and adolescents using self-report: PAQ-C and PAQ-A. Med Sci Sports Exerc 2008; 40 (4): 767-72 54. Kowalski KC, Crocker PR, Kowalski NP. Convergent validity of the physical activity questionnaire for adolescents. Pediatr Exerc Sci 1997; 9 (4): 342-52 55. Gao S, Schmitz K, Fulton J, et al. Reliability and validity of a brief tool to measure children’s physical activity. J Phys Act Health 2006; 3 (4): 415-22 56. Philippaerts RM, Matton L, Wijndaele K, et al. Validity of a physical activity computer questionnaire in 12- to 18-yearold boys and girls. Int J Sports Med 2006; 27 (2): 131-6 57. Kimm SY, Glynn NW, Kriska AM, et al. Longitudinal changes in physical activity in a biracial cohort during adolescence. Med Sci Sports Exerc 2000; 32 (8): 1445-54 58. Verheul ACM, Prins AN, Kemper HCG, et al. Validation of a weight-bearing physical activity questionnaire in a study of bone density in girls and women. Pediatr Exerc Sci 1998; 10: 38-47 59. Hagstromer M, Bergman P, De BI, et al. Concurrent validity of a modified version of the International Physical Activity Questionnaire (IPAQ-A) in European adolescents: the HELENA study. Int J Obes (Lond) 2008; 32 Suppl. 5: S42-8 60. Shiely F, MacDonncha C. Meeting the international adolescent physical activity guidelines: a comparison of objectively measured and self-reported physical activity levels. Ir Med J 2009; 102 (1): 15-9 61. Biddle SJ, Mitchell J, Armstrong N. The assessment of physical activity in children: a comparison of continuous heart rate monitoring, self-report, and interview recall techniques. Br J Phys Educ 1991; (Research Suppl.): 5-8 62. Narring F, Cauderay M, Cavadini C, et al. Physical fitness and sport activity of children and adolescents: methodological aspects of a regional survey. Soz Praventivmed 1999; 44 (2): 44-54 63. Schmidt GJ, Walkuski JJ, Stensel DJ. The Singapore Youth Coronary Risk and Physical Activity Study. Med Sci Sports Exerc 1998; 30 (1): 105-13 64. Guyatt GH, Deyo RA, Charlson M, et al. Responsiveness and validity in health status measurement: a clarification. J Clin Epidemiol 1989; 42 (5): 403-8 65. Sallis JF, Saelens BE. Assessment of physical activity by selfreport: status, limitations, and future directions. Res Q Exerc Sport 2000; 71 (2): S1-14 66. Oliver M, Schofield GM, Kolt GS. Physical activity in preschoolers: understanding prevalence and measurement issues. Sports Med 2007; 37 (12): 1045-70 67. Sirard JR, Pate RR. Physical activity assessment in children and adolescents. Sports Med 2001; 31 (6): 439-54
ª 2010 Adis Data Information BV. All rights reserved.
563
68. Freedson PS, Miller K. Objective monitoring of physical activity using motion sensors and heart rate. Res Q Exerc Sport 2000; 71 (2 Suppl.): S21-9 69. Bassett Jr DR, Ainsworth BE, Swartz AM, et al. Validity of four motion sensors in measuring moderate intensity physical activity. Med Sci Sports Exerc 2000; 32 (9 Suppl.): S471-80 70. Melanson Jr EL, Freedson PS. Validity of the Computer Science and Applications, Inc. (CSA) activity monitor. Med Sci Sports Exerc 1995; 27 (6): 934-40 71. Slootmaker SM, Chin A Paw MJ, Schuit AJ, et al. Concurrent validity of the PAM accelerometer relative to the MTI Actigraph using oxygen consumption as a reference. Scand J Med Sci Sports 2009; 19 (1): 36-43 72. Ekelund U, Sjostrom M, Yngve A, et al. Physical activity assessed by activity monitor and doubly labeled water in children. Med Sci Sports Exerc 2001; 33 (2): 275-81 73. Hendelman D, Miller K, Baggett C, et al. Validity of accelerometry for the assessment of moderate intensity physical activity in the field. Med Sci Sports Exerc 2000; 32 (9 Suppl.): S442-9 74. Welk GJ, Blair SN, Wood K, et al. A comparative evaluation of three accelerometry-based physical activity monitors. Med Sci Sports Exerc 2000; 32 (9 Suppl.): S489-97 75. Corder K, Ekelund U, Steele RM, et al. Assessment of physical activity in youth. J Appl Physiol 2008; 105 (3): 977-87 76. Hesketh K, Crawford D, Salmon J. Children’s television viewing and objectively measured physical activity: associations with family circumstance. Int J Behav Nutr Phys Act 2006; 3: 36 77. Mattocks C, Ness A, Leary S, et al. Use of accelerometers in a large field-based study of children: protocols, design issues, and effects on precision. J Phys Act Health 2008; 5 Suppl. 1: S98-111 78. Bailey RC, Olson J, Pepper SL, et al. The level and tempo of children’s physical activities: an observational study. Med Sci Sports Exerc 1995; 27 (7): 1033-41 79. Tang RB, Lee PC, Chen SJ, et al. Cardiopulmonary response in obese children using treadmill exercise testing. Zhonghua Yi Xue Za Zhi (Taipei) 2002; 65 (2): 79-82 80. Tudor-Locke C, Williams JE, Reis JP, et al. Utility of pedometers for assessing physical activity: construct validity. Sports Med 2004; 34 (5): 281-91 81. Tudor-Locke C, Williams JE, Reis JP, et al. Utility of pedometers for assessing physical activity: convergent validity. Sports Med 2002; 32 (12): 795-808 82. Raudsepp L, Jurimae T. Relationships between somatic variables, physical activity, fitness and fundamental motor skills in prepubertal boys. Biol Sport 1996; 13 (4): 279-89 83. Hands B, Larkin D, Parker H, et al. The relationship among physical activity, motor competence and health-related fitness in 14-year-old adolescents. Scand J Med Sci Sports 2009; 19 (5): 655-63
Correspondence: Dr Mai J.M. Chin A Paw, Department of Public and Occupational Health, EMGO Institute for Health and Care Research, VU University Medical Center, Van der Boechorststraat 7, 1081 BT Amsterdam, the Netherlands. E-mail:
[email protected]
Sports Med 2010; 40 (7)
Sports Med 2010; 40 (7): 565-600 0112-1642/10/0007-0565/$49.95/0
REVIEW ARTICLE
ª 2010 Adis Data Information BV. All rights reserved.
Physical Activity Questionnaires for Adults A Systematic Review of Measurement Properties Mireille N.M. van Poppel,1 Mai J.M. Chinapaw,1 Lidwine B. Mokkink,2 Willem van Mechelen1 and Caroline B. Terwee2 1 Department of Public and Occupational Health, EMGO Institute for Health and Care Research, VU University Medical Center, Amsterdam, the Netherlands 2 Department of Epidemiology and Biostatistics, EMGO Institute for Health and Care Research, VU University Medical Center, Amsterdam, the Netherlands
Contents Abstract. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1. Methods. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1 Literature Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Eligibility Criteria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3 Selection of Papers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.4 Data Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.5 Quality Assessment of the Studies on Measurement Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.6 Content Validity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.7 Construct Validity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.8 Reliability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.9 Responsiveness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2. Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 Quality of the Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Qualitative Attributes of the Questionnaires . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 Validation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3. Discussion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 Limitations of this Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Recommendations for Choosing a Questionnaire . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3 Recommendations for Further Research. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Abstract
565 566 566 567 567 567 568 568 568 569 569 570 570 578 578 594 595 595 596 597
Many questionnaires have been developed to measure physical activity (PA), but an overview of the measurement properties of PA questionnaires is lacking. A summary of this information is useful for choosing the best questionnaire available. Therefore, the objective of this study was to evaluate and compare measurement properties of self-administered questionnaires assessing PA in adults. We searched MEDLINE, EMBASE and SportDiscus, using ‘exercise’, ‘physical activity’, ‘motor activity’ and ‘questionnaire’ as keywords. We included studies that evaluated the measurement properties of self-report questionnaires assessing PA. Article
van Poppel et al.
566
selection, data extraction and quality assessment were performed by two independent reviewers. The quality and results of the studies were evaluated using the Quality Assessment of Physical Activity Questionnaires (QAPAQ) checklist. Construct validity, reliability and responsiveness were rated as positive, negative or indeterminate, depending on the methods and results. We included 85 (versions of) questionnaires. Overall, the quality of the studies assessing measurement properties of PA questionnaires was rather poor. Information on content validity was mostly lacking. Construct validity was assessed in 76 of the questionnaires, mostly by correlations with accelerometer data, maximal oxygen uptake or activity diaries. Fifty-one questionnaires were tested for reliability. Only a few questionnaires had sufficient construct validity and reliability, but these need to be further validated. Responsiveness was studied for only two questionnaires and was poor. There is a clear lack of standardization of PA questionnaires, resulting in many variations of questionnaires. No questionnaire or type of questionnaire for assessing PA was superior and therefore could not be strongly recommended above others. In the future, more attention should be paid to the methodology of studies assessing measurement properties of PA questionnaires and the quality of reporting.
Adequately measuring physical activity (PA) is important for determining trends in PA levels over time, for evaluation of the effect of PA interventions and for determining health benefits of PA. Poor measurement of PA may hinder detection of important associations or effects.[1] Many questionnaires have been developed to measure PA. Some questionnaires were developed specifically for a certain subgroup or setting, others because researchers were not aware of existing questionnaires or because they were not satisfied with available questionnaires. Often researchers needed to translate and/or adapt existing questionnaires to other target groups. This has led to a large number of (versions of) questionnaires available, which makes it difficult to choose the most suitable instrument. Furthermore, the use of different instruments in different studies and surveys makes comparison of PA levels across countries or studies difficult. To our knowledge, an overview of the measurement properties of PA questionnaires is lacking. A summary of these findings might be helpful for choosing the best questionnaire available for a specific purpose. Furthermore, a critical assessment of the methodological quality of the studies assessing the measurement properª 2010 Adis Data Information BV. All rights reserved.
ties of PA questionnaires is lacking, while the methodological quality of these studies might be variable. If the methodological quality of a study is poor, the results and conclusions can be seriously biased. For example, wrong conclusions can be drawn from a validation study if no adequate comparison instrument was used. It is therefore important to assess the methodological quality of a study to be confident that the design, conduct, analysis and interpretation of the study is adequate, and to inform about possible bias that might have influenced the results. In this article, we aim to evaluate and compare the measurement properties of all available selfadministered questionnaires measuring PA in adults, using a systematic approach for the literature search, data extraction and assessment of the quality of the studies. This article is one of a series of four articles on measurement properties of PA questionnaires published in Sports Medicine. 1. Methods 1.1 Literature Search
Literature searches were performed in PubMed, EMBASE using ‘EMBASE only’, and in Sports Med 2010; 40 (7)
Physical Activity Questionnaires for Adults
SportDiscus (complete databases until May 2009) on the topic of self-report questionnaires of PA. Additional papers were identified by manually searching references of the retrieved papers and the authors’ own literature databases. The full search strategy in PubMed was as follows: (exercise[MeSH] OR ‘physical activity’ [tiab] OR motor activity[MeSH]) AND (questionnaire[MeSH] OR questionnaire*[tiab]), and limited to humans. In EMBASE and SportDiscus, ‘physical activity’ and ‘questionnaire’ were used as free text words and in EMBASE this was complemented with the EMTREE term ‘exercise’. 1.2 Eligibility Criteria
We used the following inclusion criteria: 1. The aim of the study should be to develop or evaluate the measurement properties – i.e. content validity, construct validity, reliability or responsiveness – of a self-report questionnaire. 2. The aim of the questionnaire should be to measure PA, which was defined as any bodily movement produced by skeletal muscles that results in energy expenditure above resting level.[2] PA in daily life can be categorized into occupational, sports, conditioning, household or other activities. Questionnaires were included regardless of the time frame; thus, questionnaires measuring lifetime PA or historical activity were also included. 3. The questionnaire could be used to measure PA in adults in the general population, and was not developed or evaluated in a specific population, such as patients or pregnant or obese participants. 4. The study sample should have a mean age between 18 and 55 years. 5. The article should have been published in the English language. 6. Information on (at least one of) the measurement properties of the self-report questionnaire should be provided. We included information on measurement properties only if it was intentionally collected or calculated to assess the measurement properties of the particular self-report questionnaire. If, for example, correlations between a self-report questionnaire and an accelerometer were presented to assess the validity of ª 2010 Adis Data Information BV. All rights reserved.
567
the accelerometer (while the self-report questionnaire was used as a gold standard) or if correlations between different PA questionnaires were calculated without one questionnaire considered as the standard, these data were not included in this review. We excluded PA interviews or diaries. We also excluded studies that evaluated the measurement properties of a self-report questionnaire administered in an interview form. Finally, questionnaires measuring physical functioning (e.g. the degree to which one is limited in carrying out activities) and questionnaires asking about sweating in a single question were excluded. 1.3 Selection of Papers
Abstract selection, selection of full-text articles, data extraction and quality assessment were performed by two independent reviewers. Disagreements were discussed and resolved. We retrieved the full-text paper of all abstracts that fulfilled the inclusion criteria and of abstracts that did not contain measurement properties, but in which indications were found that these properties were presented in the full-text paper. 1.4 Data Extraction
We extracted a description of the self-report questionnaires from the included papers, using a standardized data extraction form. Data extracted included (i) the target population for which the questionnaire was developed; (ii) the dimension(s) of PA that the questionnaire intends to measure (e.g. habitual PA); (iii) the parameters of PA that the questionnaire is measuring (i.e. frequency, duration and intensity or activities); (iv) the setting in which PA is being measured (i.e. sport, recreational, transport, occupational/school activities, household activities [including gardening], other); (v) the number of questions; (vi) the recall period that the questions refer to; and (vii) the type and number of scores that were calculated (e.g. total energy expenditure or minutes of activity per day). Sports Med 2010; 40 (7)
van Poppel et al.
568
1.5 Quality Assessment of the Studies on Measurement Properties
To assess the methodological quality and results of the studies on measurement properties, we used the QAPAQ checklist (see table I for acronym definitions). We developed this checklist specifically for PA questionnaires, based on two recently developed checklists to evaluate the measurement properties of patient-reported outcomes COSMIN[8] and self-report health status questionnaires.[33] The QAPAQ is described elsewhere.[29] We extracted and rated the methods and results of all evaluated measurement properties (see sections 1.7–1.9). 1.6 Content Validity
No criterion exists to rate whether the content of a questionnaire is relevant and comprehensive for measuring PA. Therefore, we formed our own opinion on content validity. Questionnaires should measure at least duration and frequency of PA, and if the intention was to measure total PA, the questionnaire should cover activities in all settings (work, home, transport, recreation, sport).
Table I Explanation of acronyms or abbreviated names of questionnaires Abbreviation ARIC/Baecke
Full name of questionnaire [3,4]
CARDIA Q[3,5] CHAMPS[6] CMH[7] COSMIN[8]
EPAQ2[9] EPIC original Q[10]
HLAQ[11] HUNT 1 and 2[12,13] IPAQ[14] JACC Q[15] LACE PA Q[7] Minnesota LTPA Q[16] MOSPA[17] NASA Q[18]
1.7 Construct Validity
The more similar the constructs that are being compared, the more evidence is provided for validity. Comparison with objective measures of PA (doubly labelled water, accelerometers, pedometers) was considered the best level of evidence (Level 1 or 2, depending on the use of the objective data). We considered constructs not really measuring current PA (maximal oxygen uptake . [VO2max], body mass index [BMI], etc.) or another questionnaire, a diary or interview as less adequate comparison measures (Level 3). Depending on the strength of the hypothesized association with the comparison measure, different correlations were considered to be adequate (table II). A positive score was given if the study population consisted of ‡50 participants and the correlation was above the specified cut-off point. If the correlation was below the specified cutoff point, a negative score was given. If the ª 2010 Adis Data Information BV. All rights reserved.
NHS II Activity Q[19] NPAQ[20] NZPAQ-SF[21] PAFQ[22] PAQ-AD[23] PAS[24,25] PYTPAQ[26,27] QAPSE[28] QAPAQ[29] RWJ[30] SDR[31] SQUASH[32] TOQ[31] YPAS[6]
Atherosclerosis Risk in Communities (ARIC)/Baecke Questionnaire Coronary Artery Risk Development in Young Adults Questionnaire Cardiovascular Health after Maternal Placental Syndromes California Men’s Health Study COnsensus-based Standards for the selection of health Measurement INstruments EPIC-Norfolk Physical Activity Questionnaire European Prospective Investigation into Cancer and Nutrition original Questionnaire Historical Leisure Activity Questionnaire The Nord-Trøndelag Health Study 1 and 2 International Physical Activity Questionnaire Japan Collaborative Cohort Study for Evaluation of Cancer Risk Questionnaire Life After Cancer Epidemiology Study Physical Activity Questionnaire Minnesota Leisure Time Physical Activity Questionnaire Monica Optional Study of Physical Activity National Aeronautics and Space Administration Questionnaire Nurses’ Health Study II Activity Questionnaire Neighbourhood Physical Activity Questionnaire New Zealand Physical Activity Questionnaire – Short Form Physical Activity Frequency Questionnaire Physical Activity Questionnaire – Adults Physical Activity Survey Past Year Total Physical Activity Questionnaire Questionnaire d’Activite´ Physique Saint-Etienne Quality Assessment of Physical Activity Questionnaire Checklist Historical Walking, Running and Jogging Questionnaire 7-day recall short questionnaire to assess healthenhancing physical activity Tecumseh Occupational Questionnaire Yale Physical Activity Survey
Sports Med 2010; 40 (7)
Physical Activity Questionnaires for Adults
569
Table II. Cut-off points for sufficient correlations per dimension of physical activity (PA) measured by the questionnaire, and level of evidence Dimension of PA measured
Level 1
Level 2
Total energy expenditure
Doubly labelled water ‡0.70
Accelerometer total counts ‡0.50
Vigorous activity
Accelerometer vigorous counts ‡0.50
Accelerometer total counts ‡0.40
Moderate plus vigorous activity
Accelerometer moderate and vigorous counts ‡0.50
Accelerometer total counts ‡0.40
Moderate activity
Accelerometer moderate counts ‡0.50
Accelerometer total counts ‡0.40
Walking
Pedometer or accelerometer walking counts ‡0.70
Leisure time PA
Accelerometer total counts in leisure time ‡0.50
Accelerometer total counts ‡0.40
Occupational PA
Direct observational method ‡0.60
Accelerometer during working hours ‡0.40
1.8 Reliability
The time interval between the test and retest must have been described and short enough to ensure that subjects had not changed their PA levels, but long enough to prevent recall. The most optimal time interval depends on the construct to be measured and the recall period of the questionnaire. For measuring PA during the past or usual week or in the past year, a time interval of 1 day to 3 months was considered appropriate. For measuring lifetime PA, a time interval from 1 day to 12 months was considered appropriate. For reliability, three levels of evidence were formulated: Level 1: an adequate time interval between test and retest and an intraclass correlation coefficient (ICC), Kappa or Concordance. Level 2: an inadequate time interval between test and retest and an ICC, Kappa or Concordance; or an adequate time interval between test and retest and a Pearson/Spearman correlation. ª 2010 Adis Data Information BV. All rights reserved.
Diary, other questionnaire, interview ‡0.70 caloric intake, BMI, % BF ‡0.50 Diary, other questionnaire, interview ‡0.70 caloric intake, BMI, % BF ‡0.50
. BF = body fat; BMI = body mass index; VO2max = maximal oxygen uptake.
sample size was <50 participants, the score was indeterminate (?).
Level 3 . VO2max ‡0.40 Diary, other questionnaire, interview ‡0.70 caloric intake, BMI, % BF ‡0.50 . VO2max ‡0.60 Diary, other questionnaire, interview ‡0.70 caloric intake, BMI, % BF ‡0.50 . VO2max ‡0.50 Diary, other questionnaire, interview ‡0.70 caloric intake, BMI, % BF ‡0.50
. VO2max ‡0.40 Diary, other questionnaire, interview ‡0.70 caloric intake, BMI, % BF ‡0.50 . VO2max ‡0.40 Diary, other questionnaire, interview ‡0.70 caloric intake, BMI, % BF ‡0.50
Level 3: an inadequate time interval between test and retest and Pearson/Spearman correlation. An ICC >0.70 was considered acceptable.[34] The use of Pearson or Spearman correlation coefficients was considered inadequate, because it neglects systematic errors.[35] However, Pearson/Spearman correlations >0.80 would probably result in ICCs >0.70 and were therefore also rated positively, but on a second level of evidence. Pearson or Spearman correlations <0.80 were rated negatively. A positive score was given if the study population consisted of ‡50 participants and the ICC, Kappa, Concordance or Pearson/Spearman correlation was above the specified cut-off point. If the correlation was below the specified cut-off point, a negative score was given. If the sample size was <50 participants, the score was rated as indeterminate (?). 1.9 Responsiveness
Responsiveness is the ability of an instrument to detect change over time in the construct to be measured.[36] It should be considered an aspect of Sports Med 2010; 40 (7)
van Poppel et al.
570
Total 21 891
PubMed 9733
EMBASE 7601
SportDiscus® 4284
Selection based on titles and abstracts 284
Selection based on titles and abstracts not in PubMed 55
Selection based on titles and abstracts not in PubMed and EMBASE 54
Total 3931
Children 83
Adults 260
Elderly 59
Excluded 166 Included 94 papers on 85 questionnaires Fig. 1. Flowchart of literature search and paper selection. 1 One paper appears in both the review for adults and for elderly.
validity in a longitudinal setting. Responsiveness was assessed by comparing changes in the PA questionnaire with changes in other instruments that measure closely related constructs. The same approach as for assessing validity was applied, except that change scores were being compared instead of absolute scores. Depending on the strength of the hypothesized association, different correlations were considered to be adequate. 2. Results The search resulted in 21 891 hits, of which 260 abstracts were selected. Of the full-text articles with relevant titles and/or abstracts, 166 were excluded. Most of the papers were excluded because the questionnaire was administered in an interview or because no measurement properties of the questionnaire were assessed. Finally, 94 papers on 85 (versions of) questionnaires were included in the review (figure 1). Descriptive information on the questionnaires included in the review is provided in table III. ª 2010 Adis Data Information BV. All rights reserved.
2.1 Quality of the Studies
Construct validity was assessed for 77 questionnaires in 85 studies. Of these 77 questionnaires, 16 were validated at Level 1 and an additional 22 questionnaires at Level. 2. Objective comparison measures were often VO2max (n = 40), accelerometers (n = 41), heart rate monitor (n = 5), doubly labelled water (n = 7) or pedometer (n = 6) [table IV]. Two of the three questionnaires specifically designed to measure walking were validated against pedometers (Level 1). Surprisingly, appropriate cut-off points for analysing accelerometer data were often not used when assessing time spent in moderate to vigorous PA, but instead total counts were used, which does not discriminate between light, moderate or vigorous PA. Reliability was assessed for 51 (versions of) questionnaires in 49 studies. Only 15 questionnaires were reliability-tested at Level 1 and an additional 36 questionnaires at Level 2 (table V). The most frequently occurring methodological Sports Med 2010; 40 (7)
Questionnaire
Construct
Format
dimension
setting
recall period
no. of questions/activities
parameters
scores
unit of measurement
Modified Active Australia Survey[37]
PA
Leisure, walking
Past wk
24
F, D
TEE
MET min/wk
Activity History Q[38]
Physical training
?
Past y
?
F, D
TEE Vigorous EE
kcal/kg/wk kcal/kg/wk
Aires[39]
LTPA
LTPA
Past 12 mo
1
I
Total leisure
Activity score (1–4)
Arizona Activity Frequency Q[40]
TEE
Sport, recr, occup, home, sleeping, personal care
Past 28 d
68
F, D
TEE; daily PA EE
kJ/day
Baecke[18,41,42]
Habitual PA
Sport, recr, occup, sleeping
Not defined
16
F, D
Work; sport; leisure
Activity score (1–5)
Modified Baecke 1[43]
?
?
Past y
19
F, D
Work; sport; leisure; total
Activity score (1–5, total 3–15)
Modified Baecke (ARIC/Baecke)[3,4]
LTPA
Sport, recr, trans, occup, watching TV
?
15
F, D
Sport- and exerciserelated leisure index; non-sport- and exercise-related leisure index; total leisure activity
Activity score (1–5)
Modified Baecke 2[44]
?
Sport, recr, trans, watching TV, sweating
?
5
F, D
Sport activity index Leisure activity index
Activity score (?) Activity score (1–5)
Extended Baecke (QAPSE)[28]
DEE
Sport, recr, trans, occup, home, sleeping, eating, washing
Usual wk
35
F, D
TEE
MET/day
Bharathi Q[45]
?
Sport, recr, trans, occup, home, sleeping, sedentary activities
Past mo
13
F, D
TEE PAL
kJ/day 24h EE/BMR
Black Women’s Health Study[46]
?
Sport, recr, home, walking
Previous y
?
D
Weekly PA EE
MET h/wk
[3,5]
?
?
Past y
3
F
Total; moderate; heavy
Weighted F
Modified CHAMPS[6]
?
?
Past 2 wk
31
F, D
TEE; moderate/vigorous; vigorous; sports;
kcal/kg/wk
CMH Q[7]
?
Sport, recr, occup, sedentary act
Past 3 mo
24
F, D, I
Total; moderate; vigorous
MET h/wk
CARDIA
Physical Activity Questionnaires for Adults
ª 2010 Adis Data Information BV. All rights reserved.
Table III. Description of physical activity (PA) questionnaires (Q)
Continued next page
571
Sports Med 2010; 40 (7)
572
ª 2010 Adis Data Information BV. All rights reserved.
Table III. Contd Questionnaire
Construct
Format
dimension
setting
recall period
no. of questions/activities
parameters
scores
unit of measurement
EPIC original Q[10]
Daily EE
Sport, recr, trans, occup, home, rest
Past y
28
F, D, I
Total; occup; leisure; rest
kJ/24 h
Modified EPIC Q (short PA Index)[47]
?
Sport, recr, trans, occup, home
Past y
4
D
PA index
Activity score (4 categories)
EPAQ2[9]
TEE
Sport, recr, trans, occup, home, sleeping
Past y
85
F, D
TV time Activity at home Activity at work Recreational activity Vigorous activity PA index
h/wk MET MET MET h/wk MET
h/wk h/wk h/wk h/wk
PA and sedentary behaviour
Sport, recr, trans, occup, home, sedentary behaviour, sleeping
Usual wk
57–90
D
15 different activity scores
kcal/wk h/wk
Framingham Q[42]
?
?
Usually
?
?
TEE
kcal/day
Single PA Q Gionet and Godin[49]
LTPA
LTPA
Past 6 mo
1
F
Total
Activity score (1–6)
SDR Q Gionet and Godin[49] (based on Godin Q[50])
LTPA
Sport, recr
Past 7 d
29
F
Total; strenuous; moderate; mild
MET/wk
Godin Q[3,18,50]
LTPA
Sport, recr
Usual wk
4
F
Total; strenuous; moderate; light
Times/wk
Harvard/College Alumnus Q[42,51-53]
LTPA
Sport, recr, trans, stair climbing
Past 7 d
3
F, D
Leisure EE; light; moderate; vigorous; TEE
MET min/wk, kcal/wk
Harvard/College Alumnus Q[3]
?
Sport, recr, trans, stair climbing
Currently
3
F, D, I
TEE; sports
MET min/day
HUNT 1[12]
LTPA
sport
Usually
3
F, D, I
HUNT 2[13]
LTPA in past y
Sport, recr, occup
Past y
3
DI
Light PA Hard PA Work PA
Activity score (0–3) Activity score (0–3) Activity score (1–4)
IPAQ[14]
?
Sport, recr, trans, occup, home, sitting
Past 7 d/usual wk
9 (S7S, SUS) 31 (L7S, LUS)
F, D, I
TEE Meeting ACSM norm
MET min/wk Yes/no
Continued next page
van Poppel et al.
Sports Med 2010; 40 (7)
Flemish PA computerized Q[48]
Questionnaire
Construct
Format
dimension
setting
recall period
Adapted IPAQ[54,55]
?
Sport, recr, trans, occup, home, sitting
Usual wk, summer and winter
JACC Q[15]
?
Sport, recr, trans
Usually usually past 2 y
Kaiser PA Survey[56]
?
Sport, recr, trans, occup, home, watching TV
Kuopio Q[57]
Habitual activity
LACE PA Q[7]
no. of questions/activities
scores
unit of measurement
F, D, I
TEE
MET min/wk
3
F, D
PA time; walking time; PA F
Activity score (1–4)
Past y
75
F, D
Caregiving; housework; housework/caregiving; sports/exercise; active living habits; occup; 3-point summary;
Activity score (1–4 caregiving, 1–5 other)
Sport, recr, trans, occup
Currently
39
F, I
Total
F of conditioning exercise
?
Sport, recr, trans, occup, home
Past 12 mo
56
F, D, I
Domain and intensity specific summaries
MET h/wk
Life in NZ National Survey[58]
?
Recr, occup, home
Past 4 wk
100
F, D, I
Activityhi Activitylo
min/wk min/wk MET/wk
Lipid Research Clinics Q[3,42,59]
PA level relative to peers; regular engagement in strenuous activities
Unspecified
Currently
4
Comparative rating
Active/inactive/ highly active/ moderately active/ low active/very low active
Activity score (1–2) Activity score (1–4)
Lo¨f Q[60]
?
?
Past 2 wk
6
?
TEE
kcal/24 h
Leisure Time PA Q[61]
EE during LTPA
Sport, recr, trans, home
Past 3 mo Past y
47
D, I
LTPA
kcal/wk/kg
Mail Survey of PA habits[62]
Exercise habits and participation
Sport, recr, trans
Past 3 mo
6
F, D
TEE RWJ Index Sweat F
MET/wk Activity score Times/wk
Minnesota LTPA Q[16]
?
?
Past y
63
F, D
Leisure EE
MET/h
Minnesota LTPA Q[3,42,63]
?
?
Past y
74
D, I
Leisure EE
MET min/wk
59
Continued next page
573
Sports Med 2010; 40 (7)
parameters
Physical Activity Questionnaires for Adults
ª 2010 Adis Data Information BV. All rights reserved.
Table III. Contd
574
ª 2010 Adis Data Information BV. All rights reserved.
Table III. Contd Questionnaire
Construct
Format setting
recall period
no. of questions/activities
parameters
scores
unit of measurement
Modified Minnesota LTPA Q (Canada Fitness Survey)[64]
?
Sport, recr, home
Weekly, past mo, past y
?
F, D, I
Total; leisure; non-leisure
Time, kJ/kg/wk
Modified Minnesota LTPA Q (Year 11 Q)[65]
?
Sport, recr, home
Past y, recalled 1–10 y later
27
F, D
Leisure EE; light EE; moderate EE; vigorous EE
kcal/wk
Modified Minnesota LTPA Q + TOQ Q + new household activity measure[66]
EE
Sport, recr, occup, home
Past y
98
F, D
Occup EE; leisure EE; household EE
MET/h
Modified Minnesota LTPA Q + TOQ Q + general Q + sleeping[67]
EE
Sport, recr, trans, occup, home, sleeping, watching TV, reading, parenting
Past 4 wk
107
F, D,
TEE
MJ/day
MOSPA[17]
?
Sport, recr, trans, occup, home
?
?
?
TEE Work Transport Household LTPA
kcal/day min/wk min/wk min/wk min/wk
Mundal Q[68]
Habitual LTPA
Sport, recr, home
?
PA level
LTPA
Activity score (6 categories)
NASA Q[18]
?
?
?
?
?
?
NHS II Activity Q[19]
?
Sport, recr, trans, home, sedentary activities
Past y
14
F, D, I
Activity score Inactivity score
MET h/wk
Modified NHS II Activity Q[69]
?
Sport, recr, trans, home, sedentary activities, stairs climbed
Past y
15
D
Vigorous activity; non-vigorous activity; sum of activities; inactivity at home; inactivity at work; overall inactivity
MET/wk
Norman Q[70]
Total PA
Sport, occup, home, walking/cycling, watching TV/reading, sleeping
Past y
6
D, I
Crude total PA; total PA
MET
1 ?
Continued next page
van Poppel et al.
Sports Med 2010; 40 (7)
dimension
Questionnaire
Construct
Format
dimension
setting
recall period
NZPAQ-SF[21]
?
Sport, recr, trans, occup, home
Past 7 d
One-week recall Q[71]
Current PA guidelines
Sport, recr, trans, occup, home
PAFQ[22]
Total and act-specific EE
PA History Q[72]
no. of questions/activities
scores
unit of measurement
7 and 1 optional
F, D, I
EE
MET min/wk
Past wk
6
F, D
Walking Moderate Vigorous Total duration Meeting fitnorm[71]
times/wk times/wk times/wk min/wk Yes/no
Sport, recr, trans, occup, home, sleeping
Past 7 d
71
F, D
TEE
kcal/day
Usual activity
Sport, recr, trans, occup, home
Past y
13
F, D
Moderate intensity Heavy intensity Total
Activity score
PAS[24,25]
Total PA in 24 h
Sport, recr, trans, occup, home, sleeping, sitting
Average 24 h wk day
9
D
TEE
24 h MET time
PAQ-AD[23]
Moderate to vigorous PA
Sport, vigorous act
Past 7 d
7
F
Total PA
PA score (1–5)
PYTPAQ[26,27]
Total PA
Sport, recr, trans, occup, home
Past y
Open table format
F, D, I
Total PA; occup PA; household PA; recr PA
h/wk MET h/wk
Pennsylvania Alumni[42]
?
Recr, occup, ?
Past 7 d Usually Past y
?
?
TEE
kcal/day
Scottish PA Q[73,74]
PA of at least moderate intensity
?
Past 7 d
?
?
Total; leisure; occup
Min/wk
Modified Scottish PA Q for students[75]
PA of at least moderate intensity
?
Past 7 d
?
?
Total; leisure; occup
Min/wk
Saltin and Grimby Q[76]
Lifetime PA
?
?
?
?
Lifetime occup PA; lifetime LTPA
Activity score (1–4)
Singh Q[77,78]
?
Sport, recr, trans, occup, home, sleeping
Past 3 mo
26
F, D
PA index; RWJ index; total activity index; vigorous activity; sport/recr index
MET min/wk
Single Q[79]
PA for maintaining or improving physical fitness
PA to improve fitness
Currently
1
Yes/no
Meeting fitnorm[71]
Yes/no
Continued next page
575
Sports Med 2010; 40 (7)
parameters
Physical Activity Questionnaires for Adults
ª 2010 Adis Data Information BV. All rights reserved.
Table III. Contd
576
ª 2010 Adis Data Information BV. All rights reserved.
Table III. Contd Questionnaire
Construct
Format
dimension
setting
recall period
Stanford SDR[3,38]
?
Moderate, vigorous activities
Past 7 d
Modified Stanford SDR Auckland Heart Study PA Q[80]
?
Moderate, vigorous, activities, resting, sleeping
Past 3 mo
Modified Stanford SDR[72]
?
Moderate, hard, very hard activities, sleeping
Stanford Usual Act Q[3]
?
Suzuki Q[81]
no. of questions/activities
parameters
scores
unit of measurement
D
Moderate; vigorous
h/wk
?
F, D
Moderate; vigorous; TEE
kcal/day
Past 7 d
?
D
TEE; occup EE; leisure EE
kcal/kg/day
Moderate act, vigorous act
Past 3 mo
11
Moderate Vigorous
Activity score (1–6) Activity score (1–5)
Energy expenditure
Sport, recr, trans, occup, home, sleeping, sitting
Past 7 d
9
F
TEE TEE
kcal/day kcal/wk
SQUASH[32]
Habitual PA
Sport, recr, trans, occup, home
Usual wk
11
F, D, I
Total; commuting; activities at work; household; leisure time
min/wk Activity score
Total PA[82]
TEE
Sport, recr, trans, occup, home, sitting, sleeping
Usual day
9
D, I
TEE
kcal/day
Usual PA measure[83]
Usual PA
Usually
1
PA level
Total
Activity score (1–5)
YPAS[6]
Current PA
Vigorous, trans, standing, sitting
Typical wk last mo
5
F, D
Total
Activity score (0–98)
Historical RWJ[30]
Historical RWJ
Sport, recr, trans
Past 10 y
3
F, D, I
TEE Sufficiently active or not
MET h/wk Activity score (1–2)
NPAQ[20]
Walking, overall index of PA
Recr, trans
Usual wk
11?
F, D, destination
Overall PA index; walking
MET min/wk F of walking inside and outside neighbourhood; duration of walking inside and outside neighbourhood
2
Walking activities
van Poppel et al.
Sports Med 2010; 40 (7)
Continued next page
Questionnaire
Construct
Format
dimension
setting
recall period
no. of questions/activities
parameters
scores
unit of measurement
Walking Q[84]
Walking
Walking time
Usually
1
Walking Q (one question from College Alumni Q)[85]
Walking
Walking distance
?
1
D
Walking
Activity score (1–3)
F
Walking
km
Lifetime PA
Sport, recr, occup, home, childcare
Past y, 14–21 y, 22–34 y, 35–50 y, 51–65 y
32
F, D
TEE
MET h/wk
Modified Baecke ARIC/Baecke Work Index[31]
?
Occup
?
?
F
Work index
Activity score (1–5)
CARDIA occup Q[31]
?
Occup
Past y
1
F
Total occup
Activity score
Health Insurance Plan of NY Q[3,42]
?
Trans, occup
Usually
6
F, D
Total occup
Activity score (1–28)
Lipid Research Clinics occup Q[31]
?
Occup
?
1
Comparative rating
Total occup
Activity score (1–5)
Minnesota Heart Health Program Q[3]
?
Occup
Currently
6
?, D, I
Work index Leisure index
MET min/day
Minnesota Heart Health Program occup Q[31]
?
Occup
Usually
2
% vigorous act
Total occup
Activity score (1–4)
Modified Stanford SDR (SDR)[31]
?
Occup
Past 7 days
5
D
Total occup score
Activity score/wk h/wk MET min/wk
TOQ[31]
Occup-related PA
Trans, occup
Past y
29
F, D
Total occup score
Activity score/wk, h/wk, MET min/wk
Bone loading PA
Sport, recr, occup
Life time (4–45 y)
36
F, D
Total hip loading score Total spine loading score
Hip and spine bone loading score Hip and spine bone loading exposure
Historical/lifetime PA Modified HLAQ[11]
Physical Activity Questionnaires for Adults
ª 2010 Adis Data Information BV. All rights reserved.
Table III. Contd
Occupational PA
Bone Loading History Q[86]
Continued next page
577
Sports Med 2010; 40 (7)
Bone loading PA
van Poppel et al.
ª 2010 Adis Data Information BV. All rights reserved.
ACSM = meeting PA guidelines of the American College of Sports Medicine; Activityhi = activity of high intensity; Activitylo = activity of low intensity; BMR = basal metabolic rate; D = duration; EE = energy expenditure; F = frequency; home = home activities (household and gardening); I = intensity; L7S = long form, last 7 d; LTPA = leisure time physical activity; LUS = long form, usual wk; MET = metabolic equivalent; occup = occupational; PAL = PA level; Recr = recreational; RWJ = run-walk-jog; S7S = short form, last 7 d; SUS = short form, usual week; TEE = total energy expenditure; trans = transport; ? indicates not specified or unclear.
h/wk Total F, D 5 Currently PA related to bone health London PA Q[88]
Sport, recr, trans, occup, home, standing, sitting
(MET) h/day (MET) h/day (MET) h/day (MET) h/day (MET) h/day (MET) h/day Score (1–3) Total Occup Athletics Leisure Exercise Lifting/carrying Impact level Lifetime PA related to bone Historical Activity Q[87]
Sport, recr, occup, home
5–11 y, 12–13 y, 14–17 y, >18 y
89–140 activities listed for each time period
F, D, I
unit of measurement Format
parameters
Questionnaire
Table III. Contd
setting
recall period
no. of questions/activities Construct
dimension
scores
578
shortcoming was that Pearson correlations instead of ICCs or Kappas were calculated. Another frequently occurring methodological shortcoming was an inadequate time interval between the test and retest. Responsiveness was assessed for only two (versions of) questionnaires, and the quality of these studies was rated as Level 3. 2.2 Qualitative Attributes of the Questionnaires
In the study by Altschuler et al.,[7] it was tested whether respondents interpreted the LACE PA questionnaire and the CMH questionnaire as intended. In cognitive interviews, respondents described their thought processes while completing these two questionnaires. It was demonstrated that the term ‘intensity’ was frequently interpreted as emotional or psychological intensity rather than physical effort. In addition, it was found that respondents often counted the same activity more than once, overestimated occupational PA and mistook a list of examples for a definitive list. We did not find studies in which the content validity of a PA questionnaire was assessed. However, we formed our own opinion on the content of the questionnaires. Of the 85 (versions of) questionnaires included in this review, 23 had sufficient content validity: i.e. they covered all relevant settings of PA (e.g. for total PA all five settings; and for occupational PA only transport and work) and measured duration and frequency (Bharati,[45] EPIC original Questionnaire (Q),[10] EPAQ2,[9] Harvard/College Alumnus Q,[3,51] the long version of the IPAQ,[14] the adapted IPAQ,[54] Kaiser PA Survey,[56] LACE PA Q,[7] Minnesota LTPA Q,[61] Mail Survey of PA,[62] Norman Q,[70] NZPAQ-SF,[21] One-week recall Q,[71] PAFQ,[22] PA History Q,[72] PYTPAQ,[26] Singh Q,[77,78] SQUASH,[32] Historical RWJ questionnaire,[30] NPAQ,[20] Health Insurance Plan of NY,[3] TOQ[31,89] and London PA Q[88]). 2.3 Validation Results
Only the 48 studies that assessed construct validity at Level 1 or 2 are discussed below. Sports Med 2010; 40 (7)
Study population (n; mean age; nationality)
Comparison measure
Results
Levels of evidence
Modified Active Australia Survey
44~; 55 y; AUS[37]
Accelerometer
Walk mod r = 0.39 Vig r = 0.54 Total activity (‡3 MET) r = 0.52
1? 1? 1?
159~; 55 y; AUS
Pedometer
Walk mod r = 0.40 Vig r = 0.55 Total r = 0.48
33+ 3-
Activity History Q
24#; 18-31 y; US[38]
. VO2max
TEE r = 0.76 Heavy PA r = 0.64
3? 3?
Aires
160 105# 172 032~; 40-42 y; NOR[39]
BMI Cholesterol
Lower BMI with higher levels of total leisure Lower cholesterol with higher levels of total leisure
3? 3?
Arizona Activity Frequency Q
35~; 44 y; US[40]
Doubly labelled water
TEE r = 0.58
1?
Baecke
64-73; 37 y; US[3]
Accelerometer . VO2max 4 wk history % BF
r = 0.19 r = 0.54 r = 0.37 r = -0.49
2333-
7#; 30 y: 26~; 28 y; US[18]
Accelerometer
Total r = 0.40
2?
21#; 36 y; US[42]
Resting EE Caloric intake
Total r = 0.21 Total r = 0.38
3? 3?
139# 167~; 20-32 y; NL[41]
Lean body mass
Work b = 1.36#, b = 0.48~ Sport b = 1.23#, b = 0.23~ Leisure b = 0.15#, b = -0.27~
333-
Modified Baecke (ARIC/Baecke)
28# 49~; 37 y; US[4]
Accelerometer . VO2max % BF 48 h activity diary
Total leisure activity r = 0.24#, r = 0.19~ Total leisure activity r = 0.57#, r = 0.46~ Total leisure activity r = -0.30#, r = -0.51~ Total leisure activity r = 0.59#, r = 0.33~
2? 3? 3? 3?
Modified Baecke 1
60#; 20-60 y: 54~; 20-70 y; NL[43]
3-
195#; 41 y; FR[44]
3 d activity diary . VO2max % BF Quetelet index
Total r = 0.66#, r = 0.42~
Modified Baecke 2
SAI r = 0.31, LAI r = 0.09 SAI r = -0.20, LAI r = -0.14 SAI r = 0.03, LAI r = -0.21
333-
Extended Baecke (QAPSE)
20#; 56-72 y; FR[28]
Caloric intake
DEE r = 0.58
3?
Bharathi Q
14#; 34~; 18-60 y; IN[45]
Energy intake Age (young [n = 57] vs elderly [n = 49])
TEE r = 0.33 PAL young 1.52, elderly 1.22, p < 0.01
3? 3?
Continued next page
579
Sports Med 2010; 40 (7)
Questionnaire
Physical Activity Questionnaires for Adults
ª 2010 Adis Data Information BV. All rights reserved.
Table IV. Construct validity of physical activity (PA) questionnaires (Q)
580
ª 2010 Adis Data Information BV. All rights reserved.
Table IV. Contd Questionnaire
Study population (n; mean age; nationality)
Comparison measure
Results
Levels of evidence
Black Women’s Health Study CARDIA
101~; 48 y; US[46]
Total r = 0.28, vig r = 0.40, mod r = -0.04 Total r = -0.32, vig r = 0.41, mod r = 0.26 Mod r = 0.11, vig r = 0.31 Mod r = 0.08, vig r = 0.63 Mod r = 0.08, vig r = 0.54 Mod r = -0.09, vig r = -0.35
232-23-3+ 3-33-3-
Modified CHAMPS
109~ 29#; 41 y; US[6]
Accelerometer 7 d PA diary Accelerometer . VO2max 4 wk history % BF . VO2max
TEE r = 0.42, mod/vig, r = 0.43 vig ? sports r = 0.50 # TEE r = 0.07, mod/vig r = 0.05, vig r = -0.01 sports r = 0.19 ~
3? 3? 33-
TEE r = 0.15, mod/vig r = 0.19, vig r = ?, sports r = 0.07 # TEE r = -0.01, mod/vig r = 0.02, vig r = -0.03, sports r = -0.01 ~ Total r = 0.66#, r = 0.43~ PA index p = 0.003 PA index p = 0.01 PA index p < 0.05
33333333-
PA index r = 0.28, work r = 0.17 PA index r = 0.15, work r = 0.01
3-33-3-
PAL r = 0.56#, r = 0.44~
2?
TEE r = 0.24 TEE r = 0.43 Total r = 0.13; strenuous r = 0.21 Total r = 0.24; strenuous r = 0.38
3? 3? 33-3-
64-73; 37 y; US[3]
BMI
EPIC original Q Modified EPIC Q (short PA Index)
59#; 41 y: 52~; 49 y; NL[10] 84#; 59 y: 89~; 55 y; UK[47]
EPAQ2
84#; 59 y: 89~; 55 y; UK[9]
Flemish PA computerized Questionnaire Framingham Q
31#; 39 y: 35~; 42 y; BE[48]
Godin Q
163#; 31 y: 143~; 30 y; CA[50]
21#; 36 y; US[42]
Activity diary HR-EE . VO2max 7-d food diaries HR-EE . VO2max Accelerometer + 7 d activity record Resting EE Caloric intake % BF . VO2max Accelerometer
Total r = 0.45
2?
64-73; 37 y; US[3]
Accelerometer . VO2max 4 wk history % BF . VO2max BMI Musculoskeletal endurance . VO2max
Total r = 0.32 Total r = 0.56 Total r = 0.36 Total r = -0.43
23+ 33-
Total (pattern) r = 0.22#, r = 0.40~ Total (pattern) r = -0.10#, r = -0.05~ Total (pattern) r = 0.25#, r = 0.32~
333-
TEE r = 0.11#, r = 0.05~ strenuous r = 0.25#, r = 0.28~
33-
TEE r = 0.01#, r = 0.02~ strenuous r = -0.04#, r = -0.04~
33-
Single PA Q Gionet and Godin
456#; 36 y: 95~; 33 y; CA[49]
SDR Q Gionet and Godin
456#; 36 y: 95~; 33 y; CA[49]
BMI
Continued next page
van Poppel et al.
Sports Med 2010; 40 (7)
7#; 30 y: 26~; 28 y; US[18]
Questionnaire
Harvard/College Alumnus Q
Study population (n; mean age; nationality)
Results
Levels of evidence
Musculoskeletal endurance
333? 3?
21#; 36 y; US[42]
Resting EE Caloric intake
TEE r = 0.18#, r = 0.10~ strenuous r = 0.36#, r = 0.36~ TEE r = 0.32 TEE r = 0.49
64-73; 37 y; US[3]
Accelerometer . VO2max 4 wk history % BF
TEE r = 0.30 TEE r = 0.52 TEE r = 0.31 TEE r = -0.30
23+ 33-
12#; 31 y: 13~; 30 y; US[52]
HR monitor combined with two accelerometers
Total r = 0.35 Light r = 0.20 Mod r = 0.27 Vig r = 0.47
2? 2? 2? 1?
28# 50~; 38 y; US[51]
Accelerometer . VO2max 3 · 48 h activity diaries % BF . VO2max
Leisure EE r = 0.19#, r = 0.19~ Leisure EE r = 0.58#, r = 0.53~ Leisure EE r = 0.60-0.65#, r = 0.34-0.54~ Leisure EE r = -0.36#, r = -0.36~
2?23?3+ 3?33?3-
City blocks r = -0.06, stairs r = 0.11 walking min/day r = 0.32#, r = 0.02~
33-
BMI
City blocks r = 0.14, stairs r = -0.02 walking min/day r = -0.21#, r = 0.12~
33-
. VO2max Sweat Q
Leisure EE r = 0.29 Leisure EE r = 0.57
33-
21#; 36 y; US[42]
Resting EE Caloric intake
Total r = 0.05 Total r = 0.19
3? 3?
105#; 40 y: 87~; 38 y; US[90]
BMI
3? 3?
S7S: 108#; 32 y; NOR[12]
Accelerometer (PAL)
PA index: no significant regression coefficient; total wkly activity: significant regression coefficient Frequency r = 0.03 Intensity r = 0.06 Duration r = 0.12 Index r = 0.07
2222-
Frequency r = 0.43 Intensity r = 0.40 Duration r = 0.31 Index r = 0.48
3+ 3+ 33+
Light r = -0.10, hard r = 0.31, work r = 0.39 Light r = -0.03, hard r = 0.46, work r = -0.06 Light r = 0.19, hard r = 0.48, work r = 0.34
133-
138; 41 y; US[6]
36#; 41 y: 32~; 42 y; US[53]
HUNT 1
Comparison measure
. VO2max
108#; 32 y; NOR[13]
Accelerometer . VO2max IPAQ
Continued next page
581
Sports Med 2010; 40 (7)
HUNT 2
Physical Activity Questionnaires for Adults
ª 2010 Adis Data Information BV. All rights reserved.
Table IV. Contd
582
ª 2010 Adis Data Information BV. All rights reserved.
Table IV. Contd Questionnaire
Study population (n; mean age; nationality)
Comparison measure
Results
Levels of evidence
IPAQ
S7S: 26-151; 18-65 y[14] SUS: 26-127; 18-65 y[14] L7S: 26-151; 18-65 y[14] LUS: 26-127; 18-65 y[14]
Accelerometer
S7S total r = 0.02-0.47, ACSM r = 0.46-0.93 SUS total r = -0.12-0.32, ACSM r = 0.50-0.75 L7S total r = 0.05-0.52, ASCM r = 0.31-1.0 LUS total r = -0.02-0.36, ASCM r = 0.35-0.72
2-1+ 2-1+ 2-1+ 2-1+
Tot METs: some dose-response relation with . . VO2max; Vig: dose-response relation with VO2max
3?
Total r = 0.55, vig r = 0.63, mod r = 0.12 Total r = 0.21, vig r = 0.14, mod r = 0.21 Total r = 0.25, vig r = 0.27, mod r = 0.17 Leisure r = 0.58, trans r = 0.18, work r = 0.64, home r = 0.47 Total r = 0.23, vig r = 0.47, mod r = 0.23 Tot r = 0.25, vig r = 0.38, mod r = 0.17, walking r = 0.12
1+ 333-
S7S: 847-928; 29 y; FIN[91]
. VO2max
L7S: 22# 24~; 41 y; SW[92]
Accelerometer . VO2max BMI PA log
S7S: 32# 91~; 21 y; US[93]
Accelerometer Pedometer
S7S: 74# 76~; 31 y; JAP[94]
Doubly labelled water
Significant diff between insufficiently and highly active categories, but non-significant diff between insufficiently active and sufficiently active group or sufficiently active and highly active group
1-
S7S: 108#; 32 y; NOR[12]
Accelerometer
Vig r = 0.07 Mod r = 0.17 Total r = 0.26
112-
. VO2max
3-
AEE r = 0.31 METmin r = 0.33
1?
S7S: 51# 91~; 44 y; US[95]
Accelerometer
EE 1 min bout: r = 0.58#, r = 0.21~ EE 10 min bout r = 0.48#, r = 0.07~ Meeting guidelines 1 min bout k = 0.21 Meeting guidelines 10 min bout k = 0.04
2222-
S7S: 30#; 26 y: 19~; 34 y; CH[96]
Accelerometer PA log
Total r = 0.09 Total r = 0.29
23-
23# 30~; 31 y; BE[97]
Accelerometer PA diary BMI
r = 0.38 (total PA) r = 0.37 (MVPA) r = 0.39 (min) r = -0.02#, r = -0.04~
233-
Accelerometer
L7S r = 0.22#, r = 0.35~ S7S r = 0.24#, r = 0.29~ L7S r = 0.35, S7S r = 0.22 L7S r = 0.26, S7S r = 0.45 L7S r = 0.49, S7S r = 0.49
22222-
[21]
Sports Med 2010; 40 (7)
Adapted IPAQ
LUS: 1068# 1372~; 47 y; NOR[54]
IPAQ sitting Q
L7S and S7S: 65# 79~; 35 y; UK[98] L7S and S7S: 30; 33 y; NL[98] L7S and S7S: 26; 49 y; US[98] L7S and S7S: 26; 36 y; US[98]
Continued next page
van Poppel et al.
Vig r = 0.41
Doubly labelled water
S7L: 16# 20~; 39 y; NZ
computerized IPAQ
11-
Questionnaire
Study population (n; mean age; nationality)
Comparison measure
Results
Levels of evidence
JACC Q
739# 991~; 22-80; JAP[15]
Interview
PA time r = 0.53#, r = 0.58~ PA freq r = 0.53#, r = 0.59~
33-
Kaiser PA Survey
50~; 39 y; US[56]
4-point summary r = 0.49, occup r = 0.16 4-point summary r = 0.59, occup r = 0.04 4-point summary r = 0.35, occup r = 0.35 4-point summary r = -0.53, occup r = -0.06
2-33+33-33-3-
Kuopio Q
1162#; 54 y; FIN[57]
Accelerometer, . VO2max 2 7 d activity diaries, % BF . VO2max . VO2max
Total b = 0.15
3-
Activityhi r = 0.40, activitylo r = -0.10, metab2.5 r = 0.03 Activityhi r = -0.05, activitylo r = 0.14, metab2.5 r = 0.16 Activityhi r = 0.39, activitylo r = 0.31, metab2.5 r = 0.30
3-3-3-
Life in NZ National Survey
140; 37 y; NZ
[58]
BMI Stanford SDR Q
3-3-33-3-3-
21#; 36 y; US[42]
Resting EE Caloric intake
Total r = 0.24 Total r = 0.40
3? 3?
28# 50~; 40 y; US[59]
Accelerometer . VO2max % BF BMI
2-point score r2 = 0.04, 4-point score r2 = 0.04 2-point score r2 = 0.29, 4-point score r2 = 0.29 2-point score r2 = 0.10, 4-point score r2 = 0.17 2-point score r2 = 0.15, 4-point score r2 = 0.22
2333-
Lo¨f Q
24~; 30 y; SW[60]
Doubly labelled water
TEE r = 0.56 LOA = -800-1200
1?
London PA Q
26~; 43-54 y; UK[88]
Total r = 0.45 Total NS
3? 3?
Leisure Time PA Q
166#; 43 y; US[61]
4 d activity diary . VO2max . VO2max % BF HDL cholesterol Sys blood pressure
LTPA r = 0.43 LTPA r = -0.35 LTPA r = 0.17 LTPA r = 0.02
3+ 333-
Mail survey of PA habits
375#; 47 y; US[62]
Treadmill time
TEE r = 0.05 RWJ index r = 0.51 Sweat freq r = 0.51
33+ 3+
Minnesota LTPA Q
21#; 36 y; US[42]
Resting EE Caloric intake
Leisure EE r = 0.17 Leisure EE r = 0.13
3? 3?
43~; 47 y; US[76]
3 d beeper-cued diary 3 occup groups
Leisure EE r = 0.14 No significant diff between 3 occup groups
3?
Lipid Research Clinics Q
Continued next page
583
Sports Med 2010; 40 (7)
version 22 items (correspond closely to Canada Fitness Survey)
Physical Activity Questionnaires for Adults
ª 2010 Adis Data Information BV. All rights reserved.
Table IV. Contd
584
ª 2010 Adis Data Information BV. All rights reserved.
Table IV. Contd Questionnaire
Comparison measure
Results
Levels of evidence
64-73; 37 y; US[3]
Accelerometer . VO2max 4 wk history % BF Caloric intake
Leisure EE r = 0.18 Leisure EE r = 0.43 Leisure EE r = 0.74 Leisure EE r = -0.24
23+ 3+ 3-
Total LTPA r = 0.11 Light LTPA r = 0.08 Mod LTPA r = 0.04 Intense LTPA r = 0.08 Work index r = 0.04, leisure index r = 0.28 Work index r = 0.0, leisure index r = 0.56 Work index r = -0.09, leisure index r = 0.39 Work index r = 0.07, leisure index r = -0.37 . Occupational EE VO2 r = 0.03, BMI r = 0.02 . Leisure EE VO2 r = 0.21, BMI r = -0.11 . Household EE VO2 r = 0.14, BMI r = 0.00 . TEE VO2 r = 0.39, BMI r = 0.30, %BF r = -0.26 . Work VO2 r = 0.21, BMI r = 0.08, %BF r = -0.17 . Transport VO2 r = 0.16, BMI r = -0.17, %BF r = -0.13 . Household VO2 r = -0.01, BMI r = -0.15, %BF r = -0.01 . LTPA VO2 r = 0.30, BMI r = 0.04, %BF r = -0.25 LTPA k = 0.62 Total r = 0.32 Activity score r = 0.79 Activity score r = 0.56; Inactivity score r = 0.41 Vig activity r = 0.58 Non-vig activity r = 0.28 Sum of activities r = 0.65 Inactivity at home r = 0.30 Inactivity at work r = 0.40 Overall inactivity r = 0.41 Crude total PA r = 0.23 Total PA r = 0.56 r = 0.65 age 44-64 y r = 0.50 age 65-78 y r = 0.73 BMI £26 r = 0.39 BMI >26 Occup r = 0.40
33333-23-3+ 3-33-3-
2356#; 49 y; US[99]
Minnesota Heart Health Program Q
64-73; 37 y; US[3]
Modified Minnesota LTPA Q + TOQ + new household activity measure
59~; 47 y; US[66]
MOSPA
108# 59~; 36 y; BE[17]
Accelerometer . VO2max 4 wk history % BF . VO2max (score 1-5) BMI . VO2max BMI % BF
Mundal Q NASA Q NHS II Activity Q
1769#; 40-59 y; NOR[68] 7# 30 y; 26~ 28 y; US[18] 147~; 39 y; US[19]
Modified NHS II Activity Q
238#; 40-75 y; US[69]
Interview Accelerometer Past wk recall 7 d activity diary 4 7 d activity diaries
Norman Q
111#; 63 y; SW[70]
7 d activity diary
3-33-33-33-3-33-3-33-3-33-3-33-3-332? 3+ 3-333333333333+ 33Continued next page
van Poppel et al.
Sports Med 2010; 40 (7)
Study population (n; mean age; nationality)
Questionnaire
Study population (n; mean age; nationality)
Comparison measure
Results
Levels of evidence
Home r = 0.62 Leisure r = 0.40 TV/reading r = 0.52 Sleeping r = 0.61 AEE r = 0.38 METmin r = 0.39
33331?
16# 20~; 39 y; NZ[21]
Doubly labelled water
One-week recall Q
55#; 38 y: 63~; 40 y; AUS[71]
Accelerometer
‡3 MET r = 0.29#, r = 0.25~ 3.0-5.9 MET r = 0.40#, r = 0.19~ 6.0+ MET r = 0.19#, r = 0.10~
111-
PAQ-AD
61# 122~; 31 y; CAN[23]
Accelerometer CAL Accelerometer MTI (n = 41) Several other PA Q
r = 0.43 r = 0.26 r = 0.54-0.63
22? 3-
PAFQ
18# 23~; 35-69 y; SWZ[22]
Heart rate EE 24 h recall
TEE r = 0.76 TEE r = 0.80
3? 3?
PA History Q
4956; 18-30 y; US[72]
Treadmill time Caloric intake BMI
Significant regression coefficients Significant regression coefficients Significant regression coefficients only in ~
3? 3? 3?
PAS
19# 20~; 20-60 y; DK[24]
TEE r = 0.05#, r = 0.31~ TEE r = 0.86#, r = 0.49~
1? 3?
53# 47~; 35-65 y; DK[25]
Accelerometer 4 d activity diary . VO2max
TEE non-significant association Vig PA significant association
3?
45# 62~; 21 y; AUS[100]
Pedometer
TEE r = 0.48 TEE r = 0.56#, r = 0.38~
3+ 3?3-
PYTPAQ
75# 79~; 49 y; CAN[26]
Accelerometer . VO2max BMI PA log
r = 0.26 r = 0.32/0.37 r = -0.07/0.22 r = 0.41
2333-
Pennsylvania Alumni Q
21#; 36 y; US[42]
Resting EE Caloric intake
TEE r = 0.30 TEE r = 0.47
3? 3?
Saltin and Grimby Q
43~; 47 y; US[76]
3 d beeper-cued diary 3 occup groups
Lifetime occup PA r = 0.45; significant diff between 3 occup groups Lifetime LTPA r = 0.55; no significant diff between 3 occup groups
3?
Scottish PA Q
30; 37 y; SC[73]
Accelerometer
Total r = 0.13 Total without occup walking + outliers r = 0.52
2?
23; 18-48 y; UK[75]
HR monitor
Total r = 0.0003 (0.34 without 3 outliers)
3-
21; 18-48 y; UK[75]
HR monitor
Total r = 0.59
3+
Modified Scottish PA Q
Continued next page
585
Sports Med 2010; 40 (7)
NZPAQ
Physical Activity Questionnaires for Adults
ª 2010 Adis Data Information BV. All rights reserved.
Table IV. Contd
586
ª 2010 Adis Data Information BV. All rights reserved.
Table IV. Contd Questionnaire
Study population (n; mean age; nationality)
Comparison measure
Results
Levels of evidence
Singh Q
115# 90~; 52 y; US[77]
Treadmill time
333-
44# 94~; 49 y; US[78]
PAR
PA index r = 0.27-0.38#, r = 0.07-0.15~ RWJ index r = 0.28-0.48#, r = 0.10-0.34~ Total act index r = 0.24#, (n = 24) r = 0.03~ (n = 28) Total activity r = 0.51#, r = 0.65~ Vig activity r = 0.13#, r = 0.85~ Mod activity r = 0.53#, r = 0.44~ Inactivity activity r = 0.69#, r = 0.59~ Sleep r = 0.39#, r = 0.52~ Total activity r = 0.14#, r = 0.24~ Total activity r = 0.23#, r = -0.09~ p = 0.0007#, p = 0.002~ p = 0.0001#, p = 0.001~ p = 0.0001#, p = 0.46~
3-33-3+ 3-33-33-33-33-33? 3? 3?
Mod r = -0.08 Vig r = 0.18 TEE r = 0.14 TEE r = 0.79 k = 0.61 Vig r = 0.46 TEE r = 0.61 TEE r = 0.82 TEE r = 0.32 TEE r = -0.04 TEE r = 0.10 TEE r = 0.35 TEE r = 0.33 TEE r = 0.30 TEE r = 0.36 TEE r = -0.12
3332? 3? 3? 3+ 333? 3? 2333-
Mod r = 0.60 k = 0.36 Vig r = 0.48 k = 0.23 TEE r = 0.91 k = 0.62
333-
Single Q
371# 733~; 37 y; US[79]
Stanford SDR
375#; 47 y; US[62]
Pedometer Treadmill time . VO2max (n = 304) BMI HDL cholesterol Treadmill time
7#; 30 y: 26~; 28 y; US[18] 24#; 18-31 y; US[38]
Accelerometer . VO2max
158; 22 y; US[38] 74; 22 y; US[38]
7-day activity diary 12 min run Skinfolds Resting EE Caloric intake Accelerometer . VO2max 4 wk history % BF 7 d activity diary
version unclear
21#; 36 y; US[42] 64–73; 37 y; US[3]
Modified Stanford SDR: Auckland Heart Study PA Q
77#; 53 y: 75~; 56 y; NZ[80]
Modified Stanford SDR
4956; 18-30 y; US[72]
Treadmill time Caloric intake BMI
Significant regression coefficients Significant regression coefficients Non-significant regression coefficients
3? 3? 3-
Modified Stanford SDR
46~; 39 y; US[89]
Accelerometer . VO2max 2 7 d occup activity diaries
NS NS Total occup score h/wk r = 0.78 Total occup score MET.min/wk r = 0.45
2? 3? 3? 3? Continued next page
van Poppel et al.
Sports Med 2010; 40 (7)
Stanford SDR
Questionnaire
Study population (n; mean age; nationality)
Comparison measure
Results
Levels of evidence
27# 48~; 37 y; US[31]
6 · 48 h occup activity diaries
Total occup score h/wk r = 0.16 Total occup score MET min/wk r = 0.30
33-
Stanford Usual Act Q
64-73; 37 y; US[3]
Accelerometer . VO2max 4 wk history % BF
Mod r = 0.23, vig r = 0.22 Mod r = 0.27, vig r = 0.38 Mod r = 0.05, vig r = 0.28 Mod r = -0.33, vig r = -0.16
2333-
Suzuki Q
49#; 27 y: 32~; 32 y; JAP[81]
Accelerometer
Daily EE r = 0.57#, r = 0.68~ Weekly PA r = 0.69#, r = 0.69~
2+ 2+
SQUASH
36# 14~; 44 y; NL[32]
Accelerometer
r = 0.45; k for comparing tertiles: 0.30
2-
24# 16~; 37 y; NL[101]
Accelerometer
TEE r = 0.62#, r = -0.49~ kw for tertiles: r = 0.29# r = -0.15~
1? 1?
TOQ + Minnesota LTPA Q
34~; 37 y; US[16]
Doubly labelled water
TEE r = 0.40
1?
Modified Minnesota LTPA Q + TOQ + general Q + sleeping
[67]
24#; 42 y; US
Doubly labelled water
TEE r = 0.39 LOA 1.32 – 0.73 (EE was more overestimated with higher EE values)
1?
Total PA
39 + 94; 41 y; SW[82]
24 h recall
Total PA r = 0.73; concordance = 0.57
3+
Total F = 16.38, p < 0.01
3?
Total r = 0.36#, r = 0.01~ Total r = 0.16#, r = 0.08~
33-
333-
[83]
Usual PA measure
188~; 47 y; US
YPAS
138; 41 y; US[6]
BMI . VO2max BMI
131~; 50 y; US[102]
4 7 d activity diaries
TEE r = 0.29 Mod EE r = 0.16 Vig EE r = 0.63
Walking Q
51# 55~; 62 y; JAP[84]
Pedometer
Walking p < 0.001-0.006
1?
Walking Q (one question from CAQ)
48#; 41 y: 48~; 39 y; US[85]
Pedometer
Walking r = 0.35#, r = 0.48~
1-
Historical RWJ Q
4100#; 48 y: 963~; 45 y; US[30]
Treadmill time
TEE r = 0.53#, r = 0.47~ Significant diff between sufficient/insufficient (effect size 0.68#, 0.81~)
3-
80~; 31 y; US[86]
Femoral neck BMI Spine BMI
Total hip loading exposure r = 0.32 Total spine loading exposure r = 0.34 No correlations with spine BMD
11-
Physical Activity Questionnaires for Adults
ª 2010 Adis Data Information BV. All rights reserved.
Table IV. Contd
Lifetime PA Modified HLAQ
Walking activities
Bone Loading History Q
Continued next page
587
Sports Med 2010; 40 (7)
Bone loading PA
588
ª 2010 Adis Data Information BV. All rights reserved.
Table IV. Contd Questionnaire
Study population (n; mean age; nationality)
Comparison measure
Results
Levels of evidence
Modified Baecke, ARIC/Baecke Work Index
27# 48~; 37 y; US[31]
6 · 48 h occup activity diaries
Work index r = 0.04
3-
CARDIA Occup Q
27# 48~; 37 y; US[31]
6 · 48 h occup activity diaries
Total occup r = -0.05
3-
[31]
6 · 48 h occup activity diaries
Total occup r = 0.10
3-
Occup PA
Health Insurance Plan occup Q
27# 48~; 37 y; US
Health Insurance Plan of NY Q
n = 64-73; 37 y; US[3]
Accelerometer . VO2max 4 wk history % BF
Total occup r = 0.14 Total occup r = 0.07 Total occup r = 0.00 Total occup r = -0.03
2333-
Lipid Research Clinics Q
n = 64-73; 37 y; US[3]
Accelerometer . VO2max 4 wk history % BF
Total occup r = 0.21 Total occup r = 0.49 Total occup r = 0.24 Total occup r = -0.43
23+ 33-
Lipid Research Clinics occup Q
27# 48~; 37 y; US[31]
6 · 48 h occup activity diaries
Total occup r = 0.09
3-
Minnesota Heart Health Program occup Q
27# 48~; 37 y; US[31]
6 · 48 h occup activity diaries
Total occup r = 0.33
3-
TOQ
46~; 39 y; US[89]
Accelerometer . VO2max 2 · 7 d occup activity diaries
Total occup score MET min/wk r < 0.25 Total occup score MET min/wk r < 0.25 Total occup score h/wk r = 0.18 Total occup score MET min/wk r = 0.46
3? 3?
27# 48~; 37 y; US[31]
6 · 48 h occup activity diaries
Total occup score h/wk r = 0.11 Total occup score MET min/wk r = 0.52
33-
43~; 47 y; US[76]
3d beeper-cued diary Three occup groups
Total occup score MET/h = 0.29 Significant diff between three occup groups
3?
version 17 items
3? 3?
van Poppel et al.
Sports Med 2010; 40 (7)
ACSM = meeting PA guidelines of the American College of Sports Medicine; AUS = Australia; b = regression coefficient; BE = Belgium; BF = body fat; BMD = bone mineral density; BMI = body mass index; CA = Canada; CH = China; DEE = dietary energy expenditure; diff = differences; DK = Denmark; EE = energy expenditure; F = F-test for mean differences in PA between different levels of the comparison measure; FIN = Finland; FR = France; HDL = high density lipoprotein; HR = heart rate; IN = India; IPAQ = International Physical Activity Questionnaire; JAP = Japan; j = Kappa; jw = weighted Kappa; L7S = long form, last 7d; LAI = leisure activity index; LOA = limits of agreement; LTPA = leisure-time physical activity; LUS = long form, usual wk; MET = metabolic equivalent; mod = moderate; NOR = Norway; NL = Netherlands; NS = not significant; NZ = New Zealand; occup = occupational; PAL = physical activity level; PAR = physical activity recall(s); r = correlation coefficient; S7S = short form, last 7d; SAI = sport activity index; SC = Scotland; SUS = short form, usual wk; SW = Sweden; Sweat Q = number of times/wk vigorous activity sufficient to ‘work . . up a sweat’; SWZ . = Switzerland; Sys = systolic; TEE = total energy expenditure; TV = television; UK = United Kingdom; US = United States; vig = vigorous; VO2 = oxygen uptake; VO2max = maximal VO2; ? indicates indeterminate; ~ indicates female; # indicates male.
Physical Activity Questionnaires for Adults
589
Table V. Reliability of physical activity (PA) questionnaires (Q) Questionnaire
Study population (n; mean age; nationality)
Interval
Results
Rating
Modified Active Australian Survey
169~ 55 y; AUS[37]
13 d
Total frequency r = 0.58 Total min/wk r = 0.64
22-
Baecke
277; 20–32 y; NL[41]
3 mo
Work r = 0.88 Sport r = 0.81 Leisure r = 0.74
2+ 2+ 2-
Modified Baecke 1
63#; 20–60 y: 56~; 20–70 y; NL[43]
5 mo
Work r = 0.89#, r = 0.80~ Sport r = 0.88#, r = 0.71~ Leisure r = 0.76#, r = 0.83~ Total r = 0.85#, r = 0.83~
3+3 + 3+3 3-3+ 3+3 +
Modified Baecke (ARIC/Baecke)
28# 49~; 37 y; US[4]
26 d
Sport and exercise-related leisure index r = 0.92#, r = 0.87~ Non-sport and exercise-related leisure index r = 0.88#, r = 0.86~ Total leisure activity r = 0.92#, r = 0.90~
2+2 + 2+2 + 2+2 +
28# 50~; 37 y; US[3]
1 mo
Total r = 0.93 Work r = 0.78 Sport r = 0.90 Leisure r = 0.86
2+ 22+ 2+
Extended Baecke (QAPSE)
7# 13~; 23–54 y; FR[28]
6 wk
TEE r = 0.997
2?
Bharathi Q
45# 67~; 18–60 y; IN[45]
2–4 wk
TEE r = 0.86 PAL r = 0.54
2+ 2-
CARDIA
28# 50~; 37 y; US[3]
1 mo
Total r = 0.88 Mod r = 0.66 Heavy r = 0.91
2+ 22+
EPIC original Q
62#; 41 y: 50~; 49 y; NL[10]
5 mo
Total r = 0.76#, r = 0.58~ Occup r = 0.90#, r = 0.79~ Leisure r = 0.85#, r = 0.68~ Rest r = 0.67#, r = 0.65~
3-33+3 3+3 3-3-
Modified EPIC Q (short PA Index)
2271; UK[47]
18–21 mo
PA index k = 0.60
2-
EPAQ2
187#; 65 y: 212~; 64 y; UK[9]
3 mo
TV time k = 0.71#, k = 0.74~ Activity at home k = 0.61#, k = 0.62~ Activity at work k = 0.79#, k = 0.82~ Recreational activity k = 0.54#, k = 0.55~ Vig activity k = 0.58#, k = 0.67~ PA index k = 0.66#, k = 0.70~
1+1 + 1-11+1 + 1-11-11-1+
Flemish PA computerized Q
31#; 39 y: 35~; 42 y; BE[48]
2 wk
PAL ICC = 0.92#, ICC = 0.78~
1?
Godin Q
53; 18–65 y; CA[50]
2 wk
Total ICC = 0.74 Strenuous ICC = 0.94 Mod ICC = 0.46 Light ICC = 0.48
1+ 1+ 11-
28# 50~; 37 y; US[3]
1 mo
Leisure r = 0.62 Mod r = 0.36 Vig r = 0.84
222+
28# 50~; 37 y; US[3]
1 mo
TEE r = 0.72 Sports r = 0.75
22-
21# 38~; 39 y; US[51]
28 d
Leisure EE r = 0.61#, r = 0.75~
2?2?
Harvard/College Alumnus Q
Continued next page
ª 2010 Adis Data Information BV. All rights reserved.
Sports Med 2010; 40 (7)
van Poppel et al.
590
Table V. Contd Questionnaire
Study population (n; mean age; nationality)
Interval
Results
Rating
HUNT 1
S7S: 108#; 32 y; NOR[12]
1 wk
HUNT 2
108#; 32 y; NOR[13]
1 wk
Frequency kw = 0.80 Intensity kw = 0.82 Duration kw = 0.69 Light k = 0.20 Hard k = 0.41 Work k = 0.80
1+ 1+ 1111+
IPAQ
S7S: 111; 21 y; US[93]
4–6 d
Total ICC = 0.86 Vig ICC = 0.89 Mod ICC = 0.71 Walking ICC = 0.89
1+ 1+ 1+ 1+
S7S: 292a; 18–65 y[14] SUS: 906; 18–65 y[14] L7S: 294; 18–65 y[14] LUS: 904; 18–65 y[14]
3–7 d
S7S TEE r = 0.75, ACSM r = 0.93–1.0 SUS TEE r = 0.79, ACSM r = 0.77–0.99 L7S TEE r = 0.77, ASCM r = 0.92–1.0 LUS TEE r = 0.83, ASCM r = 0.90–1.0
2-2+ 2-2+ 2-2+ 2+2 +
S7S: 108#; 32 y; NOR[12]
1 wk
Vig ICC = 0.61–0.62 Mod ICC = 0.30–0.34 Walking ICC = 0.42–0.56 Sitting ICC = 0.80
1111+
S7S; 30#; 26 y: 19~; 34 y; CH[96]
3d
Total ICC = 0.79 Vig ICC = 0.75 Mod ICC = 0.31 Walking ICC = 0.93 Sitting ICC = 0.97
1+ 1+ 11+ 1+
LUS; 23# 30~; 31 y; BE[97]
7 + 3–6 d
Total ICC = 0.69 (ICC over three meas) Vig ICC = 0.82 (ICC over three meas) Mod ICC = 0.63 (ICC over three meas)
11+ 1-
L7S and S7S: 65# 78~; 35 y; UK[97] L7S and S7S: 66; 33 y; NL L7S and S7S: 25; 49 y; US L7S and S7S: 29; 36 y; US
3–7 d
L7S r = 0.82#, r = 0.65~
2+2 -
S7S r = 0.81#, r = 0.63~ L7S r = 0.87, S7S r = 0.95 L7S r = 0.95, S7S r = 0.92 L7S r = 0.85, S7S r = 0.85
2+2 2+2 + 2?2? 2?2?
JACC Q
425# 650~; 40–79 y; JAP[15]
1y
PA time k = 0.45#, k = 0.40~ Walking time k = 0.32#, k = 0.31~ PA freq k = 0.50#, k = 0.51~
2-22-22-2-
Kaiser PA Survey
50~; 39 y; US[56]
1 mo
3-point summary ICC = 0.82 4-point summary ICC = 0.83 Caregiving ICC = 0.01 Housework ICC = 0.79 Housework/caregiving ICC = 0.81 Sports/exercise ICC = 0.84 Active living habits ICC = 0.82 Occup ICC = 0.85
1+ 1+ 11+ 1+ 1+ 1+ 1+
Life in NZ National Survey
36–48; 43 y; NZ[103]
?
Activityhi ICC = 0.70–0.88 Activitylo ICC = 0.50–0.71
2? 2?
Lipid Research Clinics Q
28# 50~; 37 y; US[3]
1 mo
4-point scorer = 0.93
2+
28# 50~; 40 y; US
4 wk
Minnesota LTPA Q
28# 50~; 37 y; US[3]
1 mo
2-point scorer = 0.85 4-point score r = 0.88 Leisure EE r = 0.92 Mod r = 0.80 Heavy r = 0.95
2+ 2+ 2+ 2+ 2+
computerized IPAQ
IPAQ Sitting Q
[59]
Continued next page
ª 2010 Adis Data Information BV. All rights reserved.
Sports Med 2010; 40 (7)
Physical Activity Questionnaires for Adults
591
Table V. Contd Questionnaire
Study population (n; mean age; nationality)
Interval
Results
Rating
Modified Minnesota LTPA Q (Canada Fitness Survey) Minnesota Heart Health Program Q
64#; 49 y: 63~; 46 y; CA[64]
3–4 wk
28# 50~; 37 y; US[3]
1 mo
Total ICC = 0.53 (time) ICC = 0.48 (TEE) Leisure ICC = 0.52 (time) ICC = 0.58 (TEE) Non-leisure ICC = 0.62 (time) ICC = 0.26 (TEE) Strenuous ICC = 0.86#, ICC = 0.31~ Work index r = 0.91 Leisure index r = 0.86
1-11-11-11+1 2+ 2+
Modified Minnesota LTPA Q (y11 Q)
129# 322~; 41 y; US[65]
1–10 y
Leisure EE r = 0.20#, r = 0.29~ Leisure EE k = 0.49#, k = 0.40~ (high v low) Light EE r = 0.17#, r = 0.25~ Mod EE r = 0.17#, r = 0.25~ Vig EE r = 0.47#, r = 0.41~ Vig EE k = 0.67#, k = 0.32~ (high v low)
3-32-23-33-33-32-2-
Modified Minnesota LTPA Q + TOQ Q + new household activity measure
59~; 47 y; US[66]
2 wk
Occup EE r = 0.75; LOA = –0.009 – 0.90 Leisure EE r = 0.46; LOA = –0.05 – 2.25 Household EE r = 0.64; LOA = –0.25 – 1.80
222-
MOSPA
65; 36 y; BE[17]
<3 mo
TEE ICC = 0.68 Work ICC = 0.85 Transport ICC = 0.62 Household ICC = 0.91 LTPA ICC = 0.87
22+ 22+ 2+
NHS II Activity Q
147~; 39 y; US[19]
2y
Activity score r = 0.59 Inactivity score r = 0.52
33-
Modified NHS II Activity Q
238# 40–75 y; US[69]
2y
Vig activity ICC = 0.52 Non-vig activity ICC = 0.42 Sum of activities ICC = 0.41 Inactivity at home ICC = 0.39 Inactivity at work ICC = 0.50 Overall inactivity ICC = 0.39
222222-
Norman Q
222# 63 y; SW[70]
7 mo
Crude total PA C = 0.66 Total PAC = 0.67 C = 0.78 age 44–64 y C = 0.51 age 65–78 y C = 0.70 BMI £26 C = 0.64 BMI >26 Occup C = 0.70 Home C = 0.66 Leisure C = 0.61 TV/reading C = 0.67 Sleeping C = 0.75
222+ 22+ 22+ 2222+
One-wk recall Q
55# 38 y; 63~; 40 y; AUS[71]
3d
Walking ICC = 0.67#, ICC = 0.86~ Mod ICC = 0.71#, ICC = 0.53~ Vig ICC = 0.38#, ICC = 0.89~ Total duration ICC = 0.45#, ICC = 0.80~ Meeting fitnorm[71] k = 0.64#, k = 0.55~
1-1+ 1+1 1-1+ 1-1+ 1-1-
PYTPAQ
75# 79~; 49 y; CA[26,27]
9 wk
Total ICC = 0.66 Vig ICC = 0.72 Low/Mod ICC = 0.55 Occup ICC = 0.58
11+ 11-
Scottish PA Q
9# 25~; 33 y; SC[73]
2d
Total r = 0.998, COR = 53 min Leisure COR = 29 min, occup COR = 55 min
2?
Continued next page
ª 2010 Adis Data Information BV. All rights reserved.
Sports Med 2010; 40 (7)
van Poppel et al.
592
Table V. Contd Questionnaire web-based vs paper version Stanford SDR
Stanford Usual Act Q
Study population (n; mean age; nationality)
Interval
Results
Rating
16; UK[74]
1 wk
Total r = 0.67
2?
28# 50~; 37 y; US[3]
1 mo
Total r = 0.34 Mod r = 0.12 Vig r = 0.37
222-
90# 73~; 22 y; US[38]
3 wk 4 wk 7 wk
TEE r = 0.58 TEE r = 0.63 TEE r = 0.42
222-
28# 50~; 37 y; US[3]
1 mo
Mod r = 0.77 Vig r = 0.67
22-
Usual PA measure
37~; 40–55 y; US[83]
14 d
Total r = 0.88
2?
SQUASH
36# 14~; 44 y; NL[32]
5 wk
Total r = 0.58 Sports r = 0.90
22+
Suzuki Q
95#; 37–72 y: 119~; 35–73 y; JAP[81]
1y
TEE (day) r = 0.59#, r = 0.62~ TEE (wk) r = 0.37#, r = 0.43~
3-33-3-
Singh Q
59# 53~; 52 y; US[77]
6 wk
PA index r = 0.56–0.80#, r = 0.76~ RWJ index r = 0.77–0.78#, r = 0.70–0.85~ Total activity index r = 0.51#
2-22-22-
29# 70~; 49 y; US[78]
6 wk
RWJ index r = 0.65#, r = 0.64~ Vig activity r = 0.82#, r = 0.78~ Sport/recreational index r = 0.91#, r = 0.65~ Total activityb r = 0.78 #, r = 0.64~
2?22?22?22?2-
39 + 94; 41 y; SW[82]
3 wk
Total r = 0.73
2-
134~; 50 y; US[11,102]
1y
TEE ICC = 0.82 Mod EE ICC = 0.80 Vig EE ICC = 0.86 Recreational ICC = 0.87 Household ICC = 0.78
1+ 1+ 1+ 1+ 1+
Modified Baecke ARIC/Baecke Work Index
27# 48~; 37 y; US[31]
1 mo
Work index r = 0.74
2-
Health Insurance Plan of NY Q
27# 48~; 37 y; US[31]
1 mo
Total occup r = 0.83
2+
Total PA Lifetime PA Modified HLAQ
Occup PA
28# 50~; 37 y; US[3]
1 mo
Total occup r = 0.86
2+
Minnesota Heart Health Program Occup Q
27# 48~; 37 y; US[31]
1 mo
Total occup r = 0.84
2+
Modified Stanford SDR
27# 48~; 37 y; US[31]
1 mo
Total occup score activity score/wk r = 0.58 Total occup score h/wk r = 0.56 Total occup score MET min/wk r = 0.20
222-
TOQ
27# 48~; 37 y; US[31]
1 mo
Total occup score activity score/wk r = 0.83 Total occup score h/wk r = 0.63 Total occup score MET min/wk r = 0.37
2+ 22-
Lipid Research Clinics Occup Q
27# 48~; 37 y; US[31]
1 mo
Total occup r = 0.73
2-
CARDIA Occup
27# 48~; 37 y; US[31]
1 mo
Total occup r = 0.37
2-
Continued next page
ª 2010 Adis Data Information BV. All rights reserved.
Sports Med 2010; 40 (7)
Physical Activity Questionnaires for Adults
593
Table V. Contd Questionnaire
Study population (n; mean age; nationality)
Interval
Results
Rating
NPAQ
82; 20–71 y; AUS[20]
1 wk
Total walking ICC = 0.91
1+
Walking Q
51# 55~; 62 y; JAP[84]
3 mo
Walking 59–74% agreement
1-
Bone Loading History Q
78~; 31 y; US[86]
4–6 wk
Total PA hip ICC = 0.89 Total PA spine ICC = 0.92
1+ 1+
Historical Activity Q
31~; 21 y, US[87]
6.5 mo
Total r = 0.76 Athletics r = 0.82 Exercise r = 0.55 Leisure r = 0.70 Occup r = 0.48 Lifting/carrying r = 0.51
2-
Walking activities
Bone loading PA
a
Pooled data from 12 countries.
b
Calculated slightly differently from the Total Activity Index in Singh et al.[77]
ACSM = meeting PA guidelines of the American College of Sports Medicine; Activityhi = activity of high intensity; Activitylo = activity of low intensity; AUS = Australia; BE = Belgium; BMI = body mass index; C = concordance; CA = Canada; CH = China; COR = coefficient of repeatability; EE = energy expenditure; FR = France; ICC = intraclass correlation coefficient; IN = India; JAP = Japan; j = Kappa; jw = weighted Kappa; LOA = limits of agreement; LTPA = leisure time physical activity; LUS = long form, usual week; meas = measurements; mod = moderate; NL = Netherlands; occup = occupational; NOR = Norway; PAL = PA level; r = correlation coefficient; RWJ = run-walk-jog; S7S = short form, last 7 d; SC = Scotland; SUS = short form, usual week; SW = Sweden; TEE = total energy expenditure; TV = television; UK = United Kingdom; US = United States; vig = vigorous; ~ indicates female; # indicates male.
Construct validity was assessed by validation against doubly labelled water for seven questionnaires.[16,21,40,81,104,105] In all these studies, the correlation of total energy expenditure assessed with the questionnaire and with doubly labelled water was lower than our criterion of 0.70, with Pearson correlations ranging between 0.31 and 0.58 (table IV). In 41 studies, construct validity was assessed by validation against accelerometers (table IV). For only one questionnaire, validated in a study with >50 participants, the correlation between accelerometer data and total PA was >0.50 (Suzuki Q[81]). In an attempt to find out which type of questionnaire performed best, we averaged the correlations found in the 41 studies using accelerometers as the comparison measure. It was clear that correlations differed slightly between vigorous and moderate activity, with higher correlations for vigorous activity (r = 0.32 vs 0.22). Also, a higher correlation was found for questionnaires asking about the past week, instead of a usual week/usual PA/current PA or about the past year (r = 0.41 vs 0.26 and 0.30, respectively). ª 2010 Adis Data Information BV. All rights reserved.
Two questionnaires designed for measuring walking were validated against pedometers (Level 1). One scored negative[85] and the other was rated as indeterminate because of a statistical analysis that could not be interpreted.[84] The reliability of 15 versions of PA questionnaires was assessed at Level 1 (table V), and only five showed positive results: the self-administered, short version of the IPAQ on PA in the past 7 days (S7S),[93] the Modified HLAQ,[11,102] the NPAQ[20] and the Bone Loading History Q[86] scored positive on all aspects, and the Kaiser PA Survey[56] scored positive on all aspects, except ‘care giving’. The other questionnaires showed mixed results or scored negative on most aspects, or scored indeterminate because of a small sample size. In addition to the 15 questionnaires for which evidence on Level 1 was available, Level 2 evidence was found for another 36 (versions of) questionnaires. For only six questionnaires, a positive score on Level 2 was given (Modified Baecke [(ARIC) Baecke],[4] Health Insurance Plan of NY Q,[3,31] Lipid Res Clin Q,[3,59] Minnesota LTPA Q,[3] the Minnesota Heart Health Program Q,[3] and the Minnesota Heart Health Sports Med 2010; 40 (7)
van Poppel et al.
594
Program Occupational Q[31]). The other questionnaires showed mixed results or scored negative on most aspects, or scored indeterminate because of a small sample size. When averaging the results of the reliability studies, no clear differences were found between questionnaires with different recall periods, between different time intervals between test and retest or between sexes. The only difference found was that, on average, the reliability for vigorous activity was higher than for moderate activity. The responsiveness of a questionnaire was assessed in only two studies,[38,54] and seemed to be poor. The correlation between changes in selfreported PA and changes in supervised activity in a training programme was -0.07 for total energy expenditure and 0.01 for vigorous activity.[38] The correlation of change in PA assessed with an adapted version. of the long form of the IPAQ with change in VO2max was 0.20 for men and 0.12 for women.[54]
3. Discussion Although more than 90 papers have been published on the validity or reliability of PA questionnaires, this is the first systematic review of studies assessing the measurement properties of PA questionnaires, in which the results as well as the methodological quality of the individual studies have been taken into account. Our results indicate that the overall methodological quality of the studies could be much improved. Most common flaws were small sample size and inadequate analyses, and for construct validity, comparison measures that were not measuring the same construct. An important finding of our review was the poor reporting of methods and results of the studies. It was often unclear what dimension of PA the questionnaire was supposed to measure. This made assessing content validity sometimes impossible. Furthermore, it was extremely difficult, if not impossible, to assess whether the same or slightly modified versions of questionnaires were used in some studies, and it was not always clear whether the data were derived from a selfª 2010 Adis Data Information BV. All rights reserved.
report questionnaire or whether the questionnaire was part of an interview. For assessing construct validity, it is important to formulate specific hypotheses in advance about expected correlations between the questionnaire under study and other measures. However, almost none of the studies had formulated such hypotheses. To be able to provide levels of evidence we formulated hypotheses regarding the strength of the association between comparison instruments. This methodology is not new, and the idea behind it is that, in retrospect, it is always easy and tempting to come up with explanations for the findings and conclude that the questionnaire is valid. In fact, most studies in our review concluded that the questionnaire under study was valid. However, when we applied our criteria we found that these conclusions were overly optimistic in almost all cases. Reliability was also often poorly assessed. Many studies used large time intervals between the test and retest, and in most studies Pearson or Spearman correlation coefficients were calculated instead of ICCs or Kappas. This is partly because we included studies performed many years ago, when Pearson correlation was still an accepted method, but nowadays there is a consensus that calculating ICCs or Kappas is the preferred method for assessing reliability. Only two studies evaluated responsiveness, i.e. the ability of a questionnaire to detect change in PA over time. This is amazing, given the importance of responsiveness of a questionnaire when used in PA intervention studies. If a questionnaire has poor responsiveness, treatment effects cannot be detected, or only with large sample sizes. For some questionnaires, the majority of the population scored the highest or lowest possible score (e.g. with the modified CHAMPS[6]). When this happens, there is little opportunity for change, leading to low responsiveness. Although the methodology of assessing responsiveness tends to be less well understood, there is a consensus that responsiveness should be considered an aspect of validity, in a longitudinal context.[106] While construct validity is about the validity of a single score, responsiveness is about the validity of a change score. This means that Sports Med 2010; 40 (7)
Physical Activity Questionnaires for Adults
similar methods can be applied as for assessing validity to assess the validity of changes in PA scores over time, i.e. stating a priori hypotheses. We found that correlations between PA questionnaire data and accelerometer data were slightly higher in questionnaires asking about the previous week compared with those asking about a usual week. Often, accelerometers were worn in the week that was captured by the questionnaire. It might be that this explains why higher correlations were found for these questionnaires compared with those that asked about a usual week or usual PA. So, whether questionnaires asking about the previous week are really better in assessing PA, or that this is a consequence of the testing procedures, needs to be determined. 3.1 Limitations of this Review
As with any other systematic review, it is possible we missed some relevant papers with our literature search. We only used the search terms ‘questionnaire’, ‘physical activity’, ‘exercise’ and ‘motor activity’ and did not include alternative wordings, such as ‘survey’. However, after checking all references of relevant papers retrieved in our search, it proved that very few papers were missed. Because of an overwhelming amount of data available, we had to be selective in what to present in this review. First of all, we chose to limit the review to self-administered questionnaires, realizing that some questionnaires have been used in other forms as well, such as interviewadministered. We realized that with this restriction we have ignored some studies on questionnaires that can be either self-administered or used as an interview. The measurement properties of these questionnaires may be different in these two applications. Therefore, by restricting the review to one form of administration, the studies were more homogeneous and we felt better comparisons across questionnaires could be made, without allowing for the type of administration as well. Further, when assessing validity, only correlations . with accelerometer data, VO2max, BMI and percentage body fat were extracted from the papers, because we felt that, although these are different constructs, these comparison measures were most ª 2010 Adis Data Information BV. All rights reserved.
595
closely related to the construct being measured in the questionnaires. We have ignored correlations with, for example, cholesterol or blood pressure in these comparisons because only a limited correlation with PA can be expected. Lastly, not all scores resulting from the questionnaires could be presented. We often restricted the information to the overall or total PA scores. Data were presented for men and women separately when relevant (i.e. in case of sex differences). Interpretation of the results was difficult for some studies, mostly due to poor reporting. Although two reviewers independently extracted data from the papers, interpretation may have been incorrect in some cases. Given the number of studies included in the review, and the number of studies conducted a long time ago, we chose not to contact the authors of the original studies. Many of the choices for scoring the quality of the studies have been made without a very strong basis on theory or evidence, simply because there is not much available to base these choices on. Others might have chosen different cut-off points for scoring negative or positive on validity or reliability. The same is true for the decision on what is a sufficient sample size and what is the appropriate time interval between test-retest. However, readers can decide according to their own insights and draw their own conclusions from the data provided in the tables. 3.2 Recommendations for Choosing a Questionnaire
Current US recommendations state that every adult should participate 2.5 hours a week in moderate intensity or 75 minutes a week in vigorous intensity aerobic PA or in an equivalent combination of moderate and vigorous intensity activity. Aerobic activity should be performed in episodes of at least 10 minutes, preferably spread throughout the week. Based on these recommendations, questionnaires for measuring total PA should at least measure duration and frequency, and measure PA in all settings (work, home, transport, recreation, sport) to have sufficient content validity. Especially older questionnaires, such as the Baecke questionnaire,[41] do not fulfil this criterion, Sports Med 2010; 40 (7)
596
because insight into what PA for health should entail has changed over time. Of course, some researchers will need a PA questionnaire not only for measuring total PA but also for different purposes, and different aspects of PA might be relevant for their study. For instance, when looking at bone health, energy expended in cycling or swimming might be less important, but carrying loads would be of interest. So there will not be one questionnaire suitable for all purposes or target groups. The choice for a certain questionnaire should therefore always start with defining the purpose of the study and the PA measurement, after which the content validity of a possible questionnaire should be judged. Only then do construct validity and reliability need to be considered. In this review, the content of 23 questionnaires was deemed appropriate for the dimension of PA it was intended to measure (Bharati,[45] EPIC original Q,[10] EPAQ2,[9] Harvard/College Alumnus Q,[3,51] the long version of the IPAQ,[14] the adapted IPAQ,[54] Kaiser PA Survey,[56] LACE PA Q,[7] LTPA Q,[61] Mail Survey of PA,[62] Norman Q,[70] NZPAQ-SF,[21] One-week recall Q,[71] PAFQ,[22] PA History Q,[72] PYTPAQ,[26] Singh Q,[77,78] SQUASH,[32] Historical walking, running and jogging questionnaire,[30] NPAQ,[20] Health Insurance Plan of NY,[3] TOQ[31,89] London PA Q[88]). Unfortunately, for only 13 of these 23 questionnaires was both reliability and construct validity studied (Bharati,[45] EPIC original Q,[10] EPAQ2,[9] Harvard/College Alumnus Q,[3,51] Kaiser PA Survey,[56] the long version of the IPAQ,[14] Norman Q,[70] One-week recall Q,[71] PYTPAQ,[26] Singh Q,[77,78] SQUASH,[32] Health Insurance Plan of NY,[3] TOQ[31,89]). Of the 23 questionnaires with sufficient content validity, the Kaiser PA Survey,[56] the Godin Q,[50] the NPAQ,[20] Bharati Q,[45] the LUS version of the IPAQ,[14] One-week recall Q,[71] and the Health Insurance Plan of NY[3] scored good for reliability at Level 1 or 2. Construct validity was sufficient according to our criteria only for the L7S version of the IPAQ in one study,[92] although validity for the Kaiser PA Survey[56] was 0.49, which is only just below the (arbitrarily chosen) cut-off point of 0.50. ª 2010 Adis Data Information BV. All rights reserved.
van Poppel et al.
In recent studies, the IPAQ seems to be used most often and it is by far the most widely validated questionnaire at present.[14,91-95,97,107] Reliability of the IPAQ was not shown consistently within or between studies, although the short version for the past 7 days (S7S) and the long version for a usual week (LUS) seemed to perform best. We therefore recommend additional reliability studies of the IPAQ. Validity of the IPAQ seems questionable. First, content validity of the short forms seems limited because it does not discriminate between different settings. The long form, which does discriminate between five settings therefore has a better content validity, but it was reported to be ‘‘too boring and repetitive’’ and too long for routine surveillance.[14] The construct validity of both the short and the long forms varied widely, but were mostly below our criteria. Of the self-administered IPAQ forms, only for the L7S was a correlation found with an accelerometer – of 0.52 found in Finland[14] and 0.55 in Sweden[92] – and for the S7S in the US in men only.[95] Discrimination of the IPAQ between groups of people with different activity levels as measured with DLW[94] was questionable, although differentiation between groups with different fitness levels was adequate.[91] Therefore, we feel that additional well designed studies on the measurement properties, with specific attention to responsiveness, of the IPAQ are required. 3.3 Recommendations for Further Research
For future studies, we recommend choosing from the abovementioned 23 questionnaires that we identified as having sufficient content validity, and validating those further for reliability, construct validity and especially responsiveness. The results of this review indicate that one study on validity and reliability of a questionnaire is not enough. A number of other questionnaires were validated in more than one study, and without exception the results were conflicting: the questionnaires showed sufficient validity in one study and not in another. Also, in the large international study on validity and reliability of the IPAQ, huge differences were found between countries. This indicates that it is important for Sports Med 2010; 40 (7)
Physical Activity Questionnaires for Adults
researchers to assess the measurement properties of a questionnaire in their own language and in their own target population. As the majority of the studies on measurement properties of PA questionnaires have been conducted in the US, it remains to be seen whether the results can be generalized to other countries. We therefore strongly recommend researchers to assess measurement properties of a questionnaire carefully in their own target group. Although PA questionnaires are frequently used for the evaluation of the effects of intervention, surprisingly little attention has been paid to the responsiveness of these questionnaires. A prerequisite for detecting differences in PA after an intervention would be that the questionnaire is responsive to change. The two studies assessing responsiveness did not show positive results in that regard. Finally, more attention should be paid to reporting on studies assessing measurement properties of PA questionnaires, since, for instance, it was often unclear what questionnaire was used and for what purpose the questionnaire was intended. The QAPAQ might be a useful tool when reporting on measurement properties.
4. Conclusions Based on our review of the literature concerning measurement properties of questionnaires measuring PA, no conclusion can be drawn regarding the best questionnaire at the moment. Researchers should determine which questionnaire would fit their purposes best regarding the content of the questionnaire. Questionnaires with good content validity need to be validated in well designed studies and in different countries. Data on the responsiveness of PA questionnaires are urgently needed for the use of questionnaires in intervention studies. Acknowledgements No sources of funding were used to assist in the preparation of this review. The authors have no conflicts of interest that are directly relevant to the content of this review.
ª 2010 Adis Data Information BV. All rights reserved.
597
References 1. Powell KE, Thompson PD, Caspersen CJ, et al. Physical activity and the incidence of coronary heart disease. Annu Rev Public Health 1987; 8: 253-87 2. Caspersen CJ, Powell KE, Christenson GM. Physical activity, exercise, and physical fitness: definitions and distinctions for health-related research. Public Health Rep 1985; 100 (2): 126-31 3. Jacobs Jr DR, Ainsworth BE, Hartman TJ, et al. A simultaneous evaluation of 10 commonly used physical activity questionnaires. Med Sci Sports Exerc 1993; 25 (1): 81-91 4. Richardson MT, Ainsworth BE, Wu HC, et al. Ability of the Atherosclerosis Risk in Communities (ARIC)/Baecke Questionnaire to assess leisure-time physical activity. Int J Epidemiol 1995; 24 (4): 685-93 5. Jacobs J, Hahn LP, Haskell WL, et al. Validity and reliability of short physical activity history: Cardia and the Minnesota Heart Health Program. J Cardiopulm Rehabil 1989; 9 (11): 448-59 6. Resnicow K, McCarty F, Blissett D, et al. Validity of a modified CHAMPS physical activity questionnaire among African-Americans. Med Sci Sports Exerc 2003; 35 (9): 1537-45 7. Altschuler A, Picchi T, Nelson M, et al. Physical activity questionnaire comprehension: lessons from cognitive interviews. Med Sci Sports Exerc 2009; 41 (2): 336-43 8. Mokkink LB, Terwee CB, Knol DL, et al. Protocol of the COSMIN study: COnsensus-based Standards for the selection of health Measurement INstruments. BMC Med Res Methodol 2006 Jan 24; 6: 2 9. Wareham NJ, Jakes RW, Rennie KL, et al. Validity and repeatability of the EPIC-Norfolk Physical Activity Questionnaire. Int J Epidemiol 2002; 31 (1): 168-74 10. Pols MA, Peeters PH, Ocke MC, et al. Relative validity and repeatability of a new questionnaire on physical activity. Prev Med 1997; 26 (1): 37-43 11. Chasan-Taber L, Erickson JB, McBride JW, et al. Reproducibility of a self-administered lifetime physical activity questionnaire among female college alumnae. Am J Epidemiol 2002; 155 (3): 282-9 12. Kurtze N, Rangul V, Hustvedt BE, et al. Reliability and validity of self-reported physical activity in the NordTrondelag Health Study: HUNT 1. Scand J Public Health 2008; 36 (1): 52-61 13. Kurtze N, Rangul V, Hustvedt BE, et al. Reliability and validity of self-reported physical activity in the NordTrondelag Health Study (HUNT 2). Eur J Epidemiol 2007; 22 (6): 379-87 14. Craig CL, Marshall AL, Sjostrom M, et al. International physical activity questionnaire: 12-country reliability and validity. Med Sci Sports Exerc 2003; 35 (8): 1381-95 15. Iwai N, Hisamichi S, Hayakawa N, et al. Validity and reliability of single-item questions about physical activity. J Epidemiol 2001; 11 (5): 211-8 16. Walsh MC, Hunter GR, Sirikul B, et al. Comparison of self-reported with objectively assessed energy expenditure in black and white women before and after weight loss. Am J Clin Nutr 2004; 79 (6): 1013-9 17. Roeykens J, Rogers R, Meeusen R, et al. Validity and reliability in a Flemish population of the WHO-MONICA
Sports Med 2010; 40 (7)
van Poppel et al.
598
18.
19.
20.
21.
22.
23.
24.
25.
26.
27.
28.
29.
30.
31.
32.
33.
34.
Optional Study of Physical Activity Questionnaire. Med Sci Sports Exerc 1998; 30 (7): 1071-5 Miller DJ, Freedson PS, Kline GM. Comparison of activity levels using the Caltrac accelerometer and five questionnaires. Med Sci Sports Exerc 1994; 26 (3): 376-82 Wolf AM, Hunter DJ, Colditz GA, et al. Reproducibility and validity of a self-administered physical activity questionnaire. Int J Epidemiol 1994; 23 (5): 991-9 Giles-Corti B, Timperio A, Cutt H, et al. Development of a reliable measure of walking within and outside the local neighborhood: RESIDE’s Neighborhood Physical Activity Questionnaire. Prev Med 2006; 42 (6): 455-9 Maddison R, Ni MC, Jiang Y, et al. International Physical Activity Questionnaire (IPAQ) and New Zealand Physical Activity Questionnaire (NZPAQ): a doubly labelled water validation. Int J Behav Nutr Phys Act 2007 Dec 3; 4: 62 Bernstein M, Sloutskis D, Kumanyika S, et al. Data-based approach for developing a physical activity frequency questionnaire. Am J Epidemiol 1998; 147 (2): 147-54 Copeland JL, Kowalski KC, Donen RM, et al. Convergent Validity of the Physical Activity Questionnaire for Adults: the new member of the PAQ family. J Phys Act Health 2005; 2 (2): 216 Aadahl M, Jorgensen T. Validation of a new self-report instrument for measuring physical activity. Med Sci Sports Exerc 2003; 35 (7): 1196-202 Aadahl M, Kjaer M, Kristensen JH, et al. Self-reported physical activity compared with maximal oxygen uptake in adults. Eur J Cardiovasc Prev Rehabil 2007; 14 (3): 422-8 Friedenreich CM, Courneya KS, Neilson HK, et al. Reliability and validity of the Past Year Total Physical Activity Questionnaire. Am J Epidemiol 2006; 163 (10): 959-70 Ferrari P, Friedenreich C, Matthews CE. The role of measurement error in estimating levels of physical activity. Am J Epidemiol 2007; 166 (7): 832-40 Berthouze SE, Minaire PM, Chatard JC, et al. A new tool for evaluating energy expenditure: the ‘‘QAPSE’’ development and validation. Med Sci Sports Exerc 1993; 25 (12): 1405-14 Terwee CB, Mokkink LB, van Poppel MNM, et al. Qualitative attributes and measurement properties of physical activity questionnaires: the QAPAQ checklist. Sports Med 2010; 40 (7): 525-37 Bowles HR, FitzGerald SJ, Morrow Jr JR, et al. Construct validity of self-reported historical physical activity. Am J Epidemiol 2004; 160 (3): 279-86 Ainsworth BE, Jacobs Jr DR, Leon AS, et al. Assessment of the accuracy of physical activity questionnaire occupational data. J Occup Med 1993; 35 (10): 1017-27 Wendel-Vos GC, Schuit AJ, Saris WH, et al. Reproducibility and relative validity of the short questionnaire to assess health-enhancing physical activity. J Clin Epidemiol 2003; 56 (12): 1163-9 Terwee CB, Bot SD, de Boer MR, et al. Quality criteria were proposed for measurement properties of health status questionnaires. J Clin Epidemiol 2007; 60 (1): 34-42 Streiner DL, Norman GR. Health measurement scales: a practical guide to their development and use. New York: Oxford University Press, 2003
ª 2010 Adis Data Information BV. All rights reserved.
35. de Vet HCW. Observer reliability and agreement. In: Armitage P, Colton T, editors. Encyclopedia of biostatistics. Boston (MA): John Wiley & Sons Ltd, 1998: 3123-8 36. Deyo RA, Centor RM. Assessing the responsiveness of functional scales to clinical change: an analogy to diagnostic test performance. J Chronic Dis 1986; 39: 897-906 37. Brown WJ, Burton NW, Marshall AL, et al. Reliability and validity of a modified self-administered version of the Active Australia physical activity survey in a sample of midage women. Aust NZ J Public Health 2008; 32 (6): 535-41 38. Dishman RK, Steinhardt M. Reliability and concurrent validity for a 7-d re-call of physical activity in college students. Med Sci Sports Exerc 1988; 20 (1): 14-25 39. Aires N, Selmer R, Thelle D. The validity of self-reported leisure time physical activity, and its relationship to serum cholesterol, blood pressure and body mass index: a population based study of 332,182 men and women aged 40-42 years. Eur J Epidemiol 2003; 18 (6): 479-85 40. Staten LK, Taren DL, Howell WH, et al. Validation of the Arizona Activity Frequency Questionnaire using doubly labeled water. Med Sci Sports Exerc 2001; 33 (11): 1959-67 41. Baecke JA, Burema J, Frijters JE. A short questionnaire for the measurement of habitual physical activity in epidemiological studies. Am J Clin Nutr 1982; 36 (5): 936-42 42. Albanes D, Conway JM, Taylor PR, et al. Validation and comparison of eight physical activity questionnaires. Epidemiology 1990; 1 (1): 65-71 43. Pols MA, Peeters PH, Bueno-de-Mesquita HB, et al. Validity and repeatability of a modified Baecke questionnaire on physical activity. Int J Epidemiol 1995; 24 (2): 381-8 44. Canon F, Levol B, Duforez F. Assessment of physical activity in daily life. J Cardiovasc Pharmacol 1995; 25 Suppl. 1: S28-34 45. Bharathi AV, Sandhya N, Vaz M. The development & characteristics of a physical activity questionnaire for epidemiological studies in urban middle class Indians. Indian J Med Res 2000; 111: 95-102 46. Carter-Nolan PL, Adams-Campbell LL, Makambi K, et al. Validation of physical activity instruments: Black Women’s Health Study. Ethn Dis 2006; 16 (4): 943-7 47. Wareham NJ, Jakes RW, Rennie KL, et al. Validity and repeatability of a simple index derived from the short physical activity questionnaire used in the European Prospective Investigation into Cancer and Nutrition (EPIC) study. Public Health Nutr 2003; 6 (4): 407-13 48. Matton L, Wijndaele K, Duvigneaud N, et al. Reliability and validity of the Flemish Physical Activity Computerized Questionnaire in adults. Res Q Exerc Sport 2007; 78 (4): 293-306 49. Gionet NJ, Godin G. Self-reported exercise behavior of employees: a validity study. J Occup Med 1989; 31 (12): 969-73 50. Godin G, Shephard RJ. A simple method to assess exercise behavior in the community. Can J Appl Sport Sci 1985; 10 (3): 141-6 51. Ainsworth BE, Leon AS, Richardson MT, et al. Accuracy of the College Alumnus Physical Activity Questionnaire. J Clin Epidemiol 1993; 46 (12): 1403-11 52. Strath SJ, Bassett Jr DR, Swartz AM. Comparison of the college alumnus questionnaire physical activity index with objective monitoring. Ann Epidemiol 2004; 14 (6): 409-15
Sports Med 2010; 40 (7)
Physical Activity Questionnaires for Adults
53. Siconolfi SF, Lasater TM, Snow RC, et al. Self-reported physical activity compared with maximal oxygen uptake. Am J Epidemiol 1985; 122 (1): 101-5 54. Graff-Iversen S, Anderssen SA, Holme IM, et al. An adapted version of the long International Physical Activity Questionnaire (IPAQ-L): construct validity in a low-income, multiethnic population study from Oslo, Norway. Int J Behav Nutr Phys Act 2007 April 20; 4: 13 55. Graff-Iversen S, Anderssen SA, Holme IM, et al. Two short questionnaires on leisure-time physical activity compared with serum lipids, anthropometric measurements and aerobic power in a suburban population from Oslo, Norway. Eur J Epidemiol 2008; 23 (3): 167-74 56. Ainsworth BE, Sternfeld B, Richardson MT, et al. Evaluation of the kaiser physical activity survey in women. Med Sci Sports Exerc 2000; 32 (7): 1327-38 57. Salonen JT, Lakka T. Assessment of physical activity in population studies: validity and consistency of the methods in the Kuopio ischemic heart disease risk factor study. Scand J Sports Sci 1987; 9 (3): 89-95 58. Hopkins WG, Wilson NC, Russell DG. Validation of the physical activity instrument for the Life in New Zealand national survey. Am J Epidemiol 1991; 133 (1): 73-82 59. Ainsworth BE, Jacobs Jr DR, Leon AS. Validity and reliability of self-reported physical activity status: the Lipid Research Clinics questionnaire. Med Sci Sports Exerc 1993; 25 (1): 92-8 60. Lof M, Hannestad U, Forsum E. Assessing physical activity of women of childbearing age: ongoing work to develop and evaluate simple methods. Food Nutr Bull 2002; 23 (3 Suppl.): 30-3 61. Parker DL, Leaf DA, McAfee SR. Validation of a new questionnaire for the assessment of leisure time physical activity. Ann Sports Med 1988; 4 (2): 72-81 62. Kohl HW, Blair SN, Paffenbarger Jr RS, et al. A mail survey of physical activity habits as related to measured physical fitness. Am J Epidemiol 1988; 127 (6): 1228-39 63. Taylor HL, Jacobs Jr DR, Schucker B, et al. A questionnaire for the assessment of leisure time physical activities. J Chronic Dis 1978; 31 (12): 741-55 64. Weller IM, Corey PN. A study of the reliability of the Canada Fitness Survey questionnaire. Med Sci Sports Exerc 1998; 30 (10): 1530-6 65. Blair SN, Dowda M, Pate RR, et al. Reliability of long-term recall of participation in physical activity by middle-aged men and women. Am J Epidemiol 1991; 133 (3): 266-75 66. Wilbur J, Holm K, Dan A. A quantitative survey to measure energy expenditure in midlife women. J Nurs Meas 1993; 1 (1): 29-40 67. Conway JM, Irwin ML, Ainsworth BE. Estimating energy expenditure from the Minnesota Leisure Time Physical Activity and Tecumseh Occupational Activity questionnaires: a doubly labeled water validation. J Clin Epidemiol 2002; 55 (4): 392-9 68. Mundal R, Erikssen J, Rodahl K. Assessment of physical activity by questionnaire and personal interview with particular reference to fitness and coronary mortality. Eur J Appl Physiol Occup Physiol 1987; 56 (3): 245-52 69. Chasan-Taber S, Rimm EB, Stampfer MJ, et al. Reproducibility and validity of a self-administered physical
ª 2010 Adis Data Information BV. All rights reserved.
599
70.
71.
72.
73.
74.
75.
76.
77.
78.
79.
80.
81.
82.
83.
84.
85.
activity questionnaire for male health professionals. Epidemiology 1996; 7 (1): 81-6 Norman A, Bellocco R, Bergstrom A, et al. Validity and reproducibility of self-reported total physical activity: differences by relative weight. Int J Obes Relat Metab Disord 2001; 25 (5): 682-8 Timperio A, Salmon J, Crawford D. Validity and reliability of a physical activity recall instrument among overweight and non-overweight men and women. J Sci Med Sport 2003; 6 (4): 477-91 Sidney S, Jacobs Jr DR, Haskell WL, et al. Comparison of two methods of assessing physical activity in the Coronary Artery Risk Development in Young Adults (CARDIA) Study. Am J Epidemiol 1991; 133 (12): 1231-45 Lowther M, Mutrie N, Loughlan C, et al. Development of a Scottish physical activity questionnaire: a tool for use in physical activity interventions. Br J Sports Med 1999; 33 (4): 244-9 Marsden J, Jones RB. Validation of Web-based questionnaires regarding osteoporosis prevention in young British women. Health Bull (Edinb) 2001; 59 (4): 254-62 Bulley C, Donaghy M, Payne A, et al. Validation and modification of the Scottish Physical Activity Questionnaire for use in a female student population. Int J Health Promot Edu 2005; 43 (4): 117-24 Wilbur J, Miller A, Dan AJ, et al. Measuring physical activity in midlife women. Public Health Nurs 1989; 6 (3): 120-8 Singh PN, Tonstad S, Abbey DE, et al. Validity of selected physical activity questions in white Seventh-day Adventists and non-Adventists. Med Sci Sports Exerc 1996; 28 (8): 1026-37 Singh PN, Fraser GE, Knutsen SF, et al. Validity of a physical activity questionnaire among African-American Seventh-day Adventists. Med Sci Sports Exerc 2001; 33 (3): 468-75 Schechtman KB, Barzilai B, Rost K, et al. Measuring physical activity with a single question. Am J Public Health 1991; 81 (6): 771-3 Arroll B, Jackson R, Beaglehole R. Validation of a threemonth physical activity recall questionnaire with a sevenday food intake and physical activity diary. Epidemiology 1991; 2 (4): 296-9 Suzuki I, Kawakami N, Shimizu H. Reliability and validity of a questionnaire for assessment of energy expenditure and physical activity in epidemiological studies. J Epidemiol 1998; 8 (3): 152-9 Lagerros YT, Mucci LA, Bellocco R, et al. Validity and reliability of self-reported total energy expenditure using a novel instrument. Eur J Epidemiol 2006; 21 (3): 227-36 Li S, Carlson E, Holm K. Validation of a single-item measure of usual physical activity. Percept Mot Skills 2000; 91 (2): 593-602 Tsubono Y, Tsuji I, Fujita K, et al. Validation of walking questionnaire for population-based prospective studies in Japan: comparison with pedometer. J Epidemiol 2002; 12 (4): 305-9 Bassett Jr DR, Cureton AL, Ainsworth BE. Measurement of daily walking distance-questionnaire versus pedometer. Med Sci Sports Exerc 2000; 32 (5): 1018-23
Sports Med 2010; 40 (7)
van Poppel et al.
600
86. Dolan SH, Williams DP, Ainsworth BE, et al. Development and reproducibility of the bone loading history questionnaire. Med Sci Sports Exerc 2006; 38 (6): 1121-31 87. Eagan MS, Lyle RM, George PM, et al. A new selfreported comprehensive historical activity questionnaire for young women. J Phys Act Health 2005; 2 (1): 35 88. Suleiman S, Nelson M. Validation in London of a physical activity questionnaire for use in a study of postmenopausal osteopaenia. J Epidemiol Community Health 1997; 51 (4): 365-72 89. Ainsworth BE, Richardson MT, Jacobs Jr DR, et al. Accuracy of recall of occupational physical activity by questionnaire. J Clin Epidemiol 1999; 52 (3): 219-27 90. Rundle A, Hagins M, Orjuela M, et al. Traditional physical activity indexes derived from the Harvard Alumni Activity Survey have low construct validity in a lower income, urban population. Urban Health 2008; 84 (5): 722-32 91. Fogelholm M, Malmberg J, Suni J, et al. International Physical Activity Questionnaire: validity against fitness. Med Sci Sports Exerc 2006; 38 (4): 753-60 92. Hagstromer M, Oja P, Sjostrom M. The International Physical Activity Questionnaire (IPAQ): a study of concurrent and construct validity. Public Health Nutr 2006; 9 (6): 755-62 93. Dinger MK, Behrens TK, Han JL. Validity and reliability of the International Physical Activity Questionnaire in college students. Am J Health Edu 2006; 37 (6): 337-43 94. Ishikawa-Takata K, Tabata I, Sasaki S, et al. Physical activity level in healthy free-living Japanese estimated by doubly labelled water method and International Physical Activity Questionnaire. Eur J Clin Nutr 2008 Jul; 62 (7): 885-91 95. Wolin KY, Heil DP, Askew S, et al. Validation of the international physical activity questionnaire-short among blacks. J Phys Act Health 2008; 5 (5): 746-60 96. MacFarlane DJ, Lee CCY, Ho EYK, et al. Reliability and validity of the Chinese version of IPAQ (short, last 7 days). J Sci Med Sport 2007; 10 (1): 45-51 97. Vandelanotte C, de Bourdeaudhuij I, Philippaerts R, et al. Reliability and validity of a computerized and Dutch version of the International Physical Activity Questionnaire (IPAQ). J Phys Act Health 2005; 2 (1): 63 98. Rosenberg DE, Bull FC, Marshall AL, et al. Assessment of sedentary behavior with the International Physical
ª 2010 Adis Data Information BV. All rights reserved.
99.
100.
101.
102.
103.
104.
105.
106.
107.
Activity Questionnaire. J Phys Act Health 2008; 5 Suppl. 1: S30-44 Slattery ML, Jacobs Jr DR. The inter-relationships of physical activity, physical fitness, and body measurements. Med Sci Sports Exerc 1987; 19 (6): 564-9 Leicht A. Validation of a one-day self-report questionnaire for physical activity assessment in healthy adults. Eur J Sport Sci 2008; 8 (6): 389-97 Kwak L, Kremers SPJ, van Baak MA, et al. Measuring physical activity in field studies: comparison of a questionnaire, 24-hour recall and an accelerometer. Eur J Sport Sciences 2007; 7 (4): 193-201 Chasan-Taber L, Erickson JB, Nasca PC, et al. Validity and reproducibility of a physical activity questionnaire in women. Med Sci Sports Exerc 2002; 34 (6): 987-92 Hopkins WG, Wilson NC, Worsley FA, et al. Reliability of the core questionnaire in the life in New Zealand Survey. NZ J Health Phys Edu Rec 1991; 24 (3): 21-2 Lof M, Hannestad U, Forsum E. Comparison of commonly used procedures, including the doubly-labelled water technique, in the estimation of total energy expenditure of women with special reference to the significance of body fatness. Br J Nutr 2003; 90 (5): 961-8 Conway JM, Seale JL, Jacobs Jr DR, et al. Comparison of energy expenditure estimates from doubly labeled water, a physical activity questionnaire, and physical activity records. Am J Clin Nutr 2002; 75 (3): 519-25 Terwee CB, Dekker FW, Wiersinga WM, et al. On assessing responsiveness of health-related quality of life instruments: guidelines for instrument evaluation. Qual Life Res 2003; 12 (4): 349-62 Kurtze N, Rangul V, Hustvedt BE. Reliability and validity of the international physical activity questionnaire in the Nord-Trondelag health study (HUNT) population of men. BMC Med Res Methodol 2008 Oct 9; 8: 63
Correspondence: Dr Mireille N.M. van Poppel, Department of Public and Occupational Health, EMGO Institute for Health and Care Research, VU University Medical Center, Van der Boechorststraat 7, 1081 BT Amsterdam, the Netherlands. E-mail:
[email protected]
Sports Med 2010; 40 (7)
Sports Med 2010; 40 (7): 601-623 0112-1642/10/0007-0601/$49.95/0
RESEARCH REVIEW
ª 2010 Adis Data Information BV. All rights reserved.
Self-Administered Physical Activity Questionnaires for the Elderly A Systematic Review of Measurement Properties Lisa Forse´n,1 Nina Waaler Loland,2 Anne Vuillemin,3 Mai J.M. Chinapaw,4 Mireille N.M. van Poppel,4 Lidwine B. Mokkink,5 Willem van Mechelen4 and Caroline B. Terwee5 1 Norwegian Institute of Public Health, Division of Epidemiology, Oslo, Norway 2 Oslo University College, Faculty of Health Science, Oslo, Norway 3 Nancy-Universite´, Universite´ Paul Verlaine Metz, Universite´ Paris Descartes, EA 4360 Apemac, Nancy, France 4 Department of Public and Occupational Health and the EMGO Institute for Health and Care Research, VU University Medical Center, Amsterdam, the Netherlands 5 Department of Epidemiology and Biostatistics and the EMGO Institute for Health and Care Research, VU University Medical Center, Amsterdam, the Netherlands
Abstract
Objective: To systematically review and appraise studies examining selfadministered physical activity questionnaires (PAQ) for the elderly. This article is one of a group of four articles in Sports Medicine on the content and measurement properties of PAQs. Literature Search Methodology: Searches in PubMed, EMBASE and SportDiscus (until May 2009) on self-administered PAQ. Inclusion criteria were as follows: (i) the study examined (at least one of) the measurement properties of a self-administered PAQ; (ii) the questionnaire aimed to measure physical activity (PA) in older people; (iii) the average age of the study population was >55 years; (iv) the article was written in English. We excluded PA interviews, diaries and studies that evaluated the measurement properties of a self-administered PAQ in a specific population, such as patients. We used a standard checklist (qualitative attributes and measurement properties of PA questionnaires [QAPAQ]) for appraising the measurement properties of PAQs. Findings: Eighteen articles on 13 PAQs were reviewed, including 16 reliability analyses and 25 validity analyses (of which 15 were on construct validity, seven on health/functioning associations, two on known-groups validity and one on responsiveness). Many studies suffered from methodological flaws, e.g. too small sample size or inadequate time interval between test and retest. Three PAQs received a positive rating on reliability: IPAQ-C (International Physical Activity Questionnaire–Chinese), intraclass correlation coefficient (ICC) ‡ 0.81; WHI-PAQ (Women’s Health Initiative–PAQ), ICC = 0.76; and PASE (Physical Activity Scale for the Elderly), Pearson correlation
Forse´n et al.
602
coefficient (r) = 0.84. However, PASE was negatively rated on reliability in another study (ICC = 0.65). One PAQ received a positive rating on construct validity: PASE against Mini-Logger (r > 0.52), but PASE was negatively rated in another study against accelerometer and another PAQ, Spearman correlation coefficient = 0.17 and 0.48, respectively. Three of the 13 PAQs were tested for health/functioning associations and all three were positively rated in some categories of PA in many studies (r > 0.30). Conclusions: Even though several studies showed an association between the tested PAQ and health/functioning variables, the knowledge about reliability and construct validity of self-administrated PAQs for older adults is still scarce and more high-quality validation studies are needed.
1. Background This article is one of a group of four articles in Sports Medicine on the content and measurement properties of physical activity questionnaires (PAQs).[1-3] Due to the aging of the world’s population, a major challenge for professionals, politicians and society is to maintain a high quality of life among older people. One of the strongest determinants of high quality of life is the maintenance of good health. To preserve or reach good health, older adults should maintain or adopt a physically active lifestyle as recommended in international guidelines.[4] It is of general agreement that physical activity (PA) is beneficial for health in old age.[5-7] Research in this area is of growing importance. To be able to give evidence-based, safe public health recommendations, it is necessary to study benefits and contraindications of PA in older people. In this context, PA measurement is essential, and a challenge is to identify measurement tools that provide valid and reliable estimates of PA in this population. PA has been defined as ‘‘any bodily movement produced by skeletal muscles that results in energy expenditure’’[8] and it can be assessed by several methods, such as energy expenditure (EE) measures, motion sensors, heart-rate monitoring, activity diaries and questionnaires. PA instruments for old age should be sufficiently detailed and include light activities that are common among older people. Additionally, the instruª 2010 Adis Data Information BV. All rights reserved.
ments should have adequate measurement properties.[2] If the measurement properties are poor, the risk of misclassification is high. Accurate measurement of PA in old age, with acceptable reliability, validity and responsiveness to change, is important when the aims are as follows: Classify the aging population into categories of PA according to their participation in various activities of daily living and leisure activities. Monitor changes in PA in the aged population. Evaluate PA interventions among older people. Identify relations between PA in old age and health outcomes. Quantify dose-response relationships between PA in old age and health outcomes. Formulation of evidence-based, safe public health recommendations for older people must be conducted based on results from the previously mentioned aims. The best choice of method for measuring PA depends on various criteria but in epidemiological studies and large-scale trials, questionnaires are the most commonly used instrument. PA is frequently included as a co-variate in epidemiological studies concerning older populations with morbidity or mortality, or PA, as the outcome. In addition, PA is used in interventions either as a co-variate or as the intervening variable to improve PA or health in old age. Up to now there has been no consensus regarding which PAQ to use in the different situations. Knowledge about the measurement properties of the chosen PAQ is often scarce. The chosen PAQ Sports Med 2010; 40 (7)
Physical Activity Questionnaires for the Elderly
is often developed for younger adults and is thereby not necessarily suitable for older subjects. A systematic literature review by Jørstad-Stein et al.[8] attached to ProFaNE (The Prevention of Falls Network Europe[9]) was undertaken for the time period of 1966 to July 2003. They concluded that no single questionnaire stood out as a satisfactory PA measure for use with older adults in randomized controlled trials of fall-injury prevention and similar interventions at that time, stating ‘‘Further research is required to evaluate and compare the measurement properties of new and established instruments.’’[8] The aim of our study was to update the work of Jørstad-Stein et al.,[8] but restrict it to selfadministered questionnaires, and thus to undertake a new systematic literature review to identify and appraise self-administered PAQs used in large-scale studies concerning older people. We wanted to restrict this updated review to selfadministered PAQs because such PAQs were – and still are – used in large-scale studies, and we suspected that knowledge about their measurement properties was scarce. This updated review is attached to EUNAAPA (European Network for Action on Ageing and Physical Activity[10]). 2. Methods 2.1 Literature Search
Literature searches were performed in PubMed, EMBASE.com using ‘EMBASE only’, and in SportDiscus (entire databases until 11 May 2009) on the topic of self-administered PAQs. The full search strategy in PubMed was presented as follows: ‘exercise’[MeSH] OR ‘physical activity’ [tiab] OR ‘motor activity’[MeSH] AND ‘questionnaire’[MeSH] OR ‘questionnaire*’ [tiab]. Limits: ‘humans’. In EMBASE and SportDiscus, ‘physical activity’ and ‘questionnaire’ were used as free-text words, and in EMBASE this was complemented with the EMTREE term ‘exercise’. 2.2 Eligibility Criteria
We used the following inclusion criteria: (i) the study examined (at least one of) the measurement ª 2010 Adis Data Information BV. All rights reserved.
603
properties of a self-administered PAQ; (ii) the PAQ aimed to measure PA in older people; (iii) the average age of the study population was >55 years. The article had to be written in English. We excluded PA interviews or diaries and studies that evaluated the measurement properties of a self-administered PAQ in a specific population, such as patients. We also excluded studies that evaluated the measurement properties of a self-report PAQ administered in an interview form. 2.3 Selection of Papers and Data Extraction
Two independent reviewers performed abstract selection. Full-text articles of all abstracts that fulfilled the inclusion criteria were retrieved. We extracted data from the included articles, using a standardized data extraction form. 2.4 Quality Assessment of the Studies and Measurement Properties
We rated the methods and results of all evaluated measurement properties using a standard checklist for appraising the ‘Qualitative Attributes and measurement properties of PAQs’: the QAPAQ checklist.[2] Disagreements were discussed and resolved. Generally, reliability, validity and responsiveness (see definitions in the following sections) depend on the setting and the population in which they are assessed. Therefore, in addition to an acceptable size of the coefficient, a clear description of the design of each individual primary study – including characteristics of the study population (diagnosis and clinical features), measurements and testing conditions and data analysis – was required to receive a positive rating (i.e. a ‘plus’ sign in the tables). Furthermore, if any methodological weakness in the design or execution of the primary study was found, the evaluated measurement property was rated as indeterminate (i.e. a ‘question mark’ in the tables). 2.4.1 Reliability
Intraclass correlation coefficient (ICC) for continuous data, and Kappa for dichotomous or ordinal data, were considered as adequate measures of reliability.[2] In our rating, we assumed that the correct types of ICC and Kappa were used, Sports Med 2010; 40 (7)
Forse´n et al.
604
namely two-way ANOVA with random effects and absolute agreement for ICC and quadratic weights for Kappa. An ICC or Kappa >0.70 was considered as acceptable.[2] Initially, the use of Pearson or Spearman correlation coefficients was considered inadequate, because they neglect systematic errors.[11,12] However, several of the studies included in this article calculated Pearson or Spearman correlation coefficients. We considered it too conservative to rate all these studies as indeterminate. Pearson or Spearman correlations >0.80 would likely result in ICCs >0.70, if the mean difference between test and retest was small. We decided to rate studies with a Pearson or Spearman correlation >0.80 as positive. The time interval between the test and retest should be described and should be short enough to ensure that subjects had not changed their PA levels, but long enough to prevent recall. An adequate time interval was defined as >1 day, but <3 months for questionnaires recalling a usual week; >1 day, but <2 weeks for questionnaires recalling the previous week; >1 day, but <1 year for questionnaires recalling the previous year. We gave the reliability test the highest level of evidence (level 1) when the best reliability test was used, namely ICC or Kappa, with an adequate time interval; we gave a level of evidence 2 for other intervals. If a Pearson or Spearman correlation was used instead of ICC or Kappa, we gave a level 2 and 3 for adequate and inadequate intervals, respectively. A positive rating (+) was given if the reliability study satisfied the following criteria: the study population had at least 50 participants and the ICC, Kappa, Pearson or Spearman correlation was above the specified cut-off point. A negative rating (-) was given if the reliability coefficient was below the specified cut-off point and the study population had at least 50 participants. If the sample size was <50 participants or if there were any flaws in the design or analysis, the degree of reliability was indeterminate and the rating received a question mark (?). 2.4.2 Validity
Doubly labelled water (DLW) is considered the gold standard in the measurement of total ª 2010 Adis Data Information BV. All rights reserved.
daily EE (DEE)[13] and therefore DLW may be the criterion against which other measures of PA should be validated, thereby giving the PA measure its criterion validity. However, the degree to which DLW is a perfect gold standard for PA is questionable because DLW-assessed DEE is affected not only by PA, but also by the basal metabolic rate and the thermal effect of food processing in the body. Although the latter two are usually calculated and subtracted from the total DEE, comparisons between established PA measures, such as diary or accelerometer, and DLW-assessed DEE can show low correlations[14] and should not be called criterion validity. Instead, we considered the comparisons with DLW construct validity. PA may include other aspects in addition to EE. EE represented only the energy cost of PA, which was considered as a behaviour, and then could be described by other parameters.[15] Generally, the more similar the constructs being compared, the more evidence is provided for validity. We gave the highest level of evidence (level 1) for comparisons with DLW, level 2 for comparisons with accelerometer, pedometer and heart rate monitor and level 3 for comparisons with other questionnaires, detailed interview and diary. All these validity instruments tended to measure the same construct. In this review we called the corresponding association coefficient construct validity. When comparing a PA instrument with a measure not assessing the same construct at all, but a health/functioning variable or maximum . oxygen uptake/consumption (VO2max) where an association was expected, we called the ‘validity’ coefficient a health/functioning association, and we gave evidence level 3. According to QAPAQ,[2] the rating received a plus score (+) if the correlation was above the specified cut-off point (0.70 for DLW and ped. ometer; 0.60 for VO2max, 0.50 for accelerometer, Mini-Logger, diary and other questionnaires; and 0.30 for physical functioning and health variables) and a negative score (-) if the correlation was below the cut-off point. But if the sample size was <50 or if there were any flaws in the design or analysis, the rating received a question mark (?). Sports Med 2010; 40 (7)
Physical Activity Questionnaires for the Elderly
605
administered, but neither reliability nor validity analyses were given in the full-text article, or the PAQ was self-administered in a specific population, such as patients. However, one was deleted because the full text showed mean age was <55 years, and the article was instead included in the corresponding review for adults. The remaining 18 articles[13,18-34] were thoroughly studied. They included reliability and/or validity studies for 13 different PAQs for older adults. We did not exclude articles with self-administered PAQs where some participants had received help with the completion of the questionnaire (seven articles) or articles for which the administration was unknown (two articles, table I).
2.4.3 Responsiveness
Responsiveness referred to an instrument’s ability to detect change over time in the concept being measured.[16] It should be considered an aspect of validity in a longitudinal setting.[16] According to QAPAQ,[2] we gave level 1 evidence if the authors had a hypothesis of expected effect size a priori; otherwise we gave level 2 evidence. A positive rating was given if the effect was of medium size or better; that is, ‡0.50.[17] 3. Results 3.1 The Literature Search
The search resulted in a total of 21 891 hits and a total of 393 papers were selected based on title and abstract, 59 of which concerned older adults (figure 1). After reading the 59 full-text articles, an additional 41 were deleted because the PAQ was not self-administered or the PAQ was self-
3.2 Description of Instruments
The 13 PAQs were all developed to be selfadministered. Consequently, the people in their target populations had to be able to fill out the
Total 21 891
PubMed 9733
EMBASE 7601
SportDiscus® 4284
Selection based on titles and abstracts 284
Titles and abstracts not in PubMed 55
Titles and abstracts not in PubMed or EMBASE 54
Total 3931
Children 83
Adults 260
Elderly 59 Excluded2 41 Included3 18 articles on 13 questionnaires
Fig. 1. Flow chart of the 21 891 hits found in the literature databases. 1 Eight articles appear in both the review for adults and the review for elderly, and one article appears in both the review for children and the review for adults. 2 Forty-one articles were excluded because the physical activity questionnaire (PAQ) was not self-administered, or the PAQ was self-administered but neither reliability nor validity analyses were given in the full-text article, or the PAQ was self-administered in a specific population, such as patients. 3 Eighteen articles included reliability and/or validity analyses for 13 different PAQs. All the 13 PAQs were self-administered, but in 7 of the 18 studies some of the participants received help if needed.
ª 2010 Adis Data Information BV. All rights reserved.
Sports Med 2010; 40 (7)
606
ª 2010 Adis Data Information BV. All rights reserved.
Table I. Description of self-administered physical activity questionnaires (PAQs) for the elderly Questionnairea
EPIC[19]
Purpose
To provide a PA measure to study association between lifestyle and cancer in 500 000 participants in ten western countries
Population
Elderly, aged 50–65 y, home dwelling in Australia[19]
Construct
Format
content
setting
recall period
dimensions
no. of questionsb
scores (interpretability)
PA in occupational domain
Occupational
Past y
Four categories: sitting, standing, manual work, heavy manual work
One question with four categories
Type and amount of PA at work (four categories)
PA in leisure and household domains
Transport Sport Recreational Home
Typical wk in summer and winter, past y
D (h/wk) I (nonvigorous and vigorous)
One question with ten categories; one question; vigorous (yes/no) and h/wk; one question about climbing stairs Self-administered
MET h/wk
A combined total PA index based on occupational, leisure and home (four categories: inactive, moderately inactive, moderately active, active)
To be sufficiently sensitive to detect changes in PA
Elderly, aged >65 y, home dwelling, good cognitive status, no pacemaker
Types and intensity levels of PA that are meaningful and appropriate for older adults, (including lighter activities)
Transport Home Recreational Sport
Typical wk in the last 4 wks
F D I (indirectly reported)
41 Self-administered, 15–30 min to complete; 25% needed help at the first test,[20] CHAMPS by mail,[22] research assistant reviewed CHAMPS for completeness[24,31]
MVPA and for all activities: h/wk F/wk MET h/wk
FPACQ[25]
To assess detailed information on several
Retired people aged 48–78 y, recruited from a
PA in leisure and household domains (FPACQ for retired people
Transport Home Sport TV/PC play
A usual wk
D (h/wk) for home; F, D in three sports selected from
56–70 self-administered, computerized, easy and user-friendly also
Home, TV/PC play, eating, sleeping: h/wk Sports: Kcal/h, Continued next page
Forse´n et al.
Sports Med 2010; 40 (7)
CHAMPS[20,22,24,31]
Questionnairea
Purpose
Population
Construct
Format
content
setting
recall period
dimensions
no. of questionsb
scores (interpretability)
a list of 196 different sports with predefined MET
for those with no experience on computers, 20–30 min to complete; help available if needed; a computer is necessary to be able to answer this PAQ
Kcal/wk Total PA: MET h/wk
dimensions of PA and sedentary behaviour over a usual wk
larger community sample, Belgium
does not include occupational PA); lighter activities are registered for home and sports (two METs)c
Eating Sleeping
IPAQ-C[21]
To introduce a self-reported PAQ in Chinese elderly – a short version of IPAQ; to obtain internationally comparable estimates of PA
Elderly, mean age >65 y, physically able to manipulate pedometer, mentally able to provide consent
Habitual PA; no focus on lighter activities except for walking and sitting (IPAQ is not developed especially for the elderly)
Walking MVPA lasting at least 10 min; time spent in sedentary activity (sitting and lying awake)
The previous 7d
F D
Nine; participants were asked to complete it themselves, but many got help because they were illiterate; the proportion who received help is unknown
IPAQ-C data converted to MET within each activity: one MET for sitting; 3.3 for walking; four for moderate activity; eight for vigorous activity[35]
Modified Baecke[30]
To be able to rank older women according to PA in epidemiological studies
Women aged 51–71 y
All habitual PA in the elderly
Occupational Home Recreational Sport Sleeping
Past y 13 mo (at baseline, 5 and 11 mo)
D
16 + 3 (uncertain whether the PAQ was selfadministered in this study[30])
PA scores within work, sports, leisure time
PAQ being age-relevant and memoryenhancing, taking only a few min to complete; improves on the design of
Older athletic women aged 51–71 y, older men and women aged 65–90 y, communitybased women aged ‡70 y, volunteers from
Home Recreational Sport Exercise
Every d in the past wk
D Indirectly approximate I
38 + 5 categories by the 7 d of the past wk, two pages (self-assessment inventory); few min to complete
MET units Calculating EE in each category, making a weekly exercise status by summing over the categories (TOTKCAL) MILDCAL (<4 METs)
OA-ESI[27]
An overall activity score
Continued next page
607
Sports Med 2010; 40 (7)
Activities typically undertaken by older people; lighter activities are registered, both recreational
Physical Activity Questionnaires for the Elderly
ª 2010 Adis Data Information BV. All rights reserved.
Table I. Contd
608
ª 2010 Adis Data Information BV. All rights reserved.
Table I. Contd Questionnairea
Purpose
Population
Construct content
Format setting
recall period
dimensions
no. of questionsb
scores (interpretability)
randomly selected sites
PA and housework
MODKCAL (4–5.9 METs) VIGKCAL (6+ METs)
PASE[13,23,24,33]
PAQ being a brief easy tool for assessment of short-term PA in epidemiological studies of the elderly
Communitydwelling older adults aged ‡65 y, without serious physical or mental impairments
Activities commonly engaged in by elderly – and does not emphasize sport and recreational activities; lighter activities are registered, both recreational PA and housework
Occupational Home Recreational
The previous 7d
F D I (categories)
12 Self-administered (5 min to complete); some help if needed,[23,24] unknown administration,[13] mail administered[33]
PASE activity score: time spent in each activity (h/wk) or participation (yes/no) · PASE weight summed for all activities
PAQ-EJ[34]
To be a selfadministered PAQ for elderly Japanese
All elderly aged ‡65 y in Nakanojo in Japan, except those who were severely demented, bedridden, institutionalized or hospitalized
It focuses on the four domains of PA common among elderly Japanese (and older people in many other parts of the world); lighter activities are registered, both recreational PA and housework)
Transport Home Recreational Sport Exercise Resistance
A typical wk in the preceding mo
F D I (light, moderate or strenuous)
14 questions on two pages; selfadministered
PAQ-EJ score (MET h/wk) = no. of d · time · intensity weight (given in the paper’s appendix 2)[34]
Pre-EPIC[30]
Able to rank older women according to PA in epidemiological studies
Older women aged 51–71 y
All aspects of average daily life; an overall activity score; lighter activities are registered, both recreational PA and housework
Rest Transport Occupational Home Sport Other activities
Past y (at 0, 5 and 11 mo)
D (all activities should add up to 24 h)
28 PAQ administration unknown
TEE
QAPSE[18]
A PAQ for use in the elderly population to provide a
Healthy elderly aged 65–84 y
Complete daily PA and DEE; lighter activities are registered, both
Occupational Home Recreational
A typical wk including weekend
D I
35 Self-administered in this study because all
Mean habitual daily EE, QAPSE activity score, Continued next page
Forse´n et al.
Sports Med 2010; 40 (7)
previous 7 d recall
Questionnairea
Purpose
Population
complete measure of daily PA and DEE
Construct
Format
content
setting
recreational PA and housework
Basic ADL Miscellaneous
recall period
dimensions
no. of questionsb
scores (interpretability)
participants had good mental health
Sport activities DEE >3 METs
SBAS[32]
To give a quick assessment of usual amount and intensity of PA a person performs throughout the d
Healthy controls in a case control study, men and women, aged 60–69 y
One of five levels of PA within job and leisure (small focus on lighter activities)
Transport Occupational Home Recreational Sport
A typical d
F D I (represented in five levels of PA)
Two questions with five categories each, £5 min to complete; self-administered, reviewed by staff for completeness, help if needed
A job leisure category of PA
Self-Administered PAQ[28,29]
Designed to assess total (24h) current and historical PA
Women free of cancer, aged 56–75 y
Total PA (small focus on lighter activities)
Occupational Transport Home TV/reading Exercise
Current (last y) and historical by 50, 30 and 15 y of age
D (min/d, h/d, h/wk) I = 2 MET for all PA (average I of light activities)
5 Self-administered
MET h for home/housework; work/occupation; exercise; walking/bicycling; watching; TV/reading; total
WHI-PAQ[26]
A PAQ specially developed for women
Postmenopausal women, aged 50–79 y
First form: usual exercise or recreational activity; second form: heavy indoor household activities and yard activities
Home Recreational Exercise Time spent in sedentary activity (sitting and lying awake)
Usual, without reference to a specific time frame
F (d/wk) D (four categories from <20 to >60 min) I (light, moderate or strenuous)
9 Self-administered
MET h/wk
See table II for definitions of questionnaire acronyms.
b
Ease of use in the actual populations.
c FPACQ was developed on the basis of IPAQ, long version, and other PAQs (IPAQ is not developed especially for elderly, but the other PAQs are). ADL = activities of daily living; D = duration; DEE = daily EE; EE = energy expenditure; F = frequency; Home = home activities (household and gardening); I = intensity; Kcal = kilocalorie, (1 Kcal = 1000 calories); MET = metabolic equivalent; MILDCAL = those with <4 METs; MODKCAL = those with 4–5.9 METs; MVPA = moderate to vigorous PA; no. = number; PA = physical activity; PC = personal computer; TEE = total EE; TOTCAL = MILDCAL + MODKCAL + VIGKCAL; TV = television; VIGKCAL = those with ‡6 METs.
609
Sports Med 2010; 40 (7)
a
Physical Activity Questionnaires for the Elderly
ª 2010 Adis Data Information BV. All rights reserved.
Table I. Contd
Forse´n et al.
610
Table II. Definitions of physical activity questionnaire acronyms Acronym
Definition
CHAMPS
Community Healthy Activities Model Program for Seniors
EPIC
European Prospective Investigation into Cancer and Nutrition
FPACQ
Flemish Physical Activity Computerized Questionnaire
IPAQ-C
International Physical Activity Questionnaire – Chinese
Modified Backe
Questionnaire developed by Baecke is expanded by three questions
OA-ESI
Older Adult Exercise Status Inventory
PAQ-EJ
Physical Activity Questionnaire for Elderly Japanese
PASE
Physical Activity Scale for the Elderly
Pre-EPIC
Questionnaire preceding EPIC
QAPSE
Questionnaire d’Activite´ Physique SaintEtienne
SBAS
Stanford Brief Activity Survey
Self-administered PAQ
Self-administered Physical Activity Auestionnaire of the Swedish Mammography Cohort
WHI-PAQ
Women’s Health Initiative-PAQ
questionnaires (table I). PASE (see table II for definitions of PAQ acronyms) was tested in community-dwelling older adults usually aged >65 years.[13,24,33] CHAMPS was also tested in elderly people aged >65 years, both home dwelling[31] and in community centres.[24] In addition, both CHAMPS and PASE were tested in retirement homes[24] among older adults with good cognitive status and no pacemaker. The authors of the 18 articles in our review tried to use a random selection, but they ended up with a convenience sample of volunteers. Some authors gave the time needed to fill out the questionnaire. This completion time depended both on the actual questionnaire and on the cognitive health of the study population. For the OA-ESI, PASE and SBAS questionnaires, the authors reported that one needed £5 minutes to complete the PAQ. For CHAMPS, tested in a population with adults aged ‡65 years, with good cognitive status and no pacemaker, the participants needed about 15–30 minutes to complete the questionnaire and this was also the case for FPACQ. There were no reª 2010 Adis Data Information BV. All rights reserved.
ported completion times for the other questionnaires. The number of questions in each questionnaire was not always provided in the articles, but was easily found elsewhere (table I). The purposes of the PAQs varied. CHAMPS was developed to be used as an outcome measure of a PA promotion intervention (i.e. it was developed to be sufficiently sensitive to detect changes in PA over time). It includes lowintensity PA. PASE was developed to be a brief and easy tool for assessment of short-term PA in epidemiological studies of the elderly. The respondent had to recall the last week – as was also the case with OA-ESI. With QAPSE, the recall period was a typical week, with SBAS it was a typical day and for CHAMPS it was a typical week in the last month. In the self-administered PAQ, in PreEPIC and in EPIC the respondent recalled a much longer period (table I). Seven PAQs included questions about occupational activities and seven included questions about transport. Questions about housework were included in all PAQs except for IPAQ-C. Seven PAQs tended to measure all sorts of light activities typically undertaken by older adults. Three PAQs with less focus on light PA were IPAQ-C, SBAS and selfadministered PAQ. The unit of measurement most often used in the 13 PAQs was metabolic equivalents (MET) hours per day or week within the different types of activities, or within moderate- and vigorousintensity activity and/or total activities. To be able to compute EE for given activities, one needed a table containing MET information, suitable for older people such as was given for CHAMPS.[31] All 13 questionnaires asked for duration, while intensity was often measured in pre-defined MET categories (table I). The authors of the CHAMPS estimated the MET hours and total PA EE per time period by means of MET tables of a number of activities.[31] PreEPIC and QAPSE gave TEE (total EE) and mean habitual DEE, respectively. PASE and Modified Baecke did not use MET tables, and EE was not estimated. They asked for intensity directly in categories or gave a graded score of 1–5, respectively (table I), and activity Sports Med 2010; 40 (7)
Physical Activity Questionnaires for the Elderly
scores could be calculated both in total and/or within different activities. SBAS gave a combined job and leisure category of PA. 3.3 Measurement Properties 3.3.1 Reliability
IPAQ-C and WHI-PAQ received positive ratings with a high level of evidence. PASE also received a positive rating,[33] but the level of evidence was low because the time between test and retest was too long to test reliability and the coefficient was not the ICC as recommended (table III). In addition, PASE was negatively rated, with a high level of evidence in a Japanese study.[23] Only seven of 15 reliability studies calculated ICCs.[20-22,25,26,28,31] The other eight were presented with a Pearson or a Spearman correlation. In eight studies the sample size was too small to be able to draw definite conclusions. Four of the 15 studies used too long a time interval between the test and retest.
611
Health/Functioning Associations
Seven of the 25 validity studies compared the actual PAQ with an. instrument measuring physical functioning, VO2max or health variables (table IV). The comparison instruments were chair stand, step test, 6-minute walk, body mass index, SF-36 (including physical functioning, general . health perceptions, mental health and pain), VO2max, body mass, skin-fold thickness, fat-free-mass and body fat. All comparisons were given level 3 of evidence and all studies except for one[23] had a positive rating in some of the categories of the tested PAQ (table IV). Two of the 25 validity studies examined known-groups validity.[24] Both CHAMPS and PASE showed highly significant differences in PA between retirement homes and community centres as hypothesized. Unfortunately, the authors did not specify the magnitude of the expected differences in their hypotheses, which makes it difficult to judge whether the hypotheses were confirmed. So, according to QAPAQ[2] we rated this as level 3 evidence (but scored +).
3.3.2 Validity Responsiveness Construct Validities
Fifteen of 25 validity studies compared the actual PAQ with an instrument measuring the same construct or near the same construct and were described in 12 papers[13,19,21-25,27,29-31,34] (table IV). The comparison instruments were DLW,[13] Mini-Logger at ankle and Mini-Logger at waist (where, in addition, heart rate was transmitted to the Mini-Logger),[24] accelerometer,[19,23,29,34] pedometer[21,22] diary[30] and other PAQs.[19,23,27,31] PASE was the only study to validate the actual PAQ to DLW,[13] giving a high level of evidence. Due to a small sample size, its rating value was indeterminate (1?). We actually found several studies with DLW as the comparing instrument, but the studies were not eligible for our review because the PAQ was used in an interview format. Eight of the 15 studies had a medium level of evidence, and of these, only one received a positive rating (PASE against MiniLogger);[24] however, PASE was negatively rated against accelerometer and another PAQ in a Japanese study.[23] ª 2010 Adis Data Information BV. All rights reserved.
Responsiveness was evaluated for only one instrument: CHAMPS.[31] CHAMPS was developed to be sufficiently sensitive to detect changes in PA after an intervention on PA in elderly people. The effect size for the caloric expenditure measures were 0.38 and 0.42 (i.e. assessed as small to moderate); the effect sizes for the frequency measures were 0.54 and 0.64 (moderate). According to QAPAQ[2] we rated this validity analysis with level 2 evidence instead of level 1 because the authors gave no a priori hypothesis of expected effect size. We gave a score of 2+ for the frequency measures because the effects were of medium size,[31] while caloric expenditure measures were negatively rated. 4. Discussion The aim of our study was to undertake a systematic literature review to identify and appraise self-administered PAQs concerning older people. We identified 13 PAQs in 18 articles. Ideally, the study population used to test the Sports Med 2010; 40 (7)
612
ª 2010 Adis Data Information BV. All rights reserved.
Table III. Reliability of self-administered physical activity questionnaires (PAQs) used in older adults PAQa
Study population (n, gender, age range or mean age, health)
Time interval between test-retest
Resultsb
Rating
EPIC[19]
n = 182 Adults from Australia, aged 50–65 y, volunteer participants from a health survey, slightly more men and slightly more overweight than the general population
10 mo Recall period: past y and a typical wk during past y
Total non-occupational PA MET h/wk rho = 0.65 (0.55, 0.72) Total PA index: kw = 0.62 (0.53, 0.71)
21-
n = 43 Adults from Australia for reliability analyses, 72% women aged 65–96 y, mean (SD) age 77.4 (6.6), 72% had ‡3 health problems in the validity sample (167). The testretest group (43 adults) was a bit healthier
1 wk Recall period: a typical wk over the last 4 wks; participants received help if needed the first time (25%) and PAQ sent by post the second time
MVPA ICC h/wk: ICC = 0.78 (0.63, 0.87) F/wk: ICC = 0.76 (0.59, 0.86) MET h/wk: ICC = 0.76 (0.61, 0.87) All activities (ICC) h/wk: ICC = 0.76 (0.60, 0.86) F/wk: ICC = 0.79 (0.64, 0.87) MET h/wk: ICC = 0.75 (0.58, 0.86)
n = 1 47 Home-dwelling elderly, aged ‡65 y
6 mo (the ICC underestimates the reliability because it also includes stability)
MVPA ICC Cal.Exp./wk: ICC = 0.67 F/wk: ICC = 0.58 All activities measure: Cal.Exp./wk: ICC = 0.66 F/wk: ICC = 0.62
CHAMPS[20]
CHAMPS[31]
CHAMPS[22]
n = 54 Older Australian adults aged ‡65 y, but complete matched data: n = 29–46 (due to data loss)
1–2 wk; CHAMPS by post
2222-
1? 1? 1? 1? 1? 1? 1? 1? 1? 1? 1? 1?
Continued next page
Forse´n et al.
Sports Med 2010; 40 (7)
ICC Sessions/wk: walking: ICC = 0.93 (0.86, 0.97) moderate I: ICC = 0.83 (0.66, 0.92) vigorous I: ICC = 0.86 (0.74, 0.93) total PA: ICC = 0.89 (0.77, 0.95) Duration min/wk: walking: ICC = 0.83 (0.68, 0.91) moderate I: ICC = 0.79 (0.61, 0.89) vigorous I: ICC = 0.79 (0.61, 0.88) total PA: ICC = 0.81 (0.63, 0.90) Volume MET min/wk: walking: ICC = 0.85 (0.71, 0.92) moderate I: ICC = 0.80 (0.63, 0.89) vigorous I: ICC = 0.78 (0.59, 0.88) total PA: ICC = 0.84 (0.69, 0.91)
1? 1? 1? 1? 1? 1?
PAQa
Study population (n, gender, age range or mean age, health)
Time interval between test-retest
Resultsb
Rating
FPACQ[25]
n = 36 Retired people, 20 men and 16 women, mean age 63.65 and 63.31 y, respectively
2 wk
EE total; overall energy expenditure during a usual wk: Men: kcal/wk: ICC = 0.90 (0.76, 0.96) PAL (MET): ICC = 0.89 (0.76, 0.96) Women: kcal/wk: ICC = 0.96 (0.90, 0.99) PAL (MET): ICC = 0.77 (0.47, 0.91)
1? 1?
IPAQ-C[21]
1? 1?
n = 218 Men and women aged 51–82 y (66.1% women)
8d Participants completed the IPAQ-C for the second time on day 9 (see ‘Procedure’, p.304[21]); participants were asked to complete it themselves, but many got help because they were illiterate. The proportion who received help is unknown
IPAQ-C (MET min/wk) vigorous activity: ICC = 0.83 (0.78, 0.87) moderate activity: ICC = 0.81 (0.76, 0.85) walking: ICC = 0.85 (0.81, 0.88) sitting: ICC = 0.89 (0.86, 0.91) total excluding sitting: ICC = 0.84 (0.80, 0.87)
1+ 1+ 1+ 1+ 1+
n = 30 Women aged 51–71 y, mean (SD) age 61.2 (6.7) y recruited from a breast cancer screening project in Utrecht
5 and 11 mo; the measure period was the past y
PA score 5 mo: r = 0.82 (n = 30) 11 mo: r = 0.73 (n = 28)
2? 2?
OA-ESI[27]
n = 17 Test 1: Athletic sample of 17 women from Edmonton, aged 58–80 y (mean 67 y)
0–4 weeks OA-ESI was administered twice during a 4-wk period; the measure period was the past wk
Total amount of exercise: r = 0.340 NS Moderate exercise: r = 0.756; p < 0.001 Vigorous exercise: r = 0.505; p < 0.05
3?
OA-ESI[27]
n = 29 Test 2: 29 adults aged 65–90 y (mean age 71 y)
1 wk Participants reported on 2 different wks of activity spaced 1 wk apart
r = 0.77
2?
PASE[33]
n = 78 Men and women answered a mailed PASE questionnaire, distribution of sex in this sample is not given, age range in the total
3–7 wk The measure period was the previous 7 d
PASE mail administered: r = 0.84
3+
Modified Baecke[30]
613
Sports Med 2010; 40 (7)
Continued next page
Physical Activity Questionnaires for the Elderly
ª 2010 Adis Data Information BV. All rights reserved.
Table III. Contd
614
ª 2010 Adis Data Information BV. All rights reserved.
Table III. Contd PAQa
Study population (n, gender, age range or mean age, health)
Time interval between test-retest
Resultsb
Rating
1-
sample of 222 was 65–99 y; they lived in their own households without any serious mental or physical impairment n = 325 Healthy elderly Japanese, aged ‡65 y (mean age 72.6 y)
3–4 wk
PASE score; self-administered, 6.8% needed some help. ICC = 0.65
PAQ-EJ[34]
n = 146 A convenience sample of 61 men and 85 women, aged 65–85 y; severely demented, bedridden, institutionalized and hospitalized patients were illegible for the study
1 mo The measure period was a typical wk from the preceding mo
PAQ-EJ score Total score: r = 0.70 Low intensity activity categories: r = 0.64 Higher intensity: r = 0.71
222-
n = 29 Women aged 51–71 y, mean (SD) age 61.2 (6.7) y, recruited from a breast cancer screening project in Utrecht
5 and 11 mo The measure period was the past y
PA score 5 mo: r = 0.42 (n = 29) 11 mo: r = 0.60 (n = 28)
2? 2?
n = 44 Healthy elderly men and women, aged 65–84 y
At least 6 wk, but unknown whether it was <3 mo The measure period was a typical wk including the weekend
r MHDEE: 0.967, p < 0.0001 DEE >3 MET: 0.859, p < 0.0001 Professional activities: 662, p < 0.0001 Leisure activities: 0.770, p < 0.0001 Sports activities: 851, p < 0.0001 Housework: 0.873, p < 0.0001 Basic daily activities: 774, p < 0.0001 Moving index activities: 0.648, p < 0.0001
3? 3? 3? 3? 3? 3? 3? 3?
n = 303 Women free of cancer, aged 56–75 y
1y The measure period was the past y
Current age (past y) (ICC) MET h/d: Housework: ICC = 0.58 (0.50, 0.65) Occupation: ICC = 0.59 (0.51, 0.66) Exercise: ICC = 0.49 (0.40, 0.58) Walking/cycling: ICC = 0.56 (0.48, 0.64)
1111-
Pre-EPIC[30]
QAPSE[18]
Sports Med 2010; 40 (7)
Self-administered PAQ[28]
Continued next page
Forse´n et al.
PASE[23]
See table II for definitions of questionnaire acronyms.
ICC and rho are sometimes given with (95% confidence intervals).
a
b
n = 1092 Postmenopausal women, aged 50–79 y n = 569 repeated recreational PA n = 523 repeated home PA
ª 2010 Adis Data Information BV. All rights reserved.
615
Cal.Exp. = calorie expenditure; DEE = daily energy expenditure; EE = energy expenditure; F = frequency; I = intensity; ICC = intra class correlation coefficient; MET = metabolic equivalent; MHDEE = mean habitual daily energy expenditure; MVPA = moderate to vigorous PA; NS = not statistically significant; PA = physical activity; PAL = physical activity level; r = Pearson correlation coefficient; rho = Spearman correlation coefficient; TV = television; jw = weighted k.
1-
1-
1-
1+
Total recreational PA (MET h/wk) ICC = 0.76 (0.71, 0.79) Household PA (MET h/wk) ICC = 0.60 (0.55, 0.66) Yard PA (MET h/wk) ICC = 0.71 (0.66, 0.75) Sitting and lying down (MET h/wk) ICC = 0.60 (0.54, 0.65)
11 - (+) Watching TV/reading: ICC = 0.59 (0.51, 0.67) Total: ICC = 0.69 (0.62, 0.75)
WHI-PAQ[26]
3 mo Recall period: usual PA, without reference to a specific timeframe
Rating Study population (n, gender, age range or mean age, health) PAQa
Table III. Contd
Time interval between test-retest
Resultsb
Physical Activity Questionnaires for the Elderly
actual PAQ for reliability and validity should be a random sample from the target population; however, although many authors attempted a random selection, they ended up with a convenience sample of volunteers. Another problem was that many participants needed help with completion of the PAQ even though the PAQ was meant to be self-administered; this was the case in 7 of the 18 articles. Of the 13 PAQs, 9 were developed especially for the elderly; however, not all focused on light activities common among elderly people. Measurement of lighter activities is essential in the elderly since most of the activities are at a lower level of intensity. The insufficient or inadequate investigation of lighter activities in the various PAQs may be a key factor in explaining the rather deceptive results for validation, in addition to the other problems described in the following section on validity. A PAQ without focus on lighter activity may be less useful when the aim is to classify the elderly into levels of PA, monitor changes in PA, evaluate PA interventions, identify relations between PA and health and quantify dose response between PA and health. 4.1 Reliability
Of the 15 reliability studies, 11 were either too small or the time interval between test and retest was too long to adequately test reliability. Three PAQs received a positive rating (IPAQ-C, WHI-PAQ and PASE); however, PASE had a low level of evidence due to the inadequate time interval and use of the Pearson correlation coefficient instead of the ICC. It is worth noting that PASE was also negatively rated with high level of evidence in another study.[23] CHAMPS achieved a near positive rating in two studies, and further reliability studies of high quality are needed to reveal whether CHAMPS is capable of achieving a positive rating. Jørstad-Stein et al.[8] concluded in their review that CHAMPS is a promising questionnaire due to its responsiveness, but that it needs further development to improve reliability. The second recall period in our sample of studies was never the same as the first recall period in the test-retest analyses, which Sports Med 2010; 40 (7)
616
ª 2010 Adis Data Information BV. All rights reserved.
Table IV. Validity of self-administered physical activity questionnaires (PAQs) used in older adults PAQa
Study population (n, sex, age range, mean age, health)
Validity (construct validity or health/ functioning association)
Comparison measure
Resultsb
Rating
EPIC[19]
n = 182
Construct validity
Accelerometer
Total non-occupational PA, MET h/wk rho = 0.21 (0.07, 0.35)
2-
EPIC[19]
n = 182
Construct validity
Other PAQ: LTPAQ, telephone administered
Total non-occupational PA, MET h/wk rho = 0.26 (0.11, 0.39)
3-
CHAMPS[20]
n = 167 132 women, 35 men, aged 65–96 y, mean (SD) age 77.4 (6.6) y, 72% had ‡3 health problems
Health/functioning associations
With physical performance tests and self-reported physical and mental well-being
Chair stand: MVPA MET h/wk: rho = 0.19 F/wk: rho = 0.16 Chair stand: all activities MET h/wk: rho = 0.21 F/wk: rho = 0.14 Step test: MVPA MET h/wk: rho = 0.32 F/wk: rho = 0.31 Step test: all activities MET h/wk: rho = 0.28 F/wk: rho = 0.26
CHAMPS[22]
n = 31–44 Australian adults aged 65–74 y
Construct validity
Pedometer step counts
33333+ 3+ 33-
Walking frequency/wk: rho = 0.57, p < 0.05 Walking MET min/wk: rho = 0.40, p < 0.05 HEPA frequency/wk: rho = 0.52, p < 0.05 HEPA MET min/wk: rho = 0.21, NS
2? 2? 2? 2?
n = 87 Convenience sample from community centres (51 older adults) and retirement homes (36 residents) in Los Angeles, aged ‡65 y, good cognitive status, no pacemaker
Construct validity
Mini-Logger at the ankle Mini-Logger at the waist
Mini-Log ankle: r = 0.36 Mini-Log waist: r = 0.42
22-
CHAMPS[24]
As above
Health/functioning associations
EPESE lower body functioning, 6min walk, BMI, SF-36 (included here: PF, GH, MH, pain)
Pearson correlation between CHAMPS and: 6-min walk, r = 0.46 BMI, r = 0.006 SF-36: PF, r = 0.39 SF-36: GH, r = 0.35 SF-36: MH, r = 0.25 SF-36: pain, r = 0.26
3+ 33+ 3+ 33-
Continued next page
Forse´n et al.
Sports Med 2010; 40 (7)
CHAMPS[24]
PAQa
Study population (n, sex, age range, mean age, health)
Validity (construct validity or health/ functioning association)
Comparison measure
Resultsb
Rating
CHAMPS[24]
As above
Known-groups validity
Retirement homes and community centres
Retirement homes (n = 36) Mean (SD): 1548 (1767) Median: 805 Range: 0–6477 Community centres (n = 51) Mean (SD): 3484 (2042) Median: 3243 Range: 376–10 243 t-test: t = 4, 60, p < 0.0001
3+
CHAMPS[31]
n = 249 173 under-active, 76 active homedwelling elderly, 64% women, aged 65–90 y, mean age 74 y, 25% graduate degree
Health/functioning associations
With physical performance tests and self-reported physical functioning and mental well-being
Rho Self-reported PF, moderate and vigorous intensity: MET h/wk: 0.30, F/wk: 0.30 All activities: MET h/wk: 0.27, F/wk: 0.23
n = 164 Older adults, aged ?
Responsiveness
CHAMPS[31]
FPACQ[25]
Construct validity
n = 224 Older adults, mean age 65.2 y
Construct validity
Moderate/greater intensity: Cal. Exp/wk: UES = 0.38, p = 0.003c F/wk: UES = 0.54, p = 0.01c All activities measure: Cal. Exp/wk: UES = 0.42, p = 0.003c F/wk: UES = 0.64, p = 0.0001c
Combination of Triaxial accelerometer and written 7-d activity record (RT3)
Men, EE total Kcal/wk: r = 0.55, p < 0.01 PAL (MET): r = 0.39, p < 0.05 Women, EE total Kcal/wk: r = 0.85, p < 0.001 PAL (MET): r = 0.39, p < 0.05
Pedometer
Total: r = 0.33, p < 0.001 adjusted for sex, age and education Walking: r = 0.58, p < 0.001
3+ 3-
22+ 22+
2? 2? 2? 2? 2?
22-
Continued next page
617
Sports Med 2010; 40 (7)
IPAQ-C[21]
n = 49 Retired people, 30 men and 19 women
Intervention on PA
Physical Activity Questionnaires for the Elderly
ª 2010 Adis Data Information BV. All rights reserved.
Table IV. Contd
618
ª 2010 Adis Data Information BV. All rights reserved.
Table IV. Contd PAQa
Study population (n, sex, age range, mean age, health)
Validity (construct validity or health/ functioning association)
Comparison measure
Resultsb
Rating
Modified Baecke[30]
n = 35 Women aged 51–71 y recruited from a breast cancer screening project in Utrecht (two were excluded), [n = 29]
Construct validity
Diary
0.59, p < 0.05 (Table 4 in the paper gives information on other validation measures, but with poorer results)
3?
OA-ESI[27]
n = 327 Volunteering, but geographically strategic randomized women aged 70–98 y, mean age 77 y, less overweight, better educated, and in better health than Vancouver and Canadian census data
Construct validity
Other previously-validated PA indicators on the same survey: 7-d recall check list
TOTKCAL vs lifelong activity: r = 0.450, p < 0.01 TOTKCAL vs F of sweating: r = 0.411, p? TOTKCAL vs active days per week: r = 0.491, p < 0.001
3-
n = 325 Healthy elderly Japanese, aged ‡65 y
Construct validity
Accelerometer Other previously validated PA, JALSPAQ
PASE vs walking step: rho = 0.17, p = 0.014 PASE vs EE: rho = 0.16, p = 0.024 PASE vs JALSPAQ: rho = 0.48, p < 0.001
2-
MTMA per body weight Static balance
PASE vs MTMA/body weight: rho = 0.15, p = 0.006 PASE vs static balance: rho = 0.19, p = 0.001
3-
PASE[23]
PASE[23]
As above
Health/functioning associations
33-
23-
3-
n = 87 Convenience sample from community centres (51 older adults) and retirement homes (36 residents) in Los Angeles, aged ‡65 y, good cognitive status, no pacemaker
Construct validity
Mini-Logger at the ankle Mini-Logger at the waist
Mini-Log ankle: r = 0.59 Mini-Log waist: r = 0.52
2+ 2+
PASE[24]
As above
Health/functioning associations
EPESE lower body functioning 6-min walk BMI SF-36 (included here: physical functioning, general health perceptions, mental health, pain)
6-min walk, r = 0.68 BMI, r = -0.07 SF-36: PF, r = 0.30 SF-36: GH, r = 0.26 SF-36: MH, r = 0.23 SF-36: Pain, r = 0.17
3+ 33+ 333Continued next page
Forse´n et al.
Sports Med 2010; 40 (7)
PASE[24]
Study population (n, sex, age range, mean age, health)
Validity (construct validity or health/ functioning association)
Comparison measure
Resultsb
Rating
PASE[24]
As above
Known-groups validity
Retirement homes and community centres
Retirement homes (n = 36) Mean (SD): 50 (44) Median: 45 Range: 0–195 Community centres (n = 51) Mean (SD): 158 (65) Median: 150 Range: 54–372 t-test: t = 9.25, p < 0.0001
3+
PASE[13]
n = 21 Dutch elderly men and women participating in a PA intervention investigating cardiovascular response to PA
Construct validity
DLW: TEE/RMR ratio
0.68 (0.35–0.86)
1?
PASE[33]
n = 222 Older people without serious mental or physical impairments, aged ‡65 y
Health/functioning associations
Questions about health status and physiological measures
Perceived health (1 = excellent, 5 = poor) -0.34, p < 0.01 Sick impact profile -0.42, p < 0.01
3+ 3+
PAQ-EJ[34]
n = 147 elderly Japanese, aged 65–99 y; severely demented, bedridden, institutionalized and hospitalized patients were illegible for the study
Construct validity
Accelerometer
Total PAQ-EJ score: rho = 0.41, p < 0.001
2-
Pre-EPIC[30]
n = 35 Women aged 51–71 y recruited from a breast cancer screening project in Utrecht (two were excluded), [n = 29]
Construct validity
Diary
r = 0.64, p < 0.05
3?
QAPSE[18]
n = 65 Healthy elderly, community-dwelling, aged 65–84 y
Health/functioning associations
QAPSE MHDEE vs: . VO2max Body mass Skin fold thickness Fat-free-mass Body fat
0.464 p < 0.0001 -0.122 0.639 p < 0.0001 -0.501 p = 0.0001 0.560 p = 0.0001
3+ 33+ 3 -/+ 3+
n = 1010 Healthy men and women aged 60–69 y
Construct validity
Other questionnaire: Stanford 7-d PAR Looking for trend
MVPA, PAR min/wk trend, p < 0.01 EE, PAR Kcal/kg per d trend, p < 0.01
3 +/3 +/-
SBAS[32]
Continued next page
619
Sports Med 2010; 40 (7)
PAQa
Physical Activity Questionnaires for the Elderly
ª 2010 Adis Data Information BV. All rights reserved.
Table IV. Contd
Forse´n et al.
r and rho are sometimes given with (95% confidence intervals).
F-test, adjusted difference.
b
c
BMI = body mass index; DLW = doubly labelled water; EE = energy expenditure; EPESE = Established Populations for Epidemiologic Studies of the Elderly; F = frequency; GH = general health perceptions; JALSPAQ = the Japan Arteriosclerosis Longitudinal Study Physical Activity Questionnaire; HEPA = European network for the promotion of HealthEnhancing Physical Activity; LTPAQ = Fredereich Lifetime Total PAQ; MET = metabolic equivalent; MH = mental health; MHDEE = mean habitual daily energy expenditure; MTMA = mid-thigh muscle area; MVPA = moderate to vigorous PA; NS = not statistically significant; PA = physical activity; PAL = physical activity level; PAR = physical activity recall; PF = physical functioning; r = Pearson correlation coefficient; rho = Spearman correlation coefficient; RMR = Resting metabolic rate; RT3 = a Combination of Triaxial accelerometer and written 7-d activity . record; SF-36 = Short-Form 36; TEE = total energy expenditure; TOTCAL = MILDCAL+MODKCAL+VIGKCAL; UES = unadjusted effect size; VIGKCAL = those with ‡6 METs; VO2max = maximum oxygen uptake/consumption; 7 dAR = 7-day activity records.
See table II for definitions of questionnaire acronyms. a
23+
23+
Concordance correlation r Total daily activity: Accelerometer: r = 0.0.38 (0.22, 0.54) 7 d AR: r = 0.64 (0.45, 0.83) Leisure time activity: Accelerometer: r = 0.0.42 (0.22, 0.62) 7 d AR: r = 052 (0.36, 0.69) Accelerometer 7 dAR Construct validity n = 116 Swedish women, aged 56–75 y, mean age 64.4 y Selfadministered PAQ[29]
Resultsb Comparison measure Validity (construct validity or health/ functioning association) Study population (n, sex, age range, mean age, health) PAQa
Table IV. Contd
Rating
620
ª 2010 Adis Data Information BV. All rights reserved.
indicates that the estimated reliability coefficients may be underestimates of the ‘true’ reliability coefficients as they also include real changes.[33] On the other hand, it can be argued that natural variation should be included in the measurement error because it will also affect the measurement of change (e.g. after intervention).[2] Usually the reliability coefficient of a PAQ was tightly connected to the mental health of the actual study population and to the time interval between test and retest. Ideally, each epidemiological study should do a pilot study with test and retest in its own target population to estimate reliability coefficients for their chosen PAQ. One study with PASE, not included in our review because they used interviewers,[36] showed an extremely high reliability coefficient of 0.91. They recalled the same week twice, 3 days apart. The reliability coefficient usually showed a much lower level for self-administrated PAQs.[36] Some authors concluded that interview-based PAQs are preferable because of acceptable reliability,[33,36] but this needs to be studied further. 4.2 Validity
None of the 15 studies on construct validity received a positive rating on level 1 evidence. Only PASE received a positive rating on level 2, but PASE was also negatively rated with medium level of evidence in another study.[23] In the other 13 construct validity studies, either the study population was too small and/or the association coefficient was below the cut-off value. PASE was nearly positively rated in one analysis. The comparison instrument was DLW,[13] which would have given a high level of evidence. Unfortunately, the study sample was too small and the coefficient was borderline acceptable. In another study not included in our review, where PASE was administered by interviewers and achieved high reliability,[36] the validity coefficient against Actigraph was 0.43. That is, not even a PASE with high reliability could achieve high validity. One could conclude that PASE and Actigraph probably did not measure exactly the same construct. Our impression is that it is difficult to find an adequate construct for comparison. Sports Med 2010; 40 (7)
Physical Activity Questionnaires for the Elderly
In our sample of articles, CHAMPS was tested for construct validity only once. The comparing instrument was Mini-Logger, a level 2 of evidence, but the coefficient was far from acceptable. Several PAQs that were developed to be suitable for the elderly included lighter activities. For that reason we can probably not expect as high correlations with EE as can be found for younger people with more focus on moderate and vigorous activities. It is a future challenge to choose a suitable comparison instrument to test construct validity for PAQs for older adults. Ideally, the purpose of the actual PAQ should be decisive for the choice of comparison instrument. If the purpose is to measure EE, one should choose an EE comparison instrument that measures EE directly (DLW) or an activity monitor (such as accelerometer) to estimate EE indirectly. If the purpose is to measure only walking, pedometer is the natural choice; if the purpose is to rank subjects in their level of PA with a PA score, pedometer and accelerometer seem appropriate. On the other hand, if the purpose is to measure PA as such, with a PA score, for a PAQ with focus on lighter activities and less focus on EE and where activities such as light swimming (popular among elderly) is included, then a detailed interview, other questionnaires or a diary can be appropriate. However, with these comparison instruments, the participants give subjective answers resulting in a low level of evidence for the validity according to QAPAQ. A solution could be to use two comparison instruments (one for EE and one for PA as such) as recommended earlier.[37] The purpose of CHAMPS was to be sufficiently sensitive to detect changes in PA, and it estimated MET hours per week and gave caloric expenditure per week in moderate intensity or greater and for all PA. The chosen comparison instrument Mini-Logger should be considered appropriate, but the correlations were small and CHAMPS was negatively rated, possibly due to the study population.[24] However, PASE received a positive rating in the same study population and with the same comparison instrument.[24] The purpose of PASE was to be a brief and easy tool for assessment of short-term PA in epidemiological studies of elderly, and gave an ª 2010 Adis Data Information BV. All rights reserved.
621
activity score. PASE needed 5 minutes to be completed, while CHAMPS needed 15 minutes. Both CHAMPS and PASE were rated positively for known-group validity and may therefore be appropriate PAQs when the aim is to classify the ageing population into categories of PA. None of the other PAQs were tested for known-group validity. Several studies showed associations between a PAQ and health/functioning variables. Such associations can be substantially attenuated due to poor reliability and poor validity of the variables,[38] but all three PAQs tested for health/functioning association (CHAMPS, PASE, QAPSE) were positively rated in some of their studies in some of their categories.
4.3 Limitations of Our Review
Our three criteria for acceptable reliability and validity[2] were based on experience, but they are not absolute and they should be noted: (i) the cutoff values; (ii) the three levels of evidence; (iii) the appropriate sample size. A new study comparing PAQs with cognitive interviews[39] supports our decision to give subjective comparison instruments of construct validity a low level of evidence. However, it is possible for the reader to change the criteria in the tables and thereby change the ratings, because all necessary information is provided in the tables. Due to the large number of question marks and the divergent results for PASE, we have not been able to discover which PAQ is the most suitable in the different situations.
4.4 Strengths of Our Review
Our review is part of a large review divided into three papers: the first paper concerns PAQs for youth;[1] the second for adults;[3] and the third for elderly (the present review). All three reviews were based on the same methods paper,[2] and an agreement on definitions was achieved. Furthermore, the quality of the studies and the results of the included PAQs were rated according to standardized criteria (QAPAQ checklist). Sports Med 2010; 40 (7)
Forse´n et al.
622
5. Conclusions We found 13 PAQs developed to be selfadministered for older people. Only three of them received a positive rating on reliability, namely IPAQ-C, PASE and WHI-PAQ. Unfortunately, PASE also received a negative rating on reliability. In addition, PASE received a positive rating on validity; however it also received a negative rating on validity. The knowledge about reliability and validity of self-administrated PAQs for older adults is still scarce and more research is needed and should include the following: A thorough description of the actual study population in each reliability and validity study. All reliability and validity studies should include more than 50 participants to be able to draw definite conclusions. The 13 PAQs in our review had different recall periods. Research is needed to decide which recall period gives acceptable reliability and validity results in different populations and for different purposes of the PAQs. Acknowledgements The collaboration for this manuscript was planned at one of the meetings of the European Network for Action on Ageing and Physical Activity (EUNAAPA). The authors thank the European Commission, Directorate C – Public Health and Risk Assessment, for this network under the programme of community action in the field of public health (2003–8). The content of the article does not represent the opinion of the European Community, and the European Commission is not responsible for any use that might be made of the information presented in the text. No sources of funding were used in the preparation of this article. The authors have no conflicts of interest that are directly relevant to the content of this article.
References 1. Chinapaw MJM, Mokkink LB, Van Poppel MNM, et al. Physical activity questionnaires for youth: a systematic review of measurement properties. Sports Med 2010; 40 (7): 539-63 2. Terwee CB, Mokkink LB, Van Poppel MNM, et al. Qualitative attributes and measurement properties of physical activity questionnaires: a checklist. Sports Med 2010; 40 (7): 525-37 3. Van Poppel MNM, Chinapaw MJM, Mokkink LB, et al. Physical activity questionnaires for adults: a systematic review of measurement properties. Sports Med 2010; 40 (7): 565-600
ª 2010 Adis Data Information BV. All rights reserved.
4. Nelson ME, Rejeski WJ, Blair SN, et al. Physical activity and public health in older adults: recommendation from the American College of Sports Medicine and the American Heart Association. Med Sci Sports Exerc 2007; 39 (8): 1435-45 5. American College of Sports Medicine, Chodzko-Zajko WJ, Proctor DN, et al. American College of Sports Medicine position stand: exercise and physical activity for older adults. Med Sci Sports Exerc 2009; 41 (7): 1510-30 6. Stewart KJ. Physical activity and aging. Ann N Y Acad Sci 2005; 1055: 193-206 7. Taylor AH, Cable NT, Faulkner G, et al. Physical activity and older adults: a review of health benefits and the effectiveness of interventions. J Sports Sci 2004; 22 (8): 703-25 8. Jørstad-Stein EC, Hauer K, Becker C, et al. Suitability of physical activity questionnaires for older adults in fallprevention trials: a systematic review. J Aging Phys Act 2005; 13 (4): 461-81 9. Prevention of Falls Network Europe [online]. Available from URL: http://www.profane.eu.org [Accessed 2010 May 11] 10. European Network for Action on Ageing and Physical Activity [online]. Available from URL: http://www.eunaapa. org/ [Accessed 2010 May 11] 11. de Vet HCW. Observer reliability and agreement. In: Armitage P, Colton T, editors. Encyclopedia of biostatistics. Boston (MA): John Wiley & Sons Ltd., 1998: 3123-8 12. Terwee CB, Bot SD, de Boer MR, et al. Quality criteria were proposed for measurement properties of health status questionnaires. J Clin Epidemiol 2007; 60 (1): 34-42 13. Schuit AJ, Schouten EG, Westerterp KR, et al. Validity of the Physical Activity Scale for the Elderly (PASE): according to energy expenditure assessed by the doubly labeled water method. J Clin Epidemiol 1997; 50 (5): 541-6 14. Masse LC, Fulton JE, Watson KL, et al. Influence of body composition on physical activity validation studies using doubly labeled water. J Appl Physiol 2004; 96 (4): 1357-64 15. Lamonte MJ, Ainsworth BE. Quantifying energy expenditure and physical activity in the context of dose response. Med Sci Sports Exerc 2001; 33 (6 Suppl.): S370-8; discussion S419-20 16. Guyatt GH, Deyo RA, Charlson M, et al. Responsiveness and validity in health status measurement: a clarification. J Clin Epidemiol 1989; 42 (5): 403-8 17. Kazis LE, Anderson JJ, Meenan RF. Effect sizes for interpreting changes in health status. Med Care 1989; 27 (3 Suppl.): S178-89 18. Bonnefoy M, Kostka T, Berthouze SE, et al. Validation of a physical activity questionnaire in the elderly. Eur J Appl Physiol Occup Physiol 1996; 74 (6): 528-33 19. Cust AE, Smith BJ, Chau J, et al. Validity and repeatability of the EPIC physical activity questionnaire: a validation study using accelerometers as an objective measure. Int J Behav Nutr Phys Act 2008; 5: 33 20. Cyarto EV, Marshall AL, Dickinson RK, et al. Measurement properties of the CHAMPS physical activity questionnaire in a sample of older Australians. J Sci Med Sport 2006; 9 (4): 319-26 21. Deng HB, Macfarlane DJ, Thomas GN, et al. Reliability and validity of the IPAQ-Chinese: the Guangzhou Biobank Cohort study. Med Sci Sports Exerc 2008; 40 (2): 303-7
Sports Med 2010; 40 (7)
Physical Activity Questionnaires for the Elderly
22. Giles K, Marshall AL. Repeatability and accuracy of CHAMPS as a measure of physical activity in a community sample of older Australian adults. J Phys Act Health 2009; 6 (2): 221-9 23. Hagiwara A, Ito N, Sawai K, et al. Validity and reliability of the Physical Activity Scale for the Elderly (PASE) in Japanese elderly people. Geriatr Gerontol Int 2008; 8 (3): 143-51 24. Harada ND, Chiu V, King AC, et al. An evaluation of three self-report physical activity instruments for older adults. Med Sci Sports Exerc 2001; 33 (6): 962-70 25. Matton L, Wijndaele K, Duvigneaud N, et al. Reliability and validity of the Flemish Physical Activity Computerized Questionnaire in adults. Res Q Exerc Sport 2007; 78 (4): 293-306 26. Meyer AM, Evenson KR, Morimoto L, et al. Test-retest reliability of the Women’s Health Initiative physical activity questionnaire. Med Sci Sports Exerc 2009; 41 (3): 530-8 27. O’Brien-Cousins S. An older adult exercise status inventory: reliability and validity. J Sport Behav 1996; 19 (4): 288-306 28. Orsini N, Bellocco R, Bottai M, et al. Reproducibility of the past year and historical self-administered total physical activity questionnaire among older women. Eur J Epidemiol 2007; 22 (6): 363-8 29. Orsini N, Bellocco R, Bottai M, et al. Validity of selfreported total physical activity questionnaire among older women. Eur J Epidemiol 2008; 23 (10): 661-7 30. Pols MA, Peeters PH, Kemper HC, et al. Repeatability and relative validity of two physical activity questionnaires in elderly women. Med Sci Sports Exerc 1996; 28 (8): 1020-5 31. Stewart AL, Mills KM, King AC, et al. CHAMPS physical activity questionnaire for older adults: outcomes for interventions. Med Sci Sports Exerc 2001; 33 (7): 1126-41
ª 2010 Adis Data Information BV. All rights reserved.
623
32. Taylor-Piliae RE, Norton LC, Haskell WL, et al. Validation of a new brief physical activity survey among men and women aged 60–69 years. Am J Epidemiol 2006; 164 (6): 598-606 33. Washburn RA, Smith KW, Jette AM, et al. The Physical Activity Scale for the Elderly (PASE): development and evaluation. J Clin Epidemiol 1993; 46 (2): 153-62 34. Yasunaga A, Park H, Watanabe E, et al. Development and evaluation of the physical activity questionnaire for elderly Japanese: the Nakanojo study. J Aging Phys Act 2007; 15 (4): 398-411 35. Unit for Public Health Nutrition; Unit for Preventive nutrition. International Physical Activity Questionnaire (IPAQ) [online]. Available from URL: http://www.ipaq. ki.se [Accessed 2010 May 11] 36. Dinger MK, Oman RF, Taylor EL, et al. Stability and convergent validity of the Physical Activity Scale for the Elderly (PASE). J Sports Med Phys Fitness 2004; 44 (2): 186-92 37. Pols MA, Peeters PH, Kemper HC, et al. Methodological aspects of physical activity assessment in epidemiological studies. Eur J Epidemiol 1998; 14 (1): 63-70 38. Ferrari P, Friedenreich C, Matthews CE. The role of measurement error in estimating levels of physical activity. Am J Epidemiol 2007; 166: 832-40 39. Altschuler A, Picchi T, Nelson M, et al. Physical activity questionnaire comprehension: lessons from cognitive interviews. Med Sci Sports Exerc 2009; 41 (2): 336-43
Correspondence: Lisa Forse´n, PhD, Norwegian Institute of Public Health, Division of Epidemiology, Marcus Thranes Gate 6, PO Box 4404 Nydalen, NO-0403 Oslo, Norway. E-mail:
[email protected]
Sports Med 2010; 40 (7)