Skip to main content

Table 5 Standards and criteria for psychometrics and clinical utility provided by the authors of the reviews

From: An overview of systematic reviews on upper extremity outcome measures after stroke

Review Criteria of psychometrics or clinical utility provided by the authors of the reviews
Ashford [22] Content validity, internal consistency, construct validity, test-retest reliability, agreement, responsiveness, interpretability: adequate design, method and results (Chronbach’s α: adequate 0.7-0.9, ICC: > 0.70); minimal clinically important difference presented, floor/ceiling effect ≤ 15%, time to administer < 10 min, administration burden: easy to sum up the items
Baker [21] Psychometric testing have been performed
Connell [31] Clinical utility criteria of ≥ 8points (time to administer and interpret ≤ 30 min, cost ≤ ₤ 100, simple equipment, portability), reliability/validity (kappa, correlation coefficients, ICC/r: strong ≥ 0.80, moderate 0.6-0.8, weak 0.4-0.6), ability to detect change (measurement error, standardized response mean, standardized error of measurement, limits of agreement, minimal detectable change)
Croarkin [32] Significant correlations (p < 0.05) for test-retest, inter-rater reliability and validity (convergent, concurrent): level of evidence 1 = meets all 3 psychometrics criteria, level 2 = meets 2 of 3 criteria
Hillier [23] Sound psychometrics: content and construct validity, reliability, sensitivity to change, utility (interpretability, acceptability, relevance)
Simpson [35] MCID values calculated (related to effect size 0.2, anchor-based method using clinical scale, global rating, percentage of recovery)
Sivan [14] Test-retest reliability (ICC/kappa: high/excellent ≥ 0.75, moderate 0.40-0.74, poor <0.40); internal consistency (Chronbach’s α: high/excellent > 0.80, adequate 0.70-0.79, low < 0.70); validity (correlation coefficient: excellent r > 0.60, adequate 0.30-0.59, poor <0.30), area under the curve: excellent > 0.90, adequate 0.70-0.89, poor < 0.70); responsiveness (effect size: large > 0.8, moderate 0.5-0.79, small <0.50; other adequate responsiveness methods, MCID value); floor/ceiling effect (excellent 0%, adequate < 20%, poor > 20%), respondent burden: (time, acceptance: excellent < 15 min, adequate: longer time, lower acceptability; poor: lengthy, acceptability problem); administrative burden (excellent: scoring by hand, easy to interpret, adequate: computer scoring, obscure interpretation; poor: costly, complex scoring/interpretation)
Tse [36] Inter-rater, test-retest reliability (kappa/r/ICC > 0.75), internal consistency (Chronbach’s α > 0.80), content validity, construct validity (adequate method, r ≥ 0.60)
van Peppen [37] Valid for stroke, test-retest reliability and concurrent validity (ICC/r ≥ 0.70), responsiveness (high/low), time to administer ≤ 15 min, test-protocol available: level of evidence 1 = meets all 6 criteria, level 2 = meets 5 of 6 criteria
Velstra [38] Reliability (correlation coefficient, kappa, Chronbach’s α, ICC): very good or good/moderate; Responsiveness (effect size, standardized response mean): moderate or large
  1. Abbreviations: ICC Intraclass coefficient, MCID minimal clinically important difference.