- Research article
- Open Access
- Open Peer Review
Prognostic models for complete recovery in ischemic stroke: a systematic review and meta-analysis
BMC Neurology volume 18, Article number: 26 (2018)
Prognostic models have been increasingly developed to predict complete recovery in ischemic stroke. However, questions arise about the performance characteristics of these models. The aim of this study was to systematically review and synthesize performance of existing prognostic models for complete recovery in ischemic stroke.
We searched journal publications indexed in PUBMED, SCOPUS, CENTRAL, ISI Web of Science and OVID MEDLINE from inception until 4 December, 2017, for studies designed to develop and/or validate prognostic models for predicting complete recovery in ischemic stroke patients. Two reviewers independently examined titles and abstracts, and assessed whether each study met the pre-defined inclusion criteria and also independently extracted information about model development and performance. We evaluated validation of the models by medians of the area under the receiver operating characteristic curve (AUC) or c-statistic and calibration performance. We used a random-effects meta-analysis to pool AUC values.
We included 10 studies with 23 models developed from elderly patients with a moderately severe ischemic stroke, mainly in three high income countries. Sample sizes for each study ranged from 75 to 4441. Logistic regression was the only analytical strategy used to develop the models. The number of various predictors varied from one to 11. Internal validation was performed in 12 models with a median AUC of 0.80 (95% CI 0.73 to 0.84). One model reported good calibration. Nine models reported external validation with a median AUC of 0.80 (95% CI 0.76 to 0.82). Four models showed good discrimination and calibration on external validation. The pooled AUC of the two validation models of the same developed model was 0.78 (95% CI 0.71 to 0.85).
The performance of the 23 models found in the systematic review varied from fair to good in terms of internal and external validation. Further models should be developed with internal and external validation in low and middle income countries.
Globally, stroke is the second leading cause of death following ischemic heart disease and the third leading cause of disability [1, 2]. In 2013, 6.5 million deaths from stroke (51% died from ischemic stroke), 113 million disability-adjusted life years were lost because of stroke (58% due to ischemic stroke) and 10.3 million of people with new strokes (67% were ischemic stroke) . In 2015, prevalence of stroke was 42.4 million people, which included ischemic stroke for 24.9 million. There were 6.3 million stroke deaths worldwide, and 3.0 million individuals died of ischemic stroke .
Minimizing the time to treatment for stroke is the important key to improving chances of an excellent outcome (time lost is brain lost) . It is also important to be able to predict the outcomes of diseases or treatments. Most physicians use their own clinical experience in predicting their patients’ outcomes for making decisions in patient care management. The accuracy of these informal predictions is unclear. Care management might be improved if the physicians combined their clinical forecasts with the formal predictions provided by statistical models. This may be more accurate than relying simply on clinical experience. Prognostic models are statistical tools to assist physicians in making decisions which may affect their patients’ outcomes .
Accurate prognostic models of the functional outcome of a complete recovery in patients after ischemic stroke could be beneficial to neurological care practices for a number of reasons. Firstly, the information of developed prognostic model could be used to select appropriate treatments and action plans in individual patient management, including patient counseling. Secondly, they could be used to improve rehabilitation and discharge planning. Lastly, in light of a weakening economy, prognostic models could be used to make the best clinical choices for patients with regard to specific clinical scenarios which may reduce health care costs .
To date, several studies have developed prognostic models to predict functional outcomes after ischemic stroke, and each model has different strengths and weaknesses. Since models do not always work well in practice, it is recommended that, before a prognostic model is used in clinical practice, the performance of the model should be properly evaluated. This process is known as model validation and involves an assessment of calibration (the agreement between the observed and predicted outcomes) and discrimination (the model’s ability to discriminate between those patients who are likely or unlikely to experience a particular prognostic event). A poor calibration usually reflects over-fitting of the model in the development sample. At least the models should be determined the internal validity (for example, using ‘bootstrap sampling’) to assess validity for the setting where the development data originated from. Another aspect is the external validity (using patient data not used for the development model) to assess generalizability [6, 7].
There may be danger in moving too quickly to use these models without appropriate validation and understanding of their limitations. The purpose of this study was to systematically review and synthesize performance of existing prognostic models which have been used to predict the probability of complete recovery in ischemic stroke and to investigate their quality.
We included studies predicting the outcome of complete recovery after ischemic stroke and in which complete recovery was assessed by scores on at least one of the following instruments: the Barthel Index (BI) ≥ 95/100 or 19/20, the Glasgow Outcome Scale (GOS) score = 1, the Oxford Handicap Scale (OHS) score ≤ 2, and the Modified Rankin Scale (mRS) score ≤ 1. A further criterion was that the studies reported model performance by the use of the concordance statistic, area under the receiver operating characteristic curve (AUC) or calibration performance. There were no restrictions on timing of the outcome evaluation, age of the patients, or type/severity of ischemic stroke.
We searched PUBMED, SCOPUS, CENTRAL, ISI Web of Science and OVID MEDLINE for prognostic models published from inception until 4 December, 2017, using the search terms listed in the Additional file 1 without restrictions on publication language. We also reviewed the reference lists of relevant studies.
Study selection and data extraction
Study titles and abstracts were independently screened and selected by two reviewers (NJ and SR) using the specified criteria. If a decision could not be made based on the abstracts, we then considered their full texts. Disagreement was resolved through discussion with a third reviewer (ML). We extracted the performance measures (concordance statistics, AUCs and performance calibrations) of both types of prediction model: development models and validation models. We also extracted study characteristics: author(s), publication year, setting, study design, definition of outcome, number of subjects, number of outcome events, age, ischemic stroke severity and duration of follow-up.
We assessed the study quality based on an adaptation of the tool developed by D’Amico et al. . We showed how each study performed according to each of various major methodological requirements for prognosis research studies. The assessment items were as follows:
Did the prognostic study use a cohort design?
Were the predictors clearly defined and details provided of how they were measured?
Were the missing data handled appropriately with statistical imputation?
Was some form of stepwise analysis used for selecting predictors in a multivariable analysis?
Was the sample size adequate as defined by an events-per-variable ratio of 10 or more?
Was the final model validated on the patients who were used to generate the model (internal validation)?
Was the final model validated on the patients who were not used to generate the model (external validation)?
We qualitatively synthesized model performances because each separate model had a different combination of predictor variables. We used frequencies and medians with 95% confidence intervals to describe the model performance which included its calibration (how closely predicted values agree with the observed values) and discrimination (the model’s ability to discriminate between patients developing and not developing an outcome event, e.g., complete recovery cases and non-complete recovery cases among ischemic patients). The assessment of calibration was performed using either the Hosmer-Lemshow chi-square test or a calibration curve. The assessment of discrimination was conducted using either the AUC or the concordance statistic (C-statistic) along with a 95% CI. The discrimination of each model was evaluated in accordance with the suggestions by Hosmer and Lemeshow: excellent (AUC ≥ 0.90), good (AUC ≥ 0.80 and ˂ 0.90), fair (AUC ≥ 0.70 and ˂ 0.80), and poor (AUC ˂ 0.70). Calibration was judged as good when a calibration curve closely resembled the line representing perfect calibration (the pre-specified acceptable absolute mean error for the calibration curve was ˂ 0.4) or when the Hosmer-Lemshow chi-square test was non-significant [9, 10]. We estimated the 95% CIs for AUCs using Hanley’s method for a study which presented only AUCs. The estimation required three quantities: total sample size, number of events and an AUC . If two or more models assessed discrimination performance in terms of validation, we performed a random-effects inverse-variance meta-analysis using Stata version 10.1 .
A total of 896 articles were found by searching the electronic databases, and 10 studies were eligible [13,14,15,16,17,18,19,20,21,22]. Seven studies were development studies; while three were validation studies (see Fig. 1). Twenty three different models were identified.
Patient characteristics of included models
All included models were developed from elderly patients in high income countries: five studies from the United States of America [13,14,15,16, 22], three from Germany [17,18,19] and one from the Netherlands . One study  did not report study setting but used data from the Virtual International Stroke Trials Archive (VISTA) which included patients from many different countries.
Twenty models were developed from patients with a moderately severe ischemic stroke based on the NIHSS score , but for three models ischemic stroke severity was not reported. The sample sizes from which the models were developed ranged from 75 to 4441, the complete recovery cases ranged from 33 to 1970 and were measured at around 90 days after ischemic stroke diagnosed in six studies [13,14,15,16, 20, 22], at 100 days in three studies [17,18,19], and at 365 days in one study . Table 1 presents the characteristics of the models.
A total of 24 different variables were included in the 23 models. The number of variables included in each model ranged from 1 to 11. The National Institutes of Health Stroke Scale (NIHSS) was the most common predictor (70.8%) followed by age (62.5%) and infarct volume (50.0%). Table 2 presents details of predictor variables.
Quality of prognostic models
All 10 studies used a cohort design. Details about the measurement of predictors were presented for 13 of the 23 models (56.5%). All of the 10 studies handled missing data by excluding subjects from the analyses. The full model approach (all the candidate predictors included in the multivariable analysis) was the most common method used (69.6%) for selection of predictors by multivariable modeling [13, 14, 16]. Two studies [13, 14] reported 12 development models (model No.1-12) but provided no information on their discrimination performances. However, they did report their discrimination performances on internal validation, but for only one model (model No. 9) was calibration performance reported in the internal validation. Five studies [15, 16, 18,19,20] reported external validation in nine models (model No.3, 6 and 13-18). Calibration performances were reported for six of these nine models, and four of them (model No. 3, 6, 13 and 15) showed good calibration. There was only one study which provided a 95% confidence interval for the AUC ; those of the other nine studies [13,14,15,16,17,18,19,20, 22] were estimated using Hanley’s method (see Table 4). There were only two models (model No.3 and No.6) which were validated both internally and externally. Table 3 presents details of quality of prognostic models.
Model performance was evaluated using internal validation by bootstrapping methods in 12 models (model No.1-12). The models were validated in samples of 222  and 206  elderly patients with a moderately severe ischemic stroke. The number of events ranged from 92 to 125 (see Table 1). The median AUC for discrimination performance on internal validation was 0.82 (95% CI 0.73 to 0.87); there was only model No. 9 which was reported to have good calibration (see Table 4 and Fig. 3).
Two studies [16, 19] reported external validation in their five developed models (model No.13-16 and 18). Model No.18 was validated in two different samples in two studies [19, 20]; one study included patients from Germany  and another study included patients from many countries . Three other studies [15, 18, 20] reported external validation of four pre-existing models (model No 3, 6, 17 and 18).
On external validation discrimination and calibration performance were reported in six models (model No. 3, 6 and 13-16). Four of the six models had good discrimination with a median AUC of 0.81 (95% CI 0.80 to 0.83), and good calibration (model No. 3, 6, 13 and 15). Three other models (model No.17, model No.18 reported in two external populations) reported only discrimination performance. The pooled AUC value for model No. 18 was 0.78 (95% CI 0.71 to 0.85; two studies) (see Fig. 4). The median AUC of these nine validation models was 0.80 (95% CI 0.76 to 0.82) (see Table 4 and Fig. 5).
This systematic review identified 23 prognostic models from ten studies for complete recovery in ischemic stroke. None of these models provided complete information about the model performance which included both internal and external validation. While most prognostic models (18/23) were validated and half of the models (12/23) reported fair to good discrimination on internal validation, only one model showed good calibration. Nearly one third of the models (9/23) were externally validated, and reported fair to good discrimination performance, but only a quarter of the models (6/23) reported nearly perfect calibration. Only two models were validated both internally and externally but not in complete process of the model performance. There was only one model in which a meta-analysis could be performed, and the pooled AUC was fair.
The models were developed and validated in elderly patients mainly with a moderately severe stroke and mainly from high income countries. In addition, most of the developed models were not externally validated. These factors are likely to limit the application of the models to other populations and settings.
In our review, we conducted a systematic search of several electronic databases. All of the included studies used a cohort design. For half of all the identified models more than 10% of their subjects’ data was missing. The model performance analyses were handled by excluding the subjects with missing data. This strategy could lead to biased conclusions if the reasons for missing data were related to the important prognostic indicators or outcomes.
There are some issues related to the assessed quality of the prognostic models. Firstly, our search was performed up to 4 December, 2017. No attempt was made to search unpublished studies. Studies were selected and extracted independently by two reviewers. We did not assess publication bias by any statistical tests or funnel plot asymmetry due to insufficient data. However, we assessed and presented the quality of all the 10 selected studies for each important quality features listed in our methods section. Secondly, about 70% (16/23) of the models used the full model approach in predictor selection (all the candidate predictors included in the multivariable analysis) [13, 14, 16]. This approach could reduce the risk of predictor selection bias and over-fitting. However, this technique is difficult to apply if the number of events is limited . In our review, all of the models with the full model approach to predictor selection had fulfilled the requirement of more than 10 patients with complete recovery per predictor. Finally, incomplete measures of the prognostic model performances were reported in all included models. The 95% confidence intervals of the estimated performance indices were rarely reported. The 95% confidence intervals for AUCs which we estimated using Hanley’s method may be slightly inaccurate, but this approach has been accepted in estimating the precision of AUCs when their standard errors are not reported. In addition, calibration performance was often ignored. Calibration is the important performance measure for application of the model in practice. A poor calibration reflects over-fitting of a model and can also be interpreted as reflecting a need for shrinkage of regression coefficients in a prognostic model .
To our knowledge, there are three previous systematic reviews of prognostic models in stroke, but their outcomes of interest were different from ours: for example, mortality in hemorrhagic stroke, recurrent stroke and survival outcome of stroke patients. Therefore, our results were not able to be compared directly to the results of previous reviews. However, while the discrimination performance of their prognostic models varied from poor to good, calibration performance was not considered. The first study was a systematic review of prognostic tools for early mortality in hemorrhagic stroke . The authors selected 11 articles (12 prognostic tools), but validation data were reported for only one of the prognostic tools. The Hemphill-intracerebral hemorrhage (ICH) model had the largest number of validation cohorts (nine articles) and showed good performance with a pooled AUC of 0.80 (95%CI 0.77 to 0.85). The second study was a systematic review of prognostic models to predict survival in patients with acute stroke. The authors found 83 models, but only three models were externally validated and showed fair to good discrimination . The final study was a systematic review of prediction models for recurrent stroke and myocardial infarction after stroke. The authors showed that the models for recurrent stroke discriminate poorly between patients with and without a recurrent stroke with the pooled AUCs of 0.60 (95% CI 0.59 to 0.62) for the Essen Stroke Risk Score (ESRS) and 0.62 (95% CI 0.60 to 0.64) for the Stroke Prognosis Instrument II (SPI-II) .
Our findings suggest that some of the current prognostic models for predicting complete recovery from ischemic stroke may be clinically useful when applied to patients from high income countries who have experienced moderately severe ischemic stroke. Model No. 9 which was developed by Johnston et al.  suggests that the model was not over-fitted to the data set and is likely to be useful in predicting complete recovery from ischemic stroke in a similar population. Models No.3, 6, 13 and 15 involving eight predictors, including NIHSS score, age, infarct volume, history of diabetes mellitus and stroke, prestroke disability, small-vessel stroke and tissue-type plasminogen activator (t-PA use). Some were overlapped among the models as shown in Table 2. These models fulfilled the majority of the methodological requirements and showed acceptable performances in the external validation for both discrimination and calibration. We recommend that these models should be used in other settings.
This systematic review has shown that, while many prognostic models have been published, they are rarely validated in external populations, and most of the models were developed from elderly patients with moderately severe ischemic stroke, mainly in high income countries. There is a need for the development of models in other settings, especially in low and middle income populations. All models should be validated, and performance measures should be reported which address the two key issues of discrimination and calibration.
Area under a receiver operating characteristic (ROC) curve
Essen Stroke Risk Score
Glasgow Outcome Scale
Magnetic resonance imaging
Modified Rankin Scale
National Institutes of Health Stroke Scale
Oxford Handicap Scale
Stroke Prognosis Instrument II
Feigin VL, Norrving B, Mensah GA. Global burden of stroke. Circ Res. 2017;120:439–48.
Benjamin EJ, Virani SS, Callaway CW, Chang AR, Cheng S, Chiuve SE, et al. Heart disease and stroke statistics-2018 update: a report from the American Heart Association. Circulation. 2018;137:1–442.
Moser DK, Kimble LP, Alberts MJ, Alonzo A, Croft JB, Dracup K, Evenson KR, Go AS, Hand MM, Kothari RU, Mensah GA. Reducing delay in seeking treatment by patients with acute coronary syndrome and stroke: a scientific statement from the American Heart Association Council on cardiovascular nursing and stroke council. J Cardiovasc Nurs. 2007;22:326–43.
Steyerberg E, Moons KGM, van der Windt D, Hayden J, Perel P, Schroter S, et al. Prognosis research strategy (PROGRESS) series 3: prognostic model research. PLoS Med. 2013;10:e1001381.
Vogenberg FR. Predictive and prognostic Models: implications for healthcare decision-making in a modern recession. American health & drug benefits. 2009;2:218.
Pavlou M, Ambler G, Seaman SR, Guttmann O, Elliott P, King M, et al. How to develop a more accurate risk prediction model when there are few events relative to the number of predictors. BMJ (Clinical research ed). 2015;351:h3868.
Justice AC, Covinsky KE, Berlin JA. Assessing the generalizability of prognostic information. Ann Intern Med. 1999;130:515–24.
D’Amico G, Malizia G, D’Amico M. Prognosis research and risk of bias. Intern Emerg Med. 2016;11:251–60.
Kleinbaum D, Klein M. Logistic regression: a self- learning text. 3rd ed. New York: Springer; 2010.
Steyerberg EW, Vickers AJ, Cook NR, Gerds T, Gonen M, Obuchowski N, Pencina MJ, Kattan MW. Assessing the performance of prediction models: a framework for some traditional and novel measures. Epidemiology (Cambridge, Mass). 2010;21:128.
Hanley A, Mcneil J, Hanley JA, BJ MN. The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology. 1982;143:29–36.
StataCorp. Stata statistical software,Release 10. College Station: StataCorp LP; 2007.
Johnston KC, Connors AF, Wagner DP, Knaus WA, Wang X, Haley EC. A predictive risk model for outcomes of ischemic stroke. Stroke. 2000;31:448–55.
Johnston KC, Wagner DP, Haley EC, Connors AF. Combined clinical and imaging information as an early stroke outcome measure. Stroke. 2002;33:466–72.
Johnston KC, Connors AF Jr, Wagner DP, Haley EC Jr. Predicting outcome in ischemic stroke: external validation of predictive risk models. 2003;34:200–2.
Johnston KC, Wagner DP, Wang X-Q, Newman GC, Thijs V, Sen S, et al. Validation of an acute ischemic stroke model: does diffusion-weighted imaging lesion volume offer a clinically significant improvement in prediction of outcome? Stroke. 2007;38:1820–5.
Weimar C, Ziegler A, König IR, Diener HC. Predicting functional outcome and survival after acute ischemic stroke. J Neurol. 2002;249:888–95.
German Stroke Study Collaboration. Predicting outcome after acute ischemic stroke: an external validation of prognostic models. Neurology. 2004;62:581–5.
Weimar C, König IR, Kraywinkel K, Ziegler A, Diener HC. Age and National Institutes of Health stroke scale score within 6 hours after onset are accurate predictors of outcome after cerebral ischemia. Stroke. 2004;35:158–62.
König IR, Ziegler A, Bluhmki E, Hacke W, Bath PM, Sacco RL, Diener HC, Weimar C. Predicting long-term outcome after acute ischemic stroke. Stroke. 2008;39:1821–6.
Schiemanck SK, Kwakkel G, Post MW, Kappelle LJ, Prevo AJ. Predicting long-term independency in activities of daily living after middle cerebral artery stroke. Stroke. 2006;37:1050–4.
Patti J, Helenius J, Puri AS, Henninger N. White matter Hyperintensity–adjusted critical infarct thresholds to predict a favorable 90-day outcome. Stroke. 2016;47:2526–33.
Hage V. The NIH stroke scale: a window into neurological status. NurseCom Nursing Spectrum (Greater Chicago). 2011;24:44–9.
Mattishent K, Kwok CS, Ashkir L, Pelpola K, Myint PK, Loke YK. Prognostic tools for early mortality in hemorrhagic stroke: systematic review and meta-analysis. J Clin Neurol. 2015;11:339–48.
Counsell C, Dennis M. Systematic review of prognostic models in patients with acute stroke. Cerebrovasc Dis. 2001;12:159–70.
Thompson DD, Murray GD, Dennis M, Sudlow CL, Whiteley WN. Formal and informal prediction of recurrent stroke and myocardial infarction after stroke: a systematic review and evaluation of clinical prediction models in a new cohort. BMC Med. 2014;12:58.
The authors wish to thank a native English language speaker, Peter Bradshaw, for line and copy editing drafts of the manuscript. The funding organization had no role in this research.
Availability of data and materials
We do not hold any original data as this is a review article.
Ethics approval and consent to participate
Not applicable. This study is a systematic review of published papers and as such did not require ethical approval or any consent.
Consent for publication
Not applicable. This study is a systematic review of published papers and does not contain any individual person’s data.
The authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.