Prognostic models for complete recovery in ischemic stroke: a systematic review and meta-analysis

Background Prognostic models have been increasingly developed to predict complete recovery in ischemic stroke. However, questions arise about the performance characteristics of these models. The aim of this study was to systematically review and synthesize performance of existing prognostic models for complete recovery in ischemic stroke. Methods We searched journal publications indexed in PUBMED, SCOPUS, CENTRAL, ISI Web of Science and OVID MEDLINE from inception until 4 December, 2017, for studies designed to develop and/or validate prognostic models for predicting complete recovery in ischemic stroke patients. Two reviewers independently examined titles and abstracts, and assessed whether each study met the pre-defined inclusion criteria and also independently extracted information about model development and performance. We evaluated validation of the models by medians of the area under the receiver operating characteristic curve (AUC) or c-statistic and calibration performance. We used a random-effects meta-analysis to pool AUC values. Results We included 10 studies with 23 models developed from elderly patients with a moderately severe ischemic stroke, mainly in three high income countries. Sample sizes for each study ranged from 75 to 4441. Logistic regression was the only analytical strategy used to develop the models. The number of various predictors varied from one to 11. Internal validation was performed in 12 models with a median AUC of 0.80 (95% CI 0.73 to 0.84). One model reported good calibration. Nine models reported external validation with a median AUC of 0.80 (95% CI 0.76 to 0.82). Four models showed good discrimination and calibration on external validation. The pooled AUC of the two validation models of the same developed model was 0.78 (95% CI 0.71 to 0.85). Conclusions The performance of the 23 models found in the systematic review varied from fair to good in terms of internal and external validation. Further models should be developed with internal and external validation in low and middle income countries. Electronic supplementary material The online version of this article (10.1186/s12883-018-1032-5) contains supplementary material, which is available to authorized users.


Background
Globally, stroke is the second leading cause of death following ischemic heart disease and the third leading cause of disability [1,2]. In 2013, 6.5 million deaths from stroke (51% died from ischemic stroke), 113 million disability-adjusted life years were lost because of stroke (58% due to ischemic stroke) and 10.3 million of people with new strokes (67% were ischemic stroke) [1]. In 2015, prevalence of stroke was 42.4 million people, which included ischemic stroke for 24.9 million. There were 6.3 million stroke deaths worldwide, and 3.0 million individuals died of ischemic stroke [2].
Minimizing the time to treatment for stroke is the important key to improving chances of an excellent outcome (time lost is brain lost) [3]. It is also important to be able to predict the outcomes of diseases or treatments. Most physicians use their own clinical experience in predicting their patients' outcomes for making decisions in patient care management. The accuracy of these informal predictions is unclear. Care management might be improved if the physicians combined their clinical forecasts with the formal predictions provided by statistical models. This may be more accurate than relying simply on clinical experience. Prognostic models are statistical tools to assist physicians in making decisions which may affect their patients' outcomes [4].
Accurate prognostic models of the functional outcome of a complete recovery in patients after ischemic stroke could be beneficial to neurological care practices for a number of reasons. Firstly, the information of developed prognostic model could be used to select appropriate treatments and action plans in individual patient management, including patient counseling. Secondly, they could be used to improve rehabilitation and discharge planning. Lastly, in light of a weakening economy, prognostic models could be used to make the best clinical choices for patients with regard to specific clinical scenarios which may reduce health care costs [5].
To date, several studies have developed prognostic models to predict functional outcomes after ischemic stroke, and each model has different strengths and weaknesses. Since models do not always work well in practice, it is recommended that, before a prognostic model is used in clinical practice, the performance of the model should be properly evaluated. This process is known as model validation and involves an assessment of calibration (the agreement between the observed and predicted outcomes) and discrimination (the model's ability to discriminate between those patients who are likely or unlikely to experience a particular prognostic event). A poor calibration usually reflects over-fitting of the model in the development sample. At least the models should be determined the internal validity (for example, using 'bootstrap sampling') to assess validity for the setting where the development data originated from. Another aspect is the external validity (using patient data not used for the development model) to assess generalizability [6,7].
There may be danger in moving too quickly to use these models without appropriate validation and understanding of their limitations. The purpose of this study was to systematically review and synthesize performance of existing prognostic models which have been used to predict the probability of complete recovery in ischemic stroke and to investigate their quality.

Selection criteria
We included studies predicting the outcome of complete recovery after ischemic stroke and in which complete recovery was assessed by scores on at least one of the following instruments: the Barthel Index (BI) ≥ 95/100 or 19/20, the Glasgow Outcome Scale (GOS) score = 1, the Oxford Handicap Scale (OHS) score ≤ 2, and the Modified Rankin Scale (mRS) score ≤ 1. A further criterion was that the studies reported model performance by the use of the concordance statistic, area under the receiver operating characteristic curve (AUC) or calibration performance. There were no restrictions on timing of the outcome evaluation, age of the patients, or type/severity of ischemic stroke.

Search strategy
We searched PUBMED, SCOPUS, CENTRAL, ISI Web of Science and OVID MEDLINE for prognostic models published from inception until 4 December, 2017, using the search terms listed in the Additional file 1 without restrictions on publication language. We also reviewed the reference lists of relevant studies.

Study selection and data extraction
Study titles and abstracts were independently screened and selected by two reviewers (NJ and SR) using the specified criteria. If a decision could not be made based on the abstracts, we then considered their full texts. Disagreement was resolved through discussion with a third reviewer (ML). We extracted the performance measures (concordance statistics, AUCs and performance calibrations) of both types of prediction model: development models and validation models. We also extracted study characteristics: author(s), publication year, setting, study design, definition of outcome, number of subjects, number of outcome events, age, ischemic stroke severity and duration of follow-up.

Quality assessment
We assessed the study quality based on an adaptation of the tool developed by D' Amico et al. [8]. We showed how each study performed according to each of various major methodological requirements for prognosis research studies. The assessment items were as follows: Did the prognostic study use a cohort design? Were the predictors clearly defined and details provided of how they were measured? Were the missing data handled appropriately with statistical imputation? Was some form of stepwise analysis used for selecting predictors in a multivariable analysis? Was the sample size adequate as defined by an events-per-variable ratio of 10 or more? Was the final model validated on the patients who were used to generate the model (internal validation)? Was the final model validated on the patients who were not used to generate the model (external validation)?

Statistical analysis
We qualitatively synthesized model performances because each separate model had a different combination of predictor variables. We used frequencies and medians with 95% confidence intervals to describe the model performance which included its calibration (how closely predicted values agree with the observed values) and discrimination (the model's ability to discriminate between patients developing and not developing an outcome event, e.g., complete recovery cases and non-complete recovery cases among ischemic patients). The assessment of calibration was performed using either the Hosmer-Lemshow chisquare test or a calibration curve. The assessment of discrimination was conducted using either the AUC or the concordance statistic (C-statistic) along with a 95% CI. The discrimination of each model was evaluated in accordance with the suggestions by Hosmer and Lemeshow: excellent (AUC ≥ 0.90), good (AUC ≥ 0.80 and ˂ 0.90), fair (AUC ≥ 0.70 and ˂ 0.80), and poor (AUC ˂ 0.70). Calibration was judged as good when a calibration curve closely resembled the line representing perfect calibration (the pre-specified acceptable absolute mean error for the calibration curve was ˂ 0.4) or when the Hosmer-Lemshow chi-square test was non-significant [9,10]. We estimated the 95% CIs for AUCs using Hanley's method for a study which presented only AUCs. The estimation required three quantities: total sample size, number of events and an AUC [11]. If two or more models assessed discrimination performance in terms of validation, we performed a random-effects inverse-variance meta-analysis using Stata version 10.1 [12].

Patient characteristics of included models
All included models were developed from elderly patients in high income countries: five studies from the United States of America [13][14][15][16]22], three from Germany [17][18][19] and one from the Netherlands [21]. One study [20] did not report study setting but used data from the Virtual International Stroke Trials Twenty models were developed from patients with a moderately severe ischemic stroke based on the NIHSS score [23], but for three models ischemic stroke severity was not reported. The sample sizes from which the models were developed ranged from 75 to 4441, the complete recovery cases ranged from 33 to 1970 and were measured at around 90 days after ischemic stroke diagnosed in six studies [13-16, 20, 22], at 100 days in three studies [17][18][19], and at 365 days in one study [21]. Table 1 presents the characteristics of the models.

Model predictors
A total of 24 different variables were included in the 23 models. The number of variables included in each model ranged from 1 to 11. The National Institutes of Health Stroke Scale (NIHSS) was the most common predictor (70.8%) followed by age (62.5%) and infarct volume (50.0%). Table 2 presents details of predictor variables.

Model performances
There were 11 development models which reported AUC values. The median AUC was 0.80 (95% CI 0.77 to 0.85) (see Table 4 and Fig. 2).

External validation
Two studies [16,19] reported external validation in their five developed models (model No.13-16 and 18). Model No.18 was validated in two different samples in two studies [19,20]; one study included patients from Germany [19] and another study included patients from many countries [20]. Three other studies [15,18,20] reported external validation of four pre-existing models ( Fig. 4). The median AUC of these nine validation models was 0.80 (95% CI 0.76 to 0.82) (see Table 4 and Fig. 5).

Discussion
This systematic review identified 23 prognostic models from ten studies for complete recovery in ischemic stroke. None of these models provided complete information about the model performance which included both internal and external validation. While most prognostic models (18/ 23) were validated and half of the models (12/23) reported fair to good discrimination on internal validation, only one model showed good calibration. Nearly one third of the models (9/23) were externally validated, and reported fair to good discrimination performance, but only a quarter of the models (6/23) reported nearly perfect calibration. Only two models were validated both internally and externally but not in complete process of the model performance.
There was only one model in which a meta-analysis could be performed, and the pooled AUC was fair.
The models were developed and validated in elderly patients mainly with a moderately severe stroke and mainly from high income countries. In addition, most of the developed models were not externally validated. These factors are likely to limit the application of the models to other populations and settings. In our review, we conducted a systematic search of several electronic databases. All of the included studies used a cohort design. For half of all the identified models more than 10% of their subjects' data was missing. The model performance analyses were handled by excluding the subjects with missing data. This strategy could lead to biased conclusions if the reasons for missing data were related to the important prognostic indicators or outcomes.
There are some issues related to the assessed quality of the prognostic models. Firstly, our search was performed up to 4 December, 2017. No attempt was made to search unpublished studies. Studies were selected and extracted independently by two reviewers. We did not assess publication bias by any statistical tests or funnel plot asymmetry due to insufficient data. However, we assessed and presented the quality of all the 10 selected studies for each important quality features listed in our methods section.  Secondly, about 70% (16/23) of the models used the full model approach in predictor selection (all the candidate predictors included in the multivariable analysis) [13,14,16]. This approach could reduce the risk of predictor selection bias and over-fitting. However, this technique is difficult to apply if the number of events is limited [6]. In our review, all of the models with the full model approach to predictor selection had fulfilled the requirement of more than 10 patients with complete recovery per predictor. Finally, incomplete measures of the prognostic model performances were reported in all included models. The 95% confidence intervals of the estimated performance indices were rarely reported. The 95% confidence intervals for AUCs which we estimated using Hanley's method may be slightly inaccurate, but this approach has been accepted in estimating the precision of AUCs when their standard errors are not reported. In addition, calibration performance was often ignored.
Calibration is the important performance measure for application of the model in practice. A poor calibration reflects over-fitting of a model and can also be interpreted as reflecting a need for shrinkage of regression coefficients in a prognostic model [10]. To our knowledge, there are three previous systematic reviews of prognostic models in stroke, but their outcomes of interest were different from ours: for example, mortality in hemorrhagic stroke, recurrent stroke and survival outcome of stroke patients. Therefore, our results were not able to be compared directly to the results of previous reviews. However, while the discrimination performance of their prognostic models varied from poor to good, calibration performance was not considered. The first study was a systematic review of prognostic tools for early mortality in hemorrhagic stroke [24]. The authors selected 11 articles (12 prognostic tools), but validation  The Hemphill-intracerebral hemorrhage (ICH) model had the largest number of validation cohorts (nine articles) and showed good performance with a pooled AUC of 0.80 (95%CI 0.77 to 0.85). The second study was a systematic review of prognostic models to predict survival in patients with acute stroke. The authors found 83 models, but only three models were externally validated and showed fair to good discrimination [25]. The final study was a systematic review of prediction models for recurrent stroke and myocardial infarction after stroke. The authors showed that the models for recurrent stroke discriminate poorly between patients with and without a recurrent stroke with the pooled AUCs of 0.60 (95% CI 0.59 to 0.62) for the Essen Stroke Risk Score (ESRS) and 0.62 (95% CI 0.60 to 0.64) for the Stroke Prognosis Instrument II (SPI-II) [26].
Our findings suggest that some of the current prognostic models for predicting complete recovery from ischemic stroke may be clinically useful when applied to patients from high income countries who have experienced moderately severe ischemic stroke. Model No. 9 which was developed by Johnston et al. [14] suggests that the model was not over-fitted to the data set and is likely to be useful in predicting complete recovery from ischemic stroke in a similar population. Models No.3, 6, 13 and 15 involving eight predictors, including NIHSS score, age, infarct volume, history of diabetes mellitus and stroke, prestroke disability, small-vessel stroke and tissue-type plasminogen activator (t-PA use). Some were overlapped among the models as shown in Table 2. These models fulfilled the majority of the methodological requirements and showed acceptable performances in the external validation for both discrimination and calibration. We recommend that these models should be used in other settings.

Conclusions
This systematic review has shown that, while many prognostic models have been published, they are rarely validated in external populations, and most of the models were developed from elderly patients with moderately severe ischemic stroke, mainly in high income countries. There is a need for the development of models in other settings, especially in low and middle income populations. All models should be validated, and performance measures should be reported which address the two key issues of discrimination and calibration.