A systematic review and meta-analysis to evaluate the diagnostic accuracy of recognition of stroke in the emergency department (ROSIER) scale

Background The present study aims to evaluate the performance and the clinical applicability of the Recognition of Stroke in the Emergency Department (ROSIER) scale via systematic review and meta-analysis. Methods Electronic databases of Pubmed and Embase were searched between 1st January 2005 (when ROSIER developed) and 8th May 2020. Studies that evaluated the diagnostic accuracy of the ROSIER scale were included. The sensitivity, specificity, diagnostic odds ratio (DOR), and area under the curve (AUC) were combined using a bivariate mixed-effects model. Fagan nomogram was used to evaluate the clinical applicability of the ROSIER scale. Results A total of 14 studies incorporating 15 datasets were included in this meta-analysis. The combined sensitivity, specificity, DOR and AUC were 0.88 [95% confidence interval (CI): 0.83–0.91], 0.66 (95% CI: 0.52–0.77), 13.86 (95% CI, 7.67–25.07) and 0.88 (95% CI, 0.85–0.90), respectively. Given the pre-test probability of 60.0%, Fagan nomogram suggested the post-test probability was increased to 79% when the ROSIER was positive. In comparison, it was decreased to 22% when ROSIER was negative. Subgroup analysis showed that the pooled sensitivity of ROSIER in the European population was higher than that in Asia. In contrast, the pooled specificity was not significantly different between them. Moreover, results also suggested the male-to-female ratio ≤ 1.0 subgroup, prehospital setting subgroup, and other trained medical personnel subgroup had significantly higher sensitivity compared with their counterparts. At the same time, no significant differences were found in the pooled specificity between them. Conclusions ROSIER is a valid scale with high clinical applicability, which has not only good diagnostic accuracy in Europe but also shows excellent performance in Asia. Moreover, the ROSIER scale exhibits good applicability in prehospital settings with other trained medical personnel.


Background
Stroke is a severe concern in the emergency department and remains the leading cause of death and disability [1,2]. Early identification of patients with stroke and providing thrombolysis therapy can reduce morbidity and mortality [3,4]. However, due to the misdiagnosis and inappropriate triage, many patients missed the best time for treatment [5][6][7]. Thus, a series of screening tools had been developed to help emergency physicians to conduct a rapid and accurate diagnosis of stroke [8][9][10][11]. The Recognition of Stroke in the Emergency Department (ROSIER), which was developed by Nor and colleagues in 2005, is one of the commonly recommended stroke scales in the western world [7].
ROSIER is a 7-item recognition instrument (ranging from − 2 to + 5) that based on the clinical history and neurological signs. A score of + 1 or above was considered positive of stroke or transient ischemic attack [7]. During the past decades, several studies have been conducted to validate the diagnostic accuracy of ROSIER in different countries and work settings, but the results were not consistent [12][13][14][15][16][17]. Although previous studies have systematically evaluated its performance [18][19][20][21], the clinical utility and the applicability in other countries, and investigators have not been investigated before. Moreover, another seven studies have not been incorporated in previous meta-analyses [22][23][24][25][26][27][28].
In the present study, we aim to conduct a systematic review and meta-analysis to evaluate the diagnostic accuracy and clinical applicability of the ROSIER scale. Additionally, we also aim to discuss its performance in Asia, prehospital setting, and other trained medical personnel.

Literature search strategy
The terms of "stroke" OR "brain ischemic" OR "transient brain ischemia" OR "cerebra arterial disease" OR "non-ischemic stroke" OR "ischemic stroke" OR "cerebrovascular accident" OR "intracranial artery disease" AND "Recognition of Stroke in the Emergency Room" OR "ROSIER" were searched as medical subject headings (MeSH) in the Pubmed and Embase database for all the articles concerning the validation of the ROSIER model between 1st January 2005 (the ROSIER was developed) and 8th May 2020. The references were also manually checked for relevant papers.

Inclusion and exclusion criteria
Publications included in the present meta-analysis fulfill the criteria of (1) written in English; (2) use image logical examination as the golden standard for stroke diagnosis; (3) provide sufficient information for calculating true positive (TP), false positive (FP), false negative (FN) and negative (TN); (4) with a threshold as> 0. When multiple publications concerned about the same population, the most complete or updated one was included.

Data abstraction
Characteristics of the first author, publication year, geographic background, study design (prospective or retrospective), work setting (emergency department or prehospital settings), ROSIER assessment investigator (emergency physicians, or other medical personnel), study period, sample size, mean age or rang of age, TP, FP, FN, and TN were independently extracted by two investigators. Any discrepancies were resolved by consensus.

Statistical analysis
The pooled sensitivity, specificity, and diagnostic odds ratio (DOR) were calculated using a bivariate mixedeffects model. DOR is the risk ratio in stroke relative to that in the control group [29]. The pooled sensitivity and specificity data were used to construct the summary receiver operating characteristic curve (SROC), and the area under the curve (AUC) was used for evaluating the performance of the ROSIER scale [30]. I 2 measure the heterogeneity among the studies. The value of < 50% was considered as no heterogeneity. A sensitivity analysis was conducted to assess the effect of each dataset on the performance by sequentially omitting each data set [31]. The quality of methodology in each study was evaluated by the two investigators using the Quality Assessment of Diagnostic Accuracy Studies (QUADAS) [32]. Subgroup analysis was used for stratifying the studies by geographic background, study design, study setting, type of investigator, sample size, male-to-female ratio, and study quality. The trends of the diagnostic odds ratio (DOR) by ranking the publication year, sample size, and study quality were analyzed using cumulative meta-analyses. Publication bias was detected by Deek's funnel plot, using 1/root (effective sample size) versus the log DOR. P < 0.05 for the slope coefficient indicates significant asymmetry [33]. Clinical applicability of the ROSIER scale was evaluated by the Fagan nomogram, which was constructed by using the positive likelihood ratio and negative likelihood ratio [34].
Pooled sensitivity, specificity, SROC, DOR, and Fagan nomogram were conducted using Stata statistical software version 14.0 (StataCorp, College Station, TX). Cumulative meta-analysis was conducted by Comprehensive Meta-Analysis version 2.0 (Biostat, Englewood, NJ, USA). All the statistical significance levels were set two-tailed at p < 0.05.

Characteristics of the included studies
A total of 274 articles were acquired from the electronic databases. After a full examination, 159 publications were finally excluded: 64 were duplicated, 113 were not related, 47 were reviews, 18 were conference abstract, 10 were case report, one did not use a cutoff value of four [35] and seven not provided sufficient data [36][37][38][39][40][41][42]. (Fig. 1) In the end, a total of 14 studies with 15 datasets were included in this meta-analysis. Among them, five were conducted in United Kindom [7,13,15,22,24], four in China [14,16,27,28], one in Korea [17], one in Portugal [23], one in Germany [25], one in Ireland [12] and one in Australia [26]. The characteristics of the included studies are shown in Table 1.
Sensitivity analysis showed that the pooled DOR was not significantly altered after omitting each study, which suggested the stability of the results. (Appendix file 1 A) Cumulative meta-analysis showed, with accumulating more data ranked by the publication year, the combined DOR was gradually decreased. (Appendix file 1 B) The pooled DOR was steadily improved, and the 95% CI became narrower by continually enlarging the sample size and the study quality. (Appendix file 1 C-D) The p-value for the slope of Deek's funnel plot was 0.45, which indicated no publication bias. (Fig. 3 a) The Fagan nomogram showed, given the pre-test probability of 60.0%, the post-test likelihood was increased to 79% when the ROSIER was positive. In comparison, it was decreased to 22% when the ROSIER was negative. (Fig. 3 b).
Subgroup analysis showed there is a significant difference in the performance of the ROSIER scale between Europe and Asia population. The pooled sensitivity in   (Fig. 2 c-d) Deek's plot showed that no publication bias existed. (P for slope = 0.57, Fig. 3 c) Fagan nomogram showed, for the given pre-test probability of 60% for the suspected stroke patients, the post-test probability was 76 and 22% for the positive and negative results of the ROSIER scale, respectively. (Fig. 3 d) For the studies conducted in Asia, the pooled sensitivity, specificity, DOR and AUC were 0.88 (95% CI: 0.78-0.94), 0.74 (95% CI: 0.51-0.88), 20.74 (95% CI: 7.51-57.25) and 0.90 (95% CI: 0.87-0.92), respectively. The Deek's funnel plot suggested no publication bias existed Fig. 2 The forest plot for evaluating the pooled sensitivity, specificity, diagnostic odds ratio, and the area under the curve for the performance of the ROSIER scale. a: the forest plot for estimating the pooled sensitivity, specificity, and DOR in total population; b: the pooled AUC of the SROC in total population; c: the forest plot for estimating the pooled sensitivity, specificity, and DOR in Europe; d: the pooled AUC of the SROC in Europe; e: the forest plot for estimating the pooled sensitivity, specificity and DOR in Asia; f: the pooled AUC of the SROC in Asia Abbreviations: ROSIER = Recognition of Stroke in the Emergency Department; DOR = diagnostic odds ratio; AUC = area under the curve; SROC = summary receiver operating curve.
in Asia. (p for slope = 0.29, Fig. 3 e) Fagan nomogram showed, given a pre-test probability of 60%, the post-test probability increased to 83% when the ROSIER was positive, and it was reduced to 19% when negative. (Fig. 3 f).
Subgroup analysis also showed that the pooled sensitivity in emergency department [vs pre-hospital setting;  than their counterparts. However, there was no difference in the pooled specificity between them. Moreover, no differencewas detected in the diagnostic accuracy of ROSIER scale for different study design and study quality subgroups. (Appendix file 2).

Discussion
The incidence of stroke is rising annually around the world. Early identification and treatment of stroke can improve treatment efficiency, mitigate sequelae, and even save lives [43]. Nor and coworkers conducted the ROSIER scale for helping the emergency physicians to identify the stroke patients efficiently, and this scale was also recommended by the National Institute for Health and Clinical Excellence [7,44]. ROSIER was developed in the United Kingdom, and whether it was valid in other countries was seldom studied before. The present study showed that, after excluding each study conducted in other countries, the pooled DOR did not significantly change, which confirmed the external validation and the stability of the results. Furthermore, subgroup analysis showed that Asian populations had a relatively lower sensitivity and similar specificity compared with that in Europe. Thus, the ROSIER could also be widely used in Asia, especially in China, as most of the Asian studies included in this meta-analysis were conducted in China.
As shown in Appendix file 3, the ROSIER presented to include more items compared with the published stroke screening tools, such as Cincinnati Prehospital Stroke Scale (CPSS) [8], Face Arm Speech Test (FAST) [9], Los Angeles Prehospital Stroke Screen (LAPSS) [10] and the National Institute of Health stroke scale (NIHSS) [11]. Thus the ROSIER might have a relatively better performance in the stroke diagnosis, which was consistent with previous studies [13,14,25]. The ROSIER scale was firstly developed in the emergency department and was prospectively validated by emergency physicians [7]. The subgroup analysis showed that the performance of the ROSIER scale was comparable between prehospital settings and the emergency department. Moreover, results also suggested the other trained medical personnel present to have a significantly higher sensitivity and similar specificity compared with the emergency physicians in using the ROSIER scale. Thus, the ROSIER scale could be utilized in other workplaces and conducted by other trained investigators. It is an important finding, especially in China. Most of the stroke patients in China often occurred at home. Due to the limited health resources, not all of these patients could be transferred to the emergency department of a high-level hospital in time. According to the results in the present study, these patients could be firstly evaluated by the general practitioners in prehospital settings or community healthcare centers. The high-risk stroke patients should be transferred to the superior hospital as soon as possible. By establishing the community-hospital integrated model for the rapid treatment of stroke, and we can promote the diagnosis and treatment efficiency. Additionally, for the sake of the clinical applicability of the ROSIER in other work settings and investigators, it is of great importance to carry out comprehensive and systematic training to the medical personnel.

Limitations
Although with the superiorities mentioned above, some issues also need to be focused. Under the condition that patients were in a coma state, and they were not companied with family members, the ROSIER score could not be accurately evaluated. If all of the items were scored "0", that may result in a high false-negative rate. Although the sensitivity and specificity were relatively high, ROSIER could not wholly exclude the false-positive and false-negative rate. Thus, the ROSIER scale could just be regarded as a stroke screening tool, not the diagnostic criteria.
Moreover, substantial heterogeneities were detected to present across the studies. These heterogeneities were partly explained by factors such as geographic background, work setting, and investigators. However, it could not be markedly diminished and may affect the results to some extent. Although some studies tried to validate the performance of ROSIER, they were not included in the present study, due to the insufficient information for calculating the sensitivity and specificity with 95% CI of the ROSIER [36][37][38][39][40][41][42]. Thus, the results should be explained with caution.

Conclusions
ROSIER is a valid and portable stroke screening scale. It can be used not only for the emergency physicians at the emergency department in Europe but also in extended prehospital workplaces with other fully trained medical personnel in Asia. Other high-quality validation studies with larger sample sizes and broader populations were needed to confirm the results and try to extend the application of the ROSIER scale in the future.