A systematic review and meta-analysis to evaluate the diagnostic accuracy of recognition of stroke in the emergency department (ROSIER) scale
BMC Neurology volume 20, Article number: 304 (2020)
The present study aims to evaluate the performance and the clinical applicability of the Recognition of Stroke in the Emergency Department (ROSIER) scale via systematic review and meta-analysis.
Electronic databases of Pubmed and Embase were searched between 1st January 2005 (when ROSIER developed) and 8th May 2020. Studies that evaluated the diagnostic accuracy of the ROSIER scale were included. The sensitivity, specificity, diagnostic odds ratio (DOR), and area under the curve (AUC) were combined using a bivariate mixed-effects model. Fagan nomogram was used to evaluate the clinical applicability of the ROSIER scale.
A total of 14 studies incorporating 15 datasets were included in this meta-analysis. The combined sensitivity, specificity, DOR and AUC were 0.88 [95% confidence interval (CI): 0.83–0.91], 0.66 (95% CI: 0.52–0.77), 13.86 (95% CI, 7.67–25.07) and 0.88 (95% CI, 0.85–0.90), respectively. Given the pre-test probability of 60.0%, Fagan nomogram suggested the post-test probability was increased to 79% when the ROSIER was positive. In comparison, it was decreased to 22% when ROSIER was negative. Subgroup analysis showed that the pooled sensitivity of ROSIER in the European population was higher than that in Asia. In contrast, the pooled specificity was not significantly different between them. Moreover, results also suggested the male-to-female ratio ≤ 1.0 subgroup, prehospital setting subgroup, and other trained medical personnel subgroup had significantly higher sensitivity compared with their counterparts. At the same time, no significant differences were found in the pooled specificity between them.
ROSIER is a valid scale with high clinical applicability, which has not only good diagnostic accuracy in Europe but also shows excellent performance in Asia. Moreover, the ROSIER scale exhibits good applicability in prehospital settings with other trained medical personnel.
Stroke is a severe concern in the emergency department and remains the leading cause of death and disability [1, 2]. Early identification of patients with stroke and providing thrombolysis therapy can reduce morbidity and mortality [3, 4]. However, due to the misdiagnosis and inappropriate triage, many patients missed the best time for treatment [5,6,7]. Thus, a series of screening tools had been developed to help emergency physicians to conduct a rapid and accurate diagnosis of stroke [8,9,10,11]. The Recognition of Stroke in the Emergency Department (ROSIER), which was developed by Nor and colleagues in 2005, is one of the commonly recommended stroke scales in the western world .
ROSIER is a 7-item recognition instrument (ranging from − 2 to + 5) that based on the clinical history and neurological signs. A score of + 1 or above was considered positive of stroke or transient ischemic attack . During the past decades, several studies have been conducted to validate the diagnostic accuracy of ROSIER in different countries and work settings, but the results were not consistent [12,13,14,15,16,17]. Although previous studies have systematically evaluated its performance [18,19,20,21], the clinical utility and the applicability in other countries, and investigators have not been investigated before. Moreover, another seven studies have not been incorporated in previous meta-analyses [22,23,24,25,26,27,28].
In the present study, we aim to conduct a systematic review and meta-analysis to evaluate the diagnostic accuracy and clinical applicability of the ROSIER scale. Additionally, we also aim to discuss its performance in Asia, prehospital setting, and other trained medical personnel.
Literature search strategy
The terms of “stroke” OR “brain ischemic” OR “transient brain ischemia” OR “cerebra arterial disease” OR “non-ischemic stroke” OR “ischemic stroke” OR “cerebrovascular accident” OR “intracranial artery disease” AND “Recognition of Stroke in the Emergency Room” OR “ROSIER” were searched as medical subject headings (MeSH) in the Pubmed and Embase database for all the articles concerning the validation of the ROSIER model between 1st January 2005 (the ROSIER was developed) and 8th May 2020. The references were also manually checked for relevant papers.
Inclusion and exclusion criteria
Publications included in the present meta-analysis fulfill the criteria of (1) written in English; (2) use image logical examination as the golden standard for stroke diagnosis; (3) provide sufficient information for calculating true positive (TP), false positive (FP), false negative (FN) and negative (TN); (4) with a threshold as> 0. When multiple publications concerned about the same population, the most complete or updated one was included.
Characteristics of the first author, publication year, geographic background, study design (prospective or retrospective), work setting (emergency department or prehospital settings), ROSIER assessment investigator (emergency physicians, or other medical personnel), study period, sample size, mean age or rang of age, TP, FP, FN, and TN were independently extracted by two investigators. Any discrepancies were resolved by consensus.
The pooled sensitivity, specificity, and diagnostic odds ratio (DOR) were calculated using a bivariate mixed-effects model. DOR is the risk ratio in stroke relative to that in the control group . The pooled sensitivity and specificity data were used to construct the summary receiver operating characteristic curve (SROC), and the area under the curve (AUC) was used for evaluating the performance of the ROSIER scale . I2 measure the heterogeneity among the studies. The value of < 50% was considered as no heterogeneity. A sensitivity analysis was conducted to assess the effect of each dataset on the performance by sequentially omitting each data set . The quality of methodology in each study was evaluated by the two investigators using the Quality Assessment of Diagnostic Accuracy Studies (QUADAS) . Subgroup analysis was used for stratifying the studies by geographic background, study design, study setting, type of investigator, sample size, male-to-female ratio, and study quality. The trends of the diagnostic odds ratio (DOR) by ranking the publication year, sample size, and study quality were analyzed using cumulative meta-analyses. Publication bias was detected by Deek’s funnel plot, using 1/root (effective sample size) versus the log DOR. P < 0.05 for the slope coefficient indicates significant asymmetry . Clinical applicability of the ROSIER scale was evaluated by the Fagan nomogram, which was constructed by using the positive likelihood ratio and negative likelihood ratio .
Pooled sensitivity, specificity, SROC, DOR, and Fagan nomogram were conducted using Stata statistical software version 14.0 (StataCorp, College Station, TX). Cumulative meta-analysis was conducted by Comprehensive Meta-Analysis version 2.0 (Biostat, Englewood, NJ, USA). All the statistical significance levels were set two-tailed at p < 0.05.
Characteristics of the included studies
A total of 274 articles were acquired from the electronic databases. After a full examination, 159 publications were finally excluded: 64 were duplicated, 113 were not related, 47 were reviews, 18 were conference abstract, 10 were case report, one did not use a cutoff value of four  and seven not provided sufficient data [36,37,38,39,40,41,42]. (Fig. 1) In the end, a total of 14 studies with 15 datasets were included in this meta-analysis. Among them, five were conducted in United Kindom [7, 13, 15, 22, 24], four in China [14, 16, 27, 28], one in Korea , one in Portugal , one in Germany , one in Ireland  and one in Australia . The characteristics of the included studies are shown in Table 1.
Diagnostic accuracy of the ROSIER
The pooled sensitivity, specificity, DOR and AUC were 0.88 (95% CI: 0.83–0.91), 0.66 (95% CI: 0.52–0.77), 13.86 (95% CI: 7.67–25.07) and 0.88 (95% CI: 0.85–0.90), respectively. (Fig. 2 a-b) Substantial heterogeneity existed in the pooled sensitivity (I2 = 91.25%, p < 0.001), pooled specificity (I2 = 97.33%, p < 0.001), and the pooled DOR (I2 = 100.00%, p < 0.001).
Sensitivity analysis showed that the pooled DOR was not significantly altered after omitting each study, which suggested the stability of the results. (Appendix file 1 A) Cumulative meta-analysis showed, with accumulating more data ranked by the publication year, the combined DOR was gradually decreased. (Appendix file 1 B) The pooled DOR was steadily improved, and the 95% CI became narrower by continually enlarging the sample size and the study quality. (Appendix file 1 C-D) The p-value for the slope of Deek’s funnel plot was 0.45, which indicated no publication bias. (Fig. 3 a) The Fagan nomogram showed, given the pre-test probability of 60.0%, the post-test likelihood was increased to 79% when the ROSIER was positive. In comparison, it was decreased to 22% when the ROSIER was negative. (Fig. 3 b).
Subgroup analysis showed there is a significant difference in the performance of the ROSIER scale between Europe and Asia population. The pooled sensitivity in Asia was lower than that in Europe. (Appendix file 2) When stratified by the geographic background, the pooled sensitivity, specificity, DOR and AUC in Europe were 0.89 (95% CI: 0.82–0.93), 0.59 (95% CI: 0.42–0.74), 11.46 (95% CI: 5.67–23.15) and 0.86 (95% CI: 0.82–0.88), respectively. (Fig. 2 c-d) Deek’s plot showed that no publication bias existed. (P for slope = 0.57, Fig. 3 c) Fagan nomogram showed, for the given pre-test probability of 60% for the suspected stroke patients, the post-test probability was 76 and 22% for the positive and negative results of the ROSIER scale, respectively. (Fig. 3 d) For the studies conducted in Asia, the pooled sensitivity, specificity, DOR and AUC were 0.88 (95% CI: 0.78–0.94), 0.74 (95% CI: 0.51–0.88), 20.74 (95% CI: 7.51–57.25) and 0.90 (95% CI: 0.87–0.92), respectively. The Deek’s funnel plot suggested no publication bias existed in Asia. (p for slope = 0.29, Fig. 3 e) Fagan nomogram showed, given a pre-test probability of 60%, the post-test probability increased to 83% when the ROSIER was positive, and it was reduced to 19% when negative. (Fig. 3 f).
Subgroup analysis also showed that the pooled sensitivity in emergency department [vs pre-hospital setting; 0.87 (95% CI: 0.83–0.91) vs 0.94 (95% CI: 0.89–0.99); p < 0.001], conducted by emergency physician [vs other medical personnel; 0.86 (95%CI: 0.81–0.91) vs 0.92 (95% CI: 0.86–0.97); p < 0.001] and higher than 1.0 male-to-female ratio [vs ≤ 1.0; 0.88 (95% CI: 0.83–0.93) vs 0.89 (95% CI: 0.84–0.93); p < 0.001] was significantly lower than their counterparts. However, there was no difference in the pooled specificity between them. Moreover, no differencewas detected in the diagnostic accuracy of ROSIER scale for different study design and study quality subgroups. (Appendix file 2).
The incidence of stroke is rising annually around the world. Early identification and treatment of stroke can improve treatment efficiency, mitigate sequelae, and even save lives . Nor and coworkers conducted the ROSIER scale for helping the emergency physicians to identify the stroke patients efficiently, and this scale was also recommended by the National Institute for Health and Clinical Excellence [7, 44]. ROSIER was developed in the United Kingdom, and whether it was valid in other countries was seldom studied before. The present study showed that, after excluding each study conducted in other countries, the pooled DOR did not significantly change, which confirmed the external validation and the stability of the results. Furthermore, subgroup analysis showed that Asian populations had a relatively lower sensitivity and similar specificity compared with that in Europe. Thus, the ROSIER could also be widely used in Asia, especially in China, as most of the Asian studies included in this meta-analysis were conducted in China.
As shown in Appendix file 3, the ROSIER presented to include more items compared with the published stroke screening tools, such as Cincinnati Prehospital Stroke Scale (CPSS) , Face Arm Speech Test (FAST) , Los Angeles Prehospital Stroke Screen (LAPSS)  and the National Institute of Health stroke scale (NIHSS) . Thus the ROSIER might have a relatively better performance in the stroke diagnosis, which was consistent with previous studies [13, 14, 25]. The ROSIER scale was firstly developed in the emergency department and was prospectively validated by emergency physicians . The subgroup analysis showed that the performance of the ROSIER scale was comparable between prehospital settings and the emergency department. Moreover, results also suggested the other trained medical personnel present to have a significantly higher sensitivity and similar specificity compared with the emergency physicians in using the ROSIER scale. Thus, the ROSIER scale could be utilized in other workplaces and conducted by other trained investigators. It is an important finding, especially in China. Most of the stroke patients in China often occurred at home. Due to the limited health resources, not all of these patients could be transferred to the emergency department of a high-level hospital in time. According to the results in the present study, these patients could be firstly evaluated by the general practitioners in prehospital settings or community healthcare centers. The high-risk stroke patients should be transferred to the superior hospital as soon as possible. By establishing the community-hospital integrated model for the rapid treatment of stroke, and we can promote the diagnosis and treatment efficiency. Additionally, for the sake of the clinical applicability of the ROSIER in other work settings and investigators, it is of great importance to carry out comprehensive and systematic training to the medical personnel.
Although with the superiorities mentioned above, some issues also need to be focused. Under the condition that patients were in a coma state, and they were not companied with family members, the ROSIER score could not be accurately evaluated. If all of the items were scored “0”, that may result in a high false-negative rate. Although the sensitivity and specificity were relatively high, ROSIER could not wholly exclude the false-positive and false-negative rate. Thus, the ROSIER scale could just be regarded as a stroke screening tool, not the diagnostic criteria.
Moreover, substantial heterogeneities were detected to present across the studies. These heterogeneities were partly explained by factors such as geographic background, work setting, and investigators. However, it could not be markedly diminished and may affect the results to some extent. Although some studies tried to validate the performance of ROSIER, they were not included in the present study, due to the insufficient information for calculating the sensitivity and specificity with 95% CI of the ROSIER [36,37,38,39,40,41,42]. Thus, the results should be explained with caution.
ROSIER is a valid and portable stroke screening scale. It can be used not only for the emergency physicians at the emergency department in Europe but also in extended prehospital workplaces with other fully trained medical personnel in Asia. Other high-quality validation studies with larger sample sizes and broader populations were needed to confirm the results and try to extend the application of the ROSIER scale in the future.
Availability of data and materials
All data generated or analyzed during this study are included in this published article.
Area under the curve
Diagnostic odds ratio
Diagnostic odds ratio
Quality Assessment of Diagnostic Accuracy Studies
Recognition of Stroke in the Emergency Department
Summary receiver operating characteristic curve
Cincinnati prehospital stroke scale
Face arm speech test
Los Angeles prehospital stroke scale
National Institute of Health stroke scale.
Bonita RMS, Truelsen T, Bogousslavsky J, Toole J, Yatsu F. The global stroke initiative. Lancet Neurol. 2004;3(7):391–3.
Xia X, Yue W, Chao B, Li M, Cao L, Wang L, et al. Prevalence and risk factors of stroke in the elderly in northern China: data from the National Stroke Screening Survey. J Neurol 2019; doi: https://doi.org/10.1007/s00415-019-09281-5. [Epub ahead of print].
Marler JR, Tilley BC, Lu M, Brott TG, Lyden PC, Grotta JC, et al. Early stroke treatment associated with better outcome: the NINDS rt-PA stroke study. Neurology. 2000;55:1649–55.
Hacke W, Donnan G, Fieschi C, Kaste M, von Kummer R, Broderick JP, et al. Association of outcome with early stroke treatment: pooled analysis of ATLANTIS, ECASS, and NINDS rt-PA stroke trials. Lancet. 2004;363:768–74.
Hacke W, Kaste M, Bluhmki E, Brozman M, Davalos A, Guidetti D, et al. Thrombolysis with alteplase 3 to 4.5 hours after acute ischemic stroke. N Engl J Med. 2008;359:1317–29.
Ferro JM, Pinto AN, Falcao I, Rodrigues G, Ferreira J, Falcao F, et al. Diagnosis of stroke by the nonneurologist. A validation study. Stroke. 1998;29:1106–9.
Nor AM, Davis J, Sen B, Shipsey D, Louw SJ, Dyker AG, et al. The recognition of stroke in the emergency room (ROSIER) scale: development and validation of a stroke recognition instrument. Lancet Neurol. 2005;4:727–34.
Kothari R, Hall K, Brott T, Broderick J, et al. Early stroke recognition: developing an out-of-hospital NIH stroke scale. Acad Emerg Med. 1997;4:986–90.
Harbison J, Hossain O, Jenkinson D, Davis J, Louw SJ, Ford GA. Diagnostic accuracy of stroke referrals from primary care, emergency room physicians, and ambulance staff using the face arm speech test. Stroke. 2003;34:71–6.
Kidwell CS, Starkman S, Eckstein M, Weems K, Saver JL. Identifying stroke in the field. Prospective validation of the Los Angeles prehospital stroke screen (LAPSS). Stroke. 2000;31:71–6.
Goldstein LB, Bertels C, Davis JN. Interrater reliability of the NIH stroke scale. Arch Neurol. 1989;46:660–2.
Jackson A, Deasy C, Geary UM, Plunkett PK, Harbison J. Validation of the use of the ROSIER stroke recognition instrument in an Irish emergency department. Ir J Med Sci. 2008;177:189–92.
Whiteley WN, Wardlaw JM, Dennis MS, Sandercock PA. Clinical scores for the identification of stroke and transient ischaemic attack in the emergency department: a cross-sectional study. J Neurol Neurosurg Psychiatry. 2011;82:1006–10.
He M, Wu Z, Guo Q, Li L, Yang Y, Feng J. Validation of the use of the ROSIER scale in prehospital assessment of stroke. Ann Indian Acad Neurol. 2012;15:191–5.
Fothergill RT, Williams J, Edwards MJ, Russell IT, Gompertz P. Does use of the recognition of stroke in the emergency room stroke assessment tool enhance stroke recognition by ambulance clinicians? Stroke. 2013;44:3007–12.
Jiang HL, Chan CPY, Leung YK, Li Y, Graham CA, Rainer TH. Evaluation of the recognition of stroke in the emergency room (ROSIER) scale in Chinese patients in Hong Kong. PLoS One. 2014;9.
Lee S, Doh H, Lee S, Seo J. Validation of the recognition of stroke in the emergency room (ROSIER) scale in a Korean emergency department. Intensive Care Med. 2014;40:S69.
Rudd M, Buck D, Ford GA, Price CI. A systematic review of stroke recognition instruments in hospital and prehospital settings. Emerg Med J. 2016;33:818–22.
Brandler ES, Sharma M, Sinert RH, Levine SR. Prehospital stroke scales in urban environments: a systematic review. Neurology. 2014;82:2241–9.
Sun Z, Yue Y, Leung CCH, Chan MTV, Gelb AW, Hardman JG, et al. Clinical diagnostic tools for screening of perioperative stroke in general surgery: a systematic review. Br J Anaesth. 2016;116:328–38.
Vidale S, Agostoni E. Prehospital stroke scales and large vessel occlusion: a systematic review. Acta Neurol Scand. 2018;138:24–31.
Byrne B, O'Halloran P, Cardwell C. Accuracy of stroke diagnosis by registered nurses using the ROSIER tool compared to doctors using neurological assessment on a stroke unit: a prospective audit. Int J Nurs Stud. 2011;48:979–85.
Gregório TJA, Giles M. Performance of diagnostic tools in distinguishing TIA or minor stroke from mimics in consecutive referrals to a specialist neurovascular clinic. Cerebrovasc Dis. 2012;33:748.
Benjamin LA, Joekes E, Das K, Beeching NJ, Wilkins E, Solomon T. Diagnostic CT brain in an HIV population. J Inf Secur. 2013;67:619–22.
Purrucker JC, Hametner C, Engelbrecht A, Bruckner T, Popp E, Poli S. Comparison of stroke recognition and stroke severity scores for stroke detection in a single cohort. J Neurol Neurosurg Psychiatry. 2015;86:1021–8.
MacKay MT, Churilov L, Donnan GA, Babl FE, Monagle P. Performance of bedside stroke recognition tools in discriminating childhood stroke from mimics. Neurology. 2016;86:2154–61.
Mao H, Lin P, Mo J, Li Y, Chen X, Rainer TH, et al. Development of a new stroke scale in an emergency setting. BMC Neurol. 2016;16.
He M, Wu Z, Zhou J, Zhang G, Li Y, Chen W, et al. ROSIER scale is useful in an emergency medical service transfer protocol for acute stroke patients in primary care center: a southern China study. Neurol Asia. 2017;22:93–8.
Reitsma JB, Glas AS, Rutjes AW, Scholten RJ, Bossuyt PM, Zwinderman AH. Bivariate analysis of sensitivity and specificity produces informative summary measures in diagnostic reviews. J Clin Epidemiol. 2005;58:982–90.
Moses LE, Shapiro D, Littenberg B. Combining independent studies of a diagnostic test into a summary ROC curve: data-analytic approaches and some additional considerations. Stat Med. 1993;12:1293–316.
Thakkinstian A, McElduff P, D'Este C, et al. A method for meta-analysis of molecular association studies. Stat Med. 2005;24:1291–306.
Whiting P, Rutjes AW, Reitsma JB, Duffy D, Attia J. The development of QUADAS: a tool for the quality assessment of studies of diagnostic accuracy included in systematic reviews. BMC Med Res Methodol. 2003;3:25.
Deeks JJ, Macaskill P, Irwig L. The performance of tests of publication bias and other sample size effects in systematic reviews of diagnostic test accuracy was assessed. J Clin Epidemiol. 2005;58:882–93.
Fagan TJ. Letter: nomogram for Bayes theorem. N Engl J Med. 1975;293:257.
Turc GMB, Naggara O, Seners P, Isabel C, Tisserand M, et al. Clinical scales do not reliably identify acute ischemic stroke patients witjh large-artery occlusion. Stroke. 2016;47:1466–72.
Yock-Corrales A, Babl FE, Mosley IT, Mackay MT. Can the FAST and ROSIER adult stroke recognition tools be applied to confirmed childhood arterial ischemic stroke? BMC Pediatr. 2011;11:93.
Fernandez MLL, Fabiana NL, Pascual JLVR. The recognition of stroke in the emergency room (ROSIER) scale: a validation of a stroke recognition instrument in a tertiary hospital. Cerebrovasc Dis. 2012;34:63.
Holland D, Punter MNM, Moynihan BJ. Posterior circulation strokes are less likely to be admitted directly to a stroke unit than anterior circulation strokes within a London HASU model. Int J Stroke. 2012;7:24.
Purrucker JC, Hametner C, Engelbrecht A, Bruckner T, Popp E, Poli S. Comparison of stroke scores in a pre-and intra-hospital emergency setting. Cerebrovasc Dis. 2013;35:653.
Sharma M, Sinert R, Levine S, Brandler E. Novel graphical comparative analyses of 7 prehospital stroke scales. Neurology. 2014;82.
Zulafkar M, Alwis L, Licenik R, Kadwani H, Sethuraman S, Sekaran L, et al. A pilot study proposing a clinical decision algorithm (CDA), which, aims to improve clinical acumen in the recognition of patients with stroke or TIA. Int J Stroke. 2015;10:124.
Ciobanu G. Stroke scales in evaluation and recognition of stroke. Arch Balkan Med Union. 2017;52:A19.
Johnston SC, Mendis S, Mathers CD. Global variation in stroke burden and mortality: estimates from monitoring, surveillance, and modelling. Lancet Neurol. 2009;8:345–54.
Swain S, Turner C, Tyrrell P, Rudd A. Guideline Development Group Diagnosis and initial management of acute stroke and transient ischaemic attack: summary of NICE guidance. BMJ. 2008;337:a786.
Shan Dong Natural Science Foundation of China [Grant No. ZR2017LH022].
Ethics approval and consent to participate
Consent for publication
The authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Han, F., Zuo, C. & Zheng, G. A systematic review and meta-analysis to evaluate the diagnostic accuracy of recognition of stroke in the emergency department (ROSIER) scale. BMC Neurol 20, 304 (2020). https://doi.org/10.1186/s12883-020-01841-x