- Research article
- Open Access
Validation of the Chinese version of the FOUR score in the assessment of neurosurgical patients with different level of consciousness
BMC Neurology volume 15, Article number: 254 (2015)
The Glasgow Coma Scale (GCS) is currently the most widely used scoring system for comatose patients. A decade ago, the Full Outline of Unresponsiveness (FOUR) score was devised to better capture four functional aspects of consciousness (eye, motor responses, brainstem reflexes, and respiration). This study aimed to validate the Chinese version of the FOUR score in patients with different levels of consciousness.
The study had two phases: (1) translation of the FOUR score, and (2) assessment of its reliability and validity. The Chinese version of the FOUR score was developed according to a standardized protocol. One hundred-twenty consecutive patients with acute brain damage, admitted to Nanfang Hospital (Southern Medical University, Guangdong, China) from November 2014 to February 2015, were enrolled. The inter-rater agreement for the FOUR score and GCS was evaluated using intraclass correlation coefficient (ICC). Receiver operating characteristic (ROC) curves were established to determine the scales’ abilities to predict outcome.
The rater agreement was excellent both for FOUR (ICC = 0.970; p < 0.001) and GCS (ICC = 0.958; p < 0.001). The FOUR score yielded an excellent test-retest reliability (ICC = 0.930; p < 0.001). Spearman’s correlation coefficients between GCS and the FOUR score were high: r = 0.932, first rating; r = 0.887, second rating (all p < 0.001). Areas under the curve (AUC) for mortality were 0.834 (95 % CI, 0.740–0.928) and 0.815 (95 % CI, 0.723–0.908) for the FOUR score and GCS, respectively.
The Chinese version of the FOUR score is a reliable scale for evaluating the level of consciousness in patients with acute brain injury.
The Glasgow Coma Scale (GCS) is a widely used tool to measure objectively the patient’s level of consciousness (LOC) in the clinical setting. However, the GCS has a few limitations [1, 2]. First, it cannot detect subtle clinical changes in comatose patients due to the lack of important clinical indicators such as brainstem reflexes and respiration pattern (including mechanical ventilation), which reflect the consciousness level . In addition, for intubated patients, the GCS cannot properly assess the verbal component, and scoring difficulties have been displayed by unexperienced nurses and paramedics . Most importantly, a 10-year retrospective study revealed that the GCS cannot predict the outcome of patients with traumatic brain injury (TBI) . Therefore, other scales are being developed for this purpose, but most of them are not widely accepted because of their complexity and non-reliability [4–7].
A novel coma scaling system, the Full Outline of Un-Responsiveness (FOUR) score was developed by the Mayo Clinic in 2005 . It evaluates four functional categories: eye response, motor response, brainstem reflexes, and respiration pattern (including mechanical ventilation). All the four categories are scored from 0 to 4 points, with 4 representing normal, and 0 indicating no function . Patients are considered as brain dead with an overall score of 0 . Finally, this score has also been recommended by the latest guidelines of the European Society of Intensive Care Medicine (ESICM) . Previous studies compared the FOUR score to the GCS score and showed that they were comparable [10–14].
Recently, several prospective studies have introduced and validated the FOUR score as a reliable tool for the assessment of patients in medical intensive care units, neuro-intensive care units, and neurology and emergency departments [10, 11, 15–17]. In addition, the FOUR score has been used to assess cirrhotic  and pediatric  patients. It is also valid in predicting the outcome of patients after cardiac arrest  and traumatic brain injury . Interestingly, the FOUR score has already been translated into many languages such as Italian , French , Spanish , Korean , and Turkish , but no Chinese version is available to date.
Therefore, this study had two aims: 1) the translation of the FOUR score into Chinese; and 2) the validation of the FOUR score as a measure of the level of consciousness.
Development of the Chinese version of the FOUR Score
The translation process is displayed in Fig. 1. In the translation process, we improved the following points to allow the medical staff to apply the FOUR score more easily in Chinese. First, for the eye response, the translation resulted in “the eyelids closed but open to loud voice, and eyelids closed but open to pain”, which are misunderstood as patients eyes open to loud voice or pain; this omitted important information that patients eyelids are closed in most cases, but open to loud voice or pain. Therefore, a more specific and clear translation was provided for this point. Secondly, for the motor response, the author originally thought that focus should be on observing patients’ arm response, but this point is not reflected in the original English language text. Therefore, we added arm flexion or extension response to pain in order to be more accurate and understood by the medical staff. Finally, the Chinese version of the FOUR score was approved by the consensus meeting. In conclusion, the Chinese version of the FOUR score was validated (Tables 1 and 2).
Reliability and validity assessment of the Chinese version of the FOUR score
This study was approved by the ethical committee of the Nanfang Hospital, Southern Medical University, Guangdong, China (No. NFEC-2014-124); informed consent was obtained from all patients or their caregivers. A total of 120 consecutive patients with acute brain-damage were enrolled from November 2014 to February 2015 at our Neurosurgical intensive care unit and evaluated with the GCS and FOUR scores on the day of admission. Adult patients >18 years old diagnosed with acute traumatic brain injury or non-TBI (intracerebral hemorrhage, subarachnoid hemorrhage, brain tumor, hydrocephalus, epilepsy, cerebral infarction, etc.) were recruited. Exclusion criteria were: 1) treatment with neuromuscular junction blockers or sedatives; 2) hemodynamic instability (systolic blood pressure [BP] <80 mm Hg); or 3) substance or alcohol abuse. Demographic data, vital signs, diagnosis, day of evaluation, and degree of consciousness (awake or alert, drowsy, stuporous, or comatose, according to Ropper ), were recorded.
Patients were assessed by two neurosurgery residents (R/R) or nurses (N/N), or a combination of a resident and a nurse (R/N); each health care professional had more than 10 years of clinical experience in a neurosurgical\neurological intensive care unit (ICU), and patients were assessed by a randomly chosen rater pair. For intubated patients, the lowest GCS verbal score was considered to be 1. Raters watched a 20-min videos with patient examples and instructions about the FOUR score. Subsequently, a one-page handout with written instructions describing both the FOUR score and GCS were provided to raters who were given opportunities to assess patients before study beginning. In addition, in order to minimize the possible changes in patient’s level of consciousness, all the assessments were completed within one hour. In addition, 30 randomly selected patients were evaluated by the FOUR score on the second day of hospitalization to test the test-retest reliability.
Patients’ in-hospital mortality and clinical diagnosis of brain death were documented. Outcome was assessed at 3 months using the modified Rankin Scale (MRS) , which assesses the patients’ overall function and mortality. Simply, a score of 0 indicates no symptoms; 1 represents no evident disability despite symptoms; 2 indicates slight disability, with no ability to carry out all routine activities, ability to take care of own affairs; 3 represents moderate disability, requiring some help, but ability to walk without assistance; 4 indicates moderately severe disability, with no ability to walk and attend to own bodily needs without assistance; 5 represents severe disability, e.g. in bedridden patients with incontinence, requiring constant nursing care; 6 indicates death. In this study, a MRS score between 0 and 2 indicated a good recovery for the patient; a poor outcome was reflected by a score between 3 and 6.
SPSS 13.0 (SPSS Inc., Chicago, IL, USA) was used for data analysis. Data normality was analyzed using the Kolmogorov-Smirnov test. Normally distributed continuous data are expressed as mean ± standard deviation (SD). Non-normally distributed continuous data are presented as median (range). Categorical variables are presented as frequencies. For the FOUR score scales, intraclass correlation coefficient (ICC) was used to measure the inter-rater agreement and test-retest reliability. Cronbach’s α and Spearman’s correlation coefficients were estimated to assess internal consistency and construct validity (with the GCS as criterion). To compare the FOUR score and GCS for prediction of in-hospital mortality and 3-month MRS 3-6, prognostic performance was evaluated by receiver operating characteristic (ROC) curves and areas under the curve (AUC). In general, an AUC of 1.0 refers to a perfect test, while a perfectly inaccurate test has an AUC of 0.0. Usually, an AUC higher than 0.75 indicates that the predictors of the scale have moderate discriminative properties, while predictors are excellent with an AUC ≥0.90. The best cut-off point was chosen to yield the maximum Youden index . Two-tailed P < 0.05 was considered statistically significant.
Characteristics of the patients
Detailed patients’ characteristics are summarized in Table 3. Three patients dropped out, among which the MRS score could not be obtained at the 3-month telephone follow-up for two patients, while the other patient used sedatives during the evaluation and had to be was excluded. The characteristics of these three patients were: 1) male; drowsiness on hospitalization; 58 years old; subarachnoid hemorrhage; no mechanical ventilation; FOUR score: 13; GCS score: 11; 2) male; light coma on hospitalization; 34 years old; cerebral hemorrhage; no mechanical ventilation; FOUR score: 11; GCS: 7; and 3) this patient used sedatives during the 3-month evaluation; female; deep coma on hospitalization; 43 years old; cerebral hemorrhage; FOUR score: 3; GCS: 3.
FOUR and GCS scores
The distributions of the patients according to FOUR scores and GCS are shown in Fig. 2. For the FOUR score (discrete distribution, non-normally distributed), the maximum grade of 16 was the most represented among the patients, corroborating the results obtained for motor response, respiration, and brain stem response; the majority of patients had an eye sub-score of 0. In the case of GCS (discrete distribution, non-normally distributed), the distribution was rather sparse, with 5, 6, 7, 14, and 15 being the leading overall scores (Fig. 2). Interestingly, the overall reliability was excellent for the FOUR and the GCS scores (Table 4). In addition, the FOUR score yielded an excellent test-retest reliability (ICC = 0.930; p < 0.001) (Table 4).
For the FOUR score, intraclass correlation coefficients (ICC) for the alert, drowsy, stuporous, and comatose groups were 0.888 (95 % CI, 0.776–0.945), 0.696 (95 % CI, 0.351–0.859), 0.891 (95 % CI, 0.801–0.940), and 0.879 (95 % CI, 0.649–0.959), respectively. For GCS, ICC values for the alert, drowsy, stuporous, and comatose groups were 0.712 (95 % CI, 0.422–0.857), 0.761 (95 % CI, 0.489–0.889), 0.521 (95 % CI, 0.126–0.738), and 0.696 (95 % CI, 0.122–0.897), respectively. The FOUR score had a slightly higher inter-observer agreement for the diagnosis of traumatic head injury compared with GCS. The overall ICC of the FOUR score for the traumatic and non-traumatic head injury groups were 0.977 (95 % CI, 0.959–0.986) and 0.964 (95 % CI, 0.941–0.978), respectively. ICC of the GCS for the traumatic and non-traumatic head injury groups were 0.956 (95 % CI, 0.924–0.975) and 0.959 (95 % CI, 0.934–0.975), respectively. The overall ICC for the FOUR score for the intubated and non-intubated patients were 0.940 (0.899–0.965) and 0.956 (0.927–0.973), respectively. The overall ICC for the GCS score for the intubated and non-intubated patients were 0.858 (0.760–0.916) and 0.953 (0.922–0.972), respectively (Table 5).
The Cronbach’s α showed a high degree of internal consistency for the FOUR score (first rating, α = 0.846; second rating, α = 0.844; all p < 0.001) and the GCS (first rating, α = 0.916; second rating, α = 0.904; all p < 0.001). Spearman’s correlation coefficients between the GCS and FOUR scores were high and statistically significant (first rating, r = 0.932; second rating, r = 0.887; p < 0.001).
Regarding in-hospital mortality (Fig. 3), areas under the curve (AUC) for the FOUR and GCS scores were 0.834 (95 % CI 0.740-0.928) and 0.815 (95 % CI 0.723–0.908), respectively. The maximized scores predicting in-hospital mortality were 9 for the FOUR score (sensitivity, 75 %; specificity, 85 %) and 7 for the GCS (sensitivity, 63 %; specificity, 89 %). Similarly, as shown in Fig. 3, the FOUR score AUC for unfavorable outcome (MRS > 2) was higher compared to that obtained with the GCS (0.818 vs. 0.812). The optimal score predicting a poor outcome was 13 for the FOUR score (sensitivity, 79 %; specificity, 74 %) and 10 for the GCS (sensitivity, 83 %; specificity, 72 %). Considering the components of the FOUR and GCS scores, all sub-scores had good predictive values for poor outcome (MRS 3-6) except respiration (AUC = 0.596, 95 % CI 0.495–0.698) of the FOUR score. Table 6 displays various areas under the curve for each outcome.
To date, many prospective studies have assessed the FOUR score, which is now widely used in the clinical setting. However, no Chinese version of the FOUR score was available. This study demonstrated that the Chinese version of the FOUR score has a good concurrent validity, a high degree of internal consistency, and a good inter-rater reliability among medical staff, and is at least as good as the GCS. These results are comparable to previous studies [10–14]. Inter-rater agreement ranges from good to excellent in all patient categories, showing a greater agreement compared with the GCS. For both the Chinese version of the FOUR score and the GCS score, the overall reliability of each rater pair was excellent, with intraclass correlations (ICC) of 0.929-0.991. The FOUR score showed a good test-retest reliability (ICC = 0.930; p < 0.001), suggesting that the Chinese version of the FOUR score is with high time-stability and consistency. The lowest ICC values were obtained by a rating pair comprising a resident and a nurse for both scales, but these values were still excellent for the Chinese version of the FOUR and GCS scores (0.929, 95 % CI, 0.857–0.965; 0.940, 95 % CI, 0.880–0.970, respectively). The results from this study are consistent with previous studies . Of all the sub-scales in the Chinese version of the FOUR score, inter-rater agreement for the brainstem sub-scale was the lowest, especially in stuporous and comatose patients. Nevertheless, these values were still excellent (ICC = 0.885, 95 % CI, 0.835–0.920) and in line with previous studies [11, 28, 29], but inconsistent with Iyer et al. . This may be explained by the fact that stuporous and comatose patients’ pupil and corneal reflexes are not sensitive enough, and it is not easy to distinguish when their mental status changes. Besides, other factors such as the time spent to observe pupils and corneal reflexes, measurement methods for the pupil sizes, and corneal reflexes, may differ.
Total scores of the FOUR and GCS scores were similar in predicting mortality and unfavorable outcome, in agreement with previous findings [17, 30]. Comparing the AUC of total GCS and total FOUR scores, the value for respiration patterns in the FOUR score was lowest. This may be explained by the fact that most patients were given a score of 4 (regular breathing pattern), with no patients having 3 (Cheyne-Stokes breathing), and few having 0 (breathes at ventilator rate or apnea). In this study, we validated a cut-off point of 9 for the Chinese version of the FOUR score, and 7 for the GCS in hospital mortality, which is in line with the inventor of the FOUR score and Okasha et al. . In this study, nearly half of the patients were intubated, which was similar to previous studies [3, 29]. The FOUR score showed a good consistency for both the intubated and non-intubated patients, which was in agreement with the results reported by Kramer et al. . These findings also suggest that the Chinese version of FOUR score may be used in multiple departments including intensive care unit and other units that use mechanical ventilation. However, prospective studies with large sample sizes are needed to validate these findings.
A few limitations of this study should be addressed. First, this was a single-site study with a relatively small sample size. We also enrolled few comatose patients. Further study assessing those particular patients is needed to better evaluate the scaling ystems. In addition, we only enrolled raters who had >10 years of clinical experience, while inexperienced medical staff were not included. Finally, we did not test the Chinese version of the FOUR score in child populations with head trauma, and did not stratify the analysis in TBI patients due to their small number.
In conclusion, the Chinese version of the FOUR score can be used to reliably assess patients with impaired consciousness. The scaling system is easily taught and remembered, allows detection of locked-in syndromes as well as the presence of a vegetative state, and is useful to predict poor outcome. Based on these findings, the Chinese version of the FOUR score is a reliable tool for evaluating LOC in patients with brain damage, and worthy of recommendation and application in clinical practice.
areas under the curve
European Society of Intensive Care Medicine
Full Outline of Unresponsiveness
Glasgow Coma Scale
intraclass correlation coefficient
intensive care unit
level of consciousness
receiver operating characteristic
traumatic brain injury
Holdgate A, Ching N, Angonese L. Variability in agreement between physicians and nurses when measuring the Glasgow Coma Scale in the emergency department limits its clinical usefulness. Emerg Med Australas. 2006;18:379–84.
Balestreri M, Czosnyka M, Chatfield DA, Steiner LA, Schmidt EA, Smielewski P, et al. Predictive value of Glasgow Coma Scale after brain trauma: change in trend over the past ten years. J Neurol Neurosurg Psychiatry. 2004;75:161–2.
Wijdicks EF, Bamlet WR, Maramattom BV, Manno EM, McClelland RL. Validation of a new coma scale: The FOUR score. Ann Neurol. 2005;58:585–93.
Knaus WA, Zimmerman JE, Wagner DP, Draper EA, Lawrence DE. APACHE-acute physiology and chronic health evaluation: a physiologically based classification system. Crit Care Med. 1981;9:591–7.
Starmark JE, Stalhammar D, Holmgren E. The Reaction Level Scale (RLS85). Manual and guidelines. Acta Neurochir (Wien). 1988;91:12–20.
Benzer A, Mitterschiffthaler G, Marosi M, Luef G, Puhringer F, De La Renotiere K, et al. Prediction of non-survival after trauma: Innsbruck Coma Scale. Lancet. 1991;338:977–8.
Gill M, Martens K, Lynch EL, Salih A, Green SM. Interrater reliability of 3 simplified neurologic scales applied to adults presenting to the emergency department with altered levels of consciousness. Ann Emerg Med. 2007;49:403–7. 407.e401.
Wijdicks EF, Varelas PN, Gronseth GS, Greer DM, American Academy of N. Evidence-based guideline update: determining brain death in adults: report of the Quality Standards Subcommittee of the American Academy of Neurology. Neurology. 2010;74:1911–8.
Sharshar T, Citerio G, Andrews PJ, Chieregato A, Latronico N, Menon DK, et al. Neurological examination of critically ill patients: a pragmatic approach. Report of an ESICM expert panel. Intensive Care Med. 2014;40:484–95.
Iyer VN, Mandrekar JN, Danielson RD, Zubkov AY, Elmer JL, Wijdicks EF. Validity of the FOUR score coma scale in the medical intensive care unit. Mayo Clin Proc. 2009;84:694–701.
Idrovo L, Fuentes B, Medina J, Gabaldon L, Ruiz-Ares G, Abenza MJ, et al. Validation of the FOUR Score (Spanish Version) in acute stroke: an interobserver variability study. Eur Neurol. 2010;63:364–9.
Marcati E, Ricci S, Casalena A, Toni D, Carolei A, Sacco S. Validation of the Italian version of a new coma scale: the FOUR score. Intern Emerg Med. 2012;7:145–52.
Akavipat P. Endorsement of the FOUR score for consciousness assessment in neurosurgical patients. Neurol Med Chir (Tokyo). 2009;49:565–71.
Fischer M, Ruegg S, Czaplinski A, Strohmeier M, Lehmann A, Tschan F, et al. Inter-rater reliability of the Full Outline of UnResponsiveness score and the Glasgow Coma Scale in critically ill patients: a prospective observational study. Crit Care. 2010;14:R64.
Wolf CA, Wijdicks EF, Bamlet WR, McClelland RL. Further validation of the FOUR score coma scale by intensive care nurses. Mayo Clin Proc. 2007;82:435–8.
Stead LG, Wijdicks EF, Bhagra A, Kashyap R, Bellolio MF, Nash DL, et al. Validation of a new coma scale, the FOUR score, in the emergency department. Neurocrit Care. 2009;10:50–4.
Eken C, Kartal M, Bacanli A, Eray O. Comparison of the full outline of unresponsiveness Score Coma Scale and the Glasgow Coma Scale in an emergency setting population. Eur J Emerg Med. 2009;16:29–36.
Mouri S, Tripon S, Rudler M, Mallet M, Mayaux J, Thabut D, et al. FOUR score, a reliable score for assessing overt hepatic encephalopathy in cirrhotic patients. Neurocrit Care. 2014;22(2):251–7.
Cohen J. Interrater reliability and predictive validity of the FOUR score coma scale in a pediatric population. J Neurosci Nurs. 2009;41:261–7. quiz 268-269.
Fugate JE, Rabinstein AA, Claassen DO, White RD, Wijdicks EF. The FOUR score predicts outcome in patients after cardiac arrest. Neurocrit Care. 2010;13:205–10.
Sadaka F, Patel D, Lakshmanan R. The FOUR score predicts outcome in patients after traumatic brain injury. Neurocrit Care. 2012;16:95–101.
Weiss N, Mutlu G, Essardy F, Nacabal C, Sauves C, Bally C, et al. The French version of the FOUR score: A new coma score. Rev Neurol (Paris). 2009;165:796–802.
Koo Y, Roh JH, Kwon DY, Yoo SW, Okh K, Kim BJ. Validation of Korean version of the FOUR score. Neurology. 2008;70:A358.
Orken DN, Sagduru AK, Sirin H, Isikara CT, Gokce M, Sutlas N. Reliability of the Turkish Version of a New Coma Scale: FOUR Score. Trakya Universitesi Tip Fakultesi Dergisi. 2010;27:28–31.
Ropper AH. Lateral displacement of the brain and level of consciousness in patients with an acute hemispheral mass. N Engl J Med. 1986;314:953–8.
van Swieten JC, Koudstaal PJ, Visser MC, Schouten HJ, van Gijn J. Interobserver agreement for the assessment of handicap in stroke patients. Stroke. 1988;19:604–7.
Ray P, Le Manach Y, Riou B, Houle TT. Statistical evaluation of a biomarker. Anesthesiology. 2010;112:1023–40.
Gujjar AR, Jacob PC, Nandhagopal R, Ganguly SS, Obaidy A, Al-Asmi AR. Full Outline of UnResponsiveness score and Glasgow Coma Scale in medical patients with altered sensorium: interrater reliability and relation to outcome. J Crit Care. 2013;28:316. e311-318.
Kramer AA, Wijdicks EF, Snavely VL, Dunivan JR, Naranjo LL, Bible S, et al. A multicenter prospective study of interobserver agreement using the Full Outline of Unresponsiveness score coma scale in the intensive care unit. Crit Care Med. 2012;40:2671–6.
Akavipat P, Sookplung P, Kaewsingha P, Maunsaiyat P. Prediction of discharge outcome with the full outline of unresponsiveness (FOUR) score in neurosurgical patients. Acta Med Okayama. 2011;65:205–10.
Okasha AS, Fayed AM, Saleh AS. The FOUR score predicts mortality, endotracheal intubation and ICU length of stay after traumatic brain injury. Neurocrit Care. 2014;21:496–504.
This study was supported by Nang Fang Hospital of Southern Medical University. Many thanks to Dr. Wijdicks for authorizing us to translate the FOUR score. In addition, the authors wish to acknowledge the medical staff of Neurosurgical Department for their enthusiastic participation and valuable support. Finally, the authors wish to acknowledge and thank all the patients and their families for their kind cooperation. This study was supported by Guangdong Provincial Science and Technology Project (No. 2013B060500047).
The authors declare that they have no competing interests.
JP participated in the design of the study and drafted the manuscript. YYD participated in the design of the study and performed the neurological scoring. FYC participated in the design of the study and performed the statistical analysis. XMZ was responsible for identifying suitable patients, and performed the neurological scoring. XYW participated in the design of the study and performed the neurological scoring. YZ participated in the design of the study and performed the neurological scoring. HZZ participated in the design of the study, performed the neurological scoring. BHQ was responsible for identifying suitable patients and performed the neurological scoring. All authors contributed to data interpretation, and have read and approved the final manuscript.
Juan Peng and Yingying Deng are co-first authors.
Juan Peng and Yingying Deng contributed equally to this work.
About this article
Cite this article
Peng, J., Deng, Y., Chen, F. et al. Validation of the Chinese version of the FOUR score in the assessment of neurosurgical patients with different level of consciousness. BMC Neurol 15, 254 (2015). https://doi.org/10.1186/s12883-015-0508-9
- Full outline of un-responsiveness score
- Glasgow coma scale