Validation of the Chinese version of the FOUR score in the assessment of neurosurgical patients with different level of consciousness

Background The Glasgow Coma Scale (GCS) is currently the most widely used scoring system for comatose patients. A decade ago, the Full Outline of Unresponsiveness (FOUR) score was devised to better capture four functional aspects of consciousness (eye, motor responses, brainstem reflexes, and respiration). This study aimed to validate the Chinese version of the FOUR score in patients with different levels of consciousness. Methods The study had two phases: (1) translation of the FOUR score, and (2) assessment of its reliability and validity. The Chinese version of the FOUR score was developed according to a standardized protocol. One hundred-twenty consecutive patients with acute brain damage, admitted to Nanfang Hospital (Southern Medical University, Guangdong, China) from November 2014 to February 2015, were enrolled. The inter-rater agreement for the FOUR score and GCS was evaluated using intraclass correlation coefficient (ICC). Receiver operating characteristic (ROC) curves were established to determine the scales’ abilities to predict outcome. Results The rater agreement was excellent both for FOUR (ICC = 0.970; p < 0.001) and GCS (ICC = 0.958; p < 0.001). The FOUR score yielded an excellent test-retest reliability (ICC = 0.930; p < 0.001). Spearman’s correlation coefficients between GCS and the FOUR score were high: r = 0.932, first rating; r = 0.887, second rating (all p < 0.001). Areas under the curve (AUC) for mortality were 0.834 (95 % CI, 0.740–0.928) and 0.815 (95 % CI, 0.723–0.908) for the FOUR score and GCS, respectively. Conclusions The Chinese version of the FOUR score is a reliable scale for evaluating the level of consciousness in patients with acute brain injury.


Background
The Glasgow Coma Scale (GCS) is a widely used tool to measure objectively the patient's level of consciousness (LOC) in the clinical setting. However, the GCS has a few limitations [1,2]. First, it cannot detect subtle clinical changes in comatose patients due to the lack of important clinical indicators such as brainstem reflexes and respiration pattern (including mechanical ventilation), which reflect the consciousness level [3]. In addition, for intubated patients, the GCS cannot properly assess the verbal component, and scoring difficulties have been displayed by unexperienced nurses and paramedics [1]. Most importantly, a 10-year retrospective study revealed that the GCS cannot predict the outcome of patients with traumatic brain injury (TBI) [2]. Therefore, other scales are being developed for this purpose, but most of them are not widely accepted because of their complexity and non-reliability [4][5][6][7].
A novel coma scaling system, the Full Outline of Un-Responsiveness (FOUR) score was developed by the Mayo Clinic in 2005 [3]. It evaluates four functional categories: eye response, motor response, brainstem reflexes, and respiration pattern (including mechanical ventilation). All the four categories are scored from 0 to 4 points, with 4 representing normal, and 0 indicating no function [3]. Patients are considered as brain dead with an overall score of 0 [8]. Finally, this score has also been recommended by the latest guidelines of the European Society of Intensive Care Medicine (ESICM) [9]. Previous studies compared the FOUR score to the GCS score and showed that they were comparable [10][11][12][13][14].
Recently, several prospective studies have introduced and validated the FOUR score as a reliable tool for the assessment of patients in medical intensive care units, neuro-intensive care units, and neurology and emergency departments [10,11,[15][16][17]. In addition, the FOUR score has been used to assess cirrhotic [18] and pediatric [19] patients. It is also valid in predicting the outcome of patients after cardiac arrest [20] and traumatic brain injury [21]. Interestingly, the FOUR score has already been translated into many languages such as Italian [12], French [22], Spanish [11], Korean [23], and Turkish [24], but no Chinese version is available to date. Therefore, this study had two aims: 1) the translation of the FOUR score into Chinese; and 2) the validation of the FOUR score as a measure of the level of consciousness.

Development of the Chinese version of the FOUR Score
The translation process is displayed in Fig. 1. In the translation process, we improved the following points to allow the medical staff to apply the FOUR score more easily in Chinese. First, for the eye response, the translation resulted in "the eyelids closed but open to loud voice, and eyelids closed but open to pain", which are misunderstood as patients eyes open to loud voice or pain; this omitted important information that patients eyelids are closed in most cases, but open to loud voice or pain. Therefore, a more specific and clear translation was provided for this point. Secondly, for the motor response, the author originally thought that focus should be on observing patients' arm response, but this point is not reflected in the original English language text. Therefore, we added arm flexion or extension response to pain in order to be more accurate and understood by the medical staff. Finally, the Chinese version of the FOUR score was approved by the consensus meeting. In conclusion, the Chinese version of the FOUR score was validated (Tables 1 and 2).  This study was approved by the ethical committee of the Nanfang Hospital, Southern Medical University, Guangdong, China (No. NFEC-2014-124); informed consent was obtained from all patients or their caregivers. A total of 120 consecutive patients with acute brain-damage were enrolled from November 2014 to February 2015 at our Neurosurgical intensive care unit and evaluated with the GCS and FOUR scores on the day of admission. Adult patients >18 years old diagnosed with acute traumatic brain injury or non-TBI (intracerebral hemorrhage, subarachnoid hemorrhage, brain tumor, hydrocephalus, epilepsy, cerebral infarction, etc.) were recruited. Exclusion criteria were: 1) treatment with neuromuscular junction blockers or sedatives; 2) hemodynamic instability (systolic blood pressure [BP] <80 mm Hg); or 3) substance or alcohol abuse. Demographic data, vital signs, diagnosis, day of evaluation, and degree of consciousness (awake or alert, drowsy, stuporous, or comatose, according to Ropper [25]), were recorded.

Procedure
Patients were assessed by two neurosurgery residents (R/R) or nurses (N/N), or a combination of a resident and a nurse (R/N); each health care professional had more than 10 years of clinical experience in a neurosurgi-cal\neurological intensive care unit (ICU), and patients were assessed by a randomly chosen rater pair. For intubated patients, the lowest GCS verbal score was considered to be 1. Raters watched a 20-min videos with patient examples and instructions about the FOUR score. Subsequently, a one-page handout with written instructions describing both the FOUR score and GCS were provided to raters who were given opportunities to assess patients before study beginning. In addition, in order to minimize the possible changes in patient's level of consciousness, all the assessments were completed within one hour. In addition, 30 randomly selected patients were evaluated by the FOUR score on the second day of hospitalization to test the test-retest reliability.

Outcome assessment
Patients' in-hospital mortality and clinical diagnosis of brain death were documented. Outcome was assessed at 3 months using the modified Rankin Scale (MRS) [26], which assesses the patients' overall function and mortality. Simply, a score of 0 indicates no symptoms; 1 represents no evident disability despite symptoms; 2 indicates slight disability, with no ability to carry out all routine activities, ability to take care of own affairs; 3 represents moderate disability, requiring some help, but ability to walk without assistance; 4 indicates moderately severe disability, with no ability to walk and attend to own bodily needs without assistance; 5 represents severe disability, e.g. in bedridden patients with incontinence, requiring constant nursing care; 6 indicates death. In this study, a MRS score between 0 and 2 indicated a good recovery for the patient; a poor outcome was reflected by a score between 3 and 6.
Statistical analyses SPSS 13.0 (SPSS Inc., Chicago, IL, USA) was used for data analysis. Data normality was analyzed using the Kolmogorov-Smirnov test. Normally distributed continuous data are expressed as mean ± standard deviation (SD). Non-normally distributed continuous data are presented as median (range). Categorical variables are presented as frequencies.
For the FOUR score scales, intraclass correlation coefficient (ICC) was used to measure the inter-rater agreement and test-retest reliability. Cronbach's α and Spearman's correlation coefficients were estimated to assess internal consistency and construct validity (with the GCS as criterion). To compare the FOUR score and GCS for prediction of in-hospital mortality and 3-month MRS 3-6, prognostic performance was evaluated by receiver operating characteristic (ROC) curves and areas under the curve (AUC). In general, an AUC of 1.0 refers to a perfect test, while a perfectly inaccurate test has an AUC of 0.0. Usually, an AUC higher than 0.75 indicates that the predictors of the scale have moderate discriminative properties, while predictors are excellent with an AUC ≥0.90. The best cut-off point was chosen to yield the maximum Youden index [27]. Two-tailed P < 0.05 was considered statistically significant.

Characteristics of the patients
Detailed patients' characteristics are summarized in Table 3. Three patients dropped out, among which the MRS score could not be obtained at the 3-month telephone follow-up for two patients, while the other patient used sedatives during the evaluation and had to be was excluded. The characteristics of these three patients were: 1) male; drowsiness on hospitalization; 58 years old; subarachnoid hemorrhage; no mechanical ventilation; FOUR score: 13; GCS score: 11; 2) male; light coma on hospitalization; 34 years old; cerebral hemorrhage; no mechanical ventilation; FOUR score: 11; GCS: 7; and 3) this patient used sedatives during the 3-month evaluation; female; deep coma on hospitalization; 43 years old; cerebral hemorrhage; FOUR score: 3; GCS: 3.

FOUR and GCS scores
The distributions of the patients according to FOUR scores and GCS are shown in Fig. 2. For the FOUR score (discrete distribution, non-normally distributed), the maximum grade of 16 was the most represented among the patients, corroborating the results obtained for motor response, respiration, and brain stem response; the majority of patients had an eye sub-score of 0. In the case of GCS (discrete distribution, non-normally distributed), the distribution was rather sparse, with 5, 6, 7, 14, and 15 being the leading overall scores (Fig. 2). Interestingly, the overall reliability was excellent for the FOUR and the GCS scores (Table 4). In addition, the FOUR score yielded an excellent test-retest reliability (ICC = 0.930; p < 0.001) ( Table 4).

Discussion
To date, many prospective studies have assessed the FOUR score, which is now widely used in the clinical setting. However, no Chinese version of the FOUR score was available. This study demonstrated that the Chinese version of the FOUR score has a good concurrent validity, a high degree of internal consistency, and a good inter-rater reliability among medical staff, and is at least as good as the GCS. These results are comparable to previous studies [10][11][12][13][14]. Inter-rater agreement ranges from good to excellent in all patient categories, showing a greater agreement compared with the GCS. For both the Chinese version of the FOUR score and the GCS score, the overall reliability of each rater pair was excellent, with intraclass correlations (ICC) of 0.929-0.991. The FOUR score showed a good test-retest reliability (ICC = 0.930; p < 0.001), suggesting that the Chinese version of the FOUR score is with high time-stability and consistency. The lowest ICC values were obtained by a rating pair comprising a resident and a nurse for both scales, but these values were still excellent for the Chinese version of the FOUR and GCS scores (0.929, 95 % CI, 0.857-0.965; 0.940, 95 % CI, 0.880-0.970, respectively). The results from this study are consistent with previous studies [13]. Of all the sub-scales in the Chinese version of the FOUR score, inter-rater agreement for the brainstem sub-scale was the lowest, especially in stuporous and comatose patients. Nevertheless, these values were still excellent (ICC = 0.885, 95 % CI, 0.835-0.920) and in line with previous studies [11,28,29], but inconsistent with Iyer et al. [10]. This may be explained by the fact that stuporous and comatose patients' pupil  and corneal reflexes are not sensitive enough, and it is not easy to distinguish when their mental status changes. Besides, other factors such as the time spent to observe pupils and corneal reflexes, measurement methods for the pupil sizes, and corneal reflexes, may differ. Total scores of the FOUR and GCS scores were similar in predicting mortality and unfavorable outcome, in agreement with previous findings [17,30]. Comparing the AUC of total GCS and total FOUR scores, the value for respiration patterns in the FOUR score was lowest. This may be explained by the fact that most patients were given a score of 4 (regular breathing pattern), with no patients having 3 (Cheyne-Stokes breathing), and few having 0 (breathes at ventilator rate or apnea). In this study, we validated a cut-off point of 9 for the Chinese version of the FOUR score, and 7 for the GCS in hospital mortality, which is in line with the inventor of the FOUR score and Okasha et al. [31]. In this study, nearly half of the patients were intubated, which was similar to previous studies [3,29]. The FOUR score showed a good consistency for both the intubated and non-intubated patients, which was in agreement with the results reported by Kramer et al. [29]. These findings also suggest that the Chinese version of FOUR score may be used in multiple departments including intensive care unit and other units that use mechanical ventilation. However, prospective studies with large sample sizes are needed to validate these findings.
A few limitations of this study should be addressed. First, this was a single-site study with a relatively small sample size. We also enrolled few comatose patients. Further study assessing those particular patients is needed to better evaluate the scaling ystems. In addition, we only enrolled raters who had >10 years of clinical experience, while inexperienced medical staff were not included. Finally, we did not test the Chinese version of the FOUR score in child populations with head trauma, and did not stratify the analysis in TBI patients due to their small number.

Conclusions
In conclusion, the Chinese version of the FOUR score can be used to reliably assess patients with impaired consciousness. The scaling system is easily taught and remembered, allows detection of locked-in syndromes as well as the presence of a vegetative state, and is useful to predict poor outcome. Based on these findings, the Chinese version of the FOUR score is a reliable tool for evaluating LOC in patients with brain damage, and worthy of recommendation and application in clinical practice.
Abbreviations AUC: areas under the curve; ESICM: European Society of Intensive Care Medicine; FOUR: Full Outline of Unresponsiveness; GCS: Glasgow Coma Scale; ICC: intraclass correlation coefficient; ICU: intensive care unit; LOC: level of consciousness; ROC: receiver operating characteristic; SD: standard deviation; TBI: traumatic brain injury.