Title: Cognitive Performance in Relapsing Remitting Multiple Sclerosis: a Longitudinal Study in Daily Practice Using a Brief Computerized Cognitive Battery

The authors are confident that these final edits unambiguously present the results without suggestion of this as a treatment effects study, and that the manuscript now provides an open, fair and unbiased presentation of the data. We thank Dr Benedict for providing the critical feedback crucial to this revised description of the data. We have addressed the points as follows. Abstract Explain that Power of Attention etc are subtests from the CDR and what they measure. We have clarified the 'domain' assessed by each measure as well as the derivation in the Methods section. In addition, detailed task descriptions and composite score derivations appear in the additional files 1 and 2. Explain who are the controls, this is very important. Otherwise the d values are meaningless. We have clarified in the Abstract, Results and Discussion sections that these normative data are derived from healthy volunteers enrolled in prior clinical trials and not a control group from the present study. In addition, some of the potential caveats of such a comparison are now noted in the discussion. Stable is not defined, do you mean that the mean values did not change, good test-retest reliability? The term 'stable' has been removed to avoid any ambiguity.


Background
Cognitive disturbances are increasingly being recognized as a prominent feature of multiple sclerosis (MS) [1], occurring in about half of all patients [2] and in one third of patients with early relapsing remitting MS (RRMS) [3]. Impaired cognition is moderately associated with total lesion volumes [4], cortical lesions [5] and increase of cortical lesions over time on magnetic resonance imaging (MRI) [6]. More robust correlations have been found between cognitive function and whole brain atrophy [7,8] and regional gray matter atrophy [9]. The most frequently impaired domains are complex attention, information processing speed, memory and executive functions [3,10,11]. MS patients with problems in cognitive performance have increased odds of becoming unemployed [2]. Importantly, cognitive symptoms in early MS are predictive of disability several years later [12], and in benign RRMS failure on neuropsychological tests predicts clinical worsening over a 3-year period [13].
Two widely used and recommended neuropsychological test batteries have been developed for use in research and care of MS patients. The Brief Repeatable Neuropsychological Battery (BRNB) [14,15] [16], was the result of a consensus conference and is an expansion of the BRNB, replacing the 10/36 with the Brief Visuospatial Memory Test-Revised (BVMTR) and the SRT with the California Verbal Learning Test-Second Edition (CVLT2), which have more established psychometric properties, in particular with respect to alternate forms and test-retest reliability. In addition, the MACFIMS includes the Delis-Kaplan Executive Functioning System (D-KEFS) Sorting Test [executive function] and the Judgment of Line Orientation Test [spatial function].
Despite their status as well established batteries, further development in this area is warranted due to several factors, particularly in respect of the utility of BRNB and MACFIMS in clinical trials and patient care. Rater and patient burden are high, with the MACFIMS taking around 90 minutes to administer and both batteries containing a series of component tests and scores. These batteries require a high degree of expertise and standardized administration, and scoring may be difficult in large scale, multicentre, multi-national clinical trials. Perhaps of most importance, the batteries include no measurement of reaction time. In a disease where information processing speed during cognitive tasks is recognized as one of the primary deficits, the measures are not capable of separating information processing speed from other aspects of task performance. For example, impairment of motor function, information processing speed or working memory might affect Digit Symbol Substitution Test (DSST)/SDMT or PASAT scores; information processing speed or language or executive search impairment might affect COWAT scores; and information processing speed or memory impairment might affect SRT, CVLT2, 10/36 and BVMTR scores. However, these tasks cannot differentiate selective impairment of the different functions, which contribute to overall performance.
The Cognitive Drug Research (CDR) System is a brief, multiple repeatable, computerized battery of cognitive tests (http://www.unitedbiosource.com) [17,18]. Multiple alternate forms and availability in several languages make the battery suited to multi-national clinical trials use. The battery uses computer algorithms or rule sets to generate alternate forms of tests and randomizes these across repeated assessments, such that at each time-point in a study schedule each participant completes a different form of the test. The use of a simple two button response box minimizes the motor component of task performance and facilitates its use in populations with impaired motor control e.g. Parkinson's disease [17]. The CDR System is modular, including tests of attention and information processing speed (Simple reaction time, Choice reaction time and Digit vigilance tasks), verbal and visuospatial working memory (Numeric and Spatial working memory tasks) and verbal and visual episodic memory (Immediate and Delayed word recall [verbal responses are recorded by the administrator], Word recognition and Picture recognition tasks). Sensitivity indices (SI), ranging from zero (chance performance) to one (perfect accuracy), have been calculated for working memory and recognition tasks [19].
In this study we investigated the validity and utility of the CDR System by longitudinally assessing cognitive performance in RRMS patients with the established DSST and PASAT and comparing the results with those obtained by the CDR System.

Methods
The study was performed in two general hospitals (Amphia Ziekenhuis, Breda, the Netherlands; Clinique St. Pierre, Ottignies, Belgium), one university hospital (Cliniques Universitaires St. Luc, Université catholique de Louvain, Brussels, Belgium) and two MS centres (MS Centre Nijmegen, Nijmegen, the Netherlands; Centre Neurologique et de Readaptation Fonctionnelle, Fraiture-en-Condroz, Belgium), and was ancillary to the FLAIR study, an investigator-initiated, international study on health-related quality of life (HR-QoL) and disability in RRMS patients during treatment with intramuscular (interferon-beta-1a INFβ-1a) (ClinicalTrials. gov identifier NCT00534261). Inclusion criteria were: (1) RRMS, (2) age 18-70 years, inclusive, (3) two relapses in the preceding 24 months, (4) disease duration at least 12 months, (5) EDSS 5.5 or less, (6) naive for INFβ, (7) written informed consent prior to any assessments not part of routine care. Exclusion criteria and details on study design and procedures have been reported [20]. The study was approved by the Independent Review Board, Amsterdam, the Netherlands and carried out in compliance with the Helsinki Declaration. The study was funded by Biogen Idec Netherlands.

Assessments
Cognitive function was assessed using the DSST, the PASAT with 3 sec. interval (PASAT 3") [part of the Multiple Sclerosis Functional Composite (MSFC)] and the CDR computerized battery. The DSST is a widely used measure of visual information processing speed and working memory, complex visual scanning and sustained attention [21]. The PASAT 3" measures processing speed and working memory in the auditory/verbal sphere [22]. The CDR System is modular, and the selected battery measured attention and psychomotor/ information processing speed (Simple reaction time, Choice reaction time and Digit vigilance tasks -both accuracy of responding and reaction time to visual stimulus presentation), verbal and visuo-spatial working memory (Numeric and Spatial working memory tasks) and verbal and visual episodic memory (Immediate and Delayed word recall, Word recognition and Picture recognition tasks) (see Additional file 1); and took around 15-20 minutes to complete. To minimize the motor requirement in responding, patient responses were recorded via a simple response box with two large buttons, one marked 'YES' and one marked 'NO', in the patient's own language. The patient was not required to use the computer keyboard or mouse and in the word recall tests, oral responses were recorded by the test administrator. Five composite domain scores were derived from the CDR battery: Power of Attention (a measure of attention and psychomotor/information processing speed summing reaction times from the Simple reaction time, Choice reaction time and Digit vigilance tasks), Continuity of Attention (a measure of attention summing accuracy and error measures from the Choice reaction time and Digit vigilance tasks), Quality of Working Memory (a measure of working memory summing accuracy measures from the Numeric and Spatial working memory tasks), Quality of Episodic Memory (a measure of episodic memory summing accuracy measures from the Immediate and Delayed word recall, Word recognition and Picture recognition tasks) and Speed of Memory (a measure of complex information processing speed summing reaction times from the Numeric and Spatial working memory and Word and Picture recognition tasks) [23] (see Additional file 2). The average of z-scores for all individual task measures yielded the CDR composite. Disability was measured by the MSFC [average of zscores for PASAT 3", Timed 25-Foot Walk (Timed-25FWT) and 9-Hole Peg Test (9-HPT)] [24] and the Expanded Disability Status Scale (EDSS) [25].
Physical and mental domains of HR-QoL were measured by the Multiple Sclerosis Quality of Life-54 (MSQoL-54) questionnaire. Scores for each domain range from 0 to 100, where higher values indicate better HR-QoL.
The CDR battery, the DSST and the MSFC were performed at Day -60 (training), Day -30 (training), Day 0 (baseline), Day 30 and Months 3, 6, 12, 18 and 24. MSQoL-54 scores were assessed at Day 0 and Months 3, 6, 12, 18 and 24, and the EDSS score on Day 0 and Month 24. Training was performed prior to the baseline assessment to familiarize patients with the procedures and overcome initial learning/practice effects.

Statistical Analyses
Validity of the CDR battery was evaluated by assessing: a) test-retest reliability (Pearson correlation between subsequent assessments); b) practice effects (using the ANOVA analyses from the model described below); c) concurrent validity (Pearson correlation of cognitive measures with Physical MSQoL-54, Mental MSQoL-54 and EDSS scores at screening/baseline and Month 24; and Pearson correlation between CDR cognitive measures and DSST and MSFC measures at screening/baseline and Month 24); and d) discriminant validity (comparison to age-matched healthy controls, mean age 33.4 years, standard deviation [SD] 12.35, from the CDR normative database [version 3], derived from volunteers enrolled in a series of prior clinical trials). For the latter evaluation, the size of differences in outcome between patients and healthy controls was calculated using Cohen's d. Effect sizes may be interpreted as small (d ≥0.2), moderate (d ≥0.5) or large (d ≥0.8).
Correction for multiple comparisons was made using the Bonferroni method at p = 0.05 for each set of analyses conducted (p-value following correction indicated in the table legends).
An additional analysis was conducted to determine the number of patients impaired on the CDR cognitive measures. Patients were classified as impaired if they were ≥ 1 SD below normative data on three or more of the five composite domain scores derived from the battery (Power of Attention, Continuity of Attention, Quality of Working Memory, Quality of Episodic Memory and Speed of Memory). T-tests were used to evaluate level of disability on the EDSS in impaired versus unimpaired patients. The ANOVA assessing change over time was repeated fitting impairment at Day -30 as a fixed effect and the interaction term between time-point and impairment.
Finally, analyses were conducted to assess the change over time in cognitive parameters.Changes over time were assessed using one-way analyses of covariance using a mixed model (SAS ® PROC MIXED) with a fixed effect term for time-point and a random effect for patients. Comparisons between the time-points were made using the t-test from the LSmeans statement.

Patient characteristics
Forty-three RRMS patients were studied, 30 female and 13 male. Mean age was 38.8 years (SD 10.5) and mean EDSS score 2.8 (SD 1.15). Mean disease duration was 6.0 years (SD 5.7), mean time since diagnosis 3.3 years (SD 4.1), and mean annualized relapse rate over the prior 24 months 1.2 (SD 0.4).

Test-retest reliability
For most cognitive measures test-retest reliability was good (>0.7) and statistically significant. Exceptions were Quality of Working Memory and Quality of Episodic Memory, which showed lower and more variable correlation coefficients (Tables 1 and 2).

Correlations of cognitive and MSFC measures with EDSS and MSQoL-54
For EDSS at screening (Day -60) the largest correlation coefficient was seen for Power of Attention (0.62), followed by the CDR composite (0.59), and 9-HPT (0.55). For EDSS at Month 24 the largest correlation coefficient was seen for DSST (0.76), followed by the CDR composite (0.61), and MSFC (0.56). For MSQOL-54 Physical at screening the largest coefficient was seen for Timed 25-FWT (0.35) and all correlations were non-significant and small. For MSQOL-54 Physical at Month 24 the largest correlation coefficient was seen for Quality of Episodic Memory (0.64), followed by the Quality of Working Memory (0.35), and Continuity of Attention (0.34) and all correlations were non-significant. For MSQOL-54 Mental at screening the largest coefficient was seen for 9-HPT (0.23) and all correlations were non-significant and small. For MSQOL-54 Physical at Month 24 the largest correlation coefficient was seen for Quality of Episodic Memory (0.6), followed by the Quality of Working Memory (0.41), the CDR composite (0.25) and again all correlations were non-significant ( Table 3).

Correlations of CDR scores with DSST and MSFC measures
At baseline the CDR composite correlated well with the DSST and the PASAT 3", as well as with the 9-HPT and the MSFC, as did Power of attention (Table 4). Correlations with leg function (Timed 25-FWT) were less strong.

Baseline cognitive function
CDR and DSST data at baseline were compared to normative data (CDR data base version 3) derived from healthy age matched volunteers, using data gathered in a series of prior clinical trials. Effect size differences (Cohen's d) showed large impairments to Power of Attention (d = 1.4) and DSST (d = 1.1), and moderate impairments to Continuity of Attention (0.6) and Speed of Memory (0.7). Quality of Working Memory and Quality of Episodic Memory were unimpaired ( Table 5).

Level of cognitive impairment
The number of patients with cognitive impairment, defined as three or more domains ≥ 1 SD below age matched normative data, was 14 at Day -30 (33%) and 16 (41%) at Month 24. Learning/practice effects may misscategorise a small number of patients if training/familiarisation is not conducted, with 17 patients (39.5%) categorized as impaired at the initial Day -60 time-points. As expected, the presence of cognitive impairment was associated with a statistically significantly greater disability on the EDSS (Table 5). Using cognitive impairment as a fixed effect in the ANOVA model we showed, as would be expected, a highly significant effect of impairment, with the cognitively impaired patients performing more poorly In general terms, changes were most marked between Days -60 and -30 and indicated learning/practice effects. The changes displayed a typical 'power-curve' with increasingly smaller improvements over the repeated assessments (Figures 1 and 2 and Table 7).

Discussion
The CDR System has previously been validated in dementia [17] and traumatic brain injury [18], and is used in a variety of disease states and cognitive disorders  including dementia, epilepsy and sleep disorders, to demonstrate both efficacy and safety of drugs [26][27][28].
The battery uses alternate forms of tests and randomizes these across repeated assessments. It is important to note that these alternate forms have not been specifically evaluated to demonstrate equivalence i.e. that they are parallel forms. However, the forms are as far as possible conceptually equivalent and the use of randomization prevents systematic bias in comparison between visits when comparing between or within groups. In the present study population, test-retest reliability was moderate to high for most CDR measures and the measures correlated with other assessments of cognition (DSST, PASAT 3") as well as with disability, supporting the validity of the battery in RRMS. With the exception of two measures test-retest ranged between 0.72 and 0.98 and thus could be considered high. The two measures showing more variable test-retest were Quality of working (0.35 to 0.75) and Quality of episodic memory (0.5 to 0.82). The poorest of these might possibly be related to the learning/practice effects on the Quality of working memory measure, as this was seen between the first and second assessments, where the largest improvement occurred. However, the possibility for non-equivalence of alternate forms to influence test-retest must also be considered. It was of note that the CDR measures were correlated with the more widely used PASAT. The PASAT, as a component of both the MSFC, BRNB and MACFIMS, has been extensively used to study cognition in MS and is thought primarily to measure information processing speed deficits [29]. The DSST, though not widely used in MS, is a common cognitive test in which the patient copies symbols paired with numbers against a time limit and is the reverse of the SDMT, in which the patient copies numbers paired with symbols. The SDMT using verbal responses is also included in the BRNB and MACFIMS batteries and may measure similar aspects of cognitive function to the PASAT and DSST, particularly information processing speed; and has been proposed as a replacement for the PASAT in the MSFC [30,31]. Importantly, ease of use and the automation of the CDR System facilitate cognitive assessment in a daily care setting, and electronic data capture and computer systems validation enhance data quality. In comparison to the BRNB and MACFIMS batteries, the selected CDR battery was shorter in duration, reducing patient burden, though it does not cover some aspects of function such as visual recall and abstract problem solving; and component measures do not need to be hand scored and entered into datasets, reducing rater burden and making the tests better suited to clinical trials or patient care. Recently, another computerized battery, the Automated     Neuropsychological Assessment Metrics (ANAM), has been reported to be sensitive to cognitive impairment in MS patients [32]. In our study, the CDR battery identified impairment to information processing speed (d composite, which showed a comparable relationship to the EDSS as the DSST and the MSFC, which incorporates both measures of cognitive function and arm and leg function i.e. disability. In addition, those patients characterized as cognitively impaired using the CDR battery had greater disability scores on the EDSS (Table  6). However, associations with HR-QoL were weaker. As expected, correlations were evident between the CDR battery measures and other measures assessing aspects of cognition (DSTT and PASAT 3"), but were not seen with leg function (Timed 25-FWT). The lack of impairment to memory in our patient group was not consistent with prior findings indicating memory impairment to be prominent in MS. This could reflect properties of the CDR memory measures themselves e.g. sensitivity and/or lacking in sufficient difficulty. Alternatively, the study population might have been different from that in other studies e.g. relatively well educated with respect to the normative sample and thus 'cognitive reserve' might account for the lack of memory impairment. The CDR tasks of delayed and immediate recall and Spatial working memory nominally cover the same cognitive domains as tasks included in the BRNB and MACFIMS, which have identified impairments in MS patient populations. Thus, in conjunction with findings which show heterogeneity in cognitive impairment in MS [33], it is possible that the present sample may have presented an atypical pattern of impairment. Additionally, the CDR System has no measure of visual recall, as included in the MACFIMS and that may be particularly sensitive in this population. It would be important for future studies employing the CDR System to collect data on education, employment history and other potentially relevant demographics. However, it should be noted that some patients were at ceiling on accuracy measures for Spatial and Numeric working memory, Word and Picture recognition and PASAT 3". Thus these tasks may not be sufficiently challenging. A further issue which will need to be clarified in future studies is a more complete clinical characterization of the MS population to address other factors that may also impact upon cognition such as depression and fatigue. Thus a follow-up study in a larger sample with extensively described demographic and clinical variables is now necessary.
Practice effects are well known for the PASAT [34] and were also marked for the Quality of Working Memory from the CDR battery (Table 7). Our data confirm the importance of 'training' sessions for cognitive assessments prior to baseline [35,36], particularly in uncontrolled longitudinal studies, to overcome the large initial improvement in performance between the first and second administration of the tasks. Conclusions regarding any treatment effect cannot be drawn due to the fact that the assessments were conducted during an uncontrolled observational study. Thus without suitable control arms, it is not possible to differentiate potential treatment effects from those of the disease and/or properties of the measures themselves, over repeated assessments.

Conclusions
The CDR System measures of attention, psychomotor/ information processing speed, complex information processing speed and a global composite, showed good psychometric properties and were related to other measures of cognition and to disability. The data provide initial evidence for the utility and validity of the CDR System for use in MS clinical trials. To further validate the CDR System, data will be required in larger samples of patients with a more complete clinical and demographic characterization and with comparison to established cognitive/neuropsychological test batteries e. g. BRNB or MACFIMS.