The clinical meaningfulness of ADAS-Cog changes in Alzheimer's disease patients treated with donepezil in an open-label trial

Background In 6-month anti-dementia drug trials, a 4-point change in the Alzheimer's Disease Assessment Scale-Cognitive Subscale (ADAS-Cog) is held to be clinically important. We examined how this change compared with measures of clinical meaningfulness. Methods This is a secondary analysis of a 12 month open-label study of 100 patients (71 women) diagnosed with mild to moderate AD treated with 5–10 mg of donepezil daily. We studied the observed case, 6-month change from baseline on the ADAS-Cog, the Clinician's Interview Based Impression of Change-Plus Caregiver Input (CIBIC-Plus), patient-Goal Attainment Scaling (PGAS) and clinician-GAS (CGAS). Results At 6 months, donepezil-treated patients (n = 95) were more likely to show no change (+/- 3 points) on the ADAS-Cog (56%) than to improve (20%) or decline (24%) by 4-points. ADAS-Cog change scores were little correlated with other measures: from -0.09 for PGAS to 0.27 for the CIBIC-Plus. While patients who improved on the ADAS-Cog were less likely to decline on the clinical measures (26%), 43% of patients who declined on the ADAS-Cog improved on at least two of the clinical measures. Conclusion The ADAS-Cog did not capture all clinically important effects. In general, ADAS-Cog improvement indicates clinical improvement, whereas many people with ADAS-Cog decline do not show clinical decline. The open-label design of this study does not allow us to know whether this is a treatment effect, which requires further investigation.


Background
Cholinesterase inhibition is a strategy for treating Alzheimer's disease (AD) that yields statistically significant though modest cognitive benefits which favour treatment over placebo [1,2]. The clinical meaningfulness of cholinesterase inhibition remains controversial [3][4][5][6][7]. A widely used method of inferring clinical importance is to classify patients by whether they demonstrate a clinically meaningful minimal difference on an outcome measure [8]. The Alzheimer's Disease Assessment Scale-Cognitive subscale (ADAS-Cog) is the de facto standard primary outcome neuropsychological measure for AD trials [9]. It measures several cognitive domains, including memory, language and praxis. Total scores range from 0-70, with higher scores (≥ 18) indicating greater cognitive impairment. Many regulatory authorities recognize a four-point change on the ADAS-Cog at 6 months as indicating a clinically important difference, a proposal that has impacted how clinical trials are interpreted [10][11][12][13]. Our group was interested in understanding whether a four-point change on the ADAS-Cog was reflected in changes on other, more self-evidently meaningful clinical measures.

Sample
These data come from a previously reported 12-month, open label trial of 100 community dwelling, mild-tomoderate Alzheimer's disease patients (71% women; average age = 76 years ± 8) treated with donepezil. The Atlantic Canada Alzheimer's disease Investigation of Expectations (ACADIE) Study was conducted between 1998-1999 [14,15]. Diagnoses were made using standard criteria [16,17]. Staging followed the Clinical Dementia Rating (CDR) scale (mild = 75%) [18]. All patients were treated with 5 mg/day of donepezil for three months, and then flexibly dosed at 5 or 10 mg/day. Here, for better comparison with the 6-month double-blinded trials, we included only those patients who received treatment for a minimum of six months. To ensure that we would address only the meaningfulness of true change, we did not impute in the case of missing data.

Outcome measures
We compared the 6-month responses on the ADAS-Cog with those from three judgment-based clinical measures. The primary outcome was Goal Attainment Scaling (GAS), used to evaluate patient-centred outcomes [19]. GAS allows clinicians and patients/caregivers to selectively target symptoms, specify desired treatment outcomes (goals), and evaluate the extent to which these goals are met. We used a modified GAS approach, setting goals on a 5-point scale anchored at 0 (baseline) [14]. The 5 points correspond to individualized descriptions of the pre-treatment state (baseline, recorded as 0), desired improvements ('somewhat' and 'much' better than baseline, recorded as +1 and +2), and potential worsening ('somewhat' and 'much' worse, recorded as -1 and -2). For example, the baseline status (level 0) for a person with a misplacing problem might be described as follows: misplaces commonly used items, such as glasses, keys, TV remote, and wallet as often as 8 times per day and cannot locate items without verbal direction or hands-on assistance. The goals of treatment (desired improvements) might be the ability to find misplaced items without assistance at least once per day (level +1), and misplacing items fewer than 3 times per day (+2). Goals can then be weighted or ranked in order of their relative importance (the most important goal receives the highest numerical rank). Goal attainment is evaluated by comparing the current status at follow-up with the baseline status and determining where that outcome should be slotted on the scale; attainment is recorded as 0 if there has been no change, but is otherwise scored from -2 to +2. Individual goal scores are summarized for each patient using the following formula: 50 + {(10∑(w i x i ))/(0.7(∑w i ) 2 ) 1/2 }, where w i = weight assigned to the ith goal and x i = score of the ith goal). The summary score is 50 when all goals remain at the baseline level, greater than 50 when there is more improvement across goals than decline, and less than 50 with worsening. In ACADIE, treatment goals were constructed separately by clinicians (CGAS) and patients/caregivers (PGAS). Each was blinded to the goals set by the other. Only patient/caregiver goals were weighted (for each CGAS goal, weight = 1). Goals were coded into five domain categories: cognition, function, behaviour, leisure and social interaction. Examples of the types of goals that were set for each domain include: cognition -a decrease in repetitive questioning, improved word finding, improvement in recent memory, less misplacing of objects; function -performing various IADL and ADL tasks with less dependence; behaviour -less irritability, more initiative; social activities -outings, especially to scheduled activities such as church, bingo, card games; leisure -more interest in or effective performance of hobbies and pastime activities. The ADAS-Cog was completed independently of the CGAS, and although patients might have had some idea as to how they performed on the ADAS-Cog, neither they nor their caregivers were told of the ADAS-Cog scores.
The Clinician's Interview Based Impression of Change -Plus Caregiver Input (CIBIC-Plus) was a secondary outcome [20]. This global assessment of change is based on a comprehensive, semi-structured, patient/caregiver interview, anchored at 4 ('no change') and ranges from 1-7, where 1 is "very much improved" and 7 is "very much worse." Other secondary outcomes included the Mini-Mental State Examination (MMSE) scored between 0-30 (lower scores indicate greater cognitive decline), the Lawton-Brody Physical Self-Maintenance (PSMS) and Instrumental Activities of Daily Living (IADL) scales, which range from 6-30 and 6-31, respectively (higher score indicate less functional ability), and the Cornell Scale for Depression in Dementia, with a range of 0-38 (scores greater than 6 indicate depression) [21][22][23][24]. The latter three measures rely on informants. All measures were administered at baseline, then every three months up to 12 months.

Analyses
In this exploratory study, we analyzed observed cases (OC) after 6 months of donepezil therapy. ADAS-Cog change scores ≤ -4 were equated with improvement, whereas worsening was defined as a change score ≥ 4. Scores between 3 and -3 were interpreted as maintenance of the baseline status (no change). We report the frequency, proportion and baseline characteristics of patients in each ADAS-Cog response group. Mean change from baseline, standard deviation and 95% confidence intervals were calculated for all outcome measures by ADAS-Cog response group. Between group differences were tested using chi-square. Spearman correlation coefficients were calculated to compare mean change on the ADAS-Cog, as well as the CIBIC-Plus and GAS, with response on the other outcome measures. Statistical tests were interpreted at the 5% significance level.
Cut-points were set for each of the judgment based measures to reflect clinically detectable changes. CIBIC-Plus improvement was taken as scores of 1-3, and worsening as 5-7. GAS responses were grouped so that PGAS improvement was defined as a change > 6 (representing net improvement in 2/3 of the 8.6 goal areas set on average by patients and caregivers, and a standardized response mean of moderate size ~ 0.6), worsening as a change < -6, and no change as scores between 5 and -5. Respective CGAS cut-points were set as > 3, < -3 and 2 to -2, again representing net improvement on most goals, as clinicians set 3.4 goals on average, and also a standardized response mean in the moderate range of clinical detectability. In this way, responses on each of the three judgment-based clinical measures could be cross-classified against the ADAS-Cog.

Ethics
All patients and caregivers provided written informed consent. The study protocol was approved by the Research Ethics Committee of the Queen Elizabeth II Health Sciences Centre, Halifax, Nova Scotia, any by the institutional ethics boards at each participating study centre.

Demographics and baseline characteristics
Ninety-five of 100 patients enrolled at baseline were evaluated at 6 months. Five patients had discontinued; three due to adverse events (diarrhoea n = 2, weight loss n = 1) and two withdrew consent. The remaining patients tended to be elderly women with mild AD (Table 1), the majority of whom (63%) were dosed at 10 mg donepezil after the initial 3-month follow-up.

ADAS-Cog response at 6 months
The most common response on the ADAS-Cog at 6 months was no change from baseline (56%, mean change = -0.1, ± 2.0). Patients who showed worsening (24%; mean change = 8.0, ± 4.7) outnumbered those who had improved (20%; -6.2, ± 1.7) Patients who improved on the ADAS-Cog were slightly older than those in the other response groups, but there was no clear effect of initial conditions -i.e. those who responded showed no statistically significant differences from non-responders in baseline clinical or demographic measures ( Table 1).

Comparison of ADAS-Cog response with other outcomes at 6 months
The most common response on the patient/caregiver-GAS (PGAS) at 6 months was no change (i.e., within the range -3 to + 3) which was twice as common as improvement (60% versus 31%; Figure 1, Panel A). Overall, the PGAS response did not correlate well with the ADAS-Cog response ( Table 2). Only 42% of patients, mostly in the no change/no change group (33%), were similarly classified by the ADAS-Cog and the PGAS (Figure 1, Panel A). At a group level, patients with ADAS-Cog improvement had net improvement on the PGAS (mean change = 7.0 ± 9.1) compared with patients who had ADAS-Cog worsening (5.4 ± 11.2). At the individual level, however, there were differences in classification: while no one who was classified as improved on the ADAS-Cog was rated as having worsened clinically, 7/23 people who worsened on the ADAS-Cog were rated by patient/caregivers as having improved. This appears to reflect not just the broader range of domains considered in the PGAS, but also differing accounts of treatment. For example, considering only the PGAS-cognition goals (n = 81 patients), a similar pattern obtains (Figure 1, Panel B).
At 6-months, overall responses on the clinic-GAS (CGAS) tended towards improvement (45%), followed by no change (32%) and worsening (23%). Mean change from baseline on the CGAS corresponded with the ADAS-Cog response by group (from 5.5 ± 9.1 for ADAS-Cog improved to 0.9 ± 12.4 for ADAS-Cog worsening), but the correlation between measures was low ( Table 2). Here too, agreement (41%) was concentrated primarily in the no change/no change group (20%; Figure 2, Panel A). Patients who improved on the ADAS-Cog were also more likely to have CGAS improvement (11/19), and again -as with the PGAS -less agreement was evident with ADAS-Cog worsening. Of the 23 patients worsened by ≥ 4 points, clinicians rated 9 as improved and 5 as showing no change. A similar pattern to the PGAS was also seen when we considered only cognition goals. ADAS-Cog improvement usually indicates clinical ratings of improvement or no change; whereas ADAS-Cog worsening can be seen in many people rated as showing clinical improvement (Figure 2, Panel B).
In contrast with the ADAS-Cog, the CIBIC-Plus account of change at 6 months was more evenly distributed: 35% improved, 31% had no change and 34% worsened (see Figure 3). For each ADAS-Cog response group, the mean CIBIC-Plus score changed in the corresponding direction   (Figure 3). Clinical impressions showed less variability when the ADAS-Cog indicated improvement than for any other response. By contrast, 7/23 patients with Distribution of patients by ADAS-Cog and clinician-GAS response after 6 months of donepezil therapy At 6 months, the ADAS-Cog correlated better with the MMSE than with any other outcome measure (Table 2).
Here, the CIBIC-Plus correlated better than the ADAS-Cog with all other outcome measures, including PGAS and CGAS.

Discussion
In this secondary analysis, we investigated the clinical meaningfulness of a 4-point change on the ADAS-Cog at 6 months. Patients who improved on the ADAS-Cog (n = 19) were unlikely to show clinical deterioration (none by PGAS, 2 by CGAS, and 4 by CIBIC-Plus), patients with ADAS-Cog deterioration (n = 23) have a broader range of clinical outcomes, including about a third (9 by CGAS, 7 by PGAS and 7 by CIBIC-Plus) who show clinically important improvement. In consequence, it appears that while such a 4-point ADAS-Cog change at 6 months might help regulators discriminate treatment effects between patient groups, a 4-point decline has little inherent clinical meaning for individual patient or physician decisionmaking. By contrast, a 4-point improvement more often signals agreement with physician and patient assessment of either improvement, or the absence of decline.
Although the "4-point change at 6 months" criterion is conventional, a recent systematic analysis of double-blind placebo-controlled trials of cholinesterase inhibitors demonstrated an average -2.7 point improvement at 6 months and one year [25]. We therefore repeated our analyses using a "3-point change" criterion at 6 months. Compared with the 4-point change criterion, this identified more people as having either improved (29 versus 19) or worsened (30 versus 23). Still, the essential pointthat ADAS-Cog improvement is associated with clinical improvement whereas many people who decline on the ADAS-Cog are judged by patients and physicians to have improved-holds. The data from our study suggests that an n-point decline on ADAS-Cog needs to be interpreted in the context of overall response, and should not be privileged over, amongst other considerations, the preferences of patients and caregivers.
Our data must be interpreted with caution. As this is an open-label study, we cannot make any inference about whether these changes are due to treatment, and we have made no attempt to do so. Such an inference requires placebo-controlled studies. Where these data can help is in better understanding whether the commonly-accepted-asmeaningful 4-point change on the ADAS-Cog has a strong evidence base. These data make clear that changes at the group level are not easily translated into changes at the individual level. This does not mean that the ADAS-Cog account is right and that the clinical accounts are wrong, or vice versa. Neither does it gainsay that the ADAS-Cog shows, at a group level, across trials and across compounds, a dose-response effect which favours the use of cholinesterase inhibitors, compared with placebo, in people with Alzheimer's disease [2]. Nor does it endorse the view that these effects are meaningless [5]. What it appears to tell us is that we do need to look carefully at the whole body of evidence. In short, just as the CIBIC-Plus is a bet-Distribution of patients by ADAS-Cog and CIBIC-Plus response after 6 months of donepezil therapy ter estimate of decline than of improvement [26], so might the ADAS-Cog help us estimate the extent of improvement, but be less good at measuring clinically meaningful decline.
Another feature of these data is the large proportion of patients classified as 'no change' by both the ADAS-Cog and at least one other clinical measure. No change in a neurodegenerative illness can be a useful goal, but this needs to be better understood. 'No change' in the CIBIC-Plus, for example, often appears to reflect clinical tradeoffs [27,28]. Whether there are detectable signals within the patients classified as no change requires additional study.

Conclusion
The development of staging tools for untreated Alzheimer's disease, based on natural history observations, was a labour-intensive process, and took place by carefully characterizing many patients, often at single sites, over several years. To attempt the same in the changing environment of treatments for Alzheimer's disease is daunting and may well be impracticable. In the setting of ongoing clinical studies, however, especially those employing tests such as the CIBIC-Plus, there is an opportunity to systematically record clinical observations in a way that can quickly allow for some hundreds to be assembled and compared. If we cannot rely on the ADAS-Cog as a guide to individual decision-making, we need to pursue other methods, such as symptom inventories [6,29], to find more relevant and less arbitrary ways of understanding treatment effects in individual patients.