Meta-analyses confirmed that the different formulations of IFN-β and GA reduce ARR and generally delay progression as defined in these trials. There was little evidence that any one drug was superior to others, except for progression confirmed at 6 months, but networks were especially sparse. Findings for discontinuations due to AEs, which are intended to be indicative, did not suggest that one drug was more likely to result in discontinuation than another, but these findings relied on networks with some limited evidence of inconsistency.
Challenges with the clinical evidence
These conclusions are tempered by several considerations. Analyses did not show a clear ‘winner’ across outcomes, and, again, comparisons between drugs estimated as part of NMA models were in the main inconclusive. Though the main model for ARR was relatively well populated, analyses for time to progression confirmed at six months were especially sparse. In particular, several comparisons of drugs vs. placebo estimated as part of this last model relied exclusively on indirect evidence. Moreover, analyses for time to progression confirmed at three and at six months did not show a consistent pattern, except that all drugs were beneficial in delaying progression where progression was defined using the EDSS. This is particularly concerning, as progression confirmed at six months is considered to be a ‘stronger’ outcome than progression confirmed at three months.
Measurement of disease progression also relied on the EDSS, a measure that, while broadly accepted in clinical trials, may be of dubious value in measuring disability per se. The EDSS is heavily weighted towards mobility over other important aspects of disability affected by disease progression in MS, such as cognitive function. Additionally, progression outcomes based on confirmed progression at 3 or 6 months overestimate the accumulation of permanent disability by up to 30% [21]. This is in part because recovery from relapses may take longer than several months, and thus ‘confirmed’ progression may reflect residual relapse-related symptoms. Consequently, while time to progression confirmed at 3 or 6 months may be standard within the relatively short timeframe of clinical trials, these outcomes may not capture the true accumulation of MS-related disability over the lifecourse, and thus true differences between DMTs in delaying disease progression.
NMA models also had imbalanced risk of bias across the networks of studies. For example, most trials comparing two active treatments were open-label, whereas most trials comparing active treatments against placebos were blinded. Many trials relied on short follow-up, generally less than two years in duration, which increases the risk of spurious results [21]. Thus, participants were aware of the drugs they were receiving. This might have posed a greater risk for unblinding of outcome assessors than in ostensibly double-blinded trials. In addition, the majority of studies were judged as high risk of bias under the ‘other’ category of the Cochrane tool given that most of these were funded by drug companies. Although no research has specifically been undertaken in the field of MS trials, empirical examination of trials suggests that industry-sponsored RCTs are more likely to have favourable results than non-industry sponsored RCTs [2]. A final issue is that patient populations recruited into trials may not be the same over time, given the nearly 20-year span of the trials included in our models. These differences may well extend to diagnostic definitions of MS, and detection and diagnosis of relapses and disease progression. Again, insufficient studies on each pairwise comparison prevented exploration of this problem, but it is conceivable that this might have affected transitivity of our networks of evidence.
Review-level strengths and limitations
We used a rigorous and exhaustive search to locate primary studies, which included updating existing high-quality systematic reviews. Additionally we used auditable and transparent methods to include and synthesise studies. Where appropriate, we undertook post hoc sensitivity analyses in our clinical effectiveness assessments to check the robustness of our findings. However, a limitation of our work, inherent to all systematic reviews, is publication bias. Methods for detecting publication bias in NMAs are still in development, and we did not have enough studies in any one comparison to test for small-study bias. This may be especially relevant since many of the early trials of IFN and GA for MS were small trials. Another important limitation was the selective and inconsistent reporting of outcomes. For example, one of the reasons we did not undertake a meta-analysis of time to first relapse is that there was inconsistent and often poor reporting, especially across multiple reports of the same study, which prevented imputation of hazard ratios. We were also unable to obtain meta-analysable data for one study [12], due to the tight timeline within which the original work was undertaken.
Our analysis methods had a number of statistical advantages as well as some limitations. In examining the effect of IFN and GA on progression, we used time to event outcomes and hazard ratios instead of calculating risk ratios or odds ratios at different follow-up points. Thus, trial findings were reported at their fullest ‘maturity’ [22] and all relevant data were included. We were unable to verify empirically whether hazard ratios and rate ratios were time-varying due to few comparisons on every node of the study networks. On the other hand, we judged that stratifying analyses by time to follow-up would have resulted in excessively sparse networks that would have been difficult to interpret collectively. Thus, our decision to pool study estimates across follow-up times for analyses of clinical outcomes was both a strength and a potential limitation. Notably, we stratified analyses by time to follow-up in NMAs of discontinuations due to AEs, because we judged that the only feasible estimator in these analyses was the risk ratio.
Deviations from protocol
In our protocol, we specified that the comparator of interest was best supportive care without DMTs. In practice, this includes both best supportive care and also placebo, as reported in included trials. Though we sought to examine 10 outcomes relevant to RRMS in our original protocol, we report here findings for relapse rate, disability and discontinuation due to adverse events, as synthesis for other outcomes was limited and in many parts meta-analysable. Detailed findings for each of these outcomes are available in the main report [2]. Moreover, disability was ultimately measured and included in these meta-analyses as ‘time to progression’, as this was the most common outcome across trials. Finally, we implemented network meta-analyses in a frequentist paradigm rather than using WinBUGS as specified in the protocol.
In relation to research and practice
Our findings updated prior reviews, though comparability of findings is limited. We included trials examining IFN and GA against each other and against a no-treatment comparator, and restricted inclusion to doses and formulations within their marketing authorisation as compared to Tramacere et al. [3] who broadly examined immunomodulators and immunosuppressants for RRMS. Because they included studies across drugs and because they used risk ratios as the sole outcome estimator, our analyses and theirs are largely incommensurate. Our systematic review and NMA may however offer more clinically relevant evidence because of our focus on doses used in clinical practice. However, our analyses for discontinuation due to AEs agreed with theirs. Neither review suggested that any one drug had a significant effect on discontinuation due to AEs relative to placebo.
Our findings agree with the ABN guidelines [1] in that the guidelines classify IFN-β and GA as drugs of ‘moderate efficacy’, and observe that there is not much data to support differences in effectiveness between them. Our analysis does suggest that these drugs are effective in reducing relapse rate, which may have an effect on progression.
Longer-term observational cohorts have also examined DMT effectiveness over time and shed some doubt on the findings from randomised trials. In the year 8 analyses from the UK Risk Sharing Scheme, DMTs were not found to be cost-effective and the drugs assessed were not substantially different in terms of delays in disease progression (personal communication with UK Department of Health, 2016). An analysis from the MSBase study, an international registry with ‘real-world’ data from MS patients, has suggested that GA or subcutaneous IFN-β-1a are more effective in controlling relapse rate than other IFN-β, though drugs were not different on disease progression [23]. While this analysis relied on matching to overcome lack of randomisation, a strength is that it used disability progression confirmed at 12 months instead of at 3 or 6 months.
Future research
First, findings from this review will require updating as generic versions of the DMTs considered here are authorised. For example, the GATE trial also tested a generic version of glatiramer acetate against the branded version and placebo [24]. Key flaws in the assembled clinical effectiveness evidence included the lack of long-term follow-up and the absence of a measure for disease progression adequately capturing worsening of disability. A large-scale, longitudinal randomised trial comparing active first-line agents and using clinically meaningful and robust measures of disability progression would contribute towards resolving uncertainty about the relative benefits of different IFN or GA formulations (and other first line agents). While other, newer first line agents were beyond the remit of our systematic review, few randomised comparisons exist and thus a large trial could resolve remaining questions of comparative effectiveness. It may also be that using standardised definitions for relapses and disease progression together with blinded adjudicator panels could attenuate the risk of bias accruing to an open-label trial. Because of this lack of long-term follow-up, DMT trials are not informative on whether drugs delay progression to SPMS. Understanding long-term effectiveness of DMTs as described above would will also provide better information for informing cost-effectiveness evaluations, the effectiveness estimates for which currently rely on extrapolation from short-term trials. Use of a more relevant measure for disability and disease progression, especially as regards the development of secondary progressive MS, will also lead to better and more robust valuation of benefits accruing from DMTs.
Finally, above and beyond the broad interpretation that DMTs reduce ARR, there is a need to understand who responds best to DMTs; especially who does not respond to IFN or GA early on, to enable more targeted therapeutic decisions. Though several trials included in our clinical effectiveness review used subgroup analyses, based for example, on presenting lesions or demographic characteristics, a more fine-grained understanding can help patients and clinicians make better-informed decisions.