Overview
Drug benefitrisk assessment is here approached as the analysis of a treatment decision problem for a hypothetical representative of the relevant patient population. The same framework could be used for a real patient by incorporating his or her specific preferences.
The flow of the evaluation largely follows that of customary decision analysis [16]: the decision problem, its objective and its alternatives are defined; the relevant effects are identified and modelled in a tree to form clinical outcomes; probability and utility variables are estimated; and each alternative is evaluated with respect to expected utility as a basis for comparison. Expected utility is an overall measure of how preferable an alternative appears.
In addition, the evaluation adopts probabilistic sensitivity analysis [17], meaning that each probability and utility variable is specified as a distribution and sampled, resulting in distributions of the alternatives’ respective expected utilities. The primary evaluation metric is the preference rate, which measures the fraction of sampling iterations in which a given alternative has the highest expected utility [15]. The preference rate of an alternative therefore estimates the probability of that alternative being the preferred one, given the specified model.
This framework is illustrated in Fig. 1, including an explanation of how expected utility is calculated.
Definition of the decision problem
This assessment analyses a treatment decision of a putative MS patient in acute relapse, with the objective of maximising health during the course of the relapse. Three alternatives are considered: highdose methylprednisolone, lowdose methylprednisolone and the no treatment choice. High dose was defined as at least 2000 mg methylprednisolone cumulatively during at most 31 days, and low dose was defined as less than 1000 mg cumulatively during the same period of time. The time horizon of the assessment is the duration of a single relapse, which was taken to be 6 months [18]. Optic neuritis is here considered a different indication than MS relapses and hence excluded from the assessment. No differentiation is made with respect to the route of administration.
Selection of beneficial and adverse effects
The most common clinical endpoint in controlled trials of MS relapses is an improvement of at least one point on the expanded disability status scale (EDSS) [19]. Hence this degree of improvement was adopted as our definition of benefit. It was labelled a ‘reduced relapse’, in contrast to a ‘standard relapse’ where there is less or no improvement.
Serious and nonserious adverse effects were handled differently in the analysis. The latter were considered jointly as a group, because their main significance from a benefitrisk perspective is likely to be their aggregated burden as a nuisance to patients.
Serious adverse effects were defined as being manifested by either lifethreatening or persistently disabling reactions. These effects were selected from VigiBase®, the WHO international database of suspected adverse drug reactions [20], since this data source reflects actual concerns about drug treatment in clinical practice and captures rare events unlikely to be seen in small clinical trials. All reports in VigiBase as of May 2012 listing methylprednisolone were extracted, and those reports where treatment could be classified as high or lowdose were retained as two groups. (For details on the dose calculations, see Additional file 1). A frequency listing was constructed of reported MedDRA Preferred Terms and HighLevel Terms, for the two groups separately. A clinical reviewer (IRE) went through the lists separately, and each encountered term that was considered potentially lifethreatening or persistently disabling, and reasonably likely to be due to treatment, was mapped to a preliminary term grouping. The top ten adverse effects thus constructed for each dose group were then taken further and rigorously defined as groups of MedDRA Preferred Terms. During the review, the actual frequencies of the various reported terms were hidden.
For each included adverse effect, three different serious outcomes were considered: death, persistent disability and lifethreatening though nonlethal reactions. While a lethal outcome is relatively straightforward to capture, the other two outcomes were identified either intrinsically by the nature of the reported term, or based on explicit information on the reports. (For complete definitions, see Additional file 2). Within a given report, the outcome classification of an adverse effect was hierarchical in the order listed above. This means that, for example, if two reactions on the same report suggested hepatotoxicity, of which one reaction was persistently disabling and the other lifethreatening, the report would be counted only towards the persistent disability outcome. However, different reactions signifying separate adverse effects on the same report were counted separately and were therefore not necessarily coupled with the same outcomes. Only adverse effectoutcome combinations reported at least three times for both groups together were further considered.
Modelling of beneficial and adverse effects
All considered effects were modelled together in a tree structure. The small illustrative decision tree in the second panel of Fig. 1 can be used to view the general modelling strategy. The top level corresponds to the three alternatives, each of which is followed by the same subtree. This subtree, in turn, contains three levels, where the first corresponds to the beneficial effect. The second level contains the serious adverse effects, assumed for simplicity to be mutually exclusive on account of their rarity. Finally, the third level either corresponds to the outcome of the serious effect from the second level (psychosis or hepatotoxicity in the figure); or, in case of no serious adverse effect, the third level delineates two possible events: no adverse effect at all, or at least one nonserious adverse effect. Each branch thus constructed forms one possible clinical outcome.
Estimation of probability variables
As illustrated in the third panel of Fig. 1, each clinical outcome entails a series of events that each has an associated probability variable with a distribution. In the example used in Fig. 1, these events are in turn reduced relapse, psychosis and some unspecified serious outcome of psychosis. In general, estimation of three types of probability variables is required for each treatment alternative: the effectiveness, i.e. the probability of a reduced relapse; the risk of any nonserious adverse effect; and the respective risks of the included serious adverse effectoutcome combinations.
Effectiveness
Data to estimate the effectiveness of the various alternatives was taken from published clinical trials. All papers included in, cited by, or citing any of the available systematic reviews on methylprednisolone in MS were considered [1, 2, 8–10, 21, 22]. Study arms where patients were given either high or lowdose methylprednisolone as defined above or placebo for at most 31 days were included from trials fulfilling the following criteria:

Included patients were in acute relapse and diagnosed with either relapsingremitting or progressive MS.

The trial was randomised and treatment was blinded to both patients and clinical assessors.

Patients were assessed clinically, with results reported as the fraction of patients with an improvement of at least one EDSS point compared to start of treatment, or an equivalent thereof.
If several EDSS assessments were made in a single trial, the latest within the interval between 14 and 28 days from start of treatment was used.
The respective effectiveness distributions for the considered alternatives were then estimated by combing the fractions of improved patients reported in the various identified studies, using the hierarchical betabinomial model with a noninformative prior distribution [23]. Sampling from the posterior distributions relied on Markov chain Monte Carlo (MCMC) simulation with the MetropolisHastings algorithm [24, 25]. (For details, see Additional file 3).
Risk of any nonserious adverse effect
Data to estimate the risk of one or more occurrences of nonserious adverse effects were also taken from published clinical trials. The same basic search strategy as described for the effectiveness data was used, but treatment arms were included on other criteria, namely:

The trial was prospective, but not necessarily randomised or blinded.

Adverse events were reported in such a way that the number of affected patients could be inferred.
Risk distributions were estimated in the same way as for effectiveness, with the exception of lowdose methylprednisolone. The reason was insufficient data: only two trials were identified [26, 27], each with only ten patients on lowdose methylprednisolone and a statement that no adverse events were observed. Instead, it was assumed that the risk for lowdose methylprednisolone should lie between the risk for placebo and that for highdose methylprednisolone; therefore it was uniformly sampled from the intervals formed by the posterior draws for those two alternatives.
Risk of serious adverse effects
The limited number of clinical trials performed for methylprednisolone in MS relapses, in combination with their small sample sizes, makes this source of evidence insufficient to quantify the risks of serious adverse effects: for highdose methylprednisolone, only two events in total for all included serious adverse effects were reported across the identified trials. Similarly, no published observational studies on methylprednisolone or other glucocorticoids in association with these adverse effects could be used for risk quantification: these studies either used different treatment definitions (e.g. with respect to dose or duration), different outcome definitions, or else they were not designed to estimate risk as peralternative probabilities, which is required in decision analysis.
Instead a novel approach was used, in which upper limits on true population risks are calculated as reporting ratios in collections of individual case reports [28]. Such risk limits were computed for the included serious adverse effectoutcome combinations from within VigiBase. The reporting ratio denominators included all available reports, whether methylprednisolone was listed as suspected (S), interacting (I), or concomitant (C). The numerators included all S and I reports, while only those C reports were included that did not contain information implicating another drug. Also, for the numerators a requirement was set that the time from drug initiation to onset of the reaction should be at most 180 days. This methodology is further detailed in Additional file 4, with a proper account of the underlying assumptions.
To maintain a probabilistic analysis, different plausible distributions were assigned the various risks over the intervals from zero to their respective upper limits [28]. (For details, see Section ‘Sensitivity analyses’).
It should be noted that the method depicted here deviates slightly from the illustration in Fig. 1: sampling is for the probability of a serious adverse effectoutcome combination directly, not separately for the effect and the outcome. However, this difference is not influential as the total probability for the adverse effect is simply the sum of those for the various outcomes. The conditional probability of a specific considered outcome is then the fraction of the total probability contributed from that particular outcome.
Because no limits could be computed for the no treatment alternative, it was assumed that some proportion of the risk from active treatment could be classified as background risk that would apply to the no treatment alternative as well. This background risk was calculated, for each adverse effectoutcome combination, as the average between the sampled values for low and highdose methylprednisolone, respectively, multiplied by the proportion. Different values were imputed for this unknown proportion; see Section ‘Sensitivity analyses’.
Estimation of utility variables
As illustrated in the fourth panel of Fig. 1, the sampled probability values are combined with sampled utility values in the expected utility calculations. Here, a tailored approach was used to sample from the utility variables of the respective clinical outcomes [15, 29]. In this approach, each utility is first assigned a standard uniform distribution, and qualitative relations are specified that relate the desirability of the various clinical outcomes to each other. Then, the totality of these relations is used to shift the initial distributions accordingly. It is also possible to specify minimum differences between utility variables in case sufficient separation has not been achieved. (For details, see Additional file 5). The main benefit of this approach is that external data are not required; in particular, timely and costly elicitation studies can be avoided.
A clinical expert (IRE) performed the qualitative modelling, blinded to any estimates of probability variables. Because this benefitrisk assessment is made for the whole patient population rather than a specific patient, only logically implied or clinically well motivated relations were used. As recommended [15], a minimum utility difference was included between nonlethal and lethal outcomes, to reflect their intrinsically different nature. Modelling was performed separately for patients starting their relapse at EDSS 4 and EDSS 5, respectively, to investigate whether relapse severity has any influence on the overall benefitrisk profile.
Sensitivity analyses
Four unknown components of the assessment were altered in a series of sensitivity analysis scenarios. Two of these components concern the risk of serious adverse effects, and two concern the sampling from utility variables.
As mentioned, different types of distributions over the derived risk intervals for the serious adverse effects were investigated; these are shown in Fig. 2. Further, the proportion of the sampled risk values that is attributed to the background, and that therefore determines the values for the no treatment alternative, was varied between 0 and 50 %.
The minimum utility difference between nonlethal and lethal outcomes was altered over the range from 0 to 0.99. Also, as mentioned, different sets of qualitative utility relations were used for patients at different levels of relapse severity.
In addition, a set of auxiliary sensitivity analyses were undertaken to determine the extent to which different variables contributed to the overall uncertainty. This was done by replacing all sampled values for a given variable by the median of the sampled values for that variable.
As depicted in Fig. 1, the probabilistic sensitivity analysis within each investigated scenario was based on 10,000 sampling iterations, yielding one preference rate for each alternative. All sampled values for all probability and utility variables in all scenarios, as well as the resulting expected utilities and preference rates, are freely available; for details, see ‘Availability of supporting data’.