 Research article
 Open Access
 Published:
The nature of genetic susceptibility to multiple sclerosis: constraining the possibilities
BMC Neurology volume 16, Article number: 56 (2016)
Abstract
Background
Epidemiological observations regarding certain populationwide parameters (e.g., diseaseprevalence, recurrencerisk in relatives, gender predilections, and the distribution of common geneticvariants) place important constraints on the possibilities for the geneticbasis underlying susceptibility to multiple sclerosis (MS).
Methods
Using very broad rangeestimates for the different populationwide epidemiological parameters, a mathematical model can help elucidate the nature and the magnitude of these constraints.
Results
For MS no more than 8.5 % of the population can possibly be in the “geneticallysusceptible” subset (defined as having a lifetime MSprobability at least as high as the overall population average). Indeed, the expected MSprobability for this subset is more than 12 times that for every other person of the population who is not in this subset. Moreover, provided that those genetically susceptible persons (genotypes), who carry the wellestablished MS susceptibility allele (DRB1*1501), are equally or more likely to get MS than those susceptible persons, who don’t carry this allele, then at least 84 % of MScases must come from this “genetically susceptible” subset. Finally, because men, compared to women, are at least as likely (and possibly more likely) to be susceptible, it can be demonstrated that women are more responsive to the environmental factors that are involved in MSpathogenesis (whatever these are) and, thus, susceptible women are more likely actually to develop MS than susceptible men. Finally, in contrast to genetic susceptibility, more than 70 % of men (and likely also women) must have an environmental experience (including all of the necessary factors), which is sufficient to produce MS in a susceptible individual.
Conclusions
As a result, because of these constraints, it is possible to distinguish two classes of persons, indicating either that MS can be caused by two fundamentally different pathophysiological mechanisms or that the large majority of the population is at no risk of the developing this disease regardless of their environmental experience. Moreover, although environmentalfactors would play a critical role in both mechanisms (if both exist), there is no reason to expect that these factors are the same (or even similar) between the two.
Background
Introduction
Complex genetic disorders are those that are caused by the interaction of multiple genetic and environmental factors [1]. Many human diseases are examples of such disorders, including multiple sclerosis (MS) – a common neurological condition, in which recurrent immunemediated injuries occur to the central nervous system [2, 3]. Epidemiological evidence has implicated the involvement of multiple environmental factors, including vitamin D deficiency and EpsteinBarr viral infections – see [3] for a review. Nevertheless, it is on the genetic side that most of the recent progress has come. The associations of MS with the human leukocyte antigens (HLA) on the short arm of chromosome 6 have been known for decades [2–9]. More recently, from several genomewide association studies (GWAS) of singlenucleotide polymorphisms (SNPs) in MS, diseaseassociations have been identified in more than 150 different nonHLA locations scattered throughout the genome [4–7]. However, the translation of these associations into a clinically useful assessment of an individual’s diseaserisk has been limited. This is due to the fact that a large proportion of the heritability for many complex diseases, including MS, remains unexplained. Indeed, in MS the 110 genes so far identified (in addition to the HLA associations) only account for only 28 % of the known heredity [4, 10]. Although a good deal of effort is currently being made to narrow this socalled “heritability gap”, it is unclear how likely these efforts are to succeed.
Much will depend upon the underlying basis of genetic susceptibility. For example, suppose that the individuals in a population exist on a continuum of susceptibility (i.e., anyone can develop the illness under the proper circumstances and a person’s individual genetic makeup only serves to make this outcome more or less likely to occur). In this case, although individuals at especially highrisk could, perhaps, be identified, the development of a sensitive and specific genetic test for susceptibility, which could be applied to the population as a whole, will likely not be possible. By contrast, if only a small segment of the population is genetically susceptible and only these susceptible individuals can develop disease, then the task of developing such a test, should, in theory, be much more likely to succeed.
The epidemiological observations regarding the various populationwide parameters such as the disease prevalence, the recurrence risk in relatives, gender predilections, and the distribution of common genetic variants (observations which have been made for years in many different parts of the world) place important constraints on the possibilities for the underlying genetic basis of susceptibility. Although applicable to any complex genetic disease, for illustrative purposes, this paper considers the nature and magnitude of these constraints as they apply to the study of MS.
Model overview and implications
Because much of this model development is technical, an overview of the basic ideas behind the model (and their implications) is here provided for purposes of clarity. Thus, in this model, directlymeasurable epidemiological data are used to estimate the likelihood that an individual from the general population is “geneticallysusceptible” to getting MS. This probability is defined as P(G). The definition used for this parameter is provided both below and in Table 1. In order to estimate the value of this parameter, we define certain other quantities such as the conditional probability that a “geneticallysusceptible” individual will get MS {P(MSG)} and the probability that an individual with MS is also “geneticallysusceptible” {P(MS, G)}. By the rules of conditional probability we know that:
Therefore, in order to estimate P(G), we just need to know (approximately) what these other two probabilities are. Fortunately, these other probabilities can be estimated from directlyobserved epidemiological data. For example, it must be the case that P(MS, G) is less than the probability of MS in the population P(MS) and this probability, in turn, can be approximated by the measured prevalence of MS in the population. Also, we will define the term P(MSMZ _{ MS }) as the probability that a monozygotic (MZ) twin will get MS given that their cotwin already has (or will develop) MS. This also is a measureable epidemiological parameter – the probandwise (or casewise) MZtwin concordance rate for MS [10]. Moreover, because, this observed rate must be less than the concordance rate in “geneticallysusceptible” MZtwins, the observed rate can be used to approximate the term P(MSG). Thus, using these two approximations for the populations of North America and northern Europe (where these two measured parameters are reasonably wellestablished), our estimate for P(G) becomes:
Consequently, this simple “back of the envelope” calculation suggests that the occurrence of “geneticsusceptibility” in the population must be extremely rare (~0.4 %). The present manuscript refines this calculation, providing an estimate for the maximum possible value that this probability (G) can take, given the uncertainties in the estimates for these two measurable parameters and given the possible relationships that these measureable parameters have to the actual parameters of interest {i.e., P(MS, G) and P(MSG)}. In addition, this markedly asymmetric division of individuals into those who are “geneticallysusceptible” and those who are not has important implications for the nature of genetic susceptibility to MS. Thus, in contrast to a lognormal model (in which the odds of MS are increased by each independent genetic risk factor that a person possesses), the epidemiological data strongly suggests that most (possibly all) individuals with MS come from the “geneticallysusceptible” group and that the population is markedly bimodal with respect to the likelihood (risk) that individual members of the population will develop MS.
Moreover, there seem to be differences in the nature of the pathogenesis of MS between men and women. Thus, genetic risk factors are critical to each. However, because women comprise almost three quarters of the MS population, it is perhaps surprising that men are, if anything, more likely than women to be “geneticallysusceptible” to getting MS. Therefore, because of this fact, the final gender distribution must be related to environmental factors (either from differences in exposure between men and women or from differences in the response by women to a given exposure). The environment is known to play a critical role in MSpathogenesis and at least three separate environmental factors (events) are implicated. Each of these events occurs in the large majority of individuals within the population (i.e., they are populationwide environmental events). One event occurs near birth (either in utero or in the immediate postnatal period), another occurs during adolescence, and a third (and possibly more) occurs thereafter. Based on the observed epidemiological data, it can be shown that the basis for the final gender distribution in MS, and for the increasing proportion of women in contemporary MS cohorts, is that susceptible women are much more likely to develop MS than susceptible men under similar environmental conditions.
The remainder of this manuscript (together with the Additional file 1) is devoted to developing these ideas in a more rigorous manner.
Methods
Model definitions for determining genetic susceptibility in MS

1.
The definitions used for establishing an upper limit for the probability of being genetically susceptible to MS in the population are listed in Table 1.
Consider a population (P _{0}) of (n) individuals (i = 1,2,…,n), each with their own unique genotype (G _{ i }). Let the term P(MS) be defined as the expected lifetime probability that a member of the population will develop MS. This probability is related to a directlyobservable population parameter – the disease prevalence. Let the expected lifetime probability of getting MS for a specific individual (i.e., for their unique genotype) be defined as the conditional probability P(MSG _{ i }). Let (Z) be the set of all these individual probability values within the population. Thus:
 where::

$$ \forall {G}_i\in \left({P}_0\right):{z}_i=P\left(MS\Big{G}_i\right) $$
Further, let the population be partitioned into three mutually exclusive subsets of individuals based on their individual expected lifetime probability values. These three subsets (G), (G _{ min }), and (G) are defined in the following manner:
 and::

$$ \left(G\right)=\left\{{G}_i\in \left({P}_0\right)\BigP\left(MS\Big{G}_i\right)=0\right\};\kern0.5em P\left(MS\BigG\right)=0 $$
 By these definitions::

$$ x\ge P(MS)>y>0 $$
 and::

$$ P(G)+P\left({G}_{\min}\right)+P\left(G\right)=1 $$
The subset (G), members of which have no chance of getting MS, will be referred to as “nonsusceptible”; the subset (G _{min}), members of which have a very small chance of getting MS. will be referred to as “minimallysusceptible”; and the subset (G) will be referred to as “geneticallysusceptible”. Let the sets (X) and (Y) be the sets of the individual lifetime probability values for members of the (G) and (G _{min}) subsets respectively. Thus:
 where::

$$ \forall {G}_i\in (G):{x}_i=P\left(MS\Big{G}_i\right) $$
 and::

$$ (Y)=\left\{{y}_i\right\}; $$
 where::

$$ \forall {G}_i\in \left({G}_{\min}\right):{y}_i=P\left(MS\Big{G}_i\right) $$
Let the combined subset of all genotypes, which are not in the “geneticallysusceptible” subset, be defined as:
Let the set (V) be the set of the individual lifetime probability values for members of the (G _{ T }) subset. Thus:
 where::

$$ \forall {G}_i\in \left({G}_T\right):{v}_i=P\left(MS\Big{G}_i\right) $$
And finally, let the combined subset of all genotypes, which have a nonzero probability of developing MS (G _{ T }), be defined as:
Let the set (W) be the set of the individual lifetime probability values for members of the (G _{ T }) subset. Thus:
 where::

$$ \forall {G}_i\in \left({G}_T\right):wi=P\left(MS\Big{G}_i\right) $$
Whether defining genetic susceptibility in this manner or creating these different categories has any utility is not known. Therefore, the value (if any) of these constructs needs to be established. Nevertheless, any population can be partitioned in this manner and such a division makes no assumptions about the underlying distribution of the individual expected lifetime probability values within the population. For example, if everyone has the exact same expected lifetime probability, P(MS), then everyone will belong to the subset (G) and the subsets (G _{min}) and (G) will be empty. If everyone has a nonzero expected lifetime probability of MS, then the subset (G) will be empty. If the distribution of individual expected lifetime probability values {w _{ i }} within the (G _{ T }) subset is normal and centered at P(MS), then the two subsets (G) and (G _{min}) will be symmetrical to each other, P(G) = P(G _{min}), and each subset will have the halfnormal distribution. If the distribution of individual expected lifetime probability values is something else, then individuals will be assigned to the three subsets accordingly. For MS, the subset (G) cannot be empty; neither can it encompass the entire population [2–9].
Results
Estimating the probability of genetic susceptibility – P(G)

2.
Therefore, from the definitions provided in (1) above, the proportion (p) of “genetically susceptible” individuals in the population (P _{0}) is:
 so that::

$$ P(MS)=P(G)x+P\left({G}_{\min}\right)y=px+\left\{1pP\left(G\right)\right\}y $$
And also the proportion (q) of “genetically susceptible” individuals in the subset (MS) is:
 and::

$$ \left(1q\right)=P\left({G}_{\min}\BigMS\right)\ge 0 $$
As previously, we will let the term P(MSMZ _{ MS }) be defined as the conditional lifetime probability that an MZtwin will develop MS, given that his or her cotwin either already has, or will develop, MS. This probability is related to a directlyobservable population parameter – the probandwise (or casewise) concordance rate for MZtwins [10]. Let the purely hypothetical term, P(MSIG _{ MS }), be introduced to represent the MZconcordance rate, which has had the impact of the environment (shared by the twins) removed. Thus, this term envisions what the expected concordance rate would be if the MZtwins, with their identical genotypes (IG), were to be separated at conception and to grow up independently in different environments (both intrauterine and childhood).
Let the term (b) be defined such that:
Because MZtwins are genetically identical, and because MZtwining is thought to be nongenetic, therefore, it is assumed (see Additional file 1) that:
Let the two quantities (x ') and (y ') be defined such that:
 and::

$$ y\hbox{'}=P\left(MS\Big{G}_{\min },{IG}_{\mathrm{MS}}\right) $$
 so that::

$$ b=qx\hbox{'}+\left(1q\right)y\hbox{'} $$

3.
From the Additional file 1:
Moreover, because (0 < q ≤ 1), the final equation in (2) above can easily be rearranged to yield:
 and, therefore::

$$ x\hbox{'}\ge b $$

4.
Let the maximum probability of MS (r) within the set (X) be defined such that:
By definition, (r) is also the maximum probability within the set (Z). Moreover, because there must be at least one person in (X), for whom: xi ≥ b
therefore, it must also be the case that: r ≥ b

5.
From the Additional file 1:
 or::

$$ {x}^2x\hbox{'}x+{\sigma}_X^2=0 $$
 which, because of the constraint that when::

$$ {\sigma}_X^2=0; $$
 then::

$$ x=x\hbox{'} $$
 and because, by definition::

$$ x>0 $$
 this has the unique solution::

$$ x=x\hbox{'}/2+\left(\sqrt{{\left(x\hbox{'}\right)}^24{\sigma}_X^2}\ \right)/2 $$
From (3) above, under any circumstance, the following limits must apply:
 and::

$$ {\sigma}_X^2\le {\left(x\hbox{'}\right)}^2/4 $$
 Also, when:

$$ \left\{x\hbox{'}=b\right\}; $$
 then::

$$ {\sigma}_X^2\le {b}^2/4 $$
This theoretical limit for (σ ^{2}_{ X } ) is only slightly greater than the maximum possible variance for the set (X), which occurs for a bimodal population [11], in which:
 and::

$$ P\left\{{x}_i=P(MS)\right\}=0.5 $$
 so that::

$$ b=r; $$
 and::

$$ E(X)=x=\left\{b+P(MS)\right\}/2 $$
 and::

$$ {\sigma}_X^2=E{\left({x}_ix\right)}^2={\left\{rP(MS)\right\}}^2/4={\left\{bP(MS)\right\}}^2/4 $$
However, these circumstances describe a bimodal population. Therefore, if these circumstances pertain, the case for a bimodal distribution is already made. Consequently, for any unimodal distribution, (σ ^{2}_{ X } ) must be lower than this upperbound and (x) must be greater than this lowerbound.

6.
For example, the uniform distribution is an example of unimodal distribution, which is evenly spread out [11]. The uniform distribution is defined such that:
 where::

$$ {\sigma}_X^2={\left\{rP(MS)\right\}}^2/12 $$
 and::

$$ x=\left\{r+P(MS)\right\}/2 $$
In this circumstance, from this and from the Additional file 1:
which, when (x ' = b), can be rewritten to become:
This last expression is a quadratic in (r), which otherwise includes only the population parameters {b and P(MS)} and, thus, (r) in this circumstance can be estimated based on direct epidemiological observations.

7.
For a population with a similar agestructure to the US, the quantity P(MS) will be between one and two times the population prevalence [2]. The prevalence of MS can be very broadly estimated to be 50–250/100,000 in northern populations [2, 3, 12–15], which yields the rangeestimate for P(MS) of:
To estimate the quantity {b = P (MSIG _{ MS } )} requires an understanding of the impact that the shared childhood and intrauterine microenvironments have on the likelihood that MS will develop. From multiple observations, the shared childhood microenvironment seems to make little difference [15–22]. The shared intrauterine (IU) microenvironment, by contrast, may be important [13, 14, 23–29]. Thus, the Canadian data [23] regarding the recurrence risk for dizygotic (DZ) twins and siblings (S) suggests that this IU effect may be as large as:
A similar disparity has been noted in a review of the available epidemiological data [14]. Nevertheless, in a recent populationbased study from Sweden, DZtwins and siblings seemed to have the same risk [13, 14].
Assuming the truth lies between these two extremes then, using a very broad rangeestimate for the probandwise concordance rates for MZ twins {i.e., P(MSMZ _{ MS })} in northern populations of between 0.15 and 0.40, inclusive [2, 3, 14, 23–29] yields the rangeestimate for (b) of:

8.
Substituting: {b = 0.081 ; and : P(MS) = 0.005} into the Equations from (6) above, yields the minimum possible estimate for (x) of:
 Because::

$$ P(G)=P\left(MS,G\right)/x\le 2*P\left(MS,G\right)/b\le 2*P(MS)/b $$
 therefore::

$$ P(G)\le 2*P(MS)/b\le 0.01/0.081=0.123 $$
Consequently, under any circumstance, the maximum possible percentage of the population (P _{0}) that members of the (G) subset could comprise is 12.3 %.
A lower bound for P(G) can also be established by noting that:
 Because::

$$ b=P\left(MS\Big{IG}_{\mathrm{MS}}\right)\ge P\left(MS,G\Big{IG}_{\mathrm{MS}}\right); $$
 and because::

$$ x\hbox{'}\ge x $$
 Therefore::

$$ P(G)=P\left(MS,G\right)/x\ge P\left(MS,G\right)/x\hbox{'}\ge {\left\{P\left(G\BigMS\right)\right\}}^2\Big\{P(MS)/b\Big) $$
And, using the definition of (g) from (Additional file 1: Table S1) as:
the maximum possible range for P(G) can be expressed as:

9.
Nevertheless, as noted in (5) above, this particular upperbound is that for a bimodal population. Substituting these same values into the Equations from (6) above for the uniform distribution {at: (x ' = b)} yields the minimum (x) and the maximum variance of (X) for this situation of:
 and::

$$ {\sigma}_X=0.034 $$
Notably, however, the largest possible variance for any unimodal population [11], has been shown to be:
Therefore, using the same estimates for {b and P(MS)} as above, yields
 and::

$$ {\sigma}_X=0.036 $$
From (5) above, as either (b) increases or as (x ') increases relative to (b), both the minimum (x) and the maximum variance of (X) increase.
Despite this slightly larger upperbound for the variance of any unimodal distribution compared to the uniform distribution, several conditions (e.g., the distribution is symmetrical, the median is equal to x, or the mode is equal to x) are sufficient to make the maximum variance be that of the uniform distribution [11]. Nevertheless, the larger estimate for (σ ^{2}_{ X } ) − i.e., that for any unimodal distribution − and the smaller estimate for the minimum (x) will be used in the remainder of the calculations.

10.
Finally, because the probability of the part cannot exceed the probability of the whole, it follows that:
Using the result from (9) above that (x ≥ 0.059), therefore:
With simple rearrangement, this condition means that the maximum possible estimate for the upperbound of P(G), given any unimodal distribution of the set (X), is:

11.
Recall that the subset of all genotypes that have a nonzero probability of developing MS was defined as:
It is noteworthy, however, that the difference in the expected lifetime probability of MS between these two subsets of (G _{ T }) is substantial.
 Thus, because::

$$ P\left(MS\Big{G}_T\right)\le \left\{y=P\left(MS\Big{G}_{\min}\right)\right\}<P(MS)\le 0.005 $$
 and because::

$$ x=P\left(MS\BigG\right)\le 0.059 $$
 Therefore::

$$ x/y=P\left(MS\BigG\right)/P\left(MS\Big{G}_{\min}\right)>P\left(MS\BigG\right)/P(MS)\ge 0.059/0.005=12 $$
 so that::

$$ x=P\left(MS\BigG\right)>12*P(MS)>12*P\left(MS\Big{G}_{\min}\right)\ge 12*P\left(MS\Big{G}_T\right) $$
In fact, by the definition of the (G _{min}) and (G) subsets, the expected probability of MS in the set (X) is also more than 12 times the likelihood of MS developing for every other individual member of the population who is not in the (G) subset.
Contrast this with the very small difference in subset means permitted for any symmetric distribution centered on P(MS). Thus, by the definition of symmetry:
 Because, by definition::

$$ 0<y<P(MS) $$
 Therefore, in this situation::

$$ 0<y<P(MS)\le x<2*P(MS) $$

12.
From (10) above, the fact that:
 and::

$$ P(G)\le 0.085 $$
 and, the definition that::

$$ 0<P\left(MS\Big{G}_{\min}\right)=y<P(MS) $$
together with the extreme separation of the subset mean {P(MSG)} from the means of the subsets {P(MSG _{min}), P(MSGT−), and P(MS)}, constrain, in important ways, the possibilities for the distribution of (Z), which is the set of individual lifetime MS probabilities in the population.
For example, the observation that no more than 8.5 % of the population can possibly be members of a unimodal subset (G) indicates that (Z) can’t have a symmetric distribution centered on P(MS). This is because, in such a circumstance:
 whereas, in fact::

$$ P(G)\le 0.085<<0.915\le P\left({G}_T\right) $$
Indeed, this disparity is so large that it precludes even a roughly symmetric distribution for (Z), which is centered on P(MS). Moreover, the extreme separation of both the subset means {P(MSG _{ T }) and P(MSG)}, as well as the means for the set (G) and the (P _{ 0 }) population {i.e., P(MSG) and P(MS)}, together with the very restricted range for the MS probabilities {v _{ i }} in the set (V) − i.e., for individuals in the (G _{ T }) subset − indicates that the distribution of MS probabilities for the whole population (Z) must be, at least, bimodal [30]. This is illustrated in Fig. 1a for the circumstances in which {P(W) ≈ P(Z) = 1}. The means and variances for the distribution used in this illustration are based on considerations developed in (9) above and, for illustrative purposes only, the two distributions of the bimodal population have been each represented as normal. Moreover, this distribution could also be trimodal − i.e., the set (V) could itself be bimodal − if the both of its subsets {(G _{min}) and (G)} are nonempty (e.g., Fig. 1b).
Nevertheless, there are other (unimodal) distributions, which can also be markedly asymmetric. The question, therefore, naturally arises as to how confident we are that the distribution of the odds of MS in the (P _{ 0 }) population can be distinguished from these unimodal alternatives. As noted above, the extreme separation of the subset means suggests that the distribution is bimodal [30]. Nevertheless, the possibility that the distribution conforms to a lognormal model needs to be considered carefully. In the first place, the lognormal distribution is both unimodal and asymmetric and, moreover, this asymmetry can be of any specified degree. In the second place, the lognormal model has considerable theoretical appeal, particularly in the setting of a complex disease such as MS, which is associated with multiple genetic risk factors [6, 7]. Thus, if these multiple risk factors are independent of each other (and there is little doubt that they are sorted independently), then (by the central limit theorem) the resulting probability distribution for the odds of MS will follow a lognormal probability density function [31]. And, indeed, Clayton and colleagues recently concluded, based on experimental evidence, that a lognormal model was appropriate for another complex genetic disease, which is comparable epidemiologically to MS – type I diabetes [31].
Similarly, in MS, a lognormal distribution of the odds could account either for the minimum asymmetry of 91.5 % in (G _{min}) and 8.5 % in (G), or for an even more asymmetric split. Nevertheless, for a lognormal distribution having such a split (i.e., 91.5 %/8.5 %), the mean (t) for that portion of a lognormal population, which is at or above the mean for the entire distribution {i.e., P(MS)}, is more than 4fold less than the minimum mean for the odds at P(MSG)– see Additional file 1. Thus, in this circumstance:
As such, P(MSG) cannot be the mean for this portion of a lognormal distribution (see Additional file 1). This situation is changed only slightly with even much more asymmetric splits. Thus, even when the distribution is severely asymmetric and P(G) is truly tiny (e.g., 10^{−14}), there is still a more than 3fold difference between P(MSG) and the mean of that portion of the lognormal distribution, which is at or above P(MS). Indeed, even in these extreme circumstances:
Consequently, having a mean for the odds of getting MS in the (G) population {i.e., P(MSG)}, which is more than 12*P(MS), is not compatible, under any circumstance, with the subset (G) simply being a part of a unimodal lognormal distribution (Additional file 1: Figure S2B).
It might be argued that such a bimodal structure implies that evidence of either strong interactions or linkage should be present – neither of which has been found. However, with so many genes involved (>150) and such a small fraction of the population being in the subset (G), this is not the case. Indeed, considering only susceptibility genes, it seems very likely that almost all MS patients will have a unique genotype (Additional file 1) and, empirically, this seems to be true. Thus, using the first 95 MSassociated SNPs identified in the WTCCC data set [6, 7], 105 of the genotypes (at these SNP locations) were identical in, at least, 1 pair of MS cases. Nevertheless, regardless of the basis for these apparent duplications, for all of the other 10,643 MS cases in this dataset, their genotypes (at these SNP locations) were unique. Moreover, none of these apparently duplicated genotypes bore any obvious resemblance to each other – sharing identity at only 43 (on average) and 59 (at most) of the 95 SNPs. Under such circumstances, almost certainly, there will be no linkage and no strong interactions, even if the population (P _{ 0 }) is bimodal.

13.
There are two further possibilities. First, it could be that {P(G min) ≤ P(G)}. In this case, the distribution of MS probabilities within (W) − i.e., for members of the (G _{ T }) subset − could either be symmetric or not. This relationship necessarily pertains for any symmetric distribution of (W) because, by definition, and from the above considerations:
From this it follows that, any symmetric distribution for (W) must be centered on some probability value (μ) such that:
 and, thus, that::

$$ P(G)\ge P\left({G}_{\min}\right) $$
However, regardless of the nature of the distribution of (W), if this relationship holds, then it must also be the case that:
In addition, the actual values that P(G _{min}) can take will depend, in part, upon value of P(G).
For example, when P(G) is at its upperbound of:
 then::

$$ P\left({G}_{\min}\right)=0 $$
From this point, as the quantity P(G _{min}) becomes larger, the quantity P(G) will have to become smaller in order to maintain the relationship:
Therefore, we conclude that, if P(G min) ≤ P(G), then it must also be the case that the large majority of the population (P _{0}) must be in the (G) subset and that the distribution of (Z) is, at least, bimodal.

14.
Second, it could be that: {P(G _{min}) > P(G)}. In this case, the extreme separation of the subset means within (W) − i.e., the separation of P(MSG) from P(MSG _{min}) − together with the very restricted range for the MS probabilities in the (G _{min}) subset, again indicates that the distribution of {w _{ i }} within the set (W) must be bimodal [30]. This is illustrated in Fig. 1b, in which the resulting extreme bimodality is demonstrated even for the circumstance where:
Again, the means and variances for the distributions used in this illustration are based on considerations developed in (9) above. This same pattern (as illustrated) persists regradless of the value chosen for P(G) ≈ P(G _{min}).
Discussion
Using very broad rangeestimates for the basic epidemiological parameters of MSprevalence and the recurrence risk of MS in MZtwins, this analysis indicates that no more than 8.5 % of individuals in the general population can possibly be “genetically susceptible” to developing MS as herein defined. In all likelihood, this percentage is actually much smaller [2, 3]. In addition, as demonstrated in Additional file 1, more than 43 % (and, likely, more than 84 %) of MS cases develop through this genetic pathway. Importantly, each of these estimates are based on directlyobservable population parameters, which have been repeatedly verified in different parts of the world.
The implications of these conclusions are substantial. Recall that the subset (G _{min}) is defined as consisting of only those individuals who have very low individual expected lifetime probability values of more than zero but less than P(MS). By contrast, individuals in the (G) subset, collectively, have an expected lifetime probability of MS at least 12 times the maximum possible either for that of the (G _{min}) subset as a whole or for that of any individual member of this subset.
Consequently, even though this analysis does not assume that a group of individuals who are “genetically susceptible” is distinct from other individuals in the population, it does, in fact, establish that these two groups can be so distinguished. Thus, based on the considerations developed in (13) and (14) above, there are two possibilities.
 If::

$$ P\left({G}_{\min}\right)\le P(G) $$
 Then::

$$ P\left(G\right)=1P\left({G}_{\min}\right)P(G)\ge 12\ast P(G)>0.83 $$
And, in this situation, the two groups are distinguished by the fact that members of the “nonsusceptible” subset (G), which represents the overwhelming majority of the population, are at no risk of developing MS, regardless of their environmental exposure.
Conversely, if: P(G _{min}) > P(G)
Then, from (14) above, even considering only those individuals who belong to the combined (G _{ T }) subset, the extreme separation of the means of P(MS) and P(MSG), together with the very narrow range of MS probabilities within the (G _{min}) subset, requires the distribution of individual expected lifetime MS probabilities in the set (W) to be bimodal [30] and, thus, to reflect the existence of two distinct groups of MS patients (see Fig. 1a; Additional file 1: Figure S2B). In this circumstance, the two classes of MS patients are distinguished by the fact that the MS, which develops, seems to be caused by two, fundamentally different, pathophysiological mechanisms. If the subset (G _{min}) is, indeed, nonempty, then, in the first mechanism, MS is very improbable, the genetic contribution seems to be minor, and, thus, environmental factors are likely to be primary. By contrast, in the second mechanism, MS is comparatively much more likely to occur and the combination of both genetic and environmental events are each critical determinants of disease. Importantly, if a second group of individuals with nonzero probabilities of MS actually exists, then, despite the fact that environmental factors would be involved in both mechanisms, there is no reason to expect that the environmental events involved in the first pathway are the same as (or even similar to) those involved in the second pathway.
In most cases of MS, the genetic route seems to dominate (Additional file 1). Indeed, more than 94 % of concordant MZtwins {i.e., individuals in the (MS,IG _{ MS }) subsubset} come from the (G) subset (Additional file 1). However, these observations do not mean that the genetics primarily determines the disease, even in these cases. In fact, the increasing prevalence of MS worldwide [32–40], its increasing prevalence in women [32, 34, 36, 39, 40], and the change in prevalence and MZtwin concordance based on latitude [3, 12] are better explained by differences in environmental exposure and by a women’s greater physiological responsiveness to environmental events (Additional file 1) than by any differences in genetic susceptibility between groups or regions [2, 3]. Also, given the wide disparity between the probability of developing MS between men and women, the set (G) must itself be bimodal (Additional file 1).
These conclusions also have important implications with regard to our ability, ultimately, to determine a person’s risk through genetic analysis. Indeed, if more than 84 % of MS occurs in the genetically susceptible population (G) and less than 8.5 % of the population is susceptible, it should be possible, in theory, to characterize a person’s individual risk with high sensitivity and specificity. The fact that it has proven difficult to do this so far probably relates, in part, to the fact that the genetic associations have been defined on the basis of SNPs [4–7] rather than on the basis of more extended SNPhaplotypes [41, 42].
Conclusions
It is possible to distinguish two classes of persons in the general population, indicating either that MS can be caused by two fundamentally different pathophysiological mechanisms or that the large majority of the population is at no risk of the developing this disease regardless of their environmental experience. Moreover, although environmentalfactors would play a critical role in both mechanisms (if both exist), there is no reason to expect that these factors are the same (or even similar) between the two.
The definitions for the parameters used in the model are presented in Table 1.
Ethics approval and consent to participate
Ethical approval and consent from patients were not required because neither human subjects nor animals were used. All supporting data is available to researchers in the Additional file 1 provided.
Consent for publication
This manuscript contains does not contain any person’s individual data.
Availability of data and materials
All of the data contained in this manuscript is publically available.
Abbreviations
 MS:

Multiple sclerosis
 GWAS:

Genomewide association study
 MZ:

Monozygotic
 DZ:

Dizygotic
 S:

Sibling
 IG:

Identical genotype
References
Hofker MH, Fu J, Wijmenga C. The genome revolution and its role in understanding complex diseases. Biochim Biophys Acta. 2014;1842(10):1889–95.
Goodin DS. The genetic and environmental bases of complex human disease: Extending the utility of twinstudies. PLoS One. 2012;7(12), e47875.
Goodin DS. The epidemiology of multiple sclerosis: insights to disease pathogenesis. Handb Clin Neurol. 2014;122:231–66.
Ramagopalan SV, Anderson C, Sadovnick AD, Ebers GC. Genomewide study of multiple sclerosis. N Engl J Med. 2007;357:2199–200.
De Jager PL, Jia X, Wang J, de Bakker PI, Ottoboni L, Aggarwal NT, Piccio L, Raychaudhuri S, Tran D, Aubin C, et al. Metaanalysis of genome scans and replication identify CD6, IRF8 and TNFRSF1A as new multiple sclerosis susceptibility loci. Nat Genet. 2009;41:776–82.
The International Multiple Sclerosis Genetics Consortium and the Wellcome Trust Case Control Consortium 2. Genetic risk and a primary role for cell mediated immune mechanisms in multiple sclerosis. Nature. 2011;476:214–9.
International Multiple Sclerosis Genetics Consortium. Analysis of immunerelated loci identifies 48 new susceptibility variants for multiple sclerosis. Nat Genet. 2014;45:1353–60.
Dyment DA, Herrera BM, Cader MZ, Willer CJ, Lincoln MR, Sadovnick AD, Risch N, Ebers GC. Complex interactions among MHC haplotypes in multiple sclerosis: susceptibility and resistance. Hum Mol Genet. 2005;14:2019–26.
Hafler DA, Compston A, Sawcer S, Lander ES, Daly MJ, De Jager PL, de Baker PI, Gabriel SB, Mirel DB, Ivinson AJ, et al., and International Multiple Sclerosis Genetics Consortium. Risk alleles for multiple sclerosis identified by a genomewide study. N Engl J Med. 2007;357:851–62.
Witte JS, Carlin JB, Hopper JL. LikelihoodBased Approach to Estimating Twin concordance for dichotomous traits. Genetic Epidemiol. 1999;16:290–304.
Jacobson HI. The maximum variance of restricted unimodal distributions. Ann Math Stat. 1969;40:1746–52.
Rosati G. The prevalence of multiple sclerosis in the world: an update. Neurol Sci. 2001;22:117–39.
Hansen T, Skytthe A, Stenager E, Petersen HC, Kyvik KO, BrønnumHansen H. Risk for multiple sclerosis in dizygotic and monozygotic twins. Mult Scler. 2005;11:500–3.
Hansen T, Skytthe A, Stenager E, Petersen HC, BrønnumHansen H, Kyvik KO. Concordance for multiple sclerosis in Danish twins: an update of a nationwide study. Mult Scler. 2005;11:504–10.
O’Gorman C, Lin R, Stankovich J, Broadley SA. Modelling genetic susceptibility to multiple sclerosis with family data. Neuroepidemiology. 2013;40:1–12.
Bager P, Nielsen NM, Bihrmann K, Frisch M, Wohlfart J, KochHenriksen N, Melbye M, Westergaard T. Sibship characteristics and risk of multiple sclerosis: A nationwide cohort study in Denmark. Am J Epidemiol. 2006;163:1112–7.
Compston A, Coles A. Multiple sclerosis. Lancet. 2002;359:1221–31.
Dyment DA, Yee IML, Ebers GC, Sadovnick AD, the Canadian Collaborative Study Group. Multiple sclerosis in stepsiblings: Recurrence risk and ascertainment. J Neurol Neurosurg Psychiatry. 2006;77:258–9.
Ebers GC, Sadovnick AD, Dyment DA, Yee IM, Willer CJ, Risch N. Parent oforigin effect in multiple sclerosis: observations in halfsiblings. Lancet. 2004;363:1773–4.
Ebers GC, Yee IML, Sadovnick AD, Duquette P, the Canadian Collaborative Study Group. Conjugal multiple sclerosis: Population based prevalence and recurrence risks in offspring. Ann Neurol. 2000;48:927–31.
Sadovnick AD, Yee IML, Ebers GC, the Canadian Collaborative Study Group. Multiple sclerosis and birth order: A longitudinal cohort study. Lancet Neurol. 2005;4:611–7.
Sadovnick AD, Ebers GC, Dyment DA, Risch NJ, the Canadian Collaborative Study Group. Evidence for genetic basis of multiple sclerosis. Lancet. 1996;347:1728–30.
Willer CJ, Dyment DA, Rusch NJ, Sadovnick AD, Ebers GC, the Canadian Collaborative Study Group. Twin concordance and sibling recurrence rates in multiple sclerosis. Proc Natl Acad Sci U S A. 2003;100:12877–82.
French Research Group on Multiple Sclerosis. Multiple sclerosis in 54 twinships: Concordance rate is independent of zygosity. Ann Neurol. 1992;32:724–7.
Islam T, Gauderman WJ, Cozen W, Hamilton AS, Burnett ME, Mack TM. Differential twin concordance for multiple sclerosis by latitude of birthplace. Ann Neurol. 2006;60:56–64.
Mumford CJ, Wood NW, KellarWood H, Thorpe JW, Miller DH, Compston DA. The British Isles survey of multiple sclerosis in twins. Neurology. 1994;44:11–5.
Ristori G, Cannoni S, Stazi MA, Vanacore N, Cotichini R, Alfò M, Pugliatti M, Sotgiu S, Solaro C, Bomprezzi R,et al. Multiple sclerosis in twins from continental Italy and Sardinia: A Nationwide Study. Ann Neurol. 2006;59:27–34.
Kuusisto H, Kaprio J, Kinnunen E, Luukkaala T, Koskenvuo M, Elovaara I. Concordance and heritability of multiple sclerosis in Finland: Study on a nationwide series of twins. Eur J Neurol. 2008;15:1106–10.
Fagnani C, Neale MC, Nisticò L, et al. Twin studies in multiple sclerosis: A metaestimation of heritability and environmentality. Mult Scler. 2015;21(11):1404–13.
Freeman JB, Dale R. Assessing bimodality to detect the presence of a dual cognitive process. Behav Res. 2013;45:83–97.
Clayton DC. Prediction and interaction in complex disease genetics: Experience in Type 1 diabetes. PLoS Genet. 2009;5(7), e1000540.
Hernán MA, Olek MJ, Ascherio A. Geographic variation of MS incidence in two prospective studies of US women. Neurology. 1999;53:1711–8.
KochHenriksen N. The Danish Multiple Sclerosis Registry: a 50year followup. Mult Scler. 1999;5:293–6.
Freedman DM, Dosemeci M, Alavanja MC. Mortality from multiple sclerosis and exposure to residential and occupational solar radiation: A case control study based on death certificates. Occup Environ Med. 2000;57:418–21.
Celius EG, Vandvik B. Multiple sclerosis in Oslo, Norway: prevalence on 1 January 1995 and incidence over a 25year period. Eur J Neurol. 2001;8:463–9.
Barnett MH, William DB, Day S, Macaskill P, McLeod JG. Progressive increase in incidence and prevalence of multiple sclerosis in Newcastle, Australia: a 35year study. J Neurol Sci. 2003;213:1–6.
Sundström P, Nyström L, Forsgren L. Incidence (1988–97) and prevalence (1997) of multiple sclerosis in Västerbotten County in northern Sweden. J Neurol Neurosurg Psychiatry. 2003;74:29–32.
Ranzato F, Perini P, Tzintzeva E, Tiberio M, Calabrese M, Emani M, Davetag F, De Zanche L, Garbin E, Verdelli F, et al. Increasing frequency of multiple sclerosis in Padova, Italy: a 30 year epidemiological survey. Mult Scler. 2003;9:387–92.
Sarasoja T, Wikström J, Paltamaa J, Hakama M, Sumelahti ML. Occurrence of multiple sclerosis in central Finland: a regional and temporal comparison during 30 years. Acta Neurol Scand. 2004;110:331–6.
Orton SM, Herrera BM, Yee IM, Valdar W, Dyment DA, Ramagopalan SV, Sadovnick AD, Ebers GC, and the Canadian Collaborative Study Group. Sex ratio of multiple sclerosis in Canada: A longitudinal study. Lancet Neurol. 2006;5:932–6.
Goodin DS, Khankhanian P. Single Nucleotide Polymorphism (SNP)Strings: An Alternative Method for Assessing Genetic Associations. PLoS One. 2014;9(4), e90034.
Khankhanian P, Gourraud PA, Lizee A, Goodin DS. Haplotypebased approach to known MSassociated regions increases the amount of explained risk. J Med Genet. 2015;52(9):587–94.
Acknowledgements
None.
Funding
None.
Author information
Authors and Affiliations
Corresponding author
Additional information
Competing interests
Dr. Goodin declares that he has no competing interests with the material presented herein (either financial or nonfinancial).
Authors’ contributions
Dr. Goodin was solely responsible for developing the mathematical model, analyzing the data, and writing the manuscript.
Additional file
Additional file 1:
Contains several derivations related to the points made in the main text. It also contains 3 Table S1 and Figure S1. (PDF 2039 kb)
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
About this article
Cite this article
Goodin, D.S. The nature of genetic susceptibility to multiple sclerosis: constraining the possibilities. BMC Neurol 16, 56 (2016). https://doi.org/10.1186/s1288301605756
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s1288301605756
Keywords
 Multiple sclerosis
 Genetic susceptibility
 Heritability
 Pathogenesis
 Causation
 Epidemiology
 Environment
 Complex disease
 Complex
 Twin studies