Skip to main content
  • Research article
  • Open access
  • Published:

Decision tree analysis of genetic risk for clinically heterogeneous Alzheimer’s disease



Heritability of Alzheimer’s disease (AD) is estimated at 74% and genetic contributors have been widely sought. The ε4 allele of apolipoprotein E (APOE) remains the strongest common risk factor for AD, with numerous other common variants contributing only modest risk for disease. Variability in clinical presentation of AD, which is typically amnestic (AmnAD) but can less commonly involve visuospatial, language and/or dysexecutive syndromes (atypical or AtAD), further complicates genetic analyses. Taking a multi-locus approach may increase the ability to identify individuals at highest risk for any AD syndrome. In this study, we sought to develop and investigate the utility of a multi-variant genetic risk assessment on a cohort of phenotypically heterogeneous patients with sporadic AD clinical diagnoses.


We genotyped 75 variants in our cohort and, using a two-staged study design, we developed a 17-marker AD risk score in a Discovery cohort (n = 59 cases, n = 133 controls) then assessed its utility in a second Validation cohort (n = 126 cases, n = 150 controls). We also performed a data-driven decision tree analysis to identify genetic and/or demographic criteria that are most useful for accurately differentiating all AD cases from controls.


We confirmed APOE ε4 as a strong risk factor for AD. A 17-marker risk panel predicted AD significantly better than APOE genotype alone (P < 0.00001) in the Discovery cohort, but not in the Validation cohort. In decision tree analyses, we found that APOE best differentiated cases from controls only in AmnAD but not AtAD. In AtAD, HFE SNP rs1799945 was the strongest predictor of disease; variation in HFE has previously been implicated in AD risk in non-ε4 carriers.


Our study suggests that APOE ε4 remains the best predictor of broad AD risk when compared to multiple other genetic factors with modest effects, that phenotypic heterogeneity in broad AD can complicate simple polygenic risk modeling, and supports the association between HFE and AD risk in individuals without APOE ε4.

Peer Review reports


Alzheimer’s disease (AD) is a devastating neurodegenerative disorder that results in memory impairment and can also involve deterioration of language, visuospatial and/or executive functioning abilities. As the world’s population ages and the number of individuals with AD grows, it will become increasingly important to identify those at highest risk for AD during the earliest stages of—or prior to—disease.

Genetic predictors of AD hold strong potential for identifying those at risk of developing disease. Indeed, a large clinical study will launch in 2015 to assess the utility of AD therapies given to individuals at highest genetic risk for AD but who are still cognitively healthy [1]. These individuals, who carry the ε4 allele of apolipoprotein E (APOE), have a 2-10x increased risk for developing AD compared to non-carriers [2,3], but not all ε4 carriers go on to develop disease [3,4]. Despite the vast number of genetic studies of AD, which is estimated to be 74% heritable [5], no other common variants have been identified that confer as high a risk as APOE ε4. In rare cases, AD is familial, caused by an autosomal dominant mutation in APP, PSEN1, or PSEN2 [6,7]. For sporadic late-onset AD (LOAD), numerous common variants of very low effect (odds ratio [OR] ~ 1.1-1.3) have been identified through genome-wide association studies (GWAS) and replicated across multiple large [8], and diverse populations [9,10]. More recently, rare variants (<1% allele frequency) of larger effect size have also been identified as risk conferring (TREM2 0.3% [11], PLD3 < 0.5% [12], MAPT 0.3% [13]) or protective against (APP 0.01% [14] to 0.62% [15]) AD.

In addition to genetic heterogeneity, there is also clinical heterogeneity in AD. The majority of patients present with amnestic syndromes (AmnAD) but approximately 6-14% of AD patients demonstrate atypical clinical syndromes (AtAD) [16]. These include 1) posterior cortical atrophy (PCA), characterized by predominant visuospatial deficits [17]; 2) the logopenic variant of primary progressive aphasia (lvPPA) [18], characterized by loss in phonologic short-term memory; and 3) dysexecutive/behavioral AD [16] characterized by loss of executive function and/or behavioral changes with retention of memory function.

Genetic and phenotypic heterogeneity strongly support the notion that multiple genetic variants of small effect contribute to disease susceptibility. A multi-locus approach may increase the ability to identify individuals at highest risk for any AD syndrome. The multi-locus approach has had modest success in LOAD, with polygenic risk scoring approaches associating better with LOAD diagnoses and age of onset than APOE genotype alone [19-21]. However, most studies have focused on clinically homogeneous groups with primary amnestic presentations.

In this study, we investigated two different strategies for polygenic risk assessment of clinically heterogeneous AD. First, we took a traditional approach and developed and assessed the utility of a multi-marker genetic risk score to predict AD. The risk score was based on a Discovery cohort association study that sought to replicate previous AD findings and assess additional candidate variants for their association with disease risk. The risk score was then tested for its predictive ability in a separate Validation cohort. Second, we used a more novel decision tree analysis [22] to identify genetic and demographic risk factors for AD. This data-driven method has been used in diverse clinical contexts [23-26] to predict binary outcomes, but is largely unutilized in the prediction of AD diagnosis. It allowed us to assess step-wise interactions between variables to identify the factors that best predict AD.



Individuals 65- to 101-years-old (N = 216 males, N = 232 females) were evaluated at the University of California, San Francisco Memory and Aging Center (UCSF MAC) and had genotype data available for analysis. All participants were unrelated Caucasians (confirmed by multi-dimensional scaling (MDS) plots or self-described for those without GWAS data available). Non-Caucasians were excluded due to the insufficient number of participants and potential for confounding background genetics. All aspects of the study were approved by the UCSF Institutional Review Board and written informed consent was obtained from all participants and surrogates (as per UCSF Institutional Review Board protocol).

Clinical assessment

All participants underwent a multi-step screening process with an in-person visit at the MAC that included a neurologic exam, cognitive assessment [27], and medical history. Each participant’s study partner was also interviewed regarding functional abilities. A multidisciplinary team composed of a neurologist, neuropsychologist, and nurse then reviewed all potential participants. Participants included in this study had a study partner (i.e., spouse, close friend). The multidisciplinary team established clinical diagnoses for cases according to consensus criteria for AD [16]. Atypical or concomitant diagnoses were established for lvPPA [16,18], PCA syndrome [16,17], primary executive AD [16], vascular disease [28], or dementia with Lewy bodies (DLB) [29] according to consensus criteria. Individuals with primarily amnestic AD presentations were considered “AmnAD” and those with less common clinical syndromes (lvPPA, PCA, primary executive) or comorbidities (vascular disease, DLB) were considered as “AtAD”. All control subjects underwent a similar multi-step screening process, including study partner interview and a consensus team of clinicians then reviewed all potential participants. Controls included in this study had Mini-Mental State Exam (MMSE) [30] scores ≥26 or a Clinical Dementia Rating Scale (CDR) [31] of 0, no participant or informant report of cognitive decline in the prior year, and no evidence from their screening visit suggesting a neurodegenerative disorder (per team neurologist’s clinical judgment). Individuals harboring a known disease mutation were excluded from the study.


Genomic DNA was extracted from peripheral blood using standard protocols (Gentra PureGene Blood Kit, QIAGEN, Inc. – USA, Valencia, CA). Genotyping was performed using one of three platforms: TaqMan, Sequenom, or via array genotyping. The method used for each variant is provided in the Supplement (Additional file 1). TaqMan Allelic Discrimination Assay was used for APOE genotyping (rs429358 and rs7412) and others as noted, and was conducted on an ABI 7900HT Fast Real-Time PCR system (Applied Biosystems, Foster City, CA) according to manufacturer's instructions. Sequenom iPLEX Technology (Sequenom, San Diego, CA) was also used for genotyping a subset of variants as per manufacturer’s instructions. The SpectroAquire and MassARRAY Typer Software packages (Sequenom, San Diego, CA) were used for interpretation and Typer analyzer (v3.4.0.18) was used to review and analyze data. Only genotypes with “Conservative” or “Moderate” quality calls were included in analysis. A subset of genotypes was also obtained from the Illumina Omni1-Quad array genotyping platform (Illumina Inc., San Diego, CA), processed using manufacturer’s instructions.

A total of 75 variants were genotyped in all subjects and analyzed for association with AD risk. These variants are a culmination of different, on-going studies to evaluate the effect of genes involved in neurodegenerative disease, neurodevelopment, social function, behavior, neuropsychiatry, and language on diseases like AD and frontotemporal dementia (FTD). These included polymorphisms previously associated with: 1) risk for AD or other neurodegenerative disease; 2) neuropsychiatric phenotypes implicated in dementia risk (e.g., depression [32-34], dyslexia [27]; 3) cognitive protection [35]. A full list of variants, associated phenotypes, and accompanying references is provided in Additional file 1. Inclusion criteria for analyzed markers were: >80% non-missing genotypes, ≥0.01 minor allele frequency (MAF), and Hardy-Weinberg equilibrium (HWE) P > 0.001. The average call rate was 98% for all variants.


Association study

The study cohort was divided into two groups, a first stage “Discovery” cohort for development of the AD risk score and a second stage “Validation” cohort with which to test the risk scoring method developed in the Discovery cohort. We first conducted association analysis of all markers meeting inclusion criteria in the Discovery cohort. Analyses were performed in PLINK as a logistic regression under an additive model [36].

Risk scoring

For scoring, we ranked all findings by p-value and then removed SNPs that were in linkage disequilibrium (LD, r2 > 0.8) in our dataset; the single most strongly associated SNP of a set of linked markers was retained. Using the unlinked markers we created raw scoring files for each top finding, iteratively adding the next most significant finding to each scoring set (i.e., 1st marker in first set, 1st and 2nd markers in second set, etc.). Reference alleles were established in the scoring files such that all effects were in the same direction of conferring risk (e.g., a SNP with an empirical OR 0.1 for the reference minor allele would be switched such that the major allele was the reference allele for scoring). Using this paradigm, we created scoring sets for the top findings that were not in LD.

We implemented the ‘SNP scoring’ algorithm in PLINK to first assess the predictive ability of each score set (A-Z) in the Discovery dataset for evaluative purposes. We compared the risk scores for each set against the true phenotypes using receiver operating characteristic (ROC) curves and used the resulting area under the curve (AUC) values to determine the optimal score set, with higher AUC values representing better sensitivity and specificity. The optimal score set was determined as follows. First, score sets were evaluated in two ways: 1) by simple consecutive comparisons of AUC values to identify the set at which AUC is largest, and 2) by statistical comparisons of a given set’s ROC curve AUC (AUCi) versus the previous set’s ROC curve AUC (AUCi-1) and versus the APOE-only score’s AUC (AUCA). We then iteratively evaluated sets to determine the maximum AUC, stopping when two consecutive sets each resulted in decreases of AUC as compared to the previous set (i.e., AUCi > AUCi+1 & AUCi > AUCi+2). After determining this optimal set, we used the same scoring file to create risk scores for the Validation cohort and assessed the AUC of the resulting ROC curve to determine the generalization of our risk scoring method in an independent dataset. All ROC analyses were performed in Stata10/MP (StataCorp LP, College Station, TX).

Decision tree analysis

To explore and evaluate the diagnostic potential of the genetic variants available with ROC curves, we used the ROC4 software platform (ROC4.22.exe; The software utilizes a user-set weight of sensitivity and specificity (kappa) to choose the predictive variable and value that best divides the sample. The sample is then divided on the value of the variable, which is most predictive based on this sensitivity and specificity. Following this, the program performs the same analysis amongst the subgroups created by the previous step. The process continues until a stopping rule is enforced. The output after stopping rules come into place is a “decision tree” which shows the variables and interactions between them in predicting the outcome of interest. We chose a kappa weight of 0.5 in order to balance efficiency (sensitivity and specificity were equally weighted). There were three stopping rules: when subgroup totals were less than 10, when a significance value corresponding to a multiple-testing-corrected Χ2 test greater than P = 0.01 was reached, or when a three way interaction was reached. We performed three ROC analyses: one combined analysis of controls and all types of AD patients, one for the controls and AmnAD, and one for controls and AtAD. The ‘gold standard’ binary score was case/control outcome for any AD clinical diagnosis. Additional predictors included sex (0/1 for male/female), age (in years), and all genetic variants passing quality control (0/1/2 for dose of minor frequency allele).


In total, N = 185 AD cases and N = 283 cognitively normal controls were included in the analysis. Demographics for each group are shown in Table 1. A total of 192 (59 cases, 133 controls) individuals were in the first stage Discovery cohort and 276 (126 cases, 150 controls) were in the second stage Validation cohort. Of the Discovery cohort, 21.9% were AmnAD and 8.9% were AtAD (17 Total, 7 lvPPA, 3 PCA, 3 primarily executive AD, 2 AD with concomitant vascular disease, 2 AD with concomitant DLB; Figure 1). In the Validation cohort, 30.4% were AmnAD, and 8.0% were AtAD (22 Total, 7 lvPPA, 1 PCA, 13 AD with vascular disease, 1 AD with DLB).

Table 1 Sample demographics
Figure 1
figure 1

Diagnosis breakdown. Each cohort’s composition by diagnosis is shown with controls in blue, amnestic AD in red and other (atypical) AD in yellow.

Confirmation of AD risk variants and establishment of a 17-marker risk assessment

We first performed an association study in the Discovery cohort as a small-scale replication study of previously identified risk variants for AD in our clinically heterogeneous cohort. We then used this analysis to establish a ranked order by which we could iteratively add variants into a polygenic score to evaluate their utility for risk assessment. In our analysis, only the well-established APOE ε4 allele (P = 1.36 × 10−6), with an estimated OR = 4.28, met strict significance after Bonferroni correction for multiple testing (Table 2). Seven other variants had nominal p-values of P < 0.05. The second strongest association was with the rs1799945 SNP in HFE (P = 1.64 × 10−3, OR = 2.83). Variation in the hemochromatosis gene has previously been associated with AD in numerous large meta-analyses [37-39]. Two established risk factors for AD identified by GWAS were nominally associated in our study but with an opposite direction of association, rs3851179 in PICALM (P = 2.37 × 10−3, OR = 1.87) [40,41] and rs6701713 in CR1 (P = 0.01, OR = 0.42) [40,42]. More novel AD risk candidates implicated by our study included rs2020942 (P = 0.01, OR = 1.81), a SNP tagging the variable number tandem repeat in the serotonin transporter gene, SLC6A4, most often associated with depression [43,44]; rs1799913 (P = 0.04, OR = 0.64) in TPH1, an established depression risk factor [45] that was recently associated with depression in AD [34]; rs4504469 (P = 0.04, OR = 0.60) in KIAA0319, which was associated with dyslexia [46]; and rs1320490 (P = 0.05, OR = 1.63) in CDC42BPA, previously associated with reading ability [47].

Table 2 Association results

By iteratively adding genetic variants, we found that a risk score panel comprising 17 variants (“Q”) was the best predictor of AD status (Table 3; Figure 2). When evaluated alone, APOE genotype had modest predictive value for differentiating AD cases from controls. The 17-marker risk score had a significantly better AUC and was better at predicting AD risk than APOE alone (P < 0.00001; Figure 3).

Table 3 Score set evaluation statistics
Figure 2
figure 2

Area under the curve values for each scoring set. Area under the curve (AUC) values are provided for each scoring method (in black) as well as the difference between the current and previous scoring method (Δ AUC, in gray). Score sets were iteratively assessed until two consecutive resulting AUC values were lower than the preceding AUC value. The resulting scoring method [“Q”] had an AUC of 0.88.

Figure 3
figure 3

Receiver operating characteristic curves for Discovery and Validation cohorts. Receiver operating characteristic (ROC) curves are shown for scoring with APOE only (in blue) and the 17-marker risk score (“Q”, in red). AUC values of each curve are provided. (A) Discovery cohort shows a significant improvement in sensitivity/specificity as compared to APOE only, P < 0.00001. (B) Risk scoring with the 17-marker risk score in the Validation cohort does not increase prediction beyond APOE genotype alone.

Genetic risk score does not predict AD better than APOE in a separate cohort

When evaluated in the Validation cohort, the “Q” risk scoring method did not perform better than APOE alone (Table 4; Figure 3). The 17-marker gene score resulted in 65% maximal correct classification of individuals, with a limited sensitivity (54%) and specificity (73%; Figure 4). Removing excess AmnAD patients from the Validation group to better match the proportion of AtAD individuals in the Discovery cohort did not improve the performance of the multi-marker risk score (Additional file 2).

Table 4 Risk scoring results for the Discovery and Validation cohorts
Figure 4
figure 4

Sensitivity and specificity for 17-marker scoring method in Validation cohort. Percent sensitivity (black) and specificity (gray) are plotted by numeric value based on the 17-marker scoring method. Accepting sensitivity of 80% would render specificity of only 36%; specificity of 80% would reduce sensitivity to 45.5%.

Decision tree analysis identifies genetic heterogeneity in amnestic versus atypical AD

We postulated that the clinical heterogeneity between the Discovery and Validation cohorts might be contributing to the failure of the 17-variant risk score to differentiate AD cases from controls better than APOE genotype alone. Under an alternative model, the genetic risk for AmnAD is different from that for AtAD. In order to identify genetic and/or demographic criteria that are most useful for accurately differentiating all AD cases from controls and to test whether AmnAD and AtAD share disease predictors or are distinct in their risk profiles, we performed data-driven decision tree analyses. We performed three analyses, one in all AD cases (N = 165) versus controls (N = 283), one with only AmnAD (N = 126) versus controls, and one with AtAD (N = 39) versus controls.

In the analysis with all AD cases, carrying an APOE ε4 allele was the first differentiator of cases from controls (Figure 5). Amongst individuals carrying the ε4 risk allele, the next risk predictor was being ≥77 years old. Of these eldest individuals, the next differentiator was carrying one or more of the minor allele for rs4343 in ACE, an AD-risk gene [48,49]. The fourth differentiator of this subgroup was being homozygous for the major allele of rs8053211 in ATP2C2, a gene associated with dyslexia and other language traits [50,51], as carriers of one or two copies of the minor allele had a higher risk for diagnosis of AD. Using these predictors, the model had a predictive value positive (PVP) of 0.87, meaning that it correctly predicted a positive AD diagnosis 87% of the time. The sensitivity at this cut point was 0.71 and the specificity was 0.64 (Additional file 3). On the other side of the tree, in individuals carrying no ε4 alleles, the next differentiator of controls from cases was being <83 years old. Of these individuals, not carrying any of the HFE SNP, rs1799945, AD risk alleles was more predictive of control status. Finally, carrying two minor alleles of the DCDC2 SNP rs1091047 (a dyslexia gene [52]) was most predictive of control status. In this final group, the model had a predictive value negative (PVN) of 0.92, meaning it correctly predicted a diagnosis of control 92% of the time. The sensitivity and specificity at this cut point were 0.64 and 0.73, respectively (Additional file 3).

Figure 5
figure 5

Decision tree for all forms of Alzheimer’s disease. Binary decision tree created by receiver operator characteristic (ROC) analysis is shown. Branching points represent the variable and cutting point which best predicts whether or not an individual will be diagnosed with any form of Alzheimer’s disease (AD). Shaded boxes depict the variable used to separate each subgroup and unshaded boxes provide summary data characterizing each subgroup. For more information on the genes depicted, please see Additional file 1.

In the analysis of AmnAD cases versus controls, carrying an APOE ε4 allele was also the best differentiator of cases from controls (Figure 6). Similar to the all-AD analysis, in individuals carrying the ε4 risk allele, the next risk predictor was being ≥77 years old. Of these eldest individuals, the third differentiator was carrying one or more of the minor allele for rs4343 in ACE. In these individuals at this cut point, the PVP was 0.76. The sensitivity at this cut point was 0.83 and the specificity was 0.48. On the other side of the tree, in individuals carrying no ε4 alleles, the next differentiator of controls from cases was being between 66–87 years old. In these older individuals, there was another age differentiation whereby being 66–77 years old predicted control status. In this final group, the PVN was 0.92. The sensitivity and specificity at this cut point were 0.64 and 0.67, respectively.

Figure 6
figure 6

Decision tree for amnestic Alzheimer’s disease. Binary decision tree created by receiver operator characteristic (ROC) analysis is shown. Branching points represent the variable and cutting point which best predicts whether or not an individual will be diagnosed with an amnestic form of Alzheimer’s disease (AmnD). Shaded boxes depict the variable used to separate each subgroup and unshaded boxes provide summary data characterizing each subgroup. For more information on the genes depicted, please see Additional file 1. NC - normal control.

The analysis of AtAD cases versus controls provided striking contrast to the previous analyses. In this cohort, carrying one or more minor alleles of the HFE SNP (rs1799945) was the first differentiator (Figure 7). In those with HFE risk alleles, the next differentiator was carrying ≥1 allele of the GRN variant, rs5848, which has been associated with risk for AD [53], hippocampal sclerosis [54,55], FTD [56], and bipolar disorder [57]. In the final at-risk group, the PVP was 0.47, with sensitivity and specificity of 0.62 and 0.74, respectively. On the other side of the tree, the next differentiator predicting control status was being homozygous for the minor allele of GSK3B SNP rs13312998, which has also been associated with AD and FTD [58]. At this cut point, the PVN was 0.93. The sensitivity and specificity were 0.43 and 0.87, respectively.

Figure 7
figure 7

Decision tree for atypical Alzheimer’s disease. Binary decision tree created by receiver operator characteristic (ROC) analysis is shown. Branching points represent the variable and cutting point which best predicts whether or not an individual will be diagnosed with an atypical form of Alzheimer’s disease (AtD). Shaded boxes depict the variable used to separate each subgroup and unshaded boxes provide summary data characterizing each subgroup. For more information on the genes depicted, please see Additional file 1. NC - normal control.


In our association study, we found continuing support for APOE, HFE, PICALM, CR1, SLC6A4, CDC42BP, TPH1, and KIAA0319 as genetic risk factors for AD. Using information from 17 variants combined into a genetic risk score allowed us to predict clinically heterogeneous AD cases significantly better than APOE genotype alone, supporting the role of these variants as predictors of AD risk in this primary Discovery group. However, when we attempted to apply this polygenic risk assessment to an independent cohort of clinically heterogeneous AD patients for validation, the utility of analyzing 17 variants was not significantly better than analyzing APOE alone. Taken together, this suggests two things. First, it suggests that APOE ε4 remains the best predictor of AD risk, likely due to its strong effect, when compared to multiple other risk factors with very modest risk effects. Second, it suggests that phenotypic variability in AD complicates simple genetic risk modeling, particularly when co-morbidities are suspected.

The fact that APOE ε4 is the most predictive variant for amnestic AD but does not appear to be associated with risk for atypical AD syndromes such as PCA and lvPPA [59] likely contributes to the decreased specificity of the genetic risk assessment; namely, carrying an ε4 allele is associated with being affected in amnestic AD but is also associated with not being affected by PCA or lvPPA. Thus, APOE ε4 in the simple context of amnestic AD is quite adept at predicting who will be a case versus control, but is much less specific in the broader context of all AD syndromes, inclusive of atypical presentations and co-morbidities. Indeed, in our entire cohort of Discovery + Validation samples, APOE ε4 was significantly enriched in AmnAD but not AtAD cases when compared to controls (AmnAD vs Control P = 3.08 × 10−7; AtAD vs Control P = 0.1). A similar discrepancy due to clinical heterogeneity may also underlie our association of variants in PICALM and CR1 in the opposite direction of historical findings. An alternate methodology to identify genetic and demographic factors that predict case/control status in AmnAD and AtAD separately was able to improve differentiation. Utilizing a decision tree methodology, we found that APOE best differentiated cases from controls only in AmnAD but not AtAD. In contrast, HFE genotype was the best differentiating factor between AtAD cases and controls; the same variant was also the first genetic risk factor for broad AD in individuals without APOE ε4. These findings are consistent with prior research implicating HFE in AD risk in individuals without APOE ε4 [60]. These results also suggest that atypical presentations could represent a distinct genetic class of AD, although the present study was not designed to specifically address this question. A recent study suggests that AtAD is more heritable than AmnAD [61], supporting the theory that there are additional genetic risk factors for AtAD that remain to be elucidated. In the future, GWAS of larger, more diverse cohorts of individuals with specific atypical phenotypes (e.g., PCA) could identify novel genetic risk factors specific to these AD syndromes. Phenotypic specificity in studies of amnestic AD may also provide additional statistical power to identify risk factors of small effect size.

In an effort to rule out the possibility of misdiagnosis, particularly in the AtAD group, we performed a post hoc chart review of patients for which pathological data was available (N = 25 AmnAD and N = 8 AtAD). All of these individuals had AD pathology cited as a primary (N = 24 AmnAD, N = 5 AtAD) or major contributing factor (N = 1 AmnAD, N = 3 AtAD) that correlated with each patient’s clinical presentation (Additional file 4). Although not exhaustive, this data suggests that AD pathology was correctly recognized as a major contributor to patients’ clinical syndrome in our patient cohort, and that the differential genetic risk profile of AtAD potentially influences its pathological heterogeneity when compared to AmnAD.

This study benefits from a two-staged discovery-validation study design, inclusion of a broad spectrum of clinical patients representing the phenotypic heterogeneity of AD, well-characterized cognitively normal controls, and inclusion of many of the most replicated genetic loci implicated in AD as well as several, more novel gene candidates. The main limitations of this study include the limited sample size, lack of pathological confirmation in all study participants, and the relatively young age of the controls. In addition, Caucasian individuals were the sole participants in our study, which potentially limits the scope of our findings. Co-morbid depression was not assessed in this analysis and may be a contributing factor to the associations with the depression associated variants. This hypothesis requires direct testing in a separate study.

We implemented a decision tree analysis to identify genetic and demographic criteria most useful for accurately differentiating AD cases from controls. With an iterative, non-parametric approach, we used recursive partitioning to identify individuals according to a binary outcome of interest [22]. This method benefits from limiting the use of restrictive assumptions like linearity, additivity, and homoscedasticity, which are required by most linear models [23]. This approach has been used in a variety of clinical settings to identify variables of interest in predicting binary outcomes such as identification of AD patients who will have rapid cognitive decline [24], presence of tuberculosis after multiple conflicting tests [25], and ability to succeed in diabetes self-management programs [26]. Decision trees are amenable for use in a clinical setting, where an individual’s risk for the outcome of interest—in this case, AD—can be estimated based on multiple predictive variables that follow a logical progression. Testing whether the factors identified in our decision tree analyses have predictive value in a larger, independent cohort will be critical for elucidating whether this risk assessment has clinical utility, particularly with the inclusion of pathologically confirmed cases and exclusion of amyloid-positive ‘controls.’


We found that APOE genotype is the best predictor of risk compared to a polygenic risk score when assessing groups of clinically heterogeneous AD patients versus healthy older controls. In decision tree analysis, we found that AmnAD and AtAD have differential genetic risk factors, which may account for the inaccuracy of the traditional polygenic scoring method. Identifying individuals at highest genetic risk for AD could potentially allow for earlier diagnosis and intervention, allowing the opportunity to intervene with pathological processes and/or provide support prior to clinical onset of symptoms. These risk assessments will benefit from future work to characterize genetic risk factors of clinically homogeneous subtypes of AD in large, diverse populations.


  1. Banner Alzheimer’s Institute Announces Partnership with Novartis in New Study of Alzheimer’s Prevention Treatments. Phoenix: Banner Alzheimer’s Institute; 2014. p. 1–3.

  2. Farrer LA, Cupples LA, Haines JL, Hyman B, Kukull WA, Mayeux R, et al. Effects of age, sex, and ethnicity on the association between apolipoprotein E genotype and Alzheimer disease. A meta-analysis. APOE and Alzheimer Disease Meta Analysis Consortium. JAMA. 1997;278:1349–56.

    Article  CAS  PubMed  Google Scholar 

  3. Devanand DP, Pelton GH, Zamora D, Liu X, Tabert MH, Goodkind M, et al. Predictive utility of apolipoprotein E genotype for Alzheimer disease in outpatients with mild cognitive impairment. Arch Neurol. 2005;62:975–80.

    Article  CAS  PubMed  Google Scholar 

  4. Petersen RC. Apolipoprotein E status as a predictor of the development of Alzheimer’s disease in memory-impaired individuals. JAMA. 1995;273:1274–8.

    Article  CAS  PubMed  Google Scholar 

  5. Gatz M, Pedersen N. Heritability for Alzheimer’s disease: the study of dementia in Swedish twins. J Gerontol Med Sci. 1997;52:117–25.

    Article  Google Scholar 

  6. Campion D, Dumanchin C, Hannequin D, Dubois B, Belliard S, Puel M, et al. Early-onset autosomal dominant Alzheimer disease: prevalence, genetic heterogeneity, and mutation spectrum. Am J Hum Genet. 1999;65:664–70.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Avramopoulos D. Genetics of Alzheimer’s disease: recent advances. Genome Med. 2009;1:34.

    Article  PubMed  PubMed Central  Google Scholar 

  8. European Alzheimer’s Disease I, Genetic, Environmental Risk in Alzheimer’s D, Alzheimer’s Disease Genetic C, Cohorts for H, Aging Research in Genomic E, et al. Meta-analysis of 74,046 individuals identifies 11 new susceptibility loci for Alzheimer’s disease. Nat Genet. 2013;45:1452–8.

    Article  Google Scholar 

  9. Reitz C, Jun G, Naj A, Rajbhandary R, Vardarajan BN, Wang L, et al. Variants in the ATP-binding cassette transporter (ABCA7), apolipoprotein E ϵ4, and the risk of late-onset Alzheimer disease in African Americans. JAMA. 2013;309:1483–92.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Tang MX, Maestre G, Tsai WY, Liu XH, Feng L, Chung WY, et al. Relative risk of Alzheimer disease and age-at-onset distributions, based on APOE genotypes among elderly African Americans, Caucasians, and Hispanics in New York City. Am J Hum Genet. 1996;58:574–84.

    CAS  PubMed  PubMed Central  Google Scholar 

  11. Guerreiro R, Wojtas A, Bras J, Carrasquillo M, Rogaeva E, Majounie E, et al. TREM2 variants in Alzheimer’s disease. N Engl J Med. 2013;368:117–27.

    Article  CAS  PubMed  Google Scholar 

  12. Cruchaga C, Karch CM, Jin SC, Benitez BA, Cai Y, Guerreiro R, et al. Rare coding variants in the phospholipase D3 gene confer risk for Alzheimer’s disease. Nature. 2014;505:550–4.

    Article  CAS  PubMed  Google Scholar 

  13. Coppola G, Chinnathambi S, Lee JJ, Dombroski BA, Baker MC, Soto-Ortolaza AI, et al. Evidence for a role of the rare p.A152T variant in MAPT in increasing the risk for FTD-spectrum and Alzheimer’s diseases. Hum Mol Genet. 2012;21:3500–12.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. Wang L-S, Naj AC, Graham RR, Crane PK, Kunkle BW, Cruchaga C, et al. Rarity of the Alzheimer Disease-Protective APP A673T Variant in the United States. JAMA Neurol. 2014;19104:209–16.

    Google Scholar 

  15. Jonsson T, Atwal JK, Steinberg S, Snaedal J, Jonsson PV, Bjornsson S, et al. A mutation in APP protects against Alzheimer’s disease and age-related cognitive decline. Nature. 2012;488:96–9.

    Article  CAS  PubMed  Google Scholar 

  16. Dubois B, Feldman H, Jacova C. Advancing research diagnostic criteria for Alzheimer’s disease: the IWG-2 criteria. Lancet Neurol. 2014;13:614–29.

    Article  PubMed  Google Scholar 

  17. Crutch S, Schott J, Rabinovici G. Shining a light on posterior cortical atrophy. Alzheimers Dement. 2013;9:463–5.

    Article  PubMed  Google Scholar 

  18. Gorno-Tempini ML, Hillis AE, Weintraub S, Kertesz A, Mendez M, Cappa SF, et al. Classification of primary progressive aphasia and its variants. Neurology. 2011;76:1006–14.

    Article  PubMed  PubMed Central  Google Scholar 

  19. Sabuncu MR, Buckner RL, Smoller JW, Lee PH, Fischl B, Sperling RA. The association between a polygenic Alzheimer score and cortical thickness in clinically normal subjects. Cereb Cortex. 2012;22:2653–61.

    Article  PubMed  Google Scholar 

  20. Marden JR, Walter S, Tchetgen Tchetgen EJ, Kawachi I, Glymour MM. Validation of a polygenic risk score for dementia in black and white individuals. Brain Behav. 2014;4:687–97.

    Article  PubMed  PubMed Central  Google Scholar 

  21. Naj AC, Jun G, Reitz C, Kunkle BW, Perry W, Park YS, et al. Effects of Multiple Genetic Loci on Age at Onset in Late-Onset Alzheimer Disease A Genome-Wide Association Study. JAMA Neurol. 2014;71:1394–404.

    Article  PubMed  PubMed Central  Google Scholar 

  22. Kraemer HC. Evaluating Medical Tests: Objective and Quantitative Guidelines. Thousand Oaks: SAGE Publications, Inc; 1992.

    Google Scholar 

  23. Ong J, Kuo T, Manber R. Who is at risk for dropout from group cognitive-behavior therapy for insomnia? J Psychosom Res. 2008;64:419–25.

    Article  PubMed  PubMed Central  Google Scholar 

  24. O’Hara R, Thompson JM, Kraemer HC, Fenn C, Taylor JL, Ross L, et al. Which Alzheimer Patients Are at Risk for Rapid Cognitive Decline? J Geriatr Psychiatry Neurol. 2002;15:233–8.

    Article  PubMed  Google Scholar 

  25. Thanassi W, Noda A, Hernandez B, Newell J, Terpeluk P, Marder D, et al. Delineating a retesting zone using receiver operating characteristic analysis on serial quantiFERON tuberculosis test results in US healthcare workers. Pulm Med. 2012;2012:1–7.

    Article  Google Scholar 

  26. Glasgow RE, Strycker LA, King DK, Toobert DJ. Understanding who benefits at each step in an internet-based diabetes self-management program: application of a recursive partitioning approach. Med Decis Making. 2014;34:180–91.

    Article  PubMed  Google Scholar 

  27. Miller ZA, Mandelli ML, Rankin KP, Henry ML, Babiak MC, Frazier DT, et al. Handedness and language learning disability differentially distribute in progressive aphasia variants. Brain. 2013;136:3461–73.

    Article  PubMed  PubMed Central  Google Scholar 

  28. Román G, Tatemichi T, Erkinjuntti T. Vascular dementia Diagnostic criteria for research studies: Report of the NINDS‐AIREN International Workshop*. Neurology. 1993;43:250–60.

    Article  PubMed  Google Scholar 

  29. McKeith I, Dickson D, Lowe J, Emre M. Diagnosis and management of dementia with Lewy bodies third report of the DLB consortium. Neurology. 2005;65:1863–72.

    Article  CAS  PubMed  Google Scholar 

  30. Folstein MF, Folstein SE, McHugh PR. “Mini-mental state”. A practical method for grading the cognitive state of patients for the clinician. J Psychiatr Res. 1975;12:189–98.

    Article  CAS  PubMed  Google Scholar 

  31. Morris JC. The Clinical Dementia Rating (CDR): current version and scoring rules. Neurology. 1993;43:2412–4.

    Article  CAS  PubMed  Google Scholar 

  32. Gennatas ED, Cholfin JA, Zhou J, Crawford RK, Sasaki DA, Karydas A, et al. COMT Val158Met genotype influences neurodegeneration within dopamine-innervated brain structures. Neurology. 2012;78:1663–9.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  33. Zuccato C, Cattaneo E. Brain-derived neurotrophic factor in neurodegenerative diseases. Nat Rev Neurol. 2009;5:311–22.

    Article  CAS  PubMed  Google Scholar 

  34. Arlt S, Demiralay C, Tharun B, Geisel O, Storm N, Eichenlaub M, et al. Genetic risk factors for depression in Alzheimer’s disease patients. Curr Alzheimer Res. 2013;10:72–81.

    CAS  PubMed  Google Scholar 

  35. Dubal DB, Yokoyama JS, Zhu L, Broestl L, Worden K, Wang D, et al. Life Extension Factor Klotho Enhances Cognition. Cell Rep. 2014;7:1065–76.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  36. Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MAR, Bender D, et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet. 2007;81:559–75.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  37. Lin M, Zhao L, Fan J, Lian X-G, Ye J-X, Wu L, et al. Association between HFE polymorphisms and susceptibility to Alzheimer’s disease: a meta-analysis of 22 studies including 4,365 cases and 8,652 controls. Mol Biol Rep. 2012;39:3089–95.

    Article  CAS  PubMed  Google Scholar 

  38. Moalem S, Percy M, Andrews D, Kruck T, Wong S, Dalton A, et al. Are Hereditary Hemochromatosis Mutations Involved in Alzheimer Disease? Am J Med Genet. 2000;93:58–66.

    Article  CAS  PubMed  Google Scholar 

  39. Sampietro M, Caputo L, Casatta A, Meregalli M, Pellagatti A, Tagliabue J, et al. The hemochromatosis gene affects the age of onset of sporadic Alzheimer’s disease. Neurobiol Aging. 2001;22:563–8.

    Article  CAS  PubMed  Google Scholar 

  40. Harold D, Abraham R, Hollingworth P, Sims R, Gerrish A, Hamshere ML, et al. Genome-wide association study identifies variants at CLU and PICALM associated with Alzheimer’s disease. Nat Genet. 2009;41:1088–93.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  41. Jun G, Naj AC, Beecham GW, Wang L-S, Buros J, Gallins PJ, et al. Meta-analysis confirms CR1, CLU, and PICALM as alzheimer disease risk loci and reveals interactions with APOE genotypes. Arch Neurol. 2010;67:1473–84.

    Article  PubMed  PubMed Central  Google Scholar 

  42. Naj AC, Jun G, Beecham GW, Wang L-S, Vardarajan BN, Buros J, et al. Common variants at MS4A4/MS4A6E, CD2AP, CD33 and EPHA1 are associated with late-onset Alzheimer’s disease. Nat Genet. 2011;43:436–41.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  43. Lazary J, Lazary A, Gonda X, Benko A, Molnar E, Juhasz G, et al. New evidence for the association of the serotonin transporter gene (SLC6A4) haplotypes, threatening life events, and depressive phenotype. Biol Psychiatry. 2008;64:498–504.

    Article  CAS  PubMed  Google Scholar 

  44. Su S, Zhao J, Bremner J. Serotonin transporter gene, depressive symptoms, and interleukin-6. Circ Cardiovasc Genet. 2009;2:614–20.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  45. Gizatullin R, Zaboli G, Jönsson EG, Asberg M, Leopardi R. Haplotype analysis reveals tryptophan hydroxylase (TPH) 1 gene variants associated with major depression. Biol Psychiatry. 2006;59:295–300.

    Article  CAS  PubMed  Google Scholar 

  46. Velayos-Baeza A, Toma C, da Roza S, Paracchini S, Monaco AP. Alternative splicing in the dyslexia-associated gene KIAA0319. Mamm Genome. 2007;18:627–34.

    Article  CAS  PubMed  Google Scholar 

  47. Meaburn EL, Harlaar N, Craig IW, Schalkwyk LC, Plomin R. Quantitative trait locus association scan of early reading disability and ability using pooled DNA and 100 K SNP microarrays in a sample of 5760 children. Mol Psychiatry. 2008;13:729–40.

    Article  CAS  PubMed  Google Scholar 

  48. Lehmann DJ, Cortina-Borja M, Warden DR, Smith AD, Sleegers K, Prince JA, et al. Large meta-analysis establishes the ACE insertion-deletion polymorphism as a marker of Alzheimer’s disease. Am J Epidemiol. 2005;162:305–17.

    Article  PubMed  Google Scholar 

  49. Kehoe PG, Katzov H, Feuk L, Bennet AM, Johansson B, Wilman B, et al. Haplotypes extending across ACE are associated with Alzheimer’s disease. Hum Mol Genet. 2003;12:859–67.

    Article  CAS  PubMed  Google Scholar 

  50. Newbury DF, Winchester L, Addis L, Paracchini S, Buckingham LL, Clark A, et al. CMIP and ATP2C2 Modulate Phonological Short-Term Memory in Language Impairment. Am J Hum Genet. 2009;85:264–72.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  51. Lesch KP, Timmesfeld N, Renner TJ, Halperin R, Röser C, Nguyen TT, et al. Molecular genetics of adult ADHD: Converging evidence from genome-wide association and extended pedigree linkage studies. J Neural Transm. 2008;115:1573–85.

    Article  CAS  PubMed  Google Scholar 

  52. Scerri TS, Morris AP, Buckingham L-L, Newbury DF, Miller LL, Monaco AP, et al. DCDC2, KIAA0319 and CMIP are associated with reading-related traits. Biol Psychiatry. 2011;70:237–45.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  53. Sheng J, Su L, Xu Z, Chen G. Progranulin polymorphism rs5848 is associated with increased risk of Alzheimer’s disease. Gene. 2014;542:141–5.

    Article  CAS  PubMed  Google Scholar 

  54. Pao WC, Dickson DW, Crook JE, Finch NA, Rademakers R, Graff-Radford NR. Hippocampal sclerosis in the elderly: genetic and pathologic findings, some mimicking Alzheimer disease clinically. Alzheimer Dis Assoc Disord. 2011;25:364–8.

    Article  PubMed  PubMed Central  Google Scholar 

  55. Dickson DW, Baker M, Rademakers R. Common variant in GRN is a genetic risk factor for hippocampal sclerosis in the elderly. Neurodegener Dis. 2010;7:170–4.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  56. Rademakers R, Eriksen JL, Baker M, Robinson T, Ahmed Z, Lincoln SJ, et al. Common variation in the miR-659 binding-site of GRN is a major risk factor for TDP43-positive frontotemporal dementia. Hum Mol Genet. 2008;17:3631–42.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  57. Galimberti D, Prunas C, Paoli RA, Dell’Osso B, Fenoglio C, Villa C, et al. Progranulin gene variability influences the risk for bipolar I disorder, but not bipolar II disorder. Bipolar Disord. 2014;16:769–72.

    Article  CAS  PubMed  Google Scholar 

  58. Schaffer BAJ, Bertram L, Miller BL, Mullin K, Weintraub S, Johnson N, et al. Association of GSK3B with Alzheimer disease and frontotemporal dementia. Arch Neurol. 2008;65:1368–74.

    Article  PubMed  PubMed Central  Google Scholar 

  59. Mesulam M-M. Primary progressive aphasia–a language-based dementia. N Engl J Med. 2003;349:1535–42.

    Article  CAS  PubMed  Google Scholar 

  60. Percy M, Somerville MJ, Hicks M, Garcia A, Colelli T, Wright E, et al. Risk factors for development of dementia in a unique six-year cohort study. I. An exploratory, pilot study of involvement of the E4 allele of apolipoprotein E, mutations of the hemochromatosis-HFE gene, type 2 diabetes, and stroke. J Alzheimers Dis. 2014;38:907–22.

    CAS  PubMed  Google Scholar 

  61. Po K, Leslie FVC, Gracia N, Bartley L, Kwok JBJ, Halliday GM, et al. Heritability in frontotemporal dementia: more missing pieces? J Neurol. 2014;261:2170–7.

    Article  PubMed  Google Scholar 

Download references


J.S.Y. was funded by the Larry L. Hillblom Foundation (2012-A-015-FEL) and a diversity supplement from the NIA-NIH (P50-AG023501-08S1, PI: Miller, BL). Additional support was provided by NIH grants P50-AG023501 (B.L.M.) and RC1 AG035610 and R01 AG26938 (G.C.), the Larry L. Hillblom Foundation (B.L.M.), and the John Douglas French Alzheimer’s Foundation (G.C.). We acknowledge the support of the NINDS Informatics Center for Neurogenetics and Neurogenomics (P30 NS062691). Samples from the National Cell Repository for Alzheimer’s Disease (NCRAD), which receives government support under a cooperative agreement grant (U24 AG21886) awarded by the National Institute on Aging (NIA), were used in this study. We thank Dr. Jerome Yesavage and Art Noda for technical advice on the decision tree analysis. We thank contributors who collected samples used in this study, as well as patients and their families, whose help and participation made this work possible.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Jennifer S Yokoyama.

Additional information

Competing interests

The authors declare that they have no competing interests.

Authors’ contributions

JSY participated in the design and coordination of the study, conducted statistical analyses and drafted the manuscript. LWB performed the decision tree analysis and drafted the manuscript. RLS performed the risk scoring and carried out genotyping. EK participated in sample processing and carried out genotyping. AK participated in data analysis and sample processing. JHK participated in sample coordination and interpretation of results. BLM participated in sample coordination and interpretation of results. GC conceived of the study, participated in its coordination, and helped to draft the manuscript. All authors read and approved the final manuscript.

Luke W Bonham and Renee L Sears contributed equally to this work.

Additional files

Additional file 1:

Variants and Genotyping Methods.

Additional file 2:

Results for Three Different Versions of the Validation Cohort.

Additional file 3:

Decision Tree Analysis Results.

Additional file 4:

Pathological Diagnoses for Selected Participants.

Rights and permissions

This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yokoyama, J.S., Bonham, L.W., Sears, R.L. et al. Decision tree analysis of genetic risk for clinically heterogeneous Alzheimer’s disease. BMC Neurol 15, 47 (2015).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: