Huntington's Disease and its therapeutic target genes: a global functional profile based on the HD Research Crossroads database

Background Huntington’s disease (HD) is a fatal progressive neurodegenerative disorder caused by the expansion of the polyglutamine repeat region in the huntingtin gene. Although the disease is triggered by the mutation of a single gene, intensive research has linked numerous other genes to its pathogenesis. To obtain a systematic overview of these genes, which may serve as therapeutic targets, CHDI Foundation has recently established the HD Research Crossroads database. With currently over 800 cataloged genes, this web-based resource constitutes the most extensive curation of genes relevant to HD. It provides us with an unprecedented opportunity to survey molecular mechanisms involved in HD in a holistic manner. Methods To gain a synoptic view of therapeutic targets for HD, we have carried out a variety of bioinformatical and statistical analyses to scrutinize the functional association of genes curated in the HD Research Crossroads database. In particular, enrichment analyses were performed with respect to Gene Ontology categories, KEGG signaling pathways, and Pfam protein families. For selected processes, we also analyzed differential expression, using published microarray data. Additionally, we generated a candidate set of novel genetic modifiers of HD by combining information from the HD Research Crossroads database with previous genome-wide linkage studies. Results Our analyses led to a comprehensive identification of molecular mechanisms associated with HD. Remarkably, we not only recovered processes and pathways, which have frequently been linked to HD (such as cytotoxicity, apoptosis, and calcium signaling), but also found strong indications for other potentially disease-relevant mechanisms that have been less intensively studied in the context of HD (such as the cell cycle and RNA splicing, as well as Wnt and ErbB signaling). For follow-up studies, we provide a regularly updated compendium of molecular mechanism, that are associated with HD, at http://hdtt.sysbiolab.eu Additionally, we derived a candidate set of 24 novel genetic modifiers, including histone deacetylase 3 (HDAC3), metabotropic glutamate receptor 1 (GRM1), CDK5 regulatory subunit 2 (CDK5R2), and coactivator 1ß of the peroxisome proliferator-activated receptor gamma (PPARGC1B). Conclusions The results of our study give us an intriguing picture of the molecular complexity of HD. Our analyses can be seen as a first step towards a comprehensive list of biological processes, molecular functions, and pathways involved in HD, and may provide a basis for the development of more holistic disease models and new therapeutics.


Enrichment analyses of the unfiltered set of genes curated in HD Crossroads
For the analysis presented in the main text, a filtered set of genes from HD Crossroads was used. Filtering was performed using the Target Validation Score (TVS). Genes were only included with a corresponding TVS equal or larger than 3.0, i.e. for which a causal relationship has been deduced from experiments in a HD model, or which were indicated in an association or linkage study. In contrast, lower TVS can imply that a gene: has an altered pathway or functional activity in HD (TVS = 2.5); displays altered expression or cellular distribution in HD, or its corresponding protein binds to mutant Htt (TVS= 2.0); is active in HD-relevant brain regions or is linked to HD-associated biological mechanism (TVS=1.0); or is implicated in neurodegeneration based on genome-wide screens (TVS=0). As the inclusion of low-scoring genes may provide indication for additional disease-relevant mechanisms, enrichment analyses for GO categories, KEGG pathways and Pfam families were also carried out on the full list of genes from HD Crossroads. They include 805 genes downloaded from the HD Crossroads web-site (http://www.hdresearchcrossroads.org), as well as 162 genes that were provided directly by the curators of HD Crossroads. The same thresholds for significance and minimum number of genes were applied to these analyses as for those presented in our paper. Our results can be found in the Additional files 2-6.
The lists of GO categories, KEGG pathways and Pfam families that were enriched were subsequently compared to the corresponding lists derived for the filtered set of genes ( Figure   S1). We found that the vast majority of categories, pathways and protein families were enriched, irrespective of the filter for the TVS. In general, a larger number of significant categories, pathways and protein families were detected for the full list of genes from HD Crossroads. A smaller number of categories, pathways or protein families were enriched only for the filtered gene list. Inspection of the GO categories, KEGG pathways and Pfam families uniquely found in either of the two lists revealed the following: A) Most of the results unique to the unfiltered list had a number of genes close to the minimum number required, i.e., close to the threshold selected for the list, e.g. 25 genes for GO categories; B) Results unique to the filtered list generally had a FDR close the threshold of 0.25. Since their statistical significance for enrichment decreased on the unfiltered gene list, they were filtered out. Venn diagrams include the numbers of categories, pathways and protein families that were commonly or uniquely detected as enriched for the full list of genes (blue circles) and the filtered list of genes (orange circle) from HD Crossroads. Thresholds used for the different enrichment analysis are also shown.
Below we list the most significant results that are unique to the enrichment analysis for the unfiltered gene set. It appears that many genes in these categories are related to synaptic functions, but have (as yet) a low TVS, and thus, were excluded from our HD-relevant gene set.

Mapping of unfiltered gene set to chromosomal location with suggestive evidence for linkage with HD age of onset
In addition to the enrichment analysis, we carried out the integration with chromosomal location also for the unfiltered set of genes. Altogether, 79 genes from the unfiltered list were  (Table S1). Figure S2 displays the distributions of both filtered and unfiltered genes from HD Crossroads that are located in the chromosomal regions with suggestive evidence for linkage, with respect to the two studies. The full list of genes with locations in any chromosomal region indicated by the genome-wide linkage studies can be found in the Additional file 7.  Table S1: Candidate list of genetic modifiers. Table of genes from the unfiltered gene list that are located in chromosomal regions for which both genome-wide association studies suggest evidence of linkage with modified age of onset. Figure S2: Number of genes in a chromosomal location with suggestive linkage to HD age of onset. The Venn diagrams depict the distribution of genes located in chromosomal regions with suggestive evidence for linkage from the two studies.

Assessment of potential bias of curated HD-relevant genes towards drug targets
One of the main motivations behind the implementation of HD Crossroads is to provide a platform to select gene targets, whose activity can be manipulated to treat HD. Since small molecules ("drugs") are currently the major vesicle for the therapeutic treatment of the disease, the set of genes included in HD Crossroads might be biased towards known drug targets. This tendency might not be intentional, but may simply result from the fact that drug targeted proteins can more readily be examined for their relevance to HD. To distinguish between the available evidence for relevance and the possibility of targeting a protein by small molecules, HD Crossroads assigns a Drugability Score (DS) in addition to a Target Validation Score (TVS) to all curated genes.
To examine whether a potential bias exists in the database, we searched the list of HDrelevant genes for known drug targets. A set of 1907 known drug targets (indexed by their Entrez Gene ID) was retrieved from the DrugBank (www.drugbank.ca). In total, 197 of the HD-relevant genes were also drug targets (28%). This number is considerably higher than what we would expect by chance (N=53, assuming the total number of genes in human to be 25.000).
We also analyzed the functional composition of these known drug targets. We and found that 168 GO categories were significantly enriched in drug targets (FDR < 0.25; N ≥ 25). Subsequently, we compared this set of categories with the set of categories that we found to be enriched in HD-relevant genes. Of 54 categories enriched in HD-relevant genes, 34 were also enriched in drug targets. Examples are: nucleotide binding (FDR= 2.6 *10 -25 ), kinase activity (FDR= 2.0 *10 -20 ), catalytic activity (FDR= 6.4 *10 -178 ), enzyme binding (FDR= 6.5*10 -3 ) and transporter activity (FDR= 2.0 *10 -2 ). Thus, the significance of these categories is likely to be influenced by the large number of drug targets in the set of HDrelevant genes.
For Pfam families, which were found to be enriched in HD-relevant genes, we found that only the protein kinases family was also enriched in drug targets (FDR= 4.9 *10 -8 ).
Finally, we analyzed the correlation of TVS and DS, by comparing the distribution of DS for the set of genes with the same TVS ( Figure S3). In general, the DS tends to increase with higher TVS. However, it should be noted that the majority of genes included in HD Crossroads have a low DS.

Figure S4: Distribution of HD-associated genes across molecular functions. A)
The piechart shows the distribution of HD-associated genes linked to a reduced set of molecular processes with a minimum number of 25 genes included. GO terms for molecular processes that are significantly enriched (FDR≤0.01) are highlighted in bold. GO terms set in red indicate a corresponding odds ratio ≥ 2.0. Note that the molecular functions are not exclusive, i.e., a gene can be assigned to several functions. B) The bar plot displays odds ratios for enrichment by HD-associated genes in selected molecular functions.