Abstract
Rapid technological advances for the frequent monitoring of health parameters have raised the intriguing possibility that an individual's genotype could be predicted from phenotypic data alone. Here we used a machine learning approach to analyze the phenotypic effects of polymorphic mutations in a mouse model of Huntington's disease that determine disease presentation and age of onset. The resulting model correlated variation across 3,086 behavioral traits with seven different CAG-repeat lengths in the huntingtin gene (Htt). We selected behavioral signatures for age and CAG-repeat length that most robustly distinguished between mouse lines and validated the model by correctly predicting the repeat length of a blinded mouse line. Sufficient discriminatory power to accurately predict genotype required combined analysis of >200 phenotypic features. Our results suggest that autosomal dominant disease-causing mutations could be predicted through the use of subtle behavioral signatures that emerge in large-scale, combinatorial analyses. Our work provides an open data platform that we now share with the research community to aid efforts focused on understanding the pathways that link behavioral consequences to genetic variation in Huntington's disease.
This is a preview of subscription content, access via your institution
Relevant articles
Open Access articles citing this article.
-
Phenomic Studies on Diseases: Potential and Challenges
Phenomics Open Access 05 January 2023
-
Spectral phenotyping of embryonic development reveals integrative thermodynamic responses
BMC Bioinformatics Open Access 06 May 2021
-
Deep representation learning of electronic health records to unlock patient stratification at scale
npj Digital Medicine Open Access 17 July 2020
Access options
Subscribe to this journal
Receive 12 print issues and online access
$209.00 per year
only $17.42 per issue
Rent or buy this article
Get just this article for as long as you need it
$39.95
Prices may be subject to local taxes which are calculated during checkout






References
Denny, J.C. et al. Systematic comparison of phenome-wide association study of electronic medical record data and genome-wide association study data. Nat. Biotechnol. 31, 1102–1110 (2013).
The Huntington's Disease Collaborative Research Group. A novel gene containing a trinucleotide repeat that is expanded and unstable on Huntington's disease chromosomes. Cell 72, 971–983 (1993).
Langbehn, D.R., Hayden, M.R. & Paulsen, J.S. CAG-repeat length and the age of onset in Huntington disease (HD): a review and validation study of statistical approaches. Am. J. Med. Genet. B. Neuropsychiatr. Genet. 153B, 397–408 (2010).
Craufurd, D. & Dodge, A. Mutation size and age at onset in Huntington's disease. J. Med. Genet. 30, 1008–1011 (1993).
Raymond, L.A. et al. Pathophysiology of Huntington's disease: time-dependent alterations in synaptic and receptor function. Neuroscience 198, 252–273 (2011).
Kanazawa, I. et al. Studies on neurotransmitter markers and striatal neuronal cell density in Huntington's disease and dentatorubropallidoluysian atrophy. J. Neurol. Sci. 70, 151–165 (1985).
Justice, M.J., Noveroske, J.K., Weber, J.S., Zheng, B. & Bradley, A. Mouse ENU mutagenesis. Hum. Mol. Genet. 8, 1955–1963 (1999).
Suzuki, K. et al. DRPLA transgenic mouse substrains carrying single copy of full-length mutant human DRPLA gene with variable sizes of expanded CAG repeats exhibit CAG repeat length- and age-dependent changes in behavioral abnormalities and gene expression profiles. Neurobiol. Dis. 46, 336–350 (2012).
Bruining, H. et al. Behavioral signatures related to genetic disorders in autism. Mol. Autism 5, 11 (2014).
Alexandrov, V., Brunner, D., Hanania, T. & Leahy, E. High-throughput analysis of behavior for drug discovery. Eur. J. Pharmacol. 750, 82–89 (2015).
Langfelder, P. et al. Integrated genomics and proteomics define huntingtin CAG length-dependent networks in mice. Nat. Neurosci. 19, 623–633 (2016).
Harper, P.S. Huntington's Disease (W.B. Saunders, London, 1996).
Vonsattel, J.P. & DiFiglia, M. Huntington disease. J. Neuropathol. Exp. Neurol. 57, 369–384 (1998).
Wojaczynńska-Stanek, K., Adamek, D., Marszal, E. & Hoffman-Zacharska, D. Huntington disease in a 9-year-old boy: clinical course and neuropathologic examination. J. Child Neurol. 21, 1068–1073 (2006).
Jacobsen, J.C. et al. HD CAG-correlated gene expression changes support a simple dominant gain of function. Hum. Mol. Genet. 20, 2846–2860 (2011).
Menalled, L.B., Sison, J.D., Dragatsis, I., Zeitlin, S. & Chesselet, M.F. Time course of early motor and neuropathological anomalies in a knock-in mouse model of Huntington's disease with 140 CAG repeats. J. Comp. Neurol. 465, 11–26 (2003).
Menalled, L.B. et al. Comprehensive behavioral and molecular characterization of a new knock-in mouse model of Huntington's disease: zQ175. PLoS One 7, e49838 (2012).
Farrar, A.M. et al. Cognitive deficits in transgenic and knock-in HTT mice parallel those in Huntington's disease. J. Huntingtons Dis. 3, 145–158 (2014).
White, J.K. et al. Huntingtin is required for neurogenesis and is not impaired by the Huntington's disease CAG expansion. Nat. Genet. 17, 404–410 (1997).
Dragatsis, I. et al. CAG repeat lengths > or = 335 attenuate the phenotype in the R6/2 Huntington's disease transgenic mouse. Neurobiol. Dis. 33, 315–330 (2009).
Meghanathan, N., Nagamalai, D. & Chaki, N. Advances in Computing and Information Technology Vol. 177 (Springer, 2013).
Geisser, S. Predictive Inference: An Introduction (Springer Science + Business Media, 1993).
Trueman, R.C., Jones, L., Dunnett, S.B. & Brooks, S.P. Early onset deficits on the delayed alternation task in the Hdh(Q92) knock-in mouse model of Huntington's disease. Brain Res. Bull. 88, 156–162 (2012).
Trueman, R.C., Brooks, S.P., Jones, L. & Dunnett, S.B. Rule learning, visuospatial function and motor performance in the Hdh(Q92) knock-in mouse model of Huntington's disease. Behav. Brain Res. 203, 215–222 (2009).
Trueman, R.C., Brooks, S.P., Jones, L. & Dunnett, S.B. Time course of choice reaction time deficits in the Hdh(Q92) knock-in mouse model of Huntington's disease in the operant serial implicit learning task (SILT). Behav. Brain Res. 189, 317–324 (2008).
Brooks, S., Higgs, G., Jones, L. & Dunnett, S.B. Longitudinal analysis of the behavioural phenotype in Hdh(Q92) Huntington's disease knock-in mice. Brain Res. Bull. 88, 148–155 (2010).
Geyer, M.A., Russo, P.V. & Masten, V.L. Multivariate assessment of locomotor behavior: pharmacological and behavioral analyses. Pharmacol. Biochem. Behav. 25, 277–288 (1986).
Houghten, R.A. et al. Strategies for the use of mixture-based synthetic combinatorial libraries: scaffold ranking, direct testing in vivo, and enhanced deconvolution by computational methods. J. Comb. Chem. 10, 3–19 (2008).
Roberds, S.L., Filippov, I., Alexandrov, V., Hanania, T. & Brunner, D. Rapid, computer vision-enabled murine screening system identifies neuropharmacological potential of two new mechanisms. Front. Neurosci. 5, 103 (2011).
Oakeshott, S. et al. Circadian abnormalities in motor activity in a BAC transgenic mouse model of Huntington's disease. PLoS Curr. 3, RRN1225 (2011).
Oakeshott, S. et al. A mixed fixed ratio/progressive ratio procedure reveals an apathy phenotype in the BAC HD and the z_Q175 KI mouse models of Huntington's disease. PLoS Curr. 4, e4f972cffe982c970 (2012).
Wood, N.I., Pallier, P.N., Wanderer, J. & Morton, A.J. Systemic administration of Congo red does not improve motor or cognitive function in R6/2 mice. Neurobiol. Dis. 25, 342–353 (2007).
Brunner, D., Balci, F. & Ludvig, E.A. Comparative psychology and the grand challenge of drug discovery in psychiatry and neurodegeneration. Behav. Processes 89, 187–195 (2012).
Menalled, L.B. et al. Comprehensive behavioral testing in the R6/2 mouse model of Huntington's disease shows no benefit from CoQ10 or minocycline. PLoS One 5, e9793 (2010).
Huntington Study Group. A randomized, placebo-controlled trial of coenzyme Q10 and remacemide in Huntington's disease. Neurology 57, 397–404 (2001).
Keene, C.D. et al. A patient with Huntington's disease and long-surviving fetal neural transplants that developed mass lesions. Acta Neuropathol. 117, 329–338 (2009).
Bezprozvanny, I. The rise and fall of Dimebon. Drug News Perspect. 23, 518–523 (2010).
Huntington Study Group TREND-HD Investigators. Randomized controlled trial of ethyl-eicosapentaenoic acid in Huntington disease: the TREND-HD study. Arch. Neurol. 65, 1582–1589 (2008).
Subramaniam, S. & Snyder, S.H. Huntington's disease is a disorder of the corpus striatum: focus on Rhes (Ras homologue enriched in the striatum). Neuropharmacology 60, 1187–1192 (2011).
Williams, A. et al. Novel targets for Huntington's disease in an mTOR-independent autophagy pathway. Nat. Chem. Biol. 4, 295–305 (2008).
Giorgini, F., Guidetti, P., Nguyen, Q., Bennett, S.C. & Muchowski, P.J. A genomic screen in yeast implicates kynurenine 3-monooxygenase as a therapeutic target for Huntington disease. Nat. Genet. 37, 526–531 (2005).
Stone, T.W. & Darlington, L.G. Endogenous kynurenines as targets for drug discovery and development. Nat. Rev. Drug Discov. 1, 609–620 (2002).
Hu, G. & Agarwal, P. Human disease-drug network based on genomic expression profiles. PLoS One 4, e6536 (2009).
Balci, F. et al. High-throughput automated phenotyping of two genetic mouse models of Huntington's disease. PLoS Curr. http://dx.doi.org/10.1371/currents.hd.124aa0d16753f88215776fba102ceb29 (2013).
Cichocki, A. & Amari, S. Adaptive Blind Signal and Image Processing: Learning Algorithms and Applications (Wiley, 2006).
Chang, C.-C. & Lin, C.-J. in LIBSVM: A Library for Support Vector Machines Vol. 2 (ACM, 2011).
Steel, R.G.D. & Torrie, J.H. Principles and Procedures of Statistics with Special Reference to the Biological Sciences (McGraw-Hill, 1960).
Acknowledgements
We are grateful to E. Leahy, V. Rivera, J. Rivera, K. Cheng, D. Lignore and L. Homa for their assistance. We thank S. Noble, D. Baker and J.-M. Lee for their thoughtful comments. This work was supported by CHDI Foundation, Inc., a nonprofit biomedical research organization exclusively dedicated to developing therapeutics that slow the progression of Huntington's disease. CHDI Foundation conducts research in a number of different ways; for the purposes of this manuscript, all research was conceptualized, planned and directed by all authors listed and conducted under a fee-for-service agreement at the contract research organization PsychoGenics, Inc.
Author information
Authors and Affiliations
Contributions
V.A. developed and tested all machine learning models described in the paper. D.B., D.H., S.K., J.G., M.E.M., V.W. and L.M. designed the study. L.B.M. identified the spontaneous expansion giving origin to the Q175 line and managed further colony expansion. D.B. and L.B.M. managed all study performance and two-class data analysis. M.E.M. and V.W. created the HdhQ20, HdhQ80, HdhQ92 and HdhQ111 lines. A.K. managed tissue collection and some of the behavioral studies. J.W.-J. and M.C.R. managed animal care. M.M. monitored data collection and two-class analysis. M.M., J.T., E.S. and K.C. performed behavioral studies. I.R. developed the database used to track all information and helped manage data handling. A.S. and M.G. managed the health care and daily maintenance of the animals. I.F. developed the computer vision software. M.K. and A.G. performed and managed, respectively, RNA studies. S.R. provided general management. B.L. managed the breeding of all animals under study. J.A. and J.R. provided feedback on the application of the multi-class model to this study.
Corresponding author
Ethics declarations
Competing interests
V.A., D.B., L.B.M., A.K., J.W.-J., M.M., I.R., M.C.R., J.T., E.S., A.S., M.G., I.F., K.C., M.K., A.G. and S.R. are employees of PsychoGenics.
Integrated supplementary information
Supplementary Figure 1 Quantification of endogenous Htt mRNA by qPCR.
WT: wild type mice; Q20: HET mice from HdhQ20line; Q50: HET mice from HdhQ50 line; Q80: HET mice from HdhQ80line; Q92: HET mice from HdhQ92 line; Q111: HET mice from HdhQ111 line; Q140: HET mice from CAG 140 KI line; Q175: HET mice from zQ175 line; Q50neo: HET mice from HdhQ50neo in line. Asterisk (*) denote significant differences of the levels of mRNA of the endogenous Htt against the WT controls. Number symbols (#) denote significant differences of the levels of mRNA of the endogenouse Htt from HET mice from Q50 line against the HET mice from all the other lines. Data are expressed as mean + S.E.M. n=4-8 per group.
Supplementary Figure 2 Quantification of Htt RNA by RNA-seq.
WT: wild type mice; Q20: HET mice from HdhQ20line; Q50: HET mice from HdhQ50 line; Q80: HET mice from HdhQ80line; Q92: HET mice from HdhQ92 line; Q111: HET mice from HdhQ111 line; Q140: HET mice from CAG 140 KI line; Q175: HET mice from zQ175 line. Number symbols (#) denote significant differences of the Htt mRNA levels compared to WT controls. Asterisks (*) denote significant differences of the Htt mRNA levels compared to Q20. Percents (%) denote significant differences of the Htt mRNA levels compared to Q50. Plus (+) denote significant difference of the Htt mRNA levels compared to Q80. Carets (^) denote significant difference of the Htt mRNA levels compared to Q92 and Q111. Ampersands (&) denote significant difference of the Htt mRNA levels compared to Q140. Htt mRNA levels appeared inversely proportional to Q length but independently of age. Data are expressed as mean + S.E.M.. N of WTs per age=7-48, n of HETs per line per age=6-8. Note: At 6 and 10 months of age, HdhQ50 tissues were examined in a separated study from the one that assessed Q20, Q80, Q92, Q111, Q140 and Q175 HET tissues. Values from HETs animals were normalized to the values of the WT animals run concurrently.
Supplementary Figure 3 Performance of the CAG model, as measured by the coefficient of determination of the regression line (R2) fitted to the predicted versus observed CAG values, as a function of the number of features included.
To reach accuracy larger than 0.8 requires more than 100 features for the 6 and 10 month old mice, and more than double that for the 2 month of age.
Supplementary Figure 4 Overlap of optimally predictive feature sets from various CAG and age models.
A. Overlap of features best modeling Age for WTs and HETs. A substantial decrease of overlap of age-specific features between WT and HETs age is an indication that HD affects ageing. B. Degree of overlap among -features comprising CAG model and Age model for HETs. Age-specific features continue to play important role in all CAG models (over 50% of features in each CAG model are Age-specific features). Also, CAG-specific features change substantially for each Age.
Supplementary Figure 5 Standard protocol phases in PhenoCube.
The Habituation phase (left panel) is employed for the first 6 hrs of the experiment where both doors are open upon entry to any of the 4 corners, allowing for free access to the water bottles. Following the Habituation phase the Alternation phase is employed (right panel), where a mouse would be required to visit one of the 2 assigned active corners and nose poke into the correct recess in order for the door within that recess to open and allow for access to that water bottle for 8 seconds. The green arrow indicates the alternation or switch of the correct corner identity to the adjacent active corner following a correct visit in which reinforcement was available.
Supplementary Figure 6 Difference in feature values and feature ranks (red curve with green squares).
Relative difference (%) between feature values in two different sets is calculated and plotted in the order corresponding to feature ranks together with their ranks varying from 0 to 100.
Supplementary Figure 7 Visualization of binary discrimination in the ranked decorrelated feature space.
The two highest ranked de-correlated features are chosen to form the 2D coordinate plane for visualization purposes. Each dot represents a mouse. Mice from the control group are shown as blue dots and mice from the disease group are plotted in red. The other convenient (from a scale perspective) but equivalent measure derived from the cloud overlap is discrimination probability = 1 – overlap which measures how reliably a classifier can be trained to discriminate between groups A and B above the chance level zero corresponding to 100% overlap and no ability to distinguish the two groups above the chance level whereas 100% meaning the error free discrimination.
Supplementary Figure 9 Mapping a multidimensional dependent variable into a fully equivalent one-dimensional one.
The figure shows the wavelength transformation mapping 2-dimensional CAG/Age pairs to a 1-dimensional dependent variable: normalized to [0..1] range CAG length and Age get uniquely encoded as the values of the R (red) and G (green) channels respectively in the RGB (B=0) colormap notation.
Supplementary Figure 10 Building a l λ CAG/age lookup table.
The figure outlines the procedure for building a reverse map connecting the 1-dimensional values of the dependent variable (λ) back to the corresponding values of (normalized) CAG length and Age, which is achieved by constructing a lookup table. Each pixel (R/G/B=0 pair) is enumerated, i.e. assigned a value from 1 to N in a continuous manner as shown in the figure. The normalized [CAG,Age] (i.e. [R,G]) matrix is traversed from the lower left to the upper right corner along each subsequent diagonal enumerating (assigning next available integer value) every coordinate pair. At the end, the resulting 1D array is also normalized to the [0..1] range. Note that small λ (around 0) in this enumeration scheme correspond to the low CAG and low Age values, whereas high λ (around 1) correspond to high CAG and high Age.
Supplementary information
Supplementary Text and Figures
Supplementary Figures 1–9 and Supplementary Tables 1–3 (PDF 1679 kb)
Rights and permissions
About this article
Cite this article
Alexandrov, V., Brunner, D., Menalled, L. et al. Large-scale phenome analysis defines a behavioral signature for Huntington's disease genotype in mice. Nat Biotechnol 34, 838–844 (2016). https://doi.org/10.1038/nbt.3587
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/nbt.3587
This article is cited by
-
Phenomic Studies on Diseases: Potential and Challenges
Phenomics (2023)
-
Spectral phenotyping of embryonic development reveals integrative thermodynamic responses
BMC Bioinformatics (2021)
-
Deep representation learning of electronic health records to unlock patient stratification at scale
npj Digital Medicine (2020)
-
Precision epidemiology for infectious disease control
Nature Medicine (2019)
-
Progress in developing transgenic monkey model for Huntington’s disease
Journal of Neural Transmission (2018)