Skip to main content

Thank you for visiting You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Large-scale phenome analysis defines a behavioral signature for Huntington's disease genotype in mice


Rapid technological advances for the frequent monitoring of health parameters have raised the intriguing possibility that an individual's genotype could be predicted from phenotypic data alone. Here we used a machine learning approach to analyze the phenotypic effects of polymorphic mutations in a mouse model of Huntington's disease that determine disease presentation and age of onset. The resulting model correlated variation across 3,086 behavioral traits with seven different CAG-repeat lengths in the huntingtin gene (Htt). We selected behavioral signatures for age and CAG-repeat length that most robustly distinguished between mouse lines and validated the model by correctly predicting the repeat length of a blinded mouse line. Sufficient discriminatory power to accurately predict genotype required combined analysis of >200 phenotypic features. Our results suggest that autosomal dominant disease-causing mutations could be predicted through the use of subtle behavioral signatures that emerge in large-scale, combinatorial analyses. Our work provides an open data platform that we now share with the research community to aid efforts focused on understanding the pathways that link behavioral consequences to genetic variation in Huntington's disease.

This is a preview of subscription content, access via your institution

Relevant articles

Open Access articles citing this article.

Access options

Rent or buy this article

Get just this article for as long as you need it


Prices may be subject to local taxes which are calculated during checkout

Figure 1: Discrimination between wild-type and HET mice.
Figure 2: Performance of the CAG model during training and testing as assessed by regression on predicted versus observed CAG-repeat length.
Figure 3: Prediction of the 'blinded line' by the SVR CAG model (10-month-old mice).
Figure 4: Projection of all Q lines onto the decorrelated ranked feature (DRF) plane defined by Q20 and Q175 lines at 6 months of age.
Figure 5: Top-feature score changes across different CAG-repeat lengths and ages.
Figure 6: LOOCV performance of the age model during training and testing as assessed by regression on the predicted versus observed age.


  1. Denny, J.C. et al. Systematic comparison of phenome-wide association study of electronic medical record data and genome-wide association study data. Nat. Biotechnol. 31, 1102–1110 (2013).

    Article  CAS  Google Scholar 

  2. The Huntington's Disease Collaborative Research Group. A novel gene containing a trinucleotide repeat that is expanded and unstable on Huntington's disease chromosomes. Cell 72, 971–983 (1993).

    Article  Google Scholar 

  3. Langbehn, D.R., Hayden, M.R. & Paulsen, J.S. CAG-repeat length and the age of onset in Huntington disease (HD): a review and validation study of statistical approaches. Am. J. Med. Genet. B. Neuropsychiatr. Genet. 153B, 397–408 (2010).

    Article  CAS  Google Scholar 

  4. Craufurd, D. & Dodge, A. Mutation size and age at onset in Huntington's disease. J. Med. Genet. 30, 1008–1011 (1993).

    Article  CAS  Google Scholar 

  5. Raymond, L.A. et al. Pathophysiology of Huntington's disease: time-dependent alterations in synaptic and receptor function. Neuroscience 198, 252–273 (2011).

    Article  CAS  Google Scholar 

  6. Kanazawa, I. et al. Studies on neurotransmitter markers and striatal neuronal cell density in Huntington's disease and dentatorubropallidoluysian atrophy. J. Neurol. Sci. 70, 151–165 (1985).

    Article  CAS  Google Scholar 

  7. Justice, M.J., Noveroske, J.K., Weber, J.S., Zheng, B. & Bradley, A. Mouse ENU mutagenesis. Hum. Mol. Genet. 8, 1955–1963 (1999).

    Article  CAS  Google Scholar 

  8. Suzuki, K. et al. DRPLA transgenic mouse substrains carrying single copy of full-length mutant human DRPLA gene with variable sizes of expanded CAG repeats exhibit CAG repeat length- and age-dependent changes in behavioral abnormalities and gene expression profiles. Neurobiol. Dis. 46, 336–350 (2012).

    Article  CAS  Google Scholar 

  9. Bruining, H. et al. Behavioral signatures related to genetic disorders in autism. Mol. Autism 5, 11 (2014).

    Article  Google Scholar 

  10. Alexandrov, V., Brunner, D., Hanania, T. & Leahy, E. High-throughput analysis of behavior for drug discovery. Eur. J. Pharmacol. 750, 82–89 (2015).

    Article  CAS  Google Scholar 

  11. Langfelder, P. et al. Integrated genomics and proteomics define huntingtin CAG length-dependent networks in mice. Nat. Neurosci. 19, 623–633 (2016).

    Article  CAS  Google Scholar 

  12. Harper, P.S. Huntington's Disease (W.B. Saunders, London, 1996).

  13. Vonsattel, J.P. & DiFiglia, M. Huntington disease. J. Neuropathol. Exp. Neurol. 57, 369–384 (1998).

    Article  CAS  Google Scholar 

  14. Wojaczynńska-Stanek, K., Adamek, D., Marszal, E. & Hoffman-Zacharska, D. Huntington disease in a 9-year-old boy: clinical course and neuropathologic examination. J. Child Neurol. 21, 1068–1073 (2006).

    Article  Google Scholar 

  15. Jacobsen, J.C. et al. HD CAG-correlated gene expression changes support a simple dominant gain of function. Hum. Mol. Genet. 20, 2846–2860 (2011).

    Article  CAS  Google Scholar 

  16. Menalled, L.B., Sison, J.D., Dragatsis, I., Zeitlin, S. & Chesselet, M.F. Time course of early motor and neuropathological anomalies in a knock-in mouse model of Huntington's disease with 140 CAG repeats. J. Comp. Neurol. 465, 11–26 (2003).

    Article  CAS  Google Scholar 

  17. Menalled, L.B. et al. Comprehensive behavioral and molecular characterization of a new knock-in mouse model of Huntington's disease: zQ175. PLoS One 7, e49838 (2012).

    Article  CAS  Google Scholar 

  18. Farrar, A.M. et al. Cognitive deficits in transgenic and knock-in HTT mice parallel those in Huntington's disease. J. Huntingtons Dis. 3, 145–158 (2014).

    CAS  PubMed  Google Scholar 

  19. White, J.K. et al. Huntingtin is required for neurogenesis and is not impaired by the Huntington's disease CAG expansion. Nat. Genet. 17, 404–410 (1997).

    Article  CAS  Google Scholar 

  20. Dragatsis, I. et al. CAG repeat lengths > or = 335 attenuate the phenotype in the R6/2 Huntington's disease transgenic mouse. Neurobiol. Dis. 33, 315–330 (2009).

    Article  CAS  Google Scholar 

  21. Meghanathan, N., Nagamalai, D. & Chaki, N. Advances in Computing and Information Technology Vol. 177 (Springer, 2013).

  22. Geisser, S. Predictive Inference: An Introduction (Springer Science + Business Media, 1993).

  23. Trueman, R.C., Jones, L., Dunnett, S.B. & Brooks, S.P. Early onset deficits on the delayed alternation task in the Hdh(Q92) knock-in mouse model of Huntington's disease. Brain Res. Bull. 88, 156–162 (2012).

    Article  CAS  Google Scholar 

  24. Trueman, R.C., Brooks, S.P., Jones, L. & Dunnett, S.B. Rule learning, visuospatial function and motor performance in the Hdh(Q92) knock-in mouse model of Huntington's disease. Behav. Brain Res. 203, 215–222 (2009).

    Article  CAS  Google Scholar 

  25. Trueman, R.C., Brooks, S.P., Jones, L. & Dunnett, S.B. Time course of choice reaction time deficits in the Hdh(Q92) knock-in mouse model of Huntington's disease in the operant serial implicit learning task (SILT). Behav. Brain Res. 189, 317–324 (2008).

    Article  Google Scholar 

  26. Brooks, S., Higgs, G., Jones, L. & Dunnett, S.B. Longitudinal analysis of the behavioural phenotype in Hdh(Q92) Huntington's disease knock-in mice. Brain Res. Bull. 88, 148–155 (2010).

    Article  Google Scholar 

  27. Geyer, M.A., Russo, P.V. & Masten, V.L. Multivariate assessment of locomotor behavior: pharmacological and behavioral analyses. Pharmacol. Biochem. Behav. 25, 277–288 (1986).

    Article  CAS  Google Scholar 

  28. Houghten, R.A. et al. Strategies for the use of mixture-based synthetic combinatorial libraries: scaffold ranking, direct testing in vivo, and enhanced deconvolution by computational methods. J. Comb. Chem. 10, 3–19 (2008).

    Article  CAS  Google Scholar 

  29. Roberds, S.L., Filippov, I., Alexandrov, V., Hanania, T. & Brunner, D. Rapid, computer vision-enabled murine screening system identifies neuropharmacological potential of two new mechanisms. Front. Neurosci. 5, 103 (2011).

    Article  CAS  Google Scholar 

  30. Oakeshott, S. et al. Circadian abnormalities in motor activity in a BAC transgenic mouse model of Huntington's disease. PLoS Curr. 3, RRN1225 (2011).

    PubMed  PubMed Central  Google Scholar 

  31. Oakeshott, S. et al. A mixed fixed ratio/progressive ratio procedure reveals an apathy phenotype in the BAC HD and the z_Q175 KI mouse models of Huntington's disease. PLoS Curr. 4, e4f972cffe982c970 (2012).

    Google Scholar 

  32. Wood, N.I., Pallier, P.N., Wanderer, J. & Morton, A.J. Systemic administration of Congo red does not improve motor or cognitive function in R6/2 mice. Neurobiol. Dis. 25, 342–353 (2007).

    Article  CAS  Google Scholar 

  33. Brunner, D., Balci, F. & Ludvig, E.A. Comparative psychology and the grand challenge of drug discovery in psychiatry and neurodegeneration. Behav. Processes 89, 187–195 (2012).

    Article  Google Scholar 

  34. Menalled, L.B. et al. Comprehensive behavioral testing in the R6/2 mouse model of Huntington's disease shows no benefit from CoQ10 or minocycline. PLoS One 5, e9793 (2010).

    Article  Google Scholar 

  35. Huntington Study Group. A randomized, placebo-controlled trial of coenzyme Q10 and remacemide in Huntington's disease. Neurology 57, 397–404 (2001).

  36. Keene, C.D. et al. A patient with Huntington's disease and long-surviving fetal neural transplants that developed mass lesions. Acta Neuropathol. 117, 329–338 (2009).

    Article  Google Scholar 

  37. Bezprozvanny, I. The rise and fall of Dimebon. Drug News Perspect. 23, 518–523 (2010).

    Article  Google Scholar 

  38. Huntington Study Group TREND-HD Investigators. Randomized controlled trial of ethyl-eicosapentaenoic acid in Huntington disease: the TREND-HD study. Arch. Neurol. 65, 1582–1589 (2008).

  39. Subramaniam, S. & Snyder, S.H. Huntington's disease is a disorder of the corpus striatum: focus on Rhes (Ras homologue enriched in the striatum). Neuropharmacology 60, 1187–1192 (2011).

    Article  CAS  Google Scholar 

  40. Williams, A. et al. Novel targets for Huntington's disease in an mTOR-independent autophagy pathway. Nat. Chem. Biol. 4, 295–305 (2008).

    Article  CAS  Google Scholar 

  41. Giorgini, F., Guidetti, P., Nguyen, Q., Bennett, S.C. & Muchowski, P.J. A genomic screen in yeast implicates kynurenine 3-monooxygenase as a therapeutic target for Huntington disease. Nat. Genet. 37, 526–531 (2005).

    Article  CAS  Google Scholar 

  42. Stone, T.W. & Darlington, L.G. Endogenous kynurenines as targets for drug discovery and development. Nat. Rev. Drug Discov. 1, 609–620 (2002).

    Article  CAS  Google Scholar 

  43. Hu, G. & Agarwal, P. Human disease-drug network based on genomic expression profiles. PLoS One 4, e6536 (2009).

    Article  Google Scholar 

  44. Balci, F. et al. High-throughput automated phenotyping of two genetic mouse models of Huntington's disease. PLoS Curr. (2013).

  45. Cichocki, A. & Amari, S. Adaptive Blind Signal and Image Processing: Learning Algorithms and Applications (Wiley, 2006).

  46. Chang, C.-C. & Lin, C.-J. in LIBSVM: A Library for Support Vector Machines Vol. 2 (ACM, 2011).

  47. Steel, R.G.D. & Torrie, J.H. Principles and Procedures of Statistics with Special Reference to the Biological Sciences (McGraw-Hill, 1960).

Download references


We are grateful to E. Leahy, V. Rivera, J. Rivera, K. Cheng, D. Lignore and L. Homa for their assistance. We thank S. Noble, D. Baker and J.-M. Lee for their thoughtful comments. This work was supported by CHDI Foundation, Inc., a nonprofit biomedical research organization exclusively dedicated to developing therapeutics that slow the progression of Huntington's disease. CHDI Foundation conducts research in a number of different ways; for the purposes of this manuscript, all research was conceptualized, planned and directed by all authors listed and conducted under a fee-for-service agreement at the contract research organization PsychoGenics, Inc.

Author information

Authors and Affiliations



V.A. developed and tested all machine learning models described in the paper. D.B., D.H., S.K., J.G., M.E.M., V.W. and L.M. designed the study. L.B.M. identified the spontaneous expansion giving origin to the Q175 line and managed further colony expansion. D.B. and L.B.M. managed all study performance and two-class data analysis. M.E.M. and V.W. created the HdhQ20, HdhQ80, HdhQ92 and HdhQ111 lines. A.K. managed tissue collection and some of the behavioral studies. J.W.-J. and M.C.R. managed animal care. M.M. monitored data collection and two-class analysis. M.M., J.T., E.S. and K.C. performed behavioral studies. I.R. developed the database used to track all information and helped manage data handling. A.S. and M.G. managed the health care and daily maintenance of the animals. I.F. developed the computer vision software. M.K. and A.G. performed and managed, respectively, RNA studies. S.R. provided general management. B.L. managed the breeding of all animals under study. J.A. and J.R. provided feedback on the application of the multi-class model to this study.

Corresponding author

Correspondence to Vadim Alexandrov.

Ethics declarations

Competing interests

V.A., D.B., L.B.M., A.K., J.W.-J., M.M., I.R., M.C.R., J.T., E.S., A.S., M.G., I.F., K.C., M.K., A.G. and S.R. are employees of PsychoGenics.

Integrated supplementary information

Supplementary Figure 1 Quantification of endogenous Htt mRNA by qPCR.

WT: wild type mice; Q20: HET mice from HdhQ20line; Q50: HET mice from HdhQ50 line; Q80: HET mice from HdhQ80line; Q92: HET mice from HdhQ92 line; Q111: HET mice from HdhQ111 line; Q140: HET mice from CAG 140 KI line; Q175: HET mice from zQ175 line; Q50neo: HET mice from HdhQ50neo in line. Asterisk (*) denote significant differences of the levels of mRNA of the endogenous Htt against the WT controls. Number symbols (#) denote significant differences of the levels of mRNA of the endogenouse Htt from HET mice from Q50 line against the HET mice from all the other lines. Data are expressed as mean + S.E.M. n=4-8 per group.

Supplementary Figure 2 Quantification of Htt RNA by RNA-seq.

WT: wild type mice; Q20: HET mice from HdhQ20line; Q50: HET mice from HdhQ50 line; Q80: HET mice from HdhQ80line; Q92: HET mice from HdhQ92 line; Q111: HET mice from HdhQ111 line; Q140: HET mice from CAG 140 KI line; Q175: HET mice from zQ175 line. Number symbols (#) denote significant differences of the Htt mRNA levels compared to WT controls. Asterisks (*) denote significant differences of the Htt mRNA levels compared to Q20. Percents (%) denote significant differences of the Htt mRNA levels compared to Q50. Plus (+) denote significant difference of the Htt mRNA levels compared to Q80. Carets (^) denote significant difference of the Htt mRNA levels compared to Q92 and Q111. Ampersands (&) denote significant difference of the Htt mRNA levels compared to Q140. Htt mRNA levels appeared inversely proportional to Q length but independently of age. Data are expressed as mean + S.E.M.. N of WTs per age=7-48, n of HETs per line per age=6-8. Note: At 6 and 10 months of age, HdhQ50 tissues were examined in a separated study from the one that assessed Q20, Q80, Q92, Q111, Q140 and Q175 HET tissues. Values from HETs animals were normalized to the values of the WT animals run concurrently.

Supplementary Figure 3 Performance of the CAG model, as measured by the coefficient of determination of the regression line (R2) fitted to the predicted versus observed CAG values, as a function of the number of features included.

To reach accuracy larger than 0.8 requires more than 100 features for the 6 and 10 month old mice, and more than double that for the 2 month of age.

Supplementary Figure 4 Overlap of optimally predictive feature sets from various CAG and age models.

A. Overlap of features best modeling Age for WTs and HETs. A substantial decrease of overlap of age-specific features between WT and HETs age is an indication that HD affects ageing. B. Degree of overlap among -features comprising CAG model and Age model for HETs. Age-specific features continue to play important role in all CAG models (over 50% of features in each CAG model are Age-specific features). Also, CAG-specific features change substantially for each Age.

Supplementary Figure 5 Standard protocol phases in PhenoCube.

The Habituation phase (left panel) is employed for the first 6 hrs of the experiment where both doors are open upon entry to any of the 4 corners, allowing for free access to the water bottles. Following the Habituation phase the Alternation phase is employed (right panel), where a mouse would be required to visit one of the 2 assigned active corners and nose poke into the correct recess in order for the door within that recess to open and allow for access to that water bottle for 8 seconds. The green arrow indicates the alternation or switch of the correct corner identity to the adjacent active corner following a correct visit in which reinforcement was available.

Supplementary Figure 6 Difference in feature values and feature ranks (red curve with green squares).

Relative difference (%) between feature values in two different sets is calculated and plotted in the order corresponding to feature ranks together with their ranks varying from 0 to 100.

Supplementary Figure 7 Visualization of binary discrimination in the ranked decorrelated feature space.

The two highest ranked de-correlated features are chosen to form the 2D coordinate plane for visualization purposes. Each dot represents a mouse. Mice from the control group are shown as blue dots and mice from the disease group are plotted in red. The other convenient (from a scale perspective) but equivalent measure derived from the cloud overlap is discrimination probability = 1 – overlap which measures how reliably a classifier can be trained to discriminate between groups A and B above the chance level zero corresponding to 100% overlap and no ability to distinguish the two groups above the chance level whereas 100% meaning the error free discrimination.

Supplementary Figure 8 Calculation of discrimination significance.

Supplementary Figure 9 Mapping a multidimensional dependent variable into a fully equivalent one-dimensional one.

The figure shows the wavelength transformation mapping 2-dimensional CAG/Age pairs to a 1-dimensional dependent variable: normalized to [0..1] range CAG length and Age get uniquely encoded as the values of the R (red) and G (green) channels respectively in the RGB (B=0) colormap notation.

Supplementary Figure 10 Building a l λ CAG/age lookup table.

The figure outlines the procedure for building a reverse map connecting the 1-dimensional values of the dependent variable (λ) back to the corresponding values of (normalized) CAG length and Age, which is achieved by constructing a lookup table. Each pixel (R/G/B=0 pair) is enumerated, i.e. assigned a value from 1 to N in a continuous manner as shown in the figure. The normalized [CAG,Age] (i.e. [R,G]) matrix is traversed from the lower left to the upper right corner along each subsequent diagonal enumerating (assigning next available integer value) every coordinate pair. At the end, the resulting 1D array is also normalized to the [0..1] range. Note that small λ (around 0) in this enumeration scheme correspond to the low CAG and low Age values, whereas high λ (around 1) correspond to high CAG and high Age.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–9 and Supplementary Tables 1–3 (PDF 1679 kb)

Supplementary Code (ZIP 800 kb)

Rights and permissions

Reprints and Permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Alexandrov, V., Brunner, D., Menalled, L. et al. Large-scale phenome analysis defines a behavioral signature for Huntington's disease genotype in mice. Nat Biotechnol 34, 838–844 (2016).

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI:

This article is cited by


Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing