Credit: iStockphoto

In what may prove to be the largest longitudinal study of its kind, the US Department of Veterans Affairs (VA) Million Veteran Program awarded in March its first contract to Personalis of Menlo Park, California, to sequence and analyze the complete genomes of more than 1,000 veterans in the first phase of its multiyear Million Veteran Program (MVP). The sequence data will be added to an already wide variety of health, lifestyle and military-exposure information garnered from self-reporting and medical records that have been maintained in electronic form for at least 15 years. As of mid-May the program had enrolled more than 150,000 veterans and expects to enroll about a million veteran volunteers in total over the next five to seven years, with the VA currently expanding its biobank to accommodate the deluge of clinical samples. “Our goal is to continue to sequence as many MVP samples as our budget will allow, in addition to conducting other types of analyses,” explains VA spokesperson Kendra Schaa. The combination of whole genome sequencing with very extensive electronic medical records will be a major step forward in terms of discovering the genetic and mechanistic bases for disease, says John West, CEO of Personalis, the company contracted to do the initial sequencing and genetic analysis. The genome sequencing and genome-scale genotyping—subcontracted by Personalis to San Diego–based Illumina—will query 100-base reads at a minimum of 30× coverage, generating about a billion reads per genome sample. Neither extensive longitudinal patient studies (such as the Framingham Heart Study) nor whole genome sequencing studies to catalog human variation (e.g., the 1000 Genomes Project) are new. Yet a wave of more recent longitudinal whole genome sequencing studies—including the UK10K Project—relies heavily on a highly standardized, electronically accessible, set of clinical data including a “relatively rich set of molecular phenotypes,” with which to correlate deeper genetic findings mapped to an updated and corrected reference sequence, explains West. Genetic variants—especially the very rare ones—found in these studies to be associated with complex phenotypes may not lead to the immediate cause of a disease, let alone directly to its cure. But “every variant that points to a disease also points to a gene or a biological mechanism,” says Nicole Soranzo of the Wellcome Trust Sanger Institute in Cambridge, UK, co-chair of the cohorts group of the UK10K project. MVP and UK10K (slated to be finished in September) are making their findings freely available to colleagues around the world. “I do expect that we will be mining this data for quite a long time to come,” predicts Soranzo.