Question of the Year

The sequencing of the equivalent of an entire human genome for $1,000 has been announced as a goal for the genetics community, and new technologies suggest that reaching this goal is a matter of when, rather than if. What then? In celebration of its upcoming 15th anniversary, Nature Genetics is asking prominent geneticists to weigh in on this question: what would you do if this sequencing capacity were available immediately? This new Nature Genetics 'Question of the Year' website, sponsored by Applied Biosystems, will reveal their answers. The website will be updated monthly, so check back regularly to get a glimpse of the future of genetics.

Sponsor: Applied Biosystems

NG: What would you do if it became possible to sequence the equivalent of a full human genome for only $1,000?


John Quackenbush

John Quackenbush (Dana-Farber Cancer Institute): the unforeseeable revolution

Countless times I tried to sit down and answer the question and could not come up with something I felt was realistic. Will technology enabling a $1,000 genome make genotyping obsolete? Will it replace expression arrays? Will it usher in a new era of environmental genomics, including medical applications such as sequencing the moth or gut microbiome? The answer to all of these questions is, quite possibly, yes. But what could I add to the discussion that has already been presented on this website? Scientifically, I can think of dozens of applications, but they are all extensions of things we could already do, just on a grander scale. I realized the situation I was facing is similar to what someone in 1990 would have faced if asked to speculate on what one would do if oligonucleotide synthesis fell in cost to less than $0.05 per base. I doubt that anyone could have foreseen the range of applications, from RNAi to synthetic genomics, that would have been enabled. Similarly, here, the exciting things will be what the posters here cannot yet imagine. So what would I really do? The answer is simple, I think, and reflects an interest in personalized genomics and medicine. I would sequence myself, my wife, and my son. Given the myriad ways that information could be misused, I would take every measure to keep the data confidential. But I would set up a home genome browser and look for variants linked to disease, trying to figure out how to best improve the chances of an extended, healthy life, either through lifestyle modifications, prophylactic medications, or increased vigilance in screening. With a $1,000 genome, not only will scientists be interested in applications, but so will our families, friends, and neighbors. The implications of this technology go far beyond scientific applications and open a whole host of questions about how the broader public will view and use the technology — as well as how it might be misused. And as with scientific applications, the most interesting social implications will come up with uses we cannot at this time anticipate.

(posted 6 February 2008)

Walter Bodmer

Walter Bodmer (Oxford University): a broad genetic survey of Homo sapiens

My first priority would be to sequence, say, 100,000 presumably healthy individuals carefully selected to represent the major ethnic groups of the present day human population. The selection should follow largely geographical principles but emphasize, as we are doing in our "Peoples of the British Isles" project, an approach that attempts to minimize the effects of comparatively recent migration and admixture. We do this in the UK by sampling in rural areas and from people all of whose grandparents come from the same area. The results would provide a fascinating snapshot of the current genetic structure of Homo sapiens and, of course, a remarkable record of the pattern and extent of genetic variation in our species. The first challenge of analysis would be to identify all those changes that have likely functional consequences on disease susceptibility. How many significant rare variants influencing our pattern of disease incidence does each of us, on average, carry? This, I believe, is the real route to the much-discussed 'personalized medicine' of the future. Beyond that, we would gain further deep insights into the origins and inter-relationships of different human populations, and how this relates to the archeological and historical record.

(posted 1 October 2007)

Jun Yu

Jun Yu (Beijing Institute of Genomics): so many genomes, so little time

I have never doubted that this will happen, but the question remains: when? We are all racing against time, and we must plan ahead and prioritize things we actually want to do. We may eventually sequence every living organism on earth, including ourselves (though maybe not everyone), or even those found in space in the future. But will we or our children benefit from this DNA sequencing mania anytime soon? We need to have a list of every life-threatening disease and the people who are affected by them. We have to be able to interpret all the variations discovered from these patients and from a control population, in combination with clinical observations. We need to have a list of representative genomes from all living things for future sequencing efforts, and perhaps each nation needs at least one such list. In a patriotic spirit, perhaps each nation should have a plan to sequence the genomes of its own species of national interest, including major crops, vegetables, fruit trees, invasive species and endangered species. Should we form a 'Genome United' internationally to prioritize the items on each list or to coordinate some of the sequencing efforts? Should we set a standard for each finished genome? A collective bargain based on a concerted effort may just make the $1,000-per-genome dream an immediate reality!

(posted 1 October 2007)

Tayfun Ozcelik

Tayfun Ozcelik (Bilkent University, Turkey): farewell to abnormal genes?

According to the Online Mendelian Inheritance in Man (OMIM) statistics for September 2007, there are at least 1,600 mendelian and an additional 2,123 mendelian-suspected phenotypes for which the molecular basis is unknown. Taking together these phenotypes and those that have been characterized at the molecular level, 3-8% of married couples are at high recurrence risk of producing offspring affected by a mendelian disorder. Let us not forget that about 14% of the world's population and 19% of all births are in areas where consanguineous marriages occur by choice rather than by accident, and where the recurrence risk figures could be significantly higher. It is likely that the overwhelming majority of—if not all—single-gene disorders will be characterized at the molecular level in the near future. Furthermore, with the Personal Exome Project, carrier status determination and prenatal diagnosis will be feasible for $10. I can see history repeating itself. Just like the worldwide eradication of infectious diseases such as smallpox, or effective prevention of certain genetic diseases such as thalassemias in some parts of the Mediterranean basin, we will probably say goodbye to abnormal genes, at least the autosomal recessive ones. However, like a candle in the wind, their legend will probably continue long after they are burned out. And no doubt this process will be painful. Therefore, I would set up task forces at the guiding institutions of the world, such as the WHO (World Health Organization), UNESCO (United Nations Educational, Scientific and Cultural Organization), UNICEF (United Nations Children�s Fund), OECD (Organization for Economic Co-operation and Development), professional organizations such as the American Society for Human Genetics (ASHG), the European Society for Human Genetics (ESHG) and, last but not least, the Ethical, Legal and Social Issues Research Program at the National Human Genome Research Institute (ELSI), to best exploit the sequencing capacity of the $1,000 human genome. After all, it will require the joint efforts of scientists, health care professionals and policy makers to transform personal genomic medicine into a birthright and safely navigate society through the uncharted waters of twenty-first century biomedical economics.

(posted 1 October 2007)


Samir K. Brahmachari

Samir K. Brahmachari (Institute of Genomics & Integrative Biology, New Delhi, India): A boon for countries with rich genetic resources

If $1,000 genome sequencing becomes reality, what can we do in developing countries like India? Indian ethnic groups comprising about one-sixth of the humanity, with large family sizes, low mobility, high levels of endogamy and each group inhabiting a homogeneous environment, provide a unique resource for complex disease analysis. We would undertake full genome sequencing of (i) 20 samples from 25 distinctly different populations, to map genetic variation; (ii) 10 pairs of identical twins (five male and five female) from each of these 25 populations, to map post-zygotic repeat instability and copy-number variation; (iii) 1,000 naturally aborted fetuses (ethically acceptable as research subjects in India) that do not show cytogenetically detectable chromosomal abnormalities, in order to identify lethal deletions, insertions, duplications or other mutations, thereby identifying genomic regions and pathways essential for development; and (iv) 500 healthy (octogenarian) individuals taking no medication, drawn from 25 different ethnic groups, who will serve as 'super-normal' controls in case-control studies to identify disease genes. All the variation observed in this set is to be treated as neutral variation for healthy living. Meanwhile, most of a developing country's economy depends on agricultural productivity. How can we use this technological boon in genomics to increase productivity? We can carry out metagenomic sequencing (profiling) of microbial diversity from multiple fields that have variable productivity despite being planted with identical seeds. Categorizing metagenomic signatures for field selection of suitable crops to yield higher productivity, or designing approaches for the planting of multiple crops, will be a new challenge. On a lighter note, I will invest in stocks of computer hardware companies involved in the data storage business!

(posted 4 September 2007)

Michael Stratton

Michael Stratton (Wellcome Trust Sanger Institute): the cancer genome

There is a vast reservoir of human genome sequence variation that is currently hidden. In an adult human there are approximately 1014 cells. During the series of mitoses that take place between the fertilized egg and any single cell in the adult human, DNA replication errors occur such that the genome sequence of any cell differs from that of almost all others. Remarkably, within a single human being the genome is probably somatically mutated many times over at every one of its three billion bases. The number and types of somatic mutations in a cell will depend on the length of its mitotic lineage, the mutagenic exposures it has experienced, and cell-specific differences in DNA repair. Most somatic mutations are innocuous, but there are some notable exceptions. Cancers are single-cell clones that expand uncontrollably because of the unlucky confluence of somatic mutations in multiple cancer genes. The $1,000 genome would allow the genome sequences of 10,000 or more human cancers of various classes (and normal DNAs from the same individuals) to be generated, providing us with the full compendium of somatic point mutations, copy-number changes and rearrangements. This catalogue will include all the 'driver' mutations in each cancer and the 'passengers' that bear imprints of past mutagenic processes. Ultimately, the $1,000 genome will similarly permit investigation of the numbers and signatures of somatic mutations in individual normal cells, revealing how these differ between tissue types, with age and with lifestyle, providing fundamental insights into cellular biology and aging.

(posted 4 September 2007)


James R. Lupski

James R. Lupski (Baylor College of Medicine): new mutations genome-wide

I would apply the $1,000 human genome sequence to the study of the transmission of genetic information to address the following questions. First, what is the frequency of de novo SNPs? Second, what is the frequency of de novo copy number variants (CNVs)? Third, what is the level or degree of 'noise' in genetic information transfer? Initial experiments would obtain genomic sequences, and simultaneously apply a whole-genome array that robustly measures CNV, on two dozen trios: 12 with a male child and 12 with a female child. Comparisons between the child and parents should enable assessment of new mutations genome-wide, including both SNPs and CNVs. Then I would study one dozen of the largest three-generation CEPH families (including parents and both sets of grandparents) with the same approach (i.e., genomic sequences and whole-genome arrays to examine for SNPs and CNVs), being sure NOT to use cell lines so as to avoid genetic changes introduced as artifacts of tissue culture. For each individual studied, I would compare nucleotide sequences with array information to determine exact breakpoints of all de novo CNVs so as to infer mechanism from the products of recombination. Finally, when I was convinced that single-genome amplification was a robust procedure that introduced minimal artifacts, I would sequence the haploid genome of 100 sperm from each of ten men in whom the diploid genomic sequence was determined.

(posted 1 August 2007)

Peter Little

Peter Little (University of New South Wales, Sydney): on beauty and happiness

The $1,000 genome is at a price that almost invites frivolity. Resisting the impulse for a moment, I'd put all my support behind non-frivolous applications: cohorts with complex genetic disorders, population sequencing of indigenous peoples to understand human origins, and my own research area of expression genetics through mRNA sequencing. In the area of human origins, I'd put much more focus upon studies in Asia and Australia, simply because the convenient succession of ancient and modern human morphology seen in Europe with Neanderthals and Sapiens is not as clear in Asia. DNA analysis may be the only method for resolving these difficulties, but only if we can navigate the sociopolitical minefield that two centuries of history in Australia have created. After this, I'd turn to frivolity—but where to start? For me, this is easy. I'd use my $1,000 genome capacity to study the genetics of beauty and of happiness. I'd like to use the complete knowledge of DNA variation to try to inform us, if only a little, about the heritability of this branch of morphological development and of personality and its interaction with culture. After all, spending some of our money on areas that are close to the hearts of all humans, and not just to us scientists and clinicians, is surely a reasonable investment of taxpayers' money. Frivolous? Perhaps, but not very.

(posted 1 August 2007)


Rasmus Nielsen

Rasmus Nielsen (University of Copenhagen): understanding pancreatic cancer

We are indeed approaching the time of the $1,000 genome very rapidly. The cost of materials for human resequencing using one of the new sequencing platforms may be as low as $10,000-$50,000 today. An immediate exciting population genetic application is the sequencing of individuals from a diverse panel to solve some of the many unresolved questions regarding human demographic ancestry and recent evolution. Today we are struggling with the fact that much of the available genome-wide data in humans has been obtained through a SNP discovery process that biases the inferences we make and complicates population genetic inferences. The directly sequenced data would not suffer from these problems. However, the most exciting applications are obviously in disease genetics and, especially, in personalized medicine. Personally, I would like to sequence individuals from families with familial pancreatic cancer for the purpose of uncovering the genetic factors underlying this disease. It is a deadly disease which has killed several members of my family. Although it may not change treatment options, I would very much like to know if I carry the genetic risk factors for this disease.

(posted 2 July 2007)

Michal Pravenec

Michal Pravenec (Institute of Physiology, Czech Academy of Sciences): personalized medicine for treating common diseases?

There is no doubt that the ability to sequence the human and other complex genomes (for instance, animal models of common diseases) at a price of ~$1,000 would be very important for all fields of basic biomedical research. Both genome-wide association studies in thousands of fully sequenced individuals and studies of animal models will identify new genes and pathways as potential targets for therapy and will provide new insights into the pathogenesis of common diseases. On the other hand, responsible genetic determinants identified in association studies are applicable on a population level rather than on the level of the individual patient because many mutant alleles might have discernable effects only in specific genetic backgrounds and/or under special environmental conditions. In other words, in multifactorial common diseases, there is no direct relationship between genotype at specific loci and disease phenotype, a fact that will not be changed by our ability to make genome sequencing available for individuals. In addition, the risk associated with individual genetic variants identified in genome-wide association studies is usually too modest to be considered a serious risk by individuals with sequenced genomes. Accordingly, the clinical impact of sequencing genomes with the aim to diagnose, treat or prevent common diseases in individual patients may be limited.

(posted 2 July 2007)

Elizabeth M.C. Fisher

Elizabeth M.C. Fisher (University College London): investing in existing mouse resources

I'm going to enjoy reading the papers from the population geneticists, the evolutionary biologists and others as a result of all these genome sequence comparisons, and I'm going to enjoy the unexpected insights that arise from a $1,000 genome. But for my use of this technology and these bargain prices, I'm going to take the money and invest in the existing mouse genetics resources. I want to use the dosh (dosh: British slang for money) to give me instant access to allelic arrays of point mutations in any gene or genomic region that I'm interested in. How? ENU (N-ethyl-N-nitrosourea) is an extremely powerful mutagen currently being used in many projects around the world to produce mice carrying random point mutations. All these public (and private) collections also freeze mouse sperm samples from animals that have been treated with ENU, and each sample contains tens of individual point mutations across the genome—in genes, regulatory regions, microRNAs, non-coding conserved regions—everything. So I can start to use the tens of thousands of existing frozen samples as a massive archive to put online and make publicly available so that we can simply pick out genotypes of interest. In fact, for a mere $10 million, I could sequence 10,000 of the currently available sperm samples, which might give me as many as 5 million different point mutations. And while there are certainly ENU mutation hotspots, I still think I stand a good chance of getting informative mutations in any genomic region of interest. What a bargain for anyone studying genome function, disease models, gene regulation—all the stuff usually published by the Nature journals!

(posted 2 July 2007)


Takeshi Gojobori

Takashi Gojobori (National Institute of Genetics, Japan): parent-child pair genome sequencing

To understand the mutation spectrum over the entire genome is one of the essential tasks for geneticists, because newly arisen mutations (including nucleotide substitutions and other genomic structural changes) are the only source of genetic diversity. The current advances in genome sequencing technology enable us to sequence the complete genomes of a given child and its biological parent, which should reveal the mutation spectrum for nucleotide substitution, recombination, duplication, inversion, transposition and others. For humans, sequencing the genomes of a son and his biological father can easily reveal how newly arisen mutations are distributed over the Y chromosome. Moreover, many such parent-child pairs may be good targets for genome sequencing in order to learn the mutation spectrum over the autosomes and X chromosome. Likewise, sequencing the genome of sperm produced from a particular male may uncover the mutation spectrum due to spermatogenesis. (In the case of humans, of course, complete genome sequence information should be handled with maximum care for privacy and ethical considerations.) This is easily extended to other organisms such as mice and many egg-laying fish to sequence the genome of eggs formed by a given female, which may clarify the mutation spectrum due to oogenesis. Thus, I would propose the initiation of parent-child pair genome sequencing.

(posted 1 June 2007)

Richard Cotton

Richard Cotton (Genomic Disorders Research Centre): the variome

The variation that has been uncovered so far with current technology has not been systematically collected and indexed with its associated phenotypes. Thus, instant access to this variation is not possible for researchers and clinicians. If a complete human genome could be sequenced for $1,000, I believe there should be a massive effort to build structures to receive the variation so its meaning can be interpreted and used. Thus, there should be documentation of all variation and its phenotypes, effects, studies, etc. available for quick access. This is the vision of the Human Variome Project (see Editorial, Nature Genetics April 2007 and

(posted 1 June 2007)

Emmanouil T. Dermitzakis

Emmanouil T. Dermitzakis (Wellcome Trust Sanger Institute): social genome sequencing

The potential of complete sequencing of a human being�s DNA for $1,000 makes your imagination go wild. It is already very surprising that after many months of this project, there are still many new ideas! I would love to sequence thousands of individuals from very diverse regions of the world, perform digital interrogation of expression or replace microarrays with sequencing. All of these are really close to my heart and probably will be among the first things I do when this is possible. But there are some other ideas that have been in my head for quite a while. Having been raised in Greece and having spent my summers in small villages on the mountains of the mainland and in Crete, I have seen interesting population dynamics and behaviors in the interaction between people in adjacent villages and communities. I have also seen very high frequencies of complex and monogenic diseases, most likely due to the nature and size of the founding populations. So if the $1,000 genome were possible, I would spend $5-10 million to sample genetic variation of complete village complexes. This will allow the identification of all variants that segregate in such communities as well as demographic and behavioral patterns and patterns of marriage choices and will help explain whether there is any genetic basis for this. It will also elucidate some of the disease alleles with strong effects that segregate in low frequencies elsewhere but in high frequencies in these communities. This will be a very exciting opportunity to pair genomic technologies with disease genetics and social behavior and tease apart some of the undiscovered genetic interactions in our social life. Of course, I realize that there are substantial ethical issues behind such a project, and the complexity of the analysis is very high. In some sense this is a dream project, but when I was taught genetics in 1992 at the University of Crete, any complete mammalian genome was a dream.

(posted 1 June 2007)

Hong-Xuan Lin

Hong-Xuan Lin (Shanghai Institute of Plant Physiology and Ecology): a new green revolution

There are 21 wild species in the Oryza genus. It is well known that there are many beneficial genes (or quantitative trait loci (QTLs)) hidden in wild rice relatives, and these genes should be exploited. If the entire human genome can be sequenced for $1,000, the dream of a new green revolution will be really and truly achieved. First, my research group would sequence the entire genomes of 2,000 recombinant inbred lines (RILs) derived from a cross between wild rice relatives and the cultivated species (O. sativa). Because the size of the rice genome is approximately seven times smaller than that of the human genome, the total cost would be approximately $280,000 (assuming one rice genome can be sequenced for $140). In addition, we would repeatedly measure the phenotypes of various traits in the RIL population in multiple environments under controlled conditions, including biotic stresses (disease and insect), abiotic stresses (salinity, drought and cold), etc. Further, we would perform high-resolution genetic linkage analysis between two huge data sets that would be obtained by phenotype and genotype analyses (derived from the entire genomic sequences of 2,000 RILs), and this could lead to the rapid discovery and identification of many genes or QTLs (such as those for yield, tolerance to abiotic stresses, disease and insect resistance and other desirable traits) from the wide range of allelic variations. This wide range is due to the fact that wild rice plants have to adapt to different environmental conditions. This would enable us to efficiently pyramid various beneficial genes (or QTLs) of wild rice relatives involved in important agronomic traits into the currently cultivated rice by using marker-assisted selection to produce a super-rice variety with many improved traits for the new green revolution.

(posted 1 June 2007)

Michael D. Rhodes

Michael D. Rhodes (Applied Biosystems): the Gaia Genome Project

When children are told they can have any candy they want, they leap from possibility to possibility. It is with the same sense of glee that I look at the applications of reduced-cost sequencing. It is obvious that medical applications and understanding the genetics of human health are going to be important, as seen from previous replies, but what I would also love to see is the Gaia Genome Project, with the long-term goal of sequencing all known species. Even with the number of the species currently completed, we know little about the genomic diversity of the entire planet. For many species there is a pressing need, as they face extinction owing to an ever-growing human population and ongoing climate change. At the very least, we should capture the genomic sequence of all of these organisms before they become extinct. The benefits of the new technology ensure that sequencing will not be the bottleneck, but rather the collecting and cataloguing of organisms. Because the cost of sequencing will be so low, individual researchers will be able to collect the DNA of their organisms of interest and submit them to central sequencing services. Genome assembly is one of the challenges of de novo sequencing, but this should be overcome as more organisms are sequenced, with closely related organisms facilitating the assembly of their brethren. Depending on the technological solution to the $1,000 genome, the amount of DNA required for a complete sequence may be the contents of a few cells. If this can be driven down to a single cell, then bacterial diversity can be truly sampled. If the 6-Gb diploid human genome costs a mere $1,000, a 4-Mb haploid genome would cost $0.67. The true diversity of the world, most of which is in single-celled organisms, will be accessible to science. Sequencing the Earth�s genome is a challenging goal, but surely a worthwhile one.

(posted 1 June 2007)


Bruce Lahn

Bruce Lahn (University of Chicago): constructing an ontogenetic tree of the human body

All cells in the human body, or the body of any multicellular organism, descend from one single cell: the fertilized egg. Thus, all the cells in an organism are related to each other based on their shared descent, just like all species are related to each other based on their shared history. Akin to a phylogenetic tree that depicts the relatedness of all the living species, an ontogenetic tree depicts the relatedness of all the living cells in an organism. Two cells are closely related on the tree if they descend from a common progenitor cell just a few cell divisions ago. Conversely, two cells are distantly related if they descend from a common progenitor cell many cell divisions ago. Mutations in DNA sequence that occur during cell divisions can in theory be used to construct an ontogenetic tree connecting of all the living cells present in an organism, in much the same way that mutations occurring during evolution can be used to construct a phylogenetic tree connecting all the living species. In humans, about 80 new germline point mutations are introduced during each reproductive cycle. Extrapolating from this, two randomly selected cells in an individual can be separated by anywhere between a few to a few dozen somatic point mutations since their descent from a common progenitor cell, and the more related the two cells are in their ontogeny, the fewer the somatic mutations separating them. By sequencing the genomes of a large number of cells from a single individual—perhaps tens of thousands of cells per individual if the $1,000 genome becomes a reality—it should be possible to construct an ontogenetic tree of all the cells based on somatic point mutations. Other types of mutations, such as small insertions/deletions or expansion/contraction of microsatellite repeats, could serve the same purpose. Mutations in microsatellite repeats may be particularly useful, given their frequent occurrence. Such a tree, coupled with information on the types of cells being sequenced and their locations in the body, would be invaluable to many disciplines of biology. Similar approaches could be used to examine the cellular origin of many diseases, especially cancer.

(posted 1 May 2007)

Leena Peltonen-Palotie

Leena Peltonen-Palotie (University of Helsinki): how do genes and life events communicate to influence disease risk?

Spending $1,000 for full characterization of structural variants in human genome—what a bargain! I would spend $250 million to sequence DNA collected in the best available population cohorts, with deep longitudinal data sets reaching back up to 30-50 years. This means cohorts of tens of thousands of individuals that have provided multiple blood samples, facilitating not only sequencing but also transcriptomic, metabolomic and proteomic analyses, with relevant clinical follow-up data points at several occasions during their life. Such data sets are typically produced in countries with national health care systems and related records. Taxpayers of these nations have already contributed to the most expensive part of the study: the health care system guaranteeing reliable data collection from birth to the grave. We could afford to sequence perhaps the 25 best global cohorts from various populations with 10,000 individuals each. (Actually, I don't think there would be more cohorts available today fulfilling all strict criteria for data depth and quality). Parallel to that, I would spend $100 million to sequence a study sample of 100,000 monozygotic twins (only one of the pair, since their genomes are identical, and you get two phenotypes for the price of one genome sequence), including all those who are discordant for important diseases like schizophrenia, autism or Alzheimer disease. Again, I would select those who have been followed longest and who have been studied in the greatest detail for biological parameters and diseases and their various trait components. These two global experiments would help us to establish a basic understanding of two fascinating and complex biological and health-related problems: (i) how do our life events, including those of early childhood, modify the impact of our genome on various diseases developing throughout life, and (ii) how do factors besides genome structure (e.g., methylation and epigenetics) affect our biological features, including diseases?

(posted 1 May 2007)

Paul Nurse

Paul Nurse (Rockefeller University): the tree of life and the human microbiome

There are two projects that I would be interested in if DNA equivalent to a human genome could be sequenced for $1,000. The first would be to vastly extend the use of DNA sequencing for taxonomic and evolutionary purposes. Sequencing a selected set of genes for as many species as possible, building on present projects such as the DNA bar code studies for marine life, would assist both the identification and classification of species and would also provide the data required to build better taxonomies. More complete genome sequencing would be required of two further types of living organisms: those occupying taxonomic places that illuminate key phylogenetic transitions in the tree of life, and those that have undergone rapid speciation, such as Cichlid fishes in African lakes. Knowledge of these genomes will be revealing about evolutionary mechanisms, both the macro changes that give rise to major phylogenetic types and the micro changes that lead to speciation. And who knows, perhaps the massive evidence accumulated by these studies, which will repeatedly confirm that organisms are related through descent, might bury the creationists and the intelligent designers under a mountain of base pairs. The second project would be to randomly sequence DNA extracted from the microbial life living both within a human being, such as in the gut, and on the surface of a human being, such as on the skin, in the nose and in the anus. This would allow the identification of microorganisms associated with the human body, including the vast numbers that cannot be cultivated. If such studies were performed on a population-wide basis, the presence and numbers of particular microbes could be related to the propensity of an individual to develop a specific disease or human condition such as obesity. The correlation between the presence of Helicobacter and the generation of stomach ulcers illustrates how microbes can influence human health in unexpected ways, and more systematic studies of the variation in microbial populations within or on individuals would complement traditional studies of the effects of human genome variation on human health.

(posted 1 May 2007)

Manel Esteller

Manel Esteller (Spanish National Cancer Centre): the DNA methylome in health and disease

There are many concepts in biology and medicine that we accept as self-evident truths without further consideration. In many cases, however, pure genetic information does not provide a complete answer, and the environmental effects are difficult to measure—the old story of nature versus nurture. Epigenetics (and in particular, DNA methylation, the most stable epigenetic mark) has an important role at that interface. Thus, a complete analysis of the DNA methylation status of every CpG dinucleotide in an organism—the DNA methylome—would be extremely useful for understanding cellular physiology and disease. If we were to imagine overcoming technical and funding limitations so that we could 'read' the entire DNA methylome for $1,000, let's be ambitious. We could examine the DNA methylome in various contrasting conditions: embryonic and adult stem cells versus transformed and cancer stem cells; neurons versus muscular cells (two extremely differentiated tissues); newborns versus centenarians; genetically identical twins with different penetrance for a particular disease; various family members in genealogical studies; a brain affected by Alzheimer's disease versus a healthy one; an atherosclerotic blood vessel versus a fit one; or two apparently microscopically identical tumors with a very different clinical outcomes. We could even superimpose the DNA methylome with the pure, complete human DNA genome data. I look forward to the time where we will have the capacity and means to address these issues. Expectations are high.

(posted 1 May 2007)

Julian Parkhill

Julian Parkhill (Wellcome Trust Sanger Institute): the immensity of bacterial diversity

Bacterial genomes are, very roughly, 1,000 times smaller than the human genome, so the question can be rephrased, from a microbiological viewpoint, as "What will you do with a $1 bacterial genome?" There are several caveats to this question; not least is the fact that when most people talk about "sequencing genomes" for $1,000, they are really talking about resequencing genomes, based on extrapolation from current technological advancements. This is acceptable when addressing (human) genomes that may vary by one SNP per kilobase or so but is more problematic when addressing genomes that are far more diverse, and when much of the variation is based on the presence and absence of whole genes and gene systems. Assuming, then, that we will have technology that will deliver real bacterial genomes for $1, what could we do with this? Bacterial diversity is immense; individual species vary by a greater degree than whole orders of metazoans, and there are probably many more bacterial than eukaryotic species. Sequencing at this level would allow us to really quantify and understand this diversity. This understanding would itself open up new scientific vistas on the most numerous organisms on the planet and the effects that they have on the primary organic and inorganic cycles in the environment. In addition, this sequencing power would allow us to fully investigate phenotype-genotype associations in bacteria and to perform fine-level epidemiology on bacterial agents of disease. Of course, such sequencing technology would have implications beyond basic research. Rapid and cheap sequencing would certainly be used for medical diagnostics, allowing the organisms and their phenotypes (virulence, drug resistance, etc.) to be identified directly, rather than through surrogate markers, as they are at present. Rapid and accurate diagnosis is the foundation of effective medicine; any technology that provides this will have a fundamental impact on human health.

(posted 1 May 2007)


Sergio Verjovski-Almeida

Sergio Verjovski-Almeida (University of São Paulo): defining the human introme

The first human genome sequence cost approximately US $5 billion. If the $1,000 genome were available, I would spend the next US $5 billion to sequence the genomes of some 5 million people—not on a first-come first-served basis, but rather doing a challenging, hypothesis-generating large-scale genomic experiment. For example, one could sequence the genomes of 5,000 voluntary donors, healthy and ill, picked at random from 1,000 well-defined human populations from all parts of the world. Access to additional information from each donor will be critical so that correlations between genomic sequence and individual phenotypes can be established. This kind of information will raise tremendous ethical problems, and society must be informed and prepared; rigorous guidelines and proper controls should be put in place to avoid unethical or even criminal misuse. We will look for patterns in the least-conserved genomic regions as well as sequences that are conserved but that are outside the exonic protein-coding regions of genes. These intronic conserved regions probably comprise sequences related to critical functions in all humans, as it has become apparent that they are the source of ubiquitously expressed sense and antisense noncoding RNAs—a part of the human 'introme'. Some of these messages have tissue specificity, and their transcription in the cell is regulated by physiological factors such as hormones. Many of the noncoding RNAs may influence risk of disease. Accumulation of genomic data on individual variability of the intronic DNA will certainly drive the development of more powerful computational tools for the identification of biologically meaningful traits that are influenced by introns. In addition, cheap sequencing should permit the identification of the complete transcriptome at much deeper coverage by direct RNA sequencing. All of this will help define the introme and predict its function, although some decades will pass before we fully understand and use this knowledge to improve human life.

(posted 2 April 2007)

Yoshihide Hayashizaki

Yoshihide Hayashizaki (RIKEN Yokohama Institute): toward personalized medicine

The original goal of $1,000 genome technologies is to enable complete genome sequencing of a human-sized genome by shotgun sequencing. This method, however, is also a powerful tool to generate various biological data in a high-throughput manner. For instance, when using $100,000 technologies in combination with CAGE (cap analysis of gene expression), it is currently possible to detect expression frequency at extremely high sensitivity (one expressed molecule, (i.e., RNA) within 100 cells, on average). Once the $1,000 technologies are available, the expected accuracy will be 100-fold greater. In this application, in which the expression frequency is obtained as sequence data, it is possible to obtain expression information for each promoter that cannot be achieved by current hybridization-based methods. Clearly, this information is essential for analyzing molecular networks from gene to phenotype. Furthermore, this concept of using a DNA sequence as a tag can be extended to biological information other than gene expression. Whole-genome sequencing is the ultimate way to extract and analyze genetic information encoded in the DNA sequence. Though many technical problems remain, (i.e., shorter read length), acquisition of a high number of sequences from different individuals will allow for definition of the 'normal' sequence, which is currently indefinable. While the $1,000 genome technologies will be a powerful way to realize the 4 Ps (prediction, prevention, personalization and participation) and will lead to advanced personalized medicine, we have to keep in mind that there are some ethical concerns if it is to be applied to all citizens. In the case of an orphan disease, we cannot at this time provide adequate treatment for those patients, because the causative gene is unknown and very few cases are reported. When used in combination with RNA or full-length cDNA data, as well as expression data that may be obtained through transcriptome analysis, these technologies will enable 'personalized genome diagnosis'.

(posted 2 April 2007)

Joseph Nadeau

Joseph Nadeau (Case Western Reserve University): never enough data

Be careful with questions like this, because I am addicted to data—there's never enough, especially if good phenotypic and clinical information is also available! And I have lots of questions, especially if sequences are available from many individuals from many geographic regions, and with sequences from families. How common is epistasis? Are individuals composed of random combinations of genetic variants, in health, in disease? How many and what kinds of deleterious genetic variants do individuals carry, and how do these individuals avoid 'genetic death'? What are the frequencies and characteristics of protective alleles versus deleterious variants? Are the functions of these variants fixed, or are they variable depending on genetic interactions? Do genetic variants lead to a myriad of phenotypes, or does some combination of genetic variants lead to similar phenotypes? What is the catalog of somatic mutations that arises during development and aging, in health and disease? Is RNA editing common, what are the targets of editing and under what conditions and consequences does editing take place? How does the epigenetic code change with different genetic and environmental conditions during development and aging? What features of this code are inherited across several generations, and what are the consequences? Are network structures and functions fixed, or are they dynamic depending on genetic variation?...Breathe, more questions...Perhaps the most interesting question involves the ways in which various combinations of genetic variants conspire to enable, or compromise, health under particular genetic and environmental conditions. To address this fundamental issue, we need a complete inventory of genetic variations and their associations with phenotypes in particular environmental conditions. A deep understanding of evolution and our ability to personalize health care depends on the insights that will emerge from these data.

(posted 2 April 2007)

Muntaser E. Ibrahim

Muntaser E. Ibrahim (University of Khartoum): genomics for all

The current technological advances in genomics that promise an affordable whole-genome sequence will undoubtedly transform the practice of medicine and education. Their impact will be most significant if the process of 'individuation' of the human genome is paralleled by an equal pace of development in information technology—related applications, an association that should surely end up in a happy and everlasting marriage between these two expanding disciplines of science. We could start immediately by addressing issues such as family and subsequently community and population association/linkage maps that would pave the way to accurately map disease, allowing us to explore the limits of genetic diversity in relation to diseases and adaptive traits. The diversity of genomes in Africa and the rising incidence of diseases of complex inheritance, such as those of 'lifestyle', make the availability and affordability of a whole-genome sequence a necessity, and will also allow us to explore issues such as gene-gene interaction and monitor genome stability at an individual scale. For the skeptics, the process of 'individuation' of the human genome should make the unraveling of genetic legacies a more affordable and tenable task, thus demystifying these legacies. Judging by the current pace of spread and utility of information technology in developing countries, it is reassuring that these countries will not be isolated from the advantages of affordable sequencing. This will also reflect positively on the ethics and practice of genomics.

(posted 2 April 2007)

Jeantine Lunshof

Jeantine Lunshof (Vrije University Medical Center): a just distribution of benefits

Being a philosopher, I will go for a walk in the park when the $1,000 genome arrives, and make up my mind. By now we should have learned to be prometheus—forward-thinking, audaciously, while at the same time accepting that the humanities cannot run ahead of science. Therefore, we should take any conceivable scenario and application into consideration and hope for incremental implementation that will allow us to keep pace. Quite a few ethical problems might be solved, or at least reduced, by the general availability of individual genomes or, for now, exomes. The complex questions raised by stratification and its related group-based stigmatization may become obsolete once health risk estimates can be based on a comprehensive analysis of individual genomes. The use of comprehensive data sets in research using genome-wide association studies already confronts institutional review boards with qualitatively new questions that cannot be answered by applying the traditional criteria for ethical acceptability. At the same time, new questions will arise, such as how to assure health care equity with increasing individualization and limited resources. In clinical practice, the availability of this new type of information should improve the efficiency of therapy as well as prevention; however, a huge translational and educational effort will be needed to make it work. In biomedical research, a refinement of studies based on new modes of stratification may increase the number of studies needed and require many more research subjects. The resulting new products, including drug-test combinations, might be safer and more efficacious but have smaller markets. Such products might then be so expensive that few will benefit. From the point of view of ethics, the big challenge of the $1,000 genome will be in dealing with its mixed blessings. The humanities should join science now in a cooperative effort to reduce the adverse effects and optimize the benefits, securing their just distribution.

(posted 2 April 2007)


Sarah Tishkoff

Sarah Tishkoff (University of Maryland): the full range of genetic diversity

I would want to sequence the genomes of randomly sampled individuals from populations with diverse geographic and cultural ancestries, particularly from Africa. With this data we would have a much clearer picture of levels and patterns of genetic diversity without the ascertainment bias of selecting common polymorphisms in a small subset of populations. This would be informative for learning more about human evolutionary history, as well as the possibility of introgression of archaic (e.g., Neanderthal) and modern genomes. This data could also be used to search for signatures of natural selection so that we may identify loci and functional variants that have played a role in human adaptation and disease. I would compare this data to genome sequences from chimpanzee populations, to distinguish variants that are adaptive in each species. For the human individuals with sequenced genomes, I would also want to obtain phenotype information on common traits that may have been adaptive in past environments (i.e., sensory perception, diet and drug metabolism, infectious disease susceptibility, carbohydrate metabolism, etc.). This would enable us to do whole-genome association studies to map genes and identify genetic variants that play a role in these traits. Having a whole-genome sequence will allow us to identify both cis- and trans-acting regulatory mutations that affect variability in these common traits and to see whether these loci show signals of natural selection and if they are restricted to particular geographic regions or populations.

(posted 1 March 2007)

Stephen Scherer

Stephen Scherer (Hospital for Sick Children/University of Toronto): perfect genomics

My dream of the $1,000 genome sequence includes a fully finished product from beginning to end of each chromosome: 23 perfectly complete pairs in all. That understood, I would first like to sequence the genomes of monozygotic twins discordant for autism; Albert Einstein and Ted Williams; and then some more for comparison. For me, such data would provide a wondrous glimpse into those things that most intrigue me: my current research focus, the minimal code for a brilliant mind, and the indices for the perfect swing. In my world of 'perfect genomics', curiosity and imagination would always trump deep pockets as drivers to be satisfied, a concept we need to re-capture for this field. In fact, erasing barriers of entry and boundaries to creativity would be the greatest potential legacy of the $1,000 genome. Oh, yes—and the data release policy would follow the Toronto Principles: that it all be posted for everyone's appreciation, some moments after waking up from the perfect dream.

(posted 1 March 2007)

Laurence D. Hurst

Laurence D. Hurst (University of Bath): curious about mutation and genome evolution

If there is one expression I loathe, it is that "curiosity killed the cat". As every four-year-old knows, simple curiosity and structured play are the best way to discover the world, not something to be warned against. While a playful attitude is commonplace in theoretical groups, for experimentalists it is usually an unaffordable luxury. It would be wonderful if cheap genomes could change this. For my own part, I would love to know what underpins the heritable differences in musical ability. Unlike for language, there appears to be a sizeable proportion of the population that does not in any manner respond to music: Bach, Beethoven and the Beatles are no different from white noise. What are the variants responsible? Are the same genes also involved in language? Who knows—this may shed light on deafness. Blue-sky curiosity aside, cheap (and I hope accurate) whole genomes should allow us to detect the rare spontaneous mutational needles in the genomic haystack. Population genetics, while good at understanding what happens to variation once it is present, has no theory of the generation of variation: the rates and biases of different forms of mutation are empirical issues. But because they are rare (maybe 10-100 new mutations per human genome per generation), these are very difficult to characterize. Both somatic and germline processes would be interesting. Sequence the genome from very many cell types in one individual and you could not only derive the profile of mutational events but could also construct an ontogenetic tree. If we understood the processes operating in the germline, we could build proper null neutral models of genome evolution. In principle, this would then make inference of non-neutral processes much easier. Cheap sequencing would, I hope, also redefine the standards for expression assays. Currently such analyses are a wrestle with biases inherent in the different platforms. Obtaining the complete transcriptome by direct RNA sequencing could put an end to this. And finally, before they disappear, why not sequence all the species on the conservation red list? If any cats are being killed, we should sequence their genomes now.

(posted 1 March 2007)

John Ioannidis

John Ioannidis (University of Ioannina and Tufts University): randomized citizen-scientists and the elusive 'exposurome'

A $1,000 genome sounds great—cheaper would be even better. The question is, "Can we make a difference to our health by this knowledge?" Therefore, one of my priorities would be a large-scale randomized trial: participants are randomized to have their genome sequenced or not. Then we examine in the long term if this information improved their health outcomes. Such a trial should be conducted preferably in countries where there are already rigorous and reliable registries for outcomes such as cancer, cardiovascular disease, diabetes, end-stage renal failure, mental illnesses and drug use; several Scandinavian countries, for instance, would fit the bill. The trial should enroll not only elderly people but also young adults (if not children and adolescents) for several reasons: many complex diseases start early; earlier knowledge may be more effective; and, moreover, we are still scratching the surface of the association between genetic variation and common disease or treatment response, so it may still be several years before that information pays off. In addition, either in the context of such a trial or in the context of a separate large epidemiological study, I would wish to combine this comprehensive genomic information with equally meticulous information on environmental exposures, behavior and lifestyle. Participants should agree that they will collect just as much information to create their 'exposurome'. Unfortunately, while measurements on the genetic side have advanced rapidly in mass and precision, non-genetic exposure measurements remain in the stone age. It is unlikely that we will understand complex phenotypes unless we measure both sides. Last, who should be the authors for the scientific results of large-scale population projects of the 21st century? I think it should be the participants themselves, representing a new prototype of citizen-scientist. Having a few hundred thousand names on the web should be feasible.

(posted 1 March 2007)

Emma Whitelaw

Emma Whitelaw (Queensland Institute of Medical Research): sequencing the epigenome

We still do not really understand the role of DNA methylation. Many, but not all, eukaryotic organisms methylate their cytosine residues some of the time, and it can be considered the fifth base. Whether or not the cytosines are methylated matters because increased methylation of promoters can result in transcriptional silencing, and changes in methylation state, called epimutations, can cause cancer. The development of a robust method of determining exactly which cytosines are methylated, called bisulfite sequencing, has enabled us to start to address this problem. Sodium bisulfite results in the conversion of unmethylated, but not methylated, cytosines to uridines. Following conversion and PCR amplification, the DNA fragments can be directly sequenced. The data so far have shown that there are many regions of the genome where a particular C residue is methylated sometimes, but not always, across cells of the same tissue type. This plasticity is not understood. A few studies using animal models have reported that methylation patterns can be influenced by environmental events in utero, raising the possibility that the epigenotype of an individual is a read-out of his or her environmental history and providing a potential avenue by which environment acts to influence phenotype. These exciting ideas need to be investigated further. Recent genome-wide studies in humans show that methylation patterns vary across individuals, but the extent to which this is a consequence of changes in the underlying DNA sequence are not known. Monozygotic twins provide a unique opportunity to answer this question. So I would carry out a comprehensive analysis of the methylation state across the entire genome of monozygotic twins by direct bisulfite sequencing of their DNA, using samples taken from different tissues and from twins of different ages with differing degrees of concordance for various physical traits. In this way, we should be able to work out the ground rules of DNA methylation in humans.

(posted 1 March 2007)

Elaine A. Ostrander

Elaine A. Ostrander (National Human Genome Research Institute): understanding the genetics of dog behavior

A genome for $1,000 is certain to become a reality in our time. I would return 15 years in time to the question that first motivated me to join the field of comparative genomics: what are the genetic mechanisms that control the breed-specific behaviors of various domestic dog breeds? Why do herding dogs herd, pointers point and draft dogs pull? Why is the personality of the pit bull so different from that of the golden retriever, and that of the terriers so different from the basset hound? Given that these differences have bred true for generations, no matter what the environmental exposures or upbringing, we know they are controlled at least in part, and probably heavily, at the genetic level. I would first select a set of dog breeds that displayed extremes of behavior and try to understand the genetic basis of actions like 'giving sheep eye' to advance the herd as a border collie does, or 'pointing' or 'prey drive' as we see in so many hunting dogs. What about the genetic differences between sight and scent hounds? These experiments would involve sequencing the genomes of many animals displaying each trait at the extremes, and breaking each complex behavior into its component parts. As a second-order experiment, I would want to understand a least one of the behaviors that have made the dog man's best friend and continuous companion for thousands of years, tackling the genetics of labels like 'loyalty', 'trust' and my personal favorite, 'blind adoration'. Is it possible that this approach would allow us to understand the molecular basis of forgiveness and commitment—two behavioral traits shared uniquely by dog and man? Finally, I'd look at families of dogs within breeds that have anomalous behaviors—rage, obsessive compulsive disorders, etc., and try to understand the genetics of a small set of mental illnesses that simultaneously plague man and man's best friend. Hmm...that sounds like a lot of work. How about a $500 genome?

(posted 1 March 2007)

Christine Petit

Christine Petit (Pasteur Institute): the new challenge of infectious disease

I must confess that however passionate is my present research on genes involved in human deafness, the temptation to move away from it would be great. Indeed, considering the threat posed for mankind by the re-emergence of infectious diseases over the past three decades, and the climatic changes that are likely to make things even worse, I would be in favor of another project. I would propose to try to better understand how the various pathogens and changes in lifestyle have shaped the recent evolution of human populations and affected our specific receptiveness to infectious diseases. For instance, I would sequence the genomes of various populations from Central Africa, either living in different places or within the same area, but with different lifestyles: hunter-gatherers (such as Pygmies, Khoisan) and agriculturalists (mostly Bantu speakers), for example. The emergence of agriculture has been accompanied by an increase in population densities, a sedentary lifestyle and animal domestication. All of these factors have most likely had a lot of influence on infectious diseases and, more generally, on human health. Inexpensive sequencing would enable us to gather and correlate genetic variations within the scope of entire genomes with phenotypes and detailed disease-related (e.g., tuberculosis, hemorrhagic fevers, malaria, etc.) phenotypic data, including biological parameters (e.g., basic immunological and metabolic ones). Such a study would constitute an excellent model (i) to evaluate the extent to which pathogens have exerted selective pressure on the human genome, (ii) to identify genomic regions that have played a major biological role in host resistance and (iii) to test how the emergence of an agriculture-based lifestyle has influenced the relationship between the human host and the pathogens. The parallel sequencing of the genomes of the pathogens should be part of the program as well, in order to gain new insights into host/pathogen coevolution, and finally (iv) to integrate these data with the data derived from new therapeutic approaches, in particular those going from drugs to genes.

(posted 1 March 2007)


Axel Meyer

Axel Meyer (University of Konstanz): toward a theory of genomes

Almost 20 years ago, Allan C. Wilson noted that easy sequencing technologies would result in the "democratization of the genetic code". As costs for sequencing continue to decrease, a veritable tsunami of DNA information floods the research community. While speed and cost of data collection matter, the bottleneck is no longer in the production of raw data, but in the annotation, interpretation and paper writing. We need automatic paper writing machines to bypass the pesky human factor—an unlikely development. The obvious implications of a $1,000 human genome for the first world will be in pharmacogenomics, patient-specific treatments and data collection for insurance purposes, with obvious ethical ramifications that need not be forgotten. The actual genome sequencing is increasingly—and undemocratically—limited to a handful of genomics centers. Hopefully, $1,000 sequencing technology will reverse that trend. For researchers with broader interests, a quantum leap in cost reduction might mean that more evolutionarily diverse species will be sequenced. Maybe even population surveys of other species than Homo sapiens are around the corner? Jacques Monod's prediction, paraphrased as, "What's true for E. coli is true for an elephant," might not end up holding true. Many lineages may turn out to be quite idiosyncratic and nonconformist. GenBank is making a valiant and commendable effort in trying to channel and organize the bits of new information that are collected with every completed genome. But these pieces remain to be assembled into a comprehensive mosaic that might lead to a theory of genomes. So far, the rules, if they even exist, remain elusive. The nascent discipline of genomics is still in its mostly descriptive, natural-history phase and is more driven by technological advances than guided by the testing of theoretical predictions. Genomics is still waiting for its Darwinian moment that places all the seemingly disconnected genomic descriptions into an intellectual framework. To venture a guess, it�s probably going to have something to do with evolution.

(posted 1 February 2007)

David Goldstein

David Goldstein (Duke University): the genetics of normality

Beginning with the human genome project, genomics has been driven and motivated in large part by the hope of clinical application. The $1,000 genome is no different, and I share the enthusiasm of others that real clinical relevance may finally be imminent. Full-genome sequencing will accelerate discovery genetics substantially by allowing us to dive even further into the full spectrum of functional genetic variation. But cheap sequencing will also become a driver at the public level. Haplotype tagging may be an efficient research tool but it hardly fires the public imagination. But people will be excited to know their complete genetic sequence. Before long, a global wealthy elite will not only stump up the $1,000 dollars but will also hire 'sequence consultants' to advise them about what their complete sequence means for their health, or in California, their genetically tailored diets. At a minimum, an individual's sequence will prove relevant to drug choice, and I hope that societies will put in place appropriate mechanisms to ensure that the benefits of genomics reach farther than those that can afford 'sequence consultants'. Less appreciated, I think, is the role that inexpensive sequencing will play in basic biology. Today genomics is expensive and concentrated on disease endpoints, which are necessary to motivate the high price tags of these studies. As full representation of human genetic variation gets less expensive, these studies can move back into the study of human biology. We humans are different from one another not only in the diseases that we suffer, but in myriad other details, small and large. Many of those are the result of genetic differences that remain unknown and almost unstudied. It is finally time to study all the normal variation that enriches the human world and experience—memory, behavior and personality. In short, economical sequencing of human genomes will help us to understand who we are and how we got that way.

(posted 1 February 2007)

David Gurwitz

David Gurwitz (Tel Aviv University): faster clinical uptake of pharmacogenetics

This year marks the 50th anniversary for the field of pharmacogenetics, the study of the heritable determinants of drug response, both in terms of safety and efficacy. In 1957 Arno Motulsky published the first review in this field, which has exploded during the last decade (the current PubMed count for 'pharmacogenetics' is 4,184 items) and which is by now the subject of dedicated conferences and books. Yet, very little of this vast knowledge has found its way to the clinic. Recent reports examining the lack of uptake of pharmacogenetics into the clinical setting have emphasized high costs and lack of reimbursement among the key issues. A $1,000 genome, or merely a $1,000 'exome' as George Church rightfully notes, would allow the rapid uptake of pharmacogenetic testing to the bedside. Although ethical and legal concerns would remain substantial barriers for its implementation, the affordable genotyping costs would mean that policy makers would no longer be able to cite high costs as a reason for keeping pharmacogenetics out of the clinic. The Centers for Disease Control recently published a report stating that 6.7% of all US hospitalizations during 2004-2005 were the direct result of adverse drug reactions, highlighting their huge societal burden and the urgent need for improving drug safety. A $1,000 genome would mean large savings, given the pharmacogenetic knowledge that is already available but so far beyond the reach of all but the very affluent.

(posted 1 February 2007)

Detlef Weigel

Detlef Weigel (Max Planck Institute for Developmental Biology): adapting to a changing world

This will allow real-time evolutionary studies on an unprecedented scale. I would imagine using this sequencing power to extensively sample and resample wild populations over the course of several years (the $1,000 human genome would, for example, allow us to sequence an Arabidopsis thaliana genome for $100 or less; thus for $200,000, we could record the entire genome of 2,000 plants). By recording phenotypes of different individuals and using the new sequencing technologies to assay their genomes—and possibly also their methylomes, siRNomes and transcriptomes—we'd be able to assess how traits and sequences vary over the years, thereby identifying candidate genes that allow plants to adapt to a changing environment year after year. It will not obviate experimental studies, however, as we'd take the relevant genotype into the lab and ask in segregating F2 populations whether the associations hold up. Once we have a sufficient understanding of model species such as A. thaliana, we will move on to sample whole-genome diversity in wild plants. We will be able to make predictions about their ability to adjust to a changing environment, based on the allelic variation present in a species.

(posted 1 February 2007)

Ewan Birney

Ewan Birney (European Bioinformatics Institute): full employment in bioinformatics

Cheap sequencing opens up new areas in medicine, evolution, molecular biology and the understanding of human history. In medicine, first change disease association studies by sequencing cases and controls—no more relying on indirect markers, opening up new disease models for easy statistical testing. Then sequence enough cancer samples to understand the genetic aspects of that disease. Finally, at $1,000 a go, everyone will have their genome sequenced as part of their medical record, using it to inform both diagnosis and treatment design, though my sense is that this will not transform modern medicine beyond recognition but rather augment it, providing a more fine-grained dissection of many diseases. In the field of evolution, if one can sequence the human genome at $1,000, one can sequence many other species. Sequence any species we are interested in, driven by medicine or veterinary medicine, and part of any environment that interests us, from the extreme (e.g., Antarctica) to the nearby (understanding indigenous wildlife in Britain). Give it a couple of decades and we'll really start to see evolution play out. As for molecular biology, most of it has a DNA component; if you can cheaply sequence you can dream up new experiments. Don't do microarrays any more, just sequence RNA samples. Work out transcription factor binding sites using DNA sequencing of ChIP samples. Do competition experiments in worms with tagged gene variants. If you can make DNA the output of your experiment, then you will be able to scale the experiment beyond your wildest dreams. In the field of population genetics, as we sequence more individuals, we will build a more accurate description of our history. I suspect this will hold a number of surprises for us and perhaps will be used by historians in near history as well as anthropologists. Finally, no matter what a $1,000 genome is used for, one thing is certain: bioinformaticians will still be in demand.

(posted 1 February 2007)

Leonid Kruglyak

Leonid Kruglyak (Princeton University): digital expression profiling

A scalable technology that can deliver a $1,000 human genome will revolutionize expression profiling. Currently, expression profiling is usually carried out by hybridization to microarrays. This approach, while immensely useful, is not very quantitative, it typically yields relative rather than absolute mRNA abundance, and the results are difficult to compare across different microarray platforms. Serial analysis of gene expression (SAGE) was developed as a digital readout of the number of mRNA molecules in a sample by sequencing of short tags, but it has been limited by complicated sample preparation and the scale of sequencing. These problems would be obviated by cheap massively parallel sequencing. As an example, let's suppose that the technology works by generating threefold coverage of the human genome in 30-bp reads. Then $1 would buy 300,000 reads. Because human cells contain on the order of 300,000 mRNA molecules, 25-fold redundant sample sequencing of cDNAs would cost $25 and would allow precise and absolute quantification of transcripts expressed at levels down to one copy per cell (with a standard deviation of 20% for these rare messages and much higher precision for the more abundant ones). For yeast, the number of mRNA molecules per cell is roughly 20-fold lower, allowing the transcriptome to be profiled for a dollar. These prices are very competitive with the cost of microarrays. In addition to providing much more quantitative measurements of absolute mRNA abundance, digital expression profiling also has the potential to measure expression of different splice variants, as well as to provide allele-specific measurements, which will be useful for studies of genetic variation. Transcription profiling could also be carried out in species without a genome sequence, which even in the era of the $1,000 genome may be desirable, especially for organisms with very large genomes.

(posted 1 February 2007)

Les Biesecker Eric Green

Les Biesecker & Eric Green (National Human Genome Research Institute): how much do patients want to know?

If the '$1,000 genome' becomes a reality, most of the relevant questions will center on the biomedical issues about which diseases should be studied and how the technology is deployed in studying those diseases. However, an important question that must be thoughtfully considered is, "How will clinicians and patients interact when individual whole-genome sequencing becomes a clinical-care option?" It may be that a whole-genome sequence simply represents too much information for practical use in a clinical setting, with too many ambiguities and too much potential 'bad news' for any patient to confront. In that case, the $1,000 genome might end up being used predominantly in the arena of clinical research, specifically as a tool for discovery but not as a diagnostic tool for use in routine clinical care. Alternatively, it may be that patients will want to know information about the likelihoods of being afflicted with all diseases with a demonstrated genetic component, the medicines that they should seek or avoid and the changes in their lives that they should pursue to mitigate genetic-based risks. If the $1,000 genome is implemented to address these desires, physicians and other health-care providers must develop approaches for delivering such information to their patients in an appropriate fashion. We need to understand the views of research patients as they make decisions about what they do or do not want to know about their genomes. What kind of information do they want back from their health-care provider? What level of confidence or certainty do they want in that information before they are notified? Do they want their health-care provider to contact them in the future if new information is obtained that is clinically relevant to their genome sequence? The availability of a $1,000 genome could be a powerful health research tool. Only by studying these important clinical questions in a research setting can we be confident that we will maximize the potential benefits that this tool may offer.

(posted 1 February 2007)

Richard Gibbs

Richard Gibbs (Baylor College of Medicine): making difficult choices easier

A $1,000 genome? That's a 'no-brainer'. Since there is more than $120 million per year now being spent on genome sequences at NIH Genome Centers alone, then this is the equivalent of more than 120,000 genomes! Added to the other might be >500,000 'genomes'. In humans, I would prioritize the larger cohorts with deep phenotyping to speed disease allele discovery and add significant numbers of population-based controls in order to better our understanding of 'normal variation'. 200,000 such 'genomes' would launch human genetics into a new orbit. With this in hand, we can begin to build the knowledge base to enable us to properly use a $1,000 genome sequence as a routine diagnostic tool. Meanwhile, outside of humans, we should 'walk' through multiple primate species and include multiple individuals from each species. The same is true for all mammalians, including the marsupials. This would consume at least another 200,000 genomes. Following that, the same-size effort should be directed against the remainder of the evolutionary tree. On top of that, we have the whole Cancer Genome urgent need for several tens of thousands of genomes! A much harder question would be what to do with a $1,000,000 genome—the question before us right now. There the choices are more difficult, since we do not have the resources to simply use sequencing to solve single disease discovery efforts, but have to pick and choose which samples will yield the most insight, and what 'shortcuts' we can take to get to the finish line more cheaply. The good news is that even at $1,000,000 each, we could afford some tens of genomes—and at the current rate of discovery in human genetics, these data would be extremely exciting.

(posted 1 February 2007)

Trudy Mackay

Trudy Mackay (North Carolina State University): the architecture of complex traits

The first thing I would do is obtain whole-genome sequence of 500 inbred strains of Drosophila melanogaster, which have been recently derived from nature and extensively characterized for a battery of complex trait phenotypes. Since one human genome is roughly equivalent to 25 Drosophila melanogaster genomes, the total cost would only be $20,000. Natural populations of Drosophila contain a treasure trove of genetic variants that have survived the screen of natural selection and are variable for complex traits relevant to human health as well as adaptive evolution. I would use these data to map the allelic variants associated with complex traits to address fundamental questions about the genetic architecture of complex traits. How many genes are associated with variation in each trait, and are segregating alleles largely rare or common? How common are epistasis and pleiotropy? Are causal variants predominantly in regulatory or coding sequences? Further, these data will provide functional annotation of computationally predicted genes. There is little haplotype structure in Drosophila populations, presenting a favorable scenario for identifying polymorphisms causally associated with variation in complex traits. Further, the ability to obtain replicate measurements of the same genotypes in multiple defined environments under controlled conditions gives unprecedented power to detect variants that may be context-specific or that have small as well as large effects. These data will enable integration of population genetic analyses on a genome-wide scale with patterns of phenotypic variation and can be integrated with parallel studies on orthologous traits in human populations to provide candidate genes for further study.

(posted 1 February 2007)

Thomas Mitchell-Olds

Thomas Mitchell-Olds (Duke University): a boon to evolutionary ecology

Inexpensive sequencing will have a dramatic impact on evolutionary and ecological functional genomics. Evolutionary ecology will be a major beneficiary as studies of non-model organisms in a range of environments become more feasible. In many cases, organismal biologists will follow paths that have already been pioneered in model organisms but that have not been ideal for some ecological and evolutionary questions. It is already clear that high-throughput sequencing enables new analyses that were previously impossible. For example, genome-wide data on nucleotide polymorphisms have catalyzed population genetics analyses of targets of natural selection, based on regions of linkage disequilibrium and haplotype sharing. Likewise, the long-standing debate on the importance of deleterious or advantageous mutations has benefited from the observation of reduced variation in gene-rich regions, which may reflect selection against deleterious mutations. Analysis of completely sequenced genomes also illuminates the types of genes that are preferentially retained in multiple copies following whole-genome duplication. Complete sequences from clusters of related species will allow searches for adaptive protein evolution encompassing all genes in an organism. Inexpensive sequencing will allow identification of all SNPs that differentiate individuals or segregate in populations. With this flood of data, the challenge of identifying functionally important SNPs among thousands of polymorphisms will be increasingly obvious. Solutions to this problem (such as straightforward gene replacement) are fundamentally important. When DNA sequencing is no longer a bottleneck, then other factors will limit our research. Further advances in bioinformatics, population genetics theory and statistical methods will be needed. The era of inexpensive sequencing will return our focus to phenotypes, biological mechanisms and environmental context. In our own research, sequencing one species from each tribe in the Brassicaceae would be a useful beginning, at a lower cost than a single human genome. Moving forward will require a continual interplay between sequence data, physiological function and experiments in natural environments.

(posted 1 February 2007)


Francis S. Collins

Francis S. Collins (National Human Genome Research Institute): where to begin?

The real question is, "What wouldn't we do?" At the National Human Genome Research Institute, we'd be like kids in a candy shop—there are so many exciting possibilities from which to choose. Bearing in mind our mission of using genomic research to improve human health, we'd probably take most of our current annual spending on DNA sequencing, about $120 million, and devote it to sequencing 100,000 human samples for $100 million. About 75,000 of those samples would come from obtaining the complete genome sequences of 2,500 affected individuals for each of 30 common, complex diseases, such as asthma, arthritis, diabetes, various types of cancer, heart disease, stroke, Alzheimer's disease and depression. This would enable us to systematically find both the common and the rare genetic variations that contribute to the risk of developing these diseases. The remaining genomes to be sequenced would be those of 25,000 people who have made it to the age of 100 in relatively good health and retaining the capacity for independent function. The aim of that endeavor would be to see what's special about the genomes of healthy centenarians, and then to use that information to explore the genetics of good health and longevity in all humans.

(posted 2 January 2007)

George Church

George Church (Harvard Medical School): the Personal Exome Project

If the equivalent of a complete human genome could be sequenced for only $1,000, then we should sequence all exons (also known as the 'exome') for $10—a bargain that the world could not afford to ignore ($60 billion for 6 billion people). The exome is the 1% of the genome most easily interpreted and most likely to cause noticeable phenotypes. Even if we never get to $10, it is likely that the exome is already, in 2006, 'affordable' for the global middle class: $4,000 (e.g., using polonies)—an amount recoverable over a lifetime at $50/year in healthcare savings. Association studies based on 'pathway sequences' for a million early adopters could benefit the rest of us in a way that is out of reach with current 'common variant' and/or 'linkage disequilibrium' methods. Pathway sequence studies look for associations between a disease and any 'obviously deleterious alleles' (e.g., protein-truncating alleles or changes in highly conserved amino acids) anywhere in the pathways potentially relevant to the disease (which can include dozens of loci unlinked genetically but well-linked conceptually). This would crank up the already high motivation to work out the social components of sharing integrated genome and phenome data with trusted researchers—and at a million, the statistics would be awesome. This would permit broadening the number of hypotheses simultaneously testable (i.e., combinations of alleles and environments). This might transform personal genomic medicine from a luxury to a birthright.

(posted 2 January 2007)

Stephen J. O'Brien

Stephen J. O'Brien (National Cancer Institute): a kilo-buck genome sequence

A kilo-buck ($1,000) genome sequence would be pretty neat for several applications. First, I would gather $38,000 from a generous donor and get the sequence for one individual of each of the 38 species of cats. This would allow two things: first, a near-unlimited collection of SNPs or STRs to monitor the past, present, and future patterns of genome diversity in each species as a management adjunct for species conservation (all cats except domestic tabbies are considered threatened or endangered today). Second, the genome sequence would be the beginning of identifying the changes that occurred among species of the Felidae radiation, arguably the best and most precisely resolved mammalian family with respect to molecular evolutionary divergence. Second, I would take the top 100 most endangered species of mammals and sequence each of their genomes for two reasons: first, to help inform conservationists with hopes to stall their imminent extinction; and second, to preserve the genome instructions for the species that we allowed to go away, in case future generations develop technology to use such information better than our generation. Third, I would gather $375,000 (from NHGRI?) and use it to sequence all the 375 living species of primates so that comparative genomics would have an unabridged set of genome sequences from the mammalian order from which Homo sapiens derives. I have many more ideas, including some medical muses, but these three are not so bad for under $500,000, considering it took over $2 billion to sequence the human genome.

(posted 2 January 2007)

Evan Eichler

Evan Eichler (University of Washington): in search of variation

I would be enthusiastic about four potential uses of such technology: (i) sequence 1,000 human individual genomes of diverse geographic origin to obtain a more complete understanding of the breadth of normal genetic variation (including structural differences) within our species; (ii) sequence and compare genomes from patients with idiopathic mental retardation and their parents to identify potential de novo mutational events associated with disease (the same could be applied to almost any sporadic genetic disease); (iii) sequence the genomes of all species of primates and mammals to reconstruct the evolutionary history of every base pair and to identify lineage-specific changes of functional significance; and (iv) obtain the complete sequence of 100+ germ cells (WGA) compared to a donor to more fully understand the process of mutation and recombination without ascertainment bias. I would be even more enthusiastic about technology that would allow >200,000 base pairs of contiguous sequence to be obtained directly from genomic DNA in a single pass...this would allow us to understand more complex regions of our genome such as segmental duplications, telomeres and centromeres as well as underlying individual variation.

(posted 2 January 2007)

Jonathan Pritchard

Jonathan Pritchard (University of Chicago): new answers and new questions

First, it is clear that this will have a major impact in medical genetics. Though important in their own right, genome-wide SNP studies are largely unable to provide information about genes where low-frequency and even de novo mutations are important. We don't know how important this scenario is, but recent studies suggest that this is at least part of the puzzle. Genome-wide resequencing will provide us with a complete catalog of variation in our samples and should enable us to find systematically the genes where rare—or even de novo—mutations are important. Meanwhile, as we become increasingly able to interpret the health implications of particular combinations of genetic variants, genome sequencing will surely move into the doctor's office as well. As a parent, would I have my 3 year-old son's genome sequenced to help determine his genetic liabilities (and strengths)? Obviously, this raises serious ethical questions, both for the patient, for people who share his genes, and for a society without universal health care coverage. There will also be serious practical challenges of interpreting the genetic data. But it is hard to believe that the clinical value of such information will not ultimately outweigh the risks. Beyond the medical realm, cheap genome sequencing will have a transformative impact in organismal biology. We now have genome sequences for a relative handful of eukaryotes, but for most eukaryotes there are virtually no genetic data, and most molecular studies are extremely difficult. With cheap genome sequencing, one could take any interesting clade (e.g., the Hawaiian Drosophilids) and quickly determine the full complement of genomic differences among species, and from there head into comparative expression arrays and so forth. Cheap genome sequencing will lower the divide between 'model' and 'non-model' organisms.

(posted 2 January 2007)