Creating large genome/phenome collections can require consortium-scale resources. DNA.Land is a digital biobank that collects genetic data from individuals tested by consumer genomic companies using a fraction of the resources of traditional studies.
Y.E. holds a Career Award at the Scientific Interface from the Burroughs Wellcome Fund. This study was supported by a generous gift from Andria and Paul Heafy to the Erlich Laboratory, funding from the National Breast Cancer Coalition, and support from Amazon Web Services’ Education Grants. J.Y. is supported by the Columbia University Integrative Graduate Education and Research Traineeship (IGERT), funded by NSF research grant number 1144854. We thank the tens of thousands of DNA.Land participants—especially our early adopters, whose feedback was integral in our efforts to improve the site—and genetic genealogist C. Moore for her valuable advice. We welcome inquiries by researchers who are interested in collecting genotype and phenotype information with our resource.
Integrated Supplementary Information
a The number of page visits per week to each type of report on DNA.Land: Ancestry (Red), Relative Matching (Light Blue), Relatives of Relatives (Green), and Trait Report pages (Dark Blue) b The distribution of new user registrations to DNA.Land by day of the week c The percentage of page visits by DNA.Land users by day of the week.
a Per-week and cumulative numbers of total surveys completed by users b Per-week and cumulative numbers of total questions answered by users c The distribution of time required by users to complete each type of survey. The surveyed traits are as follows: Chronotype (Orange), Coffee Consumption (Blue), Myopia (Red), Eye Color (Green), Neuroticism (Pink), Educational Attainment (Purple), and Height (Yellow).
a The distribution of the number of inferred relatives among DNA.Land users based on matching IBD segments. Only 10.5% of DNA.Land users have no detected relatives b The distribution of degrees of relatedness among matching pairs of DNA.Land users, as calculated by the ERSA algorithm. A degree of 0 indicates either an identical twin or duplicate genotype file.
a Self-reported age distribution in DNA.Land b Ancestry composition of DNA.Land users with aggregated ancestry categories: Northern European (Red), Northeast European (Orange), Other European (Light Orange), Ashkenazi (Yellow), African (Yellow-Green), South Asian (Light Green), East Asian (Turquoise), Native American (Blue). Each column represents a single user, and stacked bars on each column indicate the distribution of ancestry groups for a given user. Users are sorted by decreasing percentage of their largest ancestry group c Geographic location of DNA.Land users, as determined by IP address.
Supplementary Figures 1–4, Supplementary Tables 1–3 and Supplementary Note.