Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Long-read sequencing of 3,622 Icelanders provides insight into the role of structural variants in human diseases and other traits

Abstract

Long-read sequencing (LRS) promises to improve the characterization of structural variants (SVs). We generated LRS data from 3,622 Icelanders and identified a median of 22,636 SVs per individual (a median of 13,353 insertions and 9,474 deletions). We discovered a set of 133,886 reliably genotyped SV alleles and imputed them into 166,281 individuals to explore their effects on diseases and other traits. We discovered an association of a rare deletion in PCSK9 with lower low-density lipoprotein (LDL) cholesterol levels, compared to the population average. We also discovered an association of a multiallelic SV in ACAN with height; we found 11 alleles that differed in the number of a 57-bp-motif repeat and observed a linear relationship between the number of repeats carried and height. These results show that SVs can be accurately characterized at the population scale using LRS data in a genome-wide non-targeted approach and demonstrate how SVs impact phenotypes.

This is a preview of subscription content

Access options

Buy article

Get time limited or full article access on ReadCube.

$32.00

All prices are NET prices.

Fig. 1: SV analysis workflow.
Fig. 2: Merged SV set characteristics.
Fig. 3: Large deletion in PCSK9 associated with lower LDL cholesterol levels.
Fig. 4: Multiallelic SVs in repeat regions within exons of ACAN, NACA and PRDM9, difficult for SV detection using SRS.

Data availability

Access to these data is controlled; the sequence data cannot be made publicly available because Icelandic law and the regulations of the Icelandic Data Protection Authority prohibit the release of individual-level and personally identifying data. Data access can be granted only at the facilities of deCODE genetics in Iceland, subject to Icelandic law regarding data usage. Anyone wishing to gain access to the data should contact K.S. (kstefans@decode.is). Icelandic law allows for unimpeded sharing of summary-level data. Data access consists of Supplementary Data 15 as described below, alongside the VCF and index files for the high-confidence SV alleles at https://github.com/DecodeGenetics/LRS_SV_sets.

Code availability

Codes are available as follows: SViper, modified, used in this study (https://github.com/DecodeGenetics/SViper/tree/cornercases); SViper, original repository (https://github.com/smehringer/SViper); Scrappie, modified, used in this study (https://github.com/DecodeGenetics/scrappie/tree/v1.3.0.events); Scrappie, original repository (https://github.com/nanoporetech/scrappie); SquiggleSVFilter (https://github.com/DecodeGenetics/nanopolish/tree/squigglesv); sample execution of SquiggleSVFilter with input and expected output data (https://github.com/DecodeGenetics/SquiggleSV_samplerun); sv-merger, to form SV cliques using the Cluster Affinity Search Technique algorithm (https://github.com/DecodeGenetics/sv-merger); LRcaller (https://github.com/DecodeGenetics/LRcaller).

References

  1. Alkan, C., Coe, B. P. & Eichler, E. E. Genome structural variation discovery and genotyping. Nat. Rev. Genet. 12, 363–376 (2011).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  2. Weischenfeldt, J., Symmons, O., Spitz, F. & Korbel, J. O. Phenotypic impact of genomic structural variation: insights from and for human disease. Nat. Rev. Genet. 14, 125–138 (2013).

    CAS  PubMed  Article  Google Scholar 

  3. Sudmant, P. H. et al. An integrated map of structural variation in 2,504 human genomes. Nature 526, 75–81 (2015).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  4. Chaisson, M. J. P. et al. Multi-platform discovery of haplotype-resolved structural variation in human genomes. Nat. Commun. 10, 1784 (2019).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  5. Zook, J. M. et al. A robust benchmark for detection of germline large deletions and insertions. Nat. Biotechnol. 38, 1347–1355 (2020).

    CAS  PubMed  Article  PubMed Central  Google Scholar 

  6. McKenna, A. et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20, 1297–1303 (2010).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  7. Eggertsson, H. P. et al. Graphtyper enables population-scale genotyping using pangenome graphs. Nat. Genet. 49, 1654–1660 (2017).

    CAS  PubMed  Article  Google Scholar 

  8. Sedlazeck, F. J. et al. Accurate detection of complex structural variations using single-molecule sequencing. Nat. Methods 15, 461–468 (2018).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  9. Kloosterman, W. P. et al. Characteristics of de novo structural changes in the human genome. Genome Res. 25, 792–801 (2015).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  10. Abel, H. J. et al. Mapping and characterization of structural variation in 17,795 deeply sequenced human genomes. Nature 583, 83–89 (2020).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  11. Collins, R. L. et al. A structural variation reference for medical and population genetics. Nature 581, 444–451 (2020).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  12. Jain, M. et al. Nanopore sequencing and assembly of a human genome with ultra-long reads. Nat. Biotechnol. 36, 338–345 (2018).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  13. Stancu, M. C. et al. Mapping and phasing of structural variation in patient genomes using nanopore sequencing. Nat. Commun. 8, 1326 (2017).

    Article  CAS  Google Scholar 

  14. De Coster, W. et al. Structural variants identified by Oxford Nanopore PromethION sequencing of the human genome. Genome Res. 29, 1178–1187 (2019).

  15. Gilpatrick, T. et al. Targeted nanopore sequencing with Cas9-guided adapter ligation. Nat. Biotechnol. 38, 433–438 (2020).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  16. Audano, P. A. et al. Characterizing the major structural variant alleles of the human genome. Cell 176, 663–675 (2019).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  17. Gudbjartsson, D. F. et al. Sequence variants from whole genome sequencing a large group of Icelanders. Sci. Data 2, 150011 (2015).

    PubMed  PubMed Central  Article  Google Scholar 

  18. Jónsson, H. et al. Whole genome characterization of sequence diversity of 15,220 Icelanders. Sci. Data 4, 170115 (2017).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  19. Li, H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34, 3094–3100 (2018).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  20. Schneider, V. A. et al. Evaluation of GRCh38 and de novo haploid genome assemblies demonstrates the enduring quality of the reference assembly. Genome Res. 27, 849–864 (2017).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  21. Mehringer, S. et al. SViper: a tool for SV polishing. Prep. (2019).

  22. Loman, N. J., Quick, J. & Simpson, J. T. A complete bacterial genome assembled de novo using only nanopore sequencing data. Nat. Methods 12, 733–735 (2015).

    CAS  PubMed  Article  Google Scholar 

  23. Eggertsson, H. et al. GraphTyper2 enables population-scale genotyping of structural variation using pangenome graphs. Nat. Commun. 10, 5402 (2019).

  24. Kong, A. et al. Detection of sharing by descent, long-range phasing and haplotype imputation. Nat. Genet. 40, 1068–1075 (2008).

    PubMed Central  Article  CAS  Google Scholar 

  25. Gudbjartsson, D. F. et al. Large-scale whole-genome sequencing of the Icelandic population. Nat. Genet. 47, 435–444 (2015).

    Article  CAS  Google Scholar 

  26. Pendleton, M. et al. Assembly and diploid architecture of an individual human genome via single-molecule technologies. Nat. Methods 12, 780–786 (2015).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  27. Seo, J. S. et al. De novo assembly and phasing of a Korean human genome. Nature 538, 243–247 (2016).

    CAS  PubMed  Article  Google Scholar 

  28. Sulovari, A. et al. Human-specific tandem repeat expansion and differential gene expression during primate evolution. Proc. Natl Acad. Sci. USA 116, 23243–23253 (2019).

    CAS  PubMed  Article  PubMed Central  Google Scholar 

  29. Duitama, J. et al. Large-scale analysis of tandem repeat variability in the human genome. Nucleic Acids Res. 42, 5728–5741 (2014).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  30. Sun, J. X. et al. A direct characterization of human mutation based on microsatellites. Nat. Genet. 44, 1161–1165 (2012).

    PubMed Central  Google Scholar 

  31. Pratto, F. et al. Recombination initiation maps of individual human genomes. Science 346, 1256442 (2014).

  32. Halldorsson, B. V. et al. Human genetics: characterizing mutagenic effects of recombination through a sequence-level genetic map. Science 363, eaau1043 (2019).

  33. De Cid, R. et al. Deletion of the late cornified envelope LCE3B and LCE3C genes as a susceptibility factor for psoriasis. Nat. Genet. 41, 211–215 (2009).

    PubMed Central  Google Scholar 

  34. Onengut-Gumuscu, S. et al. Fine mapping of type 1 diabetes susceptibility loci and evidence for colocalization of causal variants with lymphoid gene enhancers. Nat. Genet. 47, 381–386 (2015).

    PubMed Central  Article  CAS  Google Scholar 

  35. Kichaev, G. et al. Leveraging polygenic functional enrichment to improve GWAS power. Am. J. Hum. Genet. 104, 65–75 (2019).

    CAS  PubMed  Article  Google Scholar 

  36. Fritsche, L. G. et al. A large genome-wide association study of age-related macular degeneration highlights contributions of rare and common variants. Nat. Genet. 48, 134–143 (2016).

    Article  CAS  Google Scholar 

  37. Benonisdottir, S. et al. Sequence variants associating with urinary biomarkers. Hum. Mol. Genet. 28, 1199–1211 (2018).

    PubMed Central  Article  CAS  Google Scholar 

  38. Astle, W. J. et al. The allelic landscape of human blood cell trait variation and links to common complex disease. Cell 167, 1415–1429 (2016).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  39. Evangelou, E. et al. Genetic analysis of over 1 million people identifies 535 new loci associated with blood pressure traits. Nat. Genet. 50, 1412–1425 (2018).

    PubMed  PubMed Central  Google Scholar 

  40. Horton, J. D., Cohen, J. C. & Hobbs, H. H. PCSK9: a convertase that coordinates LDL catabolism. J. Lipid Res. 50, S172–S177 (2009).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  41. Raal, F. et al. Low-density lipoprotein cholesterol-lowering effects of AMG 145, a monoclonal antibody to proprotein convertase subtilisin/kexin type 9 serine protease in patients with heterozygous familial hypercholesterolemia: the Reduction of LDL-C with PCSK9 Inhibition in Heterozygous Familial Hypercholesterolemia Disorder (RUTHERFORD) randomized trial. Circulation 126, 2408–2417 (2012).

    CAS  PubMed  Article  Google Scholar 

  42. Cohen, J. C., Boerwinkle, E., Mosley, T. H.Jr & Hobbs, H. H. Sequence variations in PCSK9, low LDL, and protection against coronary heart disease. N. Engl. J. Med. 354, 1264–1272 (2006).

    CAS  PubMed  Article  Google Scholar 

  43. Willer, C. J. et al. Discovery and refinement of loci associated with lipid levels. Nat. Genet. 45, 1274–1283 (2013).

    PubMed  PubMed Central  Google Scholar 

  44. Kent, S. T. et al. PCSK9 loss-of-function variants, low-density lipoprotein cholesterol, and risk of coronary heart disease and stroke: data from 9 studies of Blacks and whites. Circ. Cardiovasc. Genet. 10, e001632 (2017).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  45. Saevarsdottir, S. et al. FLT3 stop mutation increases FLT3 ligand level and risk of autoimmune thyroid disease. Nature 584, 619–623 (2020).

    CAS  PubMed  Article  Google Scholar 

  46. Balder, J. W. et al. Genetics, lifestyle, and low-density lipoprotein cholesterol in young and apparently healthy women. Circulation 137, 820–831 (2018).

    CAS  PubMed  Article  Google Scholar 

  47. Doege, K. J., Sasaki, M., Kimura, T. & Yamada, Y. Complete coding sequence and deduced primary structure of the human cartilage large aggregating proteoglycan, aggrecan. Human-specific repeats, and additional alternatively spliced forms. J. Biol. Chem. 266, 894–902 (1991).

    CAS  PubMed  Article  Google Scholar 

  48. Allen, H. L. et al. Hundreds of variants clustered in genomic loci and biological pathways affect human height. Nature 467, 832–838 (2010).

    Article  CAS  Google Scholar 

  49. Doege, K. J., Coulter, S. N., Meek, L. M., Maslen, K. & Wood, J. G. A human-specific polymorphism in the coding region of the aggrecan gene: variable number of tandem repeats produce a range of core protein sizes in the general population. J. Biol. Chem. 272, 13974–13979 (1997).

    CAS  PubMed  Article  Google Scholar 

  50. Roughley, P. J., Alini, M. & Antoniou, J. The role of proteoglycans in aging, degeneration and repair of the intervertebral disc. Biochem. Soc. Trans. 30, 869–874 (2002).

  51. Schwartz, N. B. & Domowicz, M. Chondrodysplasias. In Reference Module in Biomedical Sciences https://doi.org/10.1016/b978-0-12-801238-3.03764-8 (Elsevier, 2014).

  52. Kiani, C. et al. Structure and function of aggrecan. Cell Res. 12, 19–32 (2002).

    PubMed  Article  Google Scholar 

  53. Mukamel, R. E. et al. Protein-coding repeat polymorphisms strongly shape diverse human phenotypes. Preprint at bioRxiv https://doi.org/10.1101/2021.01.19.427332 (2021).

  54. Nielsen, J. B. et al. Biobank-driven genomic discovery yields new insight into atrial fibrillation biology. Nat. Genet. 50, 1234–1239 (2018).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  55. Park, C. Y. et al. SkNAC, a Smyd1-interacting transcription factor, is involved in cardiac development and skeletal muscle growth and regeneration. Proc. Natl Acad. Sci. USA 107, 20750–20755 (2010).

    CAS  PubMed  Article  PubMed Central  Google Scholar 

  56. Roselli, C. et al. Multi-ethnic genome-wide association study for atrial fibrillation. Nat. Genet. 50, 1225–1233 (2018).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  57. Kong, A. et al. Fine-scale recombination rate differences between sexes, populations and individuals. Nature 467, 1099–1103 (2010).

    Article  CAS  Google Scholar 

  58. Hinch, A. G. et al. The landscape of recombination in African Americans. Nature 476, 170–175 (2011).

    PubMed Central  Article  CAS  Google Scholar 

  59. McLaren, W. et al. Deriving the consequences of genomic variants with the Ensembl API and SNP Effect Predictor. Bioinformatics 26, 2069–2070 (2010).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  60. Touchman, J. W. et al. The genomic region encompassing the nephropathic cystinosis gene (CTNS): complete sequencing of a 200-kb segment and discovery of a novel gene within the common cystinosis-causing deletion. Genome Res. 10, 165–173 (2000).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  61. Rafi, M. A., Luzi, P., Chen, Y. Q. & Wenger, D. A. A large deletion together with a point mutation in the GALC gene is a common mutant allele in patients with infantile Krabbe disease. Hum. Mol. Genet. 4, 1285–1289 (1995).

    CAS  PubMed  Article  Google Scholar 

  62. Luzi, P., Rafi, M. A. & Wenger, D. A. Characterization of the large deletion in the GALC gene found in patients with Krabbe disease. Hum. Mol. Genet. 4, 2335–2338 (1995).

    CAS  PubMed  Article  Google Scholar 

  63. Tappino, B. et al. Identification and characterization of 15 novel GALC gene mutations causing Krabbe disease. Hum. Mutat. 31, E1894–E1915 (2010).

    PubMed  PubMed Central  Article  Google Scholar 

  64. Nioi, P. et al. Variant ASGR1 associated with a reduced risk of coronary artery disease. N. Engl. J. Med. 374, 2131–2141 (2016).

    CAS  PubMed  Article  Google Scholar 

  65. Helgadottir, A. et al. Variants with large effects on blood lipids and the role of cholesterol and triglycerides in coronary disease. Nat. Genet. 48, 634–639 (2016).

    Article  CAS  Google Scholar 

  66. Beyter, D., Ingimundardottir, H., Eggertsson, H. P. & Bjornsson, E. Long read sequencing of 1,817 Icelanders provides insight into the role of structural variants in human disease. Preprint at bioRxiv https://doi.org/10.1101/848366 (2019).

  67. Wick, R. R., Judd, L. M. & Holt, K. E. Performance of neural network basecalling tools for Oxford Nanopore sequencing. Genome Biol. 20, 129 (2019).

  68. Li, H. et al. The sequence alignment/map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).

  69. Ben-Dor, A., Shamir, R. & Yakhini, Z. Clustering gene expression patterns. J. Comput. Biol. 6, 281–297 (1999).

  70. Loh, P.-R. et al. Efficient Bayesian mixed-model analysis increases association power in large cohorts. Nat. Genet. 47, 284–290 (2015).

    PubMed Central  Google Scholar 

  71. Benonisdottir, S. et al. Epigenetic and genetic components of height regulation. Nat. Commun. 7, 13490 (2016).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  72. Bulik-Sullivan, B. K. et al. LD Score regression distinguishes confounding from polygenicity in genome-wide association studies. Nat. Genet. 47, 291–295 (2015).

    PubMed Central  Google Scholar 

Download references

Acknowledgements

We thank J. Simpson, our colleagues from deCODE genetics/Amgen Inc. and employees of ONT for their help and advice. We also thank all research participants who provided a biological sample to deCODE genetics.

Author information

Authors and Affiliations

Authors

Contributions

D.B. implemented software, with additional software implemented by H.I., H.P.E., S.K., S.M., G.H. and B.V.H. D.B. and B.V.H. wrote the paper with input from H.I., A.O., H.P.E., E.B., H.J., B.A.A., S.K., M.T.H., S.A.G., R.P.K., G.H., G.P., O.A.S., A.H., U.T., H.H., D.F.G., P.S., O.T.M. and K.S. H.I. implemented the analysis pipelines, with input from D.B., S.K., S.A.G., S.T.S., G.M. and B.V.H. D.N.M. and O.T.M. performed ONT sequencing. Aslaug Jonasdottir and Adalbjorg Jonasdottir performed PCR validation experiments. G.E., I.O. and O.S. acquired LDL measurements. H.H. and B.T. acquired heart tissues. B.V.H. and K.S. conceived and supervised the study. All authors approved the final version of the manuscript.

Corresponding authors

Correspondence to Bjarni V. Halldorsson or Kari Stefansson.

Ethics declarations

Competing interests

D.B., H.I., A.O., H.P.E., E.B., H.J., B.A.A., S.K., M.T.H., S.A.G., D.N.M., Aslaug Jonasdottir, Adalbjorg Jonasdottir, R.P.K., S.T.S., G.H., G.P., O.A.S., G.M., A.H., U.T., H.H., D.F.G., P.S., O.T.M., B.V.H. and K.S. are employees of deCODE genetics/Amgen. The remaining authors declare no competing interests.

Additional information

Peer review information Nature Genetics thanks Mark Chaisson and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Oxford Nanopore Technologies (ONT) long-read sequencing statistics.

a, N50 length per flowcell (N = 4,757 flowcells) prior to GRCh38 alignment. b,c,d, Aligned coverage, alignment percentage, and error rates stratified by type, per individual (N = 3,622 individuals). Statistics are computed over sequenced reads longer than 3000 bp. In panel d, box limits indicate upper and lower quartiles, centre line indicates median, and whiskers indicate ±1.5 times the interquartile range.

Extended Data Fig. 2 SquiggleSVFilter overview.

Given a candidate structural variant (SV), and an SV supporting read, SquiggleSVFilter first identifies the subread of the ONT basecalled read overlapping the SV, using the reference alignment BAM file. Next it finds the squiggle slice of the identified subsequence using the event table. For both the left and right flanks around the variant, it determines the reference and alternative sequences given the candidate variant, and computes their raw data-vs-sequence log likelihood scores with the squiggle slice. A sufficiently high log likelihood score difference for the alternate allele marks the read as an SV supporting read.

Extended Data Fig. 3 Allele frequency distribution of SVs at low frequency.

SVs are binned at 0.01% for alleles with 0.1% to 5% frequency.

Extended Data Fig. 4 Length and modulo distributions of structural variants (SVs) that are contained within exons.

a, Length distribution of SVs with lengths between 50 and 100. Stars denote lengths divisible by 3. (N = 224 markers). b, Modulo distribution of SV lengths across length intervals. (N = 549).

Supplementary information

Supplementary Information

Supplementary Methods and Figs. 1–4

Reporting Summary

Supplementary Tables

Supplementary Tables 1–6

Supplementary Data 1

Sequencing-related information of 4,757 flow cells from 3,622 individuals.

Supplementary Data 2

Summary-level data of high-confidence SVs.

Supplementary Data 3

Primer sequences and results from PCR validation.

Supplementary Data 4

5,238 SVs in strong LD with GWAS catalog variants and related data.

Supplementary Data 5

List of genes with at least one homozygous carrier of a rare high-impact SV allele in our study.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Beyter, D., Ingimundardottir, H., Oddsson, A. et al. Long-read sequencing of 3,622 Icelanders provides insight into the role of structural variants in human diseases and other traits. Nat Genet 53, 779–786 (2021). https://doi.org/10.1038/s41588-021-00865-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41588-021-00865-4

Further reading

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing