Genomic characterization of the RH locus detects complex and novel structural variation in multi-ethnic cohorts

Wheeler, Marsha M.; Lannert, Kerry W.; Huston, Haley; Fletcher, Shelley N.; Harris, Samantha; Teramura, Gayle; Maki, Helena J.; Frazar, Chris; Underwood, Jason G.; Shaffer, Tristan; Correa, Adolfo; Delaney, Meghan; Reiner, Alex P.; Wilson, James G.; Nickerson, Deborah A.; Johnsen, Jill M.

doi:10.1038/s41436-018-0074-9

Article
Published: 29 June 2018

Genomic characterization of the RH locus detects complex and novel structural variation in multi-ethnic cohorts

Marsha M. Wheeler PHD¹,
Kerry W. Lannert MT (ASCP)²,
Haley Huston BSc³,
Shelley N. Fletcher BSc³,
Samantha Harris BSc³,
Gayle Teramura BSc³,
Helena J. Maki²,
Chris Frazar MSc¹,
Jason G. Underwood PHD¹,
Tristan Shaffer BSc¹,
Adolfo Correa MD, MPH, PHD⁴,
Meghan Delaney DO, MPH^3,5,
Alex P. Reiner MD, MSc⁶,
James G. Wilson MD⁷,
Deborah A. Nickerson PHD^1,8,
Jill M. Johnsen MD^2,9 &
NHLBI Trans-Omics for Precision Medicine (TOPMed) Consortium

Genetics in Medicine volume 21, pages 477–486 (2019)Cite this article

1634 Accesses
21 Citations
6 Altmetric
Metrics details

Abstract

Purpose

Rh antigens can provoke severe alloimmune reactions, particularly in high-risk transfusion contexts, such as sickle cell disease. Rh antigens are encoded by the paralogs, RHD and RHCE, located in one of the most complex genetic loci. Our goal was to characterize RH genetic variation in multi-ethnic cohorts, with the focus on detecting RH structural variation (SV).

Methods

We customized analytical methods to estimate paralog-specific copy number from next-generation sequencing (NGS) data. We applied these methods to clinically characterized samples, including four World Health Organization (WHO) genotyping references and 1135 Asian and Native American blood donors. Subsequently, we surveyed 1715 African American samples from the Jackson Heart Study.

Results

Most samples in each dataset exhibited SV. SV detection enabled prediction of the immunogenic RhD and RhC antigens in concordance (>99%) with serological phenotyping. RhC antigen expression was associated with exon 2 hybrid alleles (RHCE*CE-D(2)-CE). Clinically relevant exon 4–7 hybrid alleles (RHD*D-CE(4-7)-D) and exon 9 hybrid alleles (RHCE*CE-D(9)-CE) were prevalent in African Americans.

Conclusion

This study shows custom NGS methods can accurately detect RH SV, and that SV is important to inform prediction of relevant RH alleles. Additionally, this study provides the first large NGS survey of RH alleles in African Americans.

You have full access to this article via your institution.

Download PDF

Investigation of blood group genotype prevalence in Korean population using large genomic databases

Article Open access 15 September 2023

Human leukocyte antigen super-locus: nexus of genomic supergenes, SNPs, indels, transcripts, and haplotypes

Article Open access 21 December 2022

Investigating the genetic makeup of the major histocompatibility complex (MHC) in the United Arab Emirates population through next-generation sequencing

Article Open access 09 February 2024

Introduction

Blood group systems are inherited entities with direct clinical importance in transfusion and transplantation medicine. Blood group antigens are expressed on the surface of red blood cells (RBCs); most are glycoproteins with specificity determined by their oligosaccharide or amino acid sequence.¹ The genes that encode nearly all blood group systems are known² and several exhibit substantial genetic complexity and population-specific heterogeneity.

The Rh blood group system contains highly immunogenic antigens and commonly exhibits complex genetic variation including structural variation (SV). It is comprised of >50 different antigens, including the polymorphic RhD (D) and RhCE (C, c, E, and e) antigens. This antigenic diversity stems from genetic variation in two homologous paralogs, RHD and RHCE, which lie in close proximity at the RH locus.³ At present, RHD and RHCE encode >280 reported alleles (haplotypes) which include RHD gene deletions and RHD–RHCE hybrids.^2,4 This level of complexity poses clinical challenges and can provoke significant rates of Rh allosensitization.^5,6 In one study, 45% of chronically transfused African American patients with sickle cell disease (SCD) experienced alloimmunization, primarily due to undetected variation in the Rh blood group system.⁵ High rates of Rh alloimmunization persist even when patients receive transfusions from serologically matched African American donors,⁵ demonstrating the need for higher-resolution Rh blood group information.

Serology is the mainstay of clinical RBC typing, including Rh. However, serology has known limitations that can be overcome with molecular testing.⁷ In clinical laboratories, DNA-based prediction is typically performed using genotyping platforms (e.g., single-nucleotide polymorphism [SNP] arrays), Sanger sequencing, and variant-specific methods (e.g., polymerase chain reaction with sequence specific primers [PCR-SSP], restriction fragment length polymorphism [RFLP]).⁷ These can be used to characterize patients with unexpected alloantibodies, patients at risk for allosensitization, or recently transfused patients. DNA-based methods are also used to identify alleles for which antisera are unavailable and to test for paternal zygosity of the D antigen for pregnancies at risk of hemolytic disease of the fetus and newborn.^7,8 In addition, RBC genotyping methods can aid in discriminating Rh phenotypes, which can produce indeterminate or conflicted serological results.⁹ Genotyping methods can discriminate RH partial alleles, which lead to missing antigen epitopes and antibody formation when exposed to the conventional antigen.¹⁰ Genetic methods can also discern weak RH alleles, which reduce the quantity of antigens on the surface of RBCs but maintain display of the same epitopes as conventional Rh antigens.¹¹

Currently, there is growing interest in applying next-generation sequencing (NGS) to Rh antigen prediction.^{12,13,14,15,16} NGS can systematically survey for genetic variants, including SV, and is scalable for high-throughput screening. To date, efforts to detect RH variation using NGS have shown success in detecting clinically relevant variation but technical challenges have limited the interpretation of RH variation and the detection of SV.^{12,13,14,15,16} Our primary goal was to develop an RH genotyping method that addressed RH SV, including RHD–RHCE hybrid alleles that alter Rh antigen expression. We customized paralog-specific SV analyses¹⁷ and first applied these methods to four World Health Organization (WHO) RBC genotyping reference samples and to 1135 clinically immunophenotyped and clinically genotyped samples from Asian and Native American blood donors.¹⁸ Subsequently, we applied our methods to survey RH variation in 1715 unrelated African American samples from the Jackson Heart Study (JHS). This cohort was whole-genome sequenced (WGS) by the National Heart, Lung, and Blood Institute (NHLBI) Trans-Omics for Precision Medicine (TOPMed) program and analyzed in this study to provide the first NGS survey of RH alleles for this population.

Materials and methods

Samples

We purchased four WHO reference DNAs (RBC1, RBC4, RBC5, RBC12) from the National Institute for Biological Standards and Control. WHO references were clinically characterized and genotyped by a variety of methods¹⁹ but to our knowledge, not by NGS. These samples represent common European (RBC1, RBC4, RBC5) and African (RBC12) RH alleles (Table 1) including alleles encoding D positive (D+), D negative (D−), and combinations of C, c, E, e antigens (Table 1) ¹⁹.

Table 1 Summary of NGS-predicted RH alleles, known Rh serology, and DNA variants in WHO reference samples

Full size table

Asian and Native American samples (N = 1168) were selected from a prior population study of blood donors.¹⁸ Blood samples were collected from consented volunteer donors by Bloodworks Northwest. All samples were previously clinically tested for D and C antigens by serology and for C, c, E, and e genotype using a SNP array, HEA BeadChipTM Kit (Bioarray Solutions Ltd., Immucor).¹⁸ This sample set included 82 samples discrepant between C serology and SNP (N = 16) or indeterminate on the SNP array (N = 66).

African American samples (N = 1715) were selected from JHS samples (phs000964) WGS by the NHLBI TOPMed program. The JHS is a community-based observational study in which individuals were recruited from the tri-county area surrounding Jackson, Mississippi, including a subset who participated in the Atherosclerosis Risk in Communities Study.²⁰ The samples in this study were randomly selected from the maximum unrelated JHS sample set as identified using KING v1.4.0 (no individuals with a first or second degree relationship).

Library preparation and next-generation sequencing

DNA libraries from WHO and Asian and Native American samples were captured with a targeted panel designed to capture 41 blood group–relevant genes (1473 Kb; Nimblegen, Table S1). For RH, this panel captured 269 Kb of continuous sequence including introns, exons, utranslated regions (UTRs), and promoter regions. Library preparation followed a shotgun library construction method²¹ and was hybridized in multiplex (22–24 samples per reaction). Sequencing was performed on Illumina HiSeq 2500 machines using paired-end 100 bp reads to a mean coverage of approximately 150×. In total, 1139 samples (1135 Asian and Native American and 4 WHO samples) passed sequencing quality thresholds. No samples were excluded based on performance at the RH locus.

JHS African American samples were WGS by the NHLBI TOPMed program. Library preparation for JHS samples similarly followed a shotgun library construction method.²¹ Sequencing was performed on Illumina HiSeq X machines using paired-end 150 bp to a mean coverage of approximately 30×. Raw sequence data was aligned to the human reference genome (GRCh37) using BWA-MEM.²²

Detection of RH SV

SV in RHD and RHCE was identified using an adaptation of methods described previously.¹⁷ SV was identified by leveraging singly unique nucleotides (SUNs) within a repeat masked, pairwise sequence alignment of RHD and RHCE. SUNs were similarly identified in the Rhesus boxes flanking RHD.³ SUNs were used to anchor DNA sequence k-mers (k = 70), which were screened for uniqueness against GRCh37 (BLAT v3.5, UCSC). K-mers were omitted if they contained >1 perfect match. Read depth was estimated for remaining k-mers using a mapping quality ≥40. Copy number was estimated by normalizing using sequencing depth and mean read depth for samples visually confirmed to have no SV. In total, 9189 k-mers for RHD and RHCE and 2054 k-mers in the Rhesus boxes informed SV analyses. K-mers were distributed across the RH locus except for RHCE exon 10. RHD exon 10 k-mers were identified in alignment of the Rhesus boxes. SV breakpoints were identified by change-point analysis using the R changepoint package.²³ SV impacting RH exons was prioritized.

Detection of RH SNVs and indels and RH allele identification

Single-nucleotide variants (SNVs) and small insertions and deletions (indels) were genotyped using GATK HaplotypeCaller and haplotype phased using statistical methods (Beagle v4.1) ²⁴. Functional annotation was incorporated using SeattleSeq Annotation (http://snp.gs.washington.edu/SeattleSeqAnnotation138/). All variants were annotated relative to the RefSeq transcripts, NM_016124.3 (RHD) and NM_020485.4 (RHCE). To identify RH alleles, SNVs, indels, and SVs were cross-referenced with alleles listed by the International Society of Blood Transfusions (ISBT) v2.0 110914, supplemented by information from Rhesusbase.⁴ For cross-referencing, complementary DNA (cDNA) coordinates associated with ISBT alleles were converted to GRCh37 coordinates. Chr1:25643553 (NM_016124.3:c.1136) and chr1:25747230 (NM_020485.4:c.48) are variant in GRCh37 relative to ISBT v2.0 110914. Novel variants were selected based on their absence in ISBT v2.0 110914 and prioritized as impactful based on variant function (e.g., predicted loss of function). Genotype quality (GQ) was assessed for novel and annotated ISBT SNVs and indels. Chr1:25643553, which encodes the primary variant of the DAU cluster (the DAU0 allele), had variable GQ because it is present in a multiply-mapping region in exon 8. GQ was low when Chr1:25643553 was variant relative to GRCh37, which contains the DAU0 variant (NM_016124.3:c.1136T). Low GQ resulted from low coverage of RHD exon 8 due to the misalignment of reads from this region to its highly homologous region in RHCE.

Quantitative multiplex PCR of short fluorescent fragments

To validate NGS-detected RH SV, we performed quantitative multiplex PCR of short fluorescent fragments (QMPSF).²⁵ Fluorescently tagged primers were used to amplify WHO and 18 Asian and Native American samples (N = 22) representative of RHD gene deletions, RHD–RHCE hybrid alleles or deletions/duplications, and to have no SV. QMPSF primers amplified gene-specific RHD and RHCE exons. F9 exon 7 and HFE exon 2 amplicons served as positive amplification markers and as normalization controls. QMPSF products were separated via capillary gel electrophoresis (ABI 3130xl, Applied Biosystems). Fluorescence peaks were analyzed using the R Fragman package²⁶ and normalized using the maximum HFE peak height.

Combinatorial PCR and Sanger sequencing

To confirm RHCE*CE-D(2)-CE alleles (see Results) as hybrid alleles, we designed allele-specific long-range PCRs. Primer pairs were designed to target unique sequences between intron 1–exon 2 and exon 2–exon 3 (Table S2). PCRs were performed pairing RHD- and RHCE-specific primers in a combinatorial manner. PCRs consisted of 12.5 µL of Q5 Hot Start High-Fidelity Master Mix (NEB M0494S), 0.5 µM of forward and reverse primers, and 50 ng DNA. Cycling conditions for intron 1–exon 2 were: 98 °C for 30 s followed by 30 cycles of 98 °C for 10 s, 76 °C for 30 s, 72 °C for 6 min, and 72 °C for 2 min. Cycling conditions for exon 2–exon 3 were identical except annealing and extension temperatures were 68 °C for 30 s and 72 °C for 3 min, respectively. PCR was performed on 21 samples (including WHO samples). Two samples with PCR-confirmed RHCE*CE-D(2)-CE events were cloned into pMiniT vector (NEB PCR Cloning Kit). Insert-positive clones were Sanger sequenced with vector-specific and gene-agnostic primers (Table S3). Products were aligned against RHD and RHCE (GRCh37) using Geneious R8 software.

Results

NGS-based characterization of WHO reference samples

We used custom paralog-specific NGS analyses to detect SV at the RH locus. These analyses detected SV in all WHO reference samples. In RBC1 and RBC4, NGS analyses detected SV signals (Fig. 1a, c) indicative of RHD-to-RHCE hybrid alleles (RHCE*CE-D(2)-CE), similar to alleles previously associated with the C+ phenotype.^27,28 Zygosity for this event was consistent with C and c phenotypes (Table 1)¹⁹. In RBC5, analyses detected a homozygous RHD deletion causal for its reported D- phenotype (Fig. 1b, Table 1). In RBC12, analyses detected a hemizygous RHD deletion and SV indicative of an exon 9 hybrid allele (RHCE*CE-D(9)-CE) (Fig. 1d). The latter event was not reported previously for RBC12 ¹⁹. Each SV event was validated by QMPSF (Fig. 1). The one discrepancy between QMPSF and NGS analyses related to whether SV in RBC12 impacts exon 8 in addition to exon 9: QMPSF amplification is suggestive of exon 8 SV, but NGS-based breakpoints predicted exon 8 to be unaffected (Fig. 1d). The homozygous RHD deletion in RBC5 and RHCE*CE-D(2)-CE alleles predicted in RBC1 and RBC4 were additionally validated by allele-specific PCR (Fig. S1). PCR confirmed RHCE*CE-D(2)-CE events to be hybrid alleles and not separate SV events.

RBC1, RBC4, and RBC12 also harbored SNVs indicative of previously characterized alleles (Table 1). In RBC1 and RBC4, we detected variants indicative of weak RHCE alleles (Table 1). RBC12 contained hemizygous RHD SNVs representative of an RHD null allele including a 37-bp insertion and the stop-gained variant casual for its D- phenotype (Table 1). RBC12 also harbored missense variants associated with the RHCE*01.20.02 allele and the V+VS+ phenotype, a known finding for RBC12 ¹⁹.

NGS-based characterization of clinically characterized Asian and Native American samples

Paralog-specific analyses detected SV in 90% of Asian and Native American samples (Fig. 2a, genotypes listed in Tables S4–S5). Note, we do not provide representative allele frequencies for these populations because this sample set was selected in a nonrandom manner. The RHD deletion was detected in 375 samples (100 homozygotes and 275 hemizygotes, Fig. 2a). The predicted mean length for this event was 70154 ± 1888 bp and exhibited recombination signals between the flanking Rhesus boxes (similar to Fig. 1b).³ RHCE*CE-D(2)-CE alleles were detected in 832 samples (388 homozygotes and 444 heterozygotes, Fig. 2a). The mean length for this event was 4953 ± 238 bp, with the most common variant being 4959 bp in size (n = 823) but other differently sized RHCE*CE-D(2)-CE were detected and ranged in size from 1038 to 7183 bp.

In 25 samples, we detected SV events impacting other RHD and RHCE exons, including RHD gene duplications and extensive RHD–RHCE hybrid alleles (see Figs. 2a and 3). Three of these events are annotated in ISBT v2.0 110914: RHD*D-CE(4-7)-D (RHD*01N.07, Fig. 3b), RHD*D-CE(3-9)-D (RHD*01N.04, Fig. 3c) and RHD*D-CE(4-8)-D (RHD*01N.07). RHD*D-CE(4-7)-D and RHD*D-CE(4-8)-D share ISBT allele names because previous genotyping methods could not determine whether exon 8 was affected.⁴

Standard SNV/indel calling methods detected SNVs associated with established serological phenotypes (Table S6, Tables S4–S5). In RHD, SNVs indicative of 2 RHD null alleles, 7 weak D and Del alleles, and 6 partial D alleles were detected (Table S6). Six samples with weak D and partial D alleles were predicted to inform D phenotype because of compound heterozygosity with RHD deletions. For example, one serologically D- sample harbored a splice-site variant (RHD*DEL1) and was hemizygous for a RHD gene deletion. In RHCE, variants were indicative of 10 RHCE alleles (Table S6). Predicted loss-of-function variants not reported in ISBT included 1 splice-site variant in RHD and 1 splice region variant in RHCE (Table S7).

QMPSF and allele-specific PCR for clinically characterized Asian and Native American samples

Using QMPSF, we tested 18 samples that collectively represented a variety of SV events (Fig. 3, Figs. S2–S4). QMSF validated NGS-predicted events in all samples tested. As above, the discrepancy between QMPSF and NGS analyses related to the size of SV in RHCE*CE-D(9)-CE and RHCE*CE-D(8-9)-CE alleles (Figs. S3–S4).

Allele-specific PCRs further validated samples encompassing no SV, RHD deletions, and RHCE*CE-D(2)-CE events (N = 17, Figs. S5–S6). Cloning and Sanger sequencing of two samples exhibiting the common RHCE*CE-D(2)-CE allele confirmed a RHCE intron 1 SNV that was identified by NGS analysis in the larger dataset (chr1:25736299, Fig. S7). This SNV has not been previously reported and is positioned consistent with the RHCE-RHD intron 1 breakpoint. The RHCE*CE-D(2)-CE intron 2 breakpoint in these two samples was defined by a 109-bp insertion, which has been previously reported.²⁸

Comparisons between NGS-based RH alleles with SNP array–based typing and D and C serology

In Asian and Native American samples, NGS-based RH alleles were predicted blind to serology and SNP genotyping. NGS-genotype considered SNVs, indels, and SV within each sample. Briefly, D- in this dataset was mainly predicted from homozygous loss of RHD. However, one D- sample was predicted to be DEL due to hemizygous loss of RHD and the presence of the RHD*DEL1 allele, a relevant distinction as DEL can provoke anti-D.²⁹ Another D- sample exhibited hemizygous loss of RHD and a deletion of RHD exon 3 (see example of exon 3 deletion in Fig. 3f), predicting a partial D phenotype. C and c antigens were predicted based on the presence of RHCE*CE-D(2)-CE alleles, while E and e genotypes were assigned using the ISBT annotated RHCE missense (NM_020485.4:c.676G>C).

Subsequent comparisons of NGS-genotype with serology showed agreement with the D antigen in 99.8% of samples and with the C+ antigen in 99.2% of samples. Comparisons with clinical SNP-genotype showed 99.9% agreement for prediction of E and e antigens. Direct comparison between all C SNP array predictions and all C NGS-based predictions was not possible due to indeterminate SNP array results in 66 samples (see Methods). In samples that did have SNP-based c and C predictions, our results were 99.5 and 99.7% concordant, respectively. All 66 samples with indeterminate C SNP array calls were predicted by NGS in agreement with serology. Most C SNP indeterminate samples (59/66) were NGS-predicted to be C+; all 66 of these samples were 100% concordant between NGS and serology. Moreover, NGS resolved 9 of 16 samples that were discordant between C SNP array–based genotype and C serology.

NGS-based characterization of African American samples

RH SV was detected in 61% of African American samples (Fig. 2b, genotypes listed in Tables S8–S9). RHD gene deletions were present in 586 samples (mean length = 70572 ± 3352 bp) including 56 homozygotes and 530 hemizygotes (Fig. 2b). RHCE*CE-D(2)-CE events were present in 406 samples (mean length = 5216 ± 796 bp) including 33 homozygotes and 373 heterozygotes. We additionally detected hybrid alleles at relatively high prevalence, including RHD*D-CE(4-7)-D (RHD*01N.07) and RHCE*CE-D(9)-CE (Fig. 2b).

SNVs identified in African American samples were indicative of several RHD null alleles, weak D alleles, partial D alleles, and RHCE alleles (Table S10–11, Tables S8–S9). SNV-based RH alleles with allele frequencies >1% are shown in Table 2, with previously reported SNP array–based allele frequencies.³⁰ Note we detected DAU alleles in several samples (Table 2) but GQ for the primary variant was variable due to sequence homology. In African American samples, we also identified 5 predicted loss-of-function variants not reported in ISBT. In RHD, this included 1 splice-site variant and 2 frameshifts. In RHCE, this included 1 splice-site variant and 1 stop-gained variant (Table S12).

Table 2 Prevalent (>1%) single-nucleotide variant (SNV)-based RHD and RHCE alleles detected in African American samples

Full size table

Discussion

n recent years, there has been growing interest in applying NGS to predict Rh antigens.^{12,13,14,15,16} This has been motivated, in part, by the high rates of Rh allosensitization in multiply transfused patients, particularly in African American patients with sickle cell disease.^31,32 In this population, high rates of allosensitization persist even after patients have been matched by serology for D, C, c, E, e antigens and received racially matched RBC transfusions.⁵ Evidence suggests this is primarily due to the presence of undetected RH variation in patients and donors,⁵ emphasizing the need to predict Rh antigens in a systematic and locus-informed manner.

To this end, studies have shown NGS is a viable approach for predicting RBC antigens.^{12,13,14,15,16} However, these studies have applied NGS on a limited scale, mostly to a small number of well-characterized individuals, and have been largely insensitive in identifying complex SV, including RHD–RHCE hybrid alleles.^{12,13,14,15,16} Here, we show customized NGS-based methods can detect known and novel RH variation in two large cohorts comprised of individuals of Asian American, Native American, and African American descent.

This customized RH method leverages nucleotide differences between RHD and RHCE to exclude mapping artifacts associated with NGS short read data. This approach enabled SV detection in previously problematic regions including exons 1, 2, and 8 ^12,13,15,16by using information in flanking intronic sequences. Importantly, this approach performs robustly both in targeted capture and whole-genome sequencing, indicating it is generalizable to datasets where NGS spans the RH locus. In addition, this approach provides the ability to detect RH SV at scale to measure allele frequencies in large genomic datasets.

We specifically detected RHCE*CE-D(2)-CE hybrid alleles as prevalent across all datasets. Similar alleles were reported previously and associated with C+ expression, such as by Carrit et al.²⁸ However, at present there is a lack of clarity as to whether these alleles are causal for C+ expression. Recent exome studies report exon 2 read depth signals associated with C+, which are indicative of SV;^15,16 however, the majority of modern literature including RHCE genotyping references report exon 1 and 2 RHCE SNVs as causal for C+². In these large-scale analyses, the most common RHCE*CE-D(2)-CE allele spanned ~5 Kb and a subset of samples with RHCE*CE-D(2)-CE were validated by multiple orthogonal methods. Sanger sequencing characterized the common RHCE*CE-D(2)-CE intronic breakpoints, including an RHCE 109 bp “insertion” currently used in C genotyping as well as a previously undetected SNV at the RHCE*CE-D(2)-CE breakpoint in RHCE intron 1. Our analyses also show RHCE*CE-D(2)-CE correctly predicted C serology in 99.2% of clinically characterized samples, strongly supporting that RHCE*CE-D(2)-CE is the common cause for C+ antigen expression.

We further identified multiple RH hybrid alleles consistent with named ISBT alleles. We identified the clinically known RHD*01N.07 (RHD*D-CE(4-7)-D) in both large cohorts and validated this NGS signature by QMSPF (Fig. 3b). This allele was prevalent (2.3%) in African Americans, consistent with a recent study reporting this allele to occur in 2.9% of African American individuals and sickle cell disease patients³⁰ and 10-fold higher than in European populations.³³

Our methods identified novel RH SV alleles that impacted exons 8 and 9. This finding suggests previous genotyping efforts may have been hindered by sequence homology across these exons, a notion supported by our finding of RHCE*CE-D(9)-CE allele in the well-characterized WHO reference, RBC12. Notably, RHCE*CE-D(9)-CE was common (3.9%) in African American samples. In Asian and Native American samples, QMPSF validated RHCE*CE-D(9)-CE alleles but also showed amplification of exon 8 in a subset of samples. QMPSF infers exon 8 copy number through amplification of nearby intronic sequences, leading us to hypothesize intronic variation associated with RHCE*CE-D(9)-CE may have impacted this QMPSF result. Alternatively, our NGS-based methods could have excluded exon 8 as part of the SV due to the breakpoint being in a region of high homology.

Although our analyses were focused on SV, we genotyped SNVs indicative of known ISBT alleles. Notably, in an Asian American sample we detected hemizygous loss of RHD and an RHD splice-site variant causal for the DEL phenotype (RHD*DEL1). This correlated with the D- phenotype reported in this blood donor, but this is a relevant finding as DEL is not null for D protein expression and can provoke D alloimmunization. This DEL allele has been reported as a common cause of D- in Asian populations;²⁹ although, in this study of Asian Americans homozygous loss of RHD was the primary cause of D-. We further found weak and partial RH alleles known to be prevalent and clinically consequential in African populations (Table 2). Consistent with previous NGS work,¹⁵ we detected common RHD SNVs in African Americans indicative of DAU alleles. The primary DAU0 SNV had variable genotype quality leading us (and others)¹⁵ to provide caution when interpreting DAU allele frequencies derived from NGS. The limitation we observed was low coverage in the absence of the DAU0 SNV due to increased sequence homology with RHCE. Additional customization of NGS analyses, such as the use of an alternative mapping locus, should resolve this limitation. Separately, in RBC12, we detected SNVs indicative of RHD*04N.01 and RHCE*01.20.02. In African Americans, we detected RHD*04N.01 at a frequency of ~3% (Table 2), consistent with allele frequencies reported by other studies in individuals of African descent.³⁰ RHD*04N.01 co-occurred with hemizygous RHD gene deletions predicting D- in 1.4% of African Americans, while 3.2% of African Americans were D- due to homozygous RHD gene deletions.

In summary, our results show the ability of NGS-based methods to systematically identify RH SV and detect known, complex, and novel RH SV. This represents the first scale study of RH variation in Asian and Native Americans and the largest population survey of RH SV in African Americans to date. We found complex SV to be common suggesting additional clinically relevant RH variation remains undiscovered. Altogether, this study shows locus-informed genomic approaches can detect RH alleles and characterize complex genetic variation in large and diverse datasets.

References

Reid, ME, Lomas-Francis, C & Olsson, ML. The Blood Group Antigen FactsBook. London, UK. (Academic Press, 2012).
Storry JR, et al. International society of blood transfusion working party on red cell immunogenetics and terminology: report of the Seoul and London meetings. ISBT Sci Ser. 2016;11:118–22.
Article CAS Google Scholar
Wagner FF, Flegel WA. RHD gene deletion occurred in the Rhesus box. Blood. 2000;95:3662–8.
CAS PubMed Google Scholar
Wagner FF, Flegel WA. The rhesus site. Transfus Med Hemother. 2014;41:357–63.
Article Google Scholar
Chou ST, et al. High prevalence of red blood cell alloimmunization in sickle cell disease despite transfusion from Rh-matched minority donors. Blood. 2013;122:1062–71.
Article CAS Google Scholar
Sippert E, et al. Variant RH alleles and Rh immunisation in patients with sickle cell disease. Blood Transfus. 2015;13:72–7.
PubMed PubMed Central Google Scholar
Hillyer CD, Shaz BH, Winkler AM, Reid M. Integrating molecular technologies for red blood cell typing and compatibility testing into blood centers and transfusion services. Transfus Med Rev. 2008;22:117–32.
Article Google Scholar
Westhoff CM. Molecular DNA-based testing for blood group antigens: recipient-donor focus. ISBT Sci Ser. 2013;8:1–5.
Article CAS Google Scholar
Denomme GA, Dake LR, Vilensky D, Ramyar L, Judd WJ. Rh discrepancies caused by variable reactivity of partial and weak D types with different serologic techniques. Transfusion. 2008;48:473–8.
Article Google Scholar
Castilho L, et al. High frequency of partial DIIIa and DAR alleles found in sickle cell disease patients suggests increased risk of alloimmunization to RhD. Transfus Med. 2005;15:49–55.
Article CAS Google Scholar
Wagner FF, et al. Molecular basis of weak D phenotypes. Blood. 1999;93:385–93.
CAS PubMed Google Scholar
Stabentheiner S, et al. Overcoming methodical limits of standard RHD genotyping by next-generation sequencing. Vox Sang. 2011;100:381–8.
Article CAS Google Scholar
Fichou Y, Audrézet M-P, Guéguen P, Le Maréchal C, Férec C. Next-generation sequencing is a credible strategy for blood group genotyping. Br J Haematol. 2014;167:554–62.
Article CAS Google Scholar
Lane WJ, et al. Comprehensive red blood cell and platelet antigen prediction from whole genome sequencing: proof of principle. Transfusion. 2016;56:743–54.
Article CAS Google Scholar
Chou ST, et al. Whole-exome sequencing for RH genotyping and alloimmunization risk in children with sickle cell anemia. Blood Adv. 2017;1:1414–22.
Article CAS Google Scholar
Schoeman EM, et al. Evaluation of targeted exome sequencing for 28 protein-based blood group systems, including the homologous gene systems, for blood group genotyping. Transfusion. 2017;57:1078–88.
Article CAS Google Scholar
Sudmant PH, et al. Diversity of human copy number variation and multicopy genes. Science. 2010;330:641–6.
Article CAS Google Scholar
Delaney M, et al. Red blood cell antigen genotype analysis for 9087 Asian, Asian American, and Native American blood donors. Transfusion. 2015;55:2369–75.
Article CAS Google Scholar
Boyle J, et al. International reference reagents to standardise blood group genotyping: evaluation of candidate preparations in an international collaborative study. Vox Sang. 2013;104:144–52.
Article CAS Google Scholar
Taylor HA. The Jackson Heart Study: an overview. Ethn Dis. 2005;15:S6–1–3.
PubMed Google Scholar
Ng SB, et al. Targeted capture and massively parallel sequencing of 12 human exomes. Nature. 2009;461:272–6.
Article CAS Google Scholar
Li, H Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arXiv 2013;1303:3997.
Killick, R, and Eckley IA. changepoint: An R Package for Changepoint Analysis. Journal of Statistical Software;58:1–19.
Browning SR, Browning BL. Rapid and accurate haplotype phasing and missing-data inference for whole-genome association studies by use of localized haplotype clustering. Am J Hum Genet. 2007;81:1084–97.
Article CAS Google Scholar
Fichou Y, et al. A convenient qualitative and quantitative method to investigate RHD-RHCE hybrid genes. Transfusion. 2013;53:2974–82.
Article CAS Google Scholar
Covarrubias-Pazaran G, Diaz-Garcia L, Schlautman B, Salazar W, Zalapa J. Fragman: an R package for fragment analysis. BMC Genet. 2016;17:62.
Article Google Scholar
Poulter M, Kemp TJ, Carritt B. DNA-based rhesus typing: simultaneous determination of RHC and RHD status using the polymerase chain reaction. Vox Sang. 1996;70:164–8.
Article CAS Google Scholar
Carritt B, Kemp TJ, Poulter M. Evolution of the human RH (rhesus) blood group genes: a 50 year old prediction (partially) fulfilled. Hum Mol Genet. 1997;6:843–50.
Article CAS Google Scholar
Kwon DH, Sandler SG, Flegel WA. DEL phenotype. Immunohematology. 2017;33:125–32.
PubMed PubMed Central Google Scholar
Reid ME, Halter-Hipsky C, Hue-Roye K, Hoppe C. Genomic analyses of RH alleles to improve transfusion therapy in patients with sickle cell disease. Blood Cells Mol Dis. 2014;52:195–202.
Article CAS Google Scholar
Aygun B, Padmanabhan S, Paley C, Chandrasekaran V. Clinical significance of RBC alloantibodies and autoantibodies in sickle cell patients who received transfusions. Transfusion. 2002;42:37–43.
Article CAS Google Scholar
Lasalle-Williams M, et al. Extended red blood cell antigen matching for transfusions in sickle cell disease: a review of a 14-year experience from a single center (CME). Transfusion. 2011;51:1732–9.
Article Google Scholar
Wagner FF, Frohmajer A, Flegel WA. RHD positive haplotypes in D negative Europeans. BMC Genet . 2001;2:10.
Article CAS Google Scholar

Download references

Acknowledgements

We thank our colleagues at Bloodworks NW and the Nickerson Lab for their advice and assistance, particularly Danielle Drury-Stewart, Thomas Walsh, Colleen Lammers, Ken Setran, Yanyun Wu, James Zimring, Karen Nelson, Barbara Konkle, Colleen Davis, Stephanie Krauter, Josh Smith, Peggy Robertson, Steven Lee, and Qian Yi. This study was supported by an NHLBI RS&G Pilot Project (HHSN268201100037C) and a Cardiovascular Research Training Grant. Whole-genome sequencing (WGS) for the TOPMed program was supported by the NHLBI. WGS for the Jackson Heart Study (JHS) (phs000964.v1.p1) was performed at the University of Washington (UW) Northwest Genomics Center (HHSN268201100037C). Centralized data harmonization was provided by the TOPMed Informatics Research Center (3R01HL-117626-02S1). Phenotype harmonization and general study coordination were provided by the TOPMed Data Coordinating Center (3R01HL-120393-02S1). We gratefully acknowledge the studies and participants who provided biological samples. The JHS is supported and conducted in collaboration with Jackson State University (HHSN268201300049C and HHSN268201300050C), Tougaloo College (HHSN268201300048C), and the University of Mississippi Medical Center (HHSN268201300046C and HHSN268201300047C) contracts from the NHLBI and the National Institute for Minority Health and Health Disparities (NIMHD).

Author information

Authors and Affiliations

University of Washington, School of Medicine, Department of Genome Sciences, Seattle, Washington, USA
Marsha M. Wheeler PHD, Chris Frazar MSc, Jason G. Underwood PHD, Tristan Shaffer BSc & Deborah A. Nickerson PHD
Bloodworks NW Research Institute, Seattle, Washington, USA
Kerry W. Lannert MT (ASCP), Helena J. Maki & Jill M. Johnsen MD
Bloodworks NW Specialty Diagnostics, Red Cell Genomics Laboratory, Seattle, Washington, USA
Haley Huston BSc, Shelley N. Fletcher BSc, Samantha Harris BSc, Gayle Teramura BSc & Meghan Delaney DO, MPH
Department of Medicine, University of Mississippi Medical Center, Jackson, Mississippi, USA
Adolfo Correa MD, MPH, PHD
Department of Laboratory Medicine, University of Washington, Seattle, Washington, USA
Meghan Delaney DO, MPH
Department of Epidemiology, University of Washington, Seattle, Washington, USA
Alex P. Reiner MD, MSc
Department Physiology and Biophysics, University of Mississippi Medical Center, Jackson, Mississippi, USA
James G. Wilson MD
Brotman Baty Institute for Precision Medicine, Seattle, Washington, USA
Deborah A. Nickerson PHD
Department of Medicine, University of Washington, Seattle, Washington, USA
Jill M. Johnsen MD

Authors

Marsha M. Wheeler PHD
View author publications
You can also search for this author in PubMed Google Scholar
Kerry W. Lannert MT (ASCP)
View author publications
You can also search for this author in PubMed Google Scholar
Haley Huston BSc
View author publications
You can also search for this author in PubMed Google Scholar
Shelley N. Fletcher BSc
View author publications
You can also search for this author in PubMed Google Scholar
Samantha Harris BSc
View author publications
You can also search for this author in PubMed Google Scholar
Gayle Teramura BSc
View author publications
You can also search for this author in PubMed Google Scholar
Helena J. Maki
View author publications
You can also search for this author in PubMed Google Scholar
Chris Frazar MSc
View author publications
You can also search for this author in PubMed Google Scholar
Jason G. Underwood PHD
View author publications
You can also search for this author in PubMed Google Scholar
Tristan Shaffer BSc
View author publications
You can also search for this author in PubMed Google Scholar
Adolfo Correa MD, MPH, PHD
View author publications
You can also search for this author in PubMed Google Scholar
Meghan Delaney DO, MPH
View author publications
You can also search for this author in PubMed Google Scholar
Alex P. Reiner MD, MSc
View author publications
You can also search for this author in PubMed Google Scholar
James G. Wilson MD
View author publications
You can also search for this author in PubMed Google Scholar
Deborah A. Nickerson PHD
View author publications
You can also search for this author in PubMed Google Scholar
Jill M. Johnsen MD
View author publications
You can also search for this author in PubMed Google Scholar

Consortia

NHLBI Trans-Omics for Precision Medicine (TOPMed) Consortium

Corresponding authors

Correspondence to Deborah A. Nickerson PHD or Jill M. Johnsen MD.

Ethics declarations

Disclosure

The authors declare no conflicts of interest.

Electronic supplementary material

Supplementary Table S1

Supplementary Table S4

Supplementary Table S5

Supplementary Table S8

Supplementary Table S9

Supplementary Information

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wheeler, M.M., Lannert, K.W., Huston, H. et al. Genomic characterization of the RH locus detects complex and novel structural variation in multi-ethnic cohorts. Genet Med 21, 477–486 (2019). https://doi.org/10.1038/s41436-018-0074-9

Download citation

Received: 25 January 2018
Accepted: 16 May 2018
Published: 29 June 2018
Issue Date: February 2019
DOI: https://doi.org/10.1038/s41436-018-0074-9

Key Words

This article is cited by

Cataloguing experimentally confirmed 80.7 kb-long ACKR1 haplotypes from the 1000 Genomes Project database
- Kshitij Srivastava
- Anne-Sophie Fratzscher
- Willy Albert Flegel
BMC Bioinformatics (2021)
Evaluating the promise of inclusion of African ancestry populations in genomics
- Amy R. Bentley
- Shawneequa L. Callier
- Charles N. Rotimi
npj Genomic Medicine (2020)

Abstract

Purpose

Methods

Results

Conclusion

Similar content being viewed by others

Introduction

Materials and methods

Samples

Library preparation and next-generation sequencing

Detection of RH SV

Detection of RH SNVs and indels and RH allele identification

Quantitative multiplex PCR of short fluorescent fragments

Combinatorial PCR and Sanger sequencing

Results

NGS-based characterization of WHO reference samples

NGS-based characterization of clinically characterized Asian and Native American samples

QMPSF and allele-specific PCR for clinically characterized Asian and Native American samples

Comparisons between NGS-based RH alleles with SNP array–based typing and D and C serology

NGS-based characterization of African American samples

Discussion

References

Acknowledgements

Author information

Authors and Affiliations

Consortia

NHLBI Trans-Omics for Precision Medicine (TOPMed) Consortium

Corresponding authors

Ethics declarations

Disclosure

Electronic supplementary material

Rights and permissions

About this article

Cite this article

Share this article

Key Words

This article is cited by

Search

Quick links