Genetic and mechanistic basis for APOBEC3H alternative splicing, retrovirus restriction, and counteraction by HIV-1 protease

Human APOBEC3H (A3H) is a single-stranded DNA cytosine deaminase that inhibits HIV-1. Seven haplotypes (I–VII) and four splice variants (SV154/182/183/200) with differing antiviral activities and geographic distributions have been described, but the genetic and mechanistic basis for variant expression and function remains unclear. Using a combined bioinformatic/experimental analysis, we find that SV200 expression is specific to haplotype II, which is primarily found in sub-Saharan Africa. The underlying genetic mechanism for differential mRNA splicing is an ancient intronic deletion [del(ctc)] within A3H haplotype II sequence. We show that SV200 is at least fourfold more HIV-1 restrictive than other A3H splice variants. To counteract this elevated antiviral activity, HIV-1 protease cleaves SV200 into a shorter, less restrictive isoform. Our analyses indicate that, in addition to Vif-mediated degradation, HIV-1 may use protease as a counter-defense mechanism against A3H in >80% of sub-Saharan African populations.

M ost humans have seven APOBEC3 (apolipoprotein B mRNA editing enzyme, catalytic polypeptide-like 3, A3) genes organized in tandem on chromosome 22 and located between conserved chromobox genes CBX6 and CBX7 1 . These genes, A3A-D and A3F-H, encode members of the APO-BEC family of polynucleotide cytosine deaminases (reviewed in refs. 2,3 ). The hallmark biochemical activity of this family is single-stranded DNA cytosine-to-uracil deamination, although the family namesake (APOBEC1) and at least one A3 enzyme (A3A) also have the capability to edit RNA cytosines 4,5 . These enzymes have diverse functions in biology ranging from transcriptome editing (APOBEC1 and A3A), antibody gene diversification (AID), and restriction of a broad number of DNAbased parasites (APOBEC1, AID, and A3A-H) (reviewed by refs. 2,3,[6][7][8]. The principle function of A3 enzymes in innate immunity is evidenced by high degrees of genetic variation including large and small changes both between and within species. For instance, gene copy numbers vary from 1 gene in rodents (encoding 2 deaminase domains) 9 , 7 genes in most humans (11 deaminase domains) 9,10 , and up to 13 genes in bats (18 deaminase domains) 11 . Many humans lack the entire A3B coding sequence due to a 26.5 kb deletion 12 and/or A3H function due to a 3 bp deletion removing an essential amino acid 13 . Additionally, there are coding and non-coding single-nucleotide polymorphisms (SNPs) and insertion/deletion polymorphisms (indels), which cumulatively account for high rates of positive selection (reviewed by refs. [14][15][16]. The best-studied A3 substrate is the retrovirus HIV-1 (reviewed by refs. 6,7,17 ). Four different A3 enzymes, A3D, A3F, A3G, and A3H, have the capacity to package into assembling viral particles, track within particles until a new host cell becomes infected, and then target virus replication intermediates, e.g., [18][19][20] . Each A3 enzyme can interfere with the overall reverse transcription process to varying degrees by at least two distinct mechanisms. The first is a deamination-independent process by which A3 enzymes bind viral genomic RNA and directly impede reverse transcription, e.g., [21][22][23] . The second is a deaminationdependent process in which viral cDNA cytosines are converted to uracils, which become immortalized by reverse transcription into genomic strand G-to-A mutations, e.g., [24][25][26] . Left unchecked, mono or combinatorial A3 restriction activity can fully suppress virus replication, e.g., [27][28][29] . Therefore, it is not surprising that HIV-1 and related lentiviruses have an A3 counter-defense mechanism governed by the viral protein Vif, which nucleates the formation of an E3 ubiquitin ligase complex to degrade cellular A3 enzymes and prevent access to susceptible viral replication intermediates (reviewed by refs. 6,7,17 ). Thus, during every round of virus replication, there is a continuous interaction between A3 enzymes attempting to restrict virus replication and Vif working to neutralize the A3 enzymes and preserve viral genomic integrity. The balance between restriction and counteraction is most likely to be sensitive to perturbation as proviral sequences from patients often have G-to-A mutations, e.g., 30,31 , and A3 mutagenesis has been linked to virus evolution including immune escape and drug resistance [32][33][34][35][36] .
A3H is the most variable A3 gene in the human population. Four SNPs and one indel have been reported within A3H coding regions: N15/ΔN15, R18/L18, G105/R105, K121/D121, and E178/ D178 13,37 . These natural variations have been organized nonrandomly into seven different haplotype blocks (I-VII) 13, 38 and two in particular have a major impact on A3H function. The deletion of N15 common to haplotypes III, IV, and VI results in a non-functional protein, likely due to loss of structural integrity and rapid degradation (i.e., undetectable by immunoblots and activity assays) 13,19,23,38 . G105 occurs in haplotypes I and VI (although inconsequential in the latter due to ΔN15) and compromises antiviral activity in large part due to poor expression and altered subcellular localization 13,19,[38][39][40][41] . Only haplotypes II, V, and VII have demonstrated potent restriction activity against Vif-deficient HIV-1 in model systems 19,20,38,39,42 and, due to a high global allele frequency and strong geographic biases, only A3H haplotype II is likely to contribute to HIV-1 restriction and mutagenesis in vivo 19,20,43,44 . For instance, although the global frequency of the A3H haplotype II is 20.9%, the allele frequency in HIV-1 pandemic areas of sub-Saharan Africa is 57%, which means 82% of this population has at least one allele of this HIV-1 restrictive haplotype.
A3H is further diversified by alternative splicing, with four reported splice variants (SV154, SV182, SV183, SV200) 39 . SV154 is predicted to be corrupted structurally and non-functional, SV182 and SV183 are formed by utilization of tandem alternative splice acceptors and have been shown to be similar functionally in prior experiments, and SV200 has a C-terminal extension that could potentially influence multiple activities relevant to HIV-1 restriction 23,39 . Here we endeavor to connect genotype to phenotype by asking whether a SNP (or multiple SNPs) within the A3H gene is responsible for alternative splicing and production of SV200. Genetic association and reporter gene studies combined to demonstrate that a single non-coding variation, a 3 nucleotide deletion in intron 4 (Δctc), is embedded within the haplotype II block and responsible for SV200 mRNA expression. Immunoblots of lymphoblastoid cell lines (LCLs) and peripheral blood mononuclear cells (PBMCs) showed that the SV200 enzyme is expressed in cells with haplotype II genotypes. HIV-1 infectivity experiments showed that the A3H haplotype II SV200 enzyme is at least fourfold more restrictive than other splice variants and, interestingly, that this enhanced activity is counteracted by a single cleavage event catalyzed by HIV-1 protease inside virus particles. Thus, our studies have uncovered a novel proteasedependent mechanism used by HIV-1, in addition to Vif, to counteract A3H-mediated restriction. This cleavage mechanism may be required by transmitting viral isolates in pandemic regions such as sub-Saharan Africa to counteract the increased restriction potency of cells with haplotype II genotypes.

Results
A3H genetic variation in humans. Human A3H is polymorphic at five codons, and these variants are organized non-randomly into seven different haplotype blocks 13,38 (Fig. 1a, b and Table 1). To search for additional haplotypes, we analyzed the A3H genotypes of 2504 individuals from the 1000 Genomes Project 45 . These analyses confirmed six of the previously described A3H haplotypes (not haplotype VII), and also revealed the existence of six additional minor haplotypes, VIII through XIII. It is noteworthy that three of these new haplotypes have allele frequencies that exceed those of three previously described haplotypes (Table 1). Thus, humans collectively express at least 12 different A3H haplotypes out of the maximum number of 32 combinations (i.e., tight genetic linkage prevents random assortment by recombination of the two different alleles for each of the five variable codons).
A3H haplotypes I and II are spliced differentially. Our analyses of 1000 Genomes Project data confirmed the prevalence of haplotypes I, II, III, and IV in human populations 13,38 (i.e., allele frequencies > 1%; Table 1 and Fig. 2a). A3H haplotypes III and IV are not detectable at the protein level, even upon exogenous overexpression 13,19,38,39 . This is due to a single-amino acid deletion, ΔN15, which destabilizes the overall protein structure 23 . Therefore, because A3H ΔN15 haplotypes III and IV are unlikely to have functional importance, we focused further computational and biological studies on haplotypes I and II, which have global allele frequencies of 46.4 and 20.9%, respectively. Moreover, due to geographic biases in A3H haplotype distributions, average allele frequencies vary dramatically in different regions of the world with, for instance, haplotype I constituting 65% in Asia and haplotype II 57% in sub-Saharan Africa (Fig. 2a).
Human A3H has 4 known splice variants (SV) that translate into proteins with 182, 183, 200, and 154 amino acids 39 (Fig. 1a,  b). Genetic polymorphisms often underlie alternative splicing events [46][47][48] . We therefore asked whether A3H haplotypes I and II are spliced differentially. To answer this question, we first surveyed RNAseq read depths of A3H exons in EBVimmortalized lymphoblastoid cell lines (LCLs) from 20 homozygous haplotype I and 20 homozygous haplotype II donors selected randomly from the 1000 Genomes Project 45 . These RNAseq profiles revealed a major difference between mRNA species with respect to exon 4b (2 representative profiles in Fig. 2b and 40 in Supplementary Fig. 1). Exon 4b is mostly excluded from mRNA species of A3H haplotype I LCLs but is frequently included in mRNAs of A3H haplotype II LCLs. On average, exon 4b is expressed fivefold more frequently in A3H haplotype II versus haplotype I LCLs (15.8 ± 1.2 versus 3.0 ± 0.2; n = 20 LCLs representing each haplotype; p = 6 × 10 −9 , two-tailed Student's t-test).
These results predicted that homozygous A3H haplotype II individuals will express SV182, SV183, and SV200, whereas homozygous A3H haplotype I individuals will only express the former 2 splice variants. To test this prediction, Geuvadis Project 49 RNAseq data sets were analyzed for A3H haplotypes and mRNA levels. As predicted, LCLs from individuals homozygous for A3H haplotype I predominately express SV182 and SV183 at similar frequencies, each~45% (Fig. 2c). In contrast, LCLs from individuals homozygous for A3H haplotype II express SV182, SV183, and SV200 at statistically indistinguishable frequencies, each~32% (Fig. 2c). Moreover, A3H haplotype I and II heterozygous LCLs show intermediate levels of all three splice variants. These data strongly indicate that A3H differential splicing has a genetic basis, most likely due to one or more SNPs/indels within the locus itself.     The near equal expression of SV182 and SV183 (regardless of A3H haplotype) is most likely due to a splice acceptor slippage mechanism, in which the splicing apparatus has an equal chance of using AG dinucleotide splice acceptor sites arranged tandemly at the beginning of exon 5 (5′-gag-cag in Fig. 1a). Analogous acceptor slippage events have been documented for other genes, e.g., ref. 50 . The fourth splice variant, SV154, excludes exon 4 and includes exon 4b, and it is only observed at low levels regardless of A3H haplotype (<2-3%). Moreover, SV154 is predicted to encode a non-functional enzyme due to a loss of multiple conserved structural elements 23 (Fig. 1b). Therefore, studies hereon focus exclusively on SV182, SV183, and SV200.
A3H haplotype II expresses three distinct proteins. To investigate whether these mRNA splicing differences manifest at the protein level, whole-cell lysates from LCLs homozygous for A3H haplotypes I and II were immunoblotted with a rabbit anti-A3H polyclonal antibody (Fig. 2d) and independently with a mouse anti-A3H mAb P1D8-1 that binds a shared N-terminal epitope 19 ( Supplementary Fig. 2). Strong signals were evident for SV182/ 183 proteins in all A3H haplotype I and II LCLs, and a less intense but clearly detectable signal was observed for the SV200 enzyme only in LCLs with at least one A3H haplotype II allele (Fig. 2d). The same antibodies detected all three splice variants in Vif-deficient HIV-1 infected PBMCs from A3H haplotype II individuals, but only the shorter two splice variants in haplotype I specimens ( Fig. 2e and Supplementary Fig. 2). Endogenous A3H haplotype I is expressed at lower levels than haplotype II in both LCLs and PBMCs as reported 19 (compare SV182/3 band intensities for homozygous specimens in Fig. 2d, e). However, despite the technical challenge of lower expression levels, A3H haplotype I samples yielded clear signals for SV182/3 enzymes and no signal for SV200, even at highest exposure levels ( Fig. 2d, e). Importantly, quantification of the amount of SV200 in immunoblots from homozygous haplotype II PBMCs and LCLs showed that this variant comprises 30-35 and 16-43%, respectively, of total A3H protein (n = 4 and n = 8 independent experiments, respectively; Fig. 2d, e and additional blots not shown). This result is concordant with SV200 constituting one-third of total A3H haplotype II splice products (Fig. 2c) and approximately one-third of total cellular A3H protein (Fig. 2d, e). Together, these data demonstrate that the SV200 enzyme is a major expressed splice variant with similar opportunities (based on abundance) to contribute to antiviral immunity as SV182 and SV183.
SNPs associated with A3H exon 4b alternative splicing. To investigate the mechanism responsible for A3H alternative splicing, we first tested the hypothesis that differential splicing may be due to one of the known coding SNPs G105/R105, K121/D121, and/or E178/D178 that distinguish A3H haplotypes I and II (Fig. 1). RNAseq data were used to quantify expression levels of all 4 A3H splice variants in donors with homozygous haplotype III or IV genotypes. Haplotypes III and IV are identical to haplotype II with respect to these three coding SNPs (Table 1). Accordingly, haplotypes III and IV LCLs would be predicted to   IV   II   III  I   IV   IV   III   III  II   II   I   I   IV   IV   II   III   II   III  I   I   60   50   20   30   40   10   0  SV182  SV183  SV200  SV154  SV182  SV183  SV200  SV154  SV182  SV183  SV200  express SV200 at levels comparable to haplotype II LCLs. However, these analyses showed that haplotypes III and IV LCLs express much lower levels of SV200 mRNA indicating that none of these coding SNPs is responsible for alternative splicing and expression of SV200 (compare Fig. 2c and Supplementary Fig. 3). We next asked whether a non-coding SNP or indel may be responsible and performed an unbiased association analysis of 5488 SNPs covering the full A3 locus (n = 461; Supplementary Data 1). The top candidate was a 3 bp polymorphism upstream of exon 4b (ctc/Δ, rs149229175; Fig. 3a). The correlation between ctc copy number and SV200 expression is highly significant (P regression = 8 × 10 −81 , ANOVA; Fig. 3b). With few exceptions, likely due to rare recombination events, we found that the ctc deletion SNP is highly specific to A3H haplotype II. Cells homozygous for the ctc deletion (mostly also homozygous haplotype II) express the highest levels of SV200. In contrast, cells homozygous for ctc (mostly haplotypes I, III, and IV) have undetectable or low levels of SV200 expression. As expected, heterozygous cells express intermediate SV200 levels (Fig. 3b).
Phylogenetic analyses indicated that the non-coding ctc deletion (Δctc) is a unique and defining feature of the A3H haplotype II block (Fig. 3c). This deletion is absent in other human A3H haplotypes and in the corresponding genomic DNA sequences of non-human primates, chimpanzees and rhesus macaques ( Fig. 3c and Supplementary Fig. 4). Moreover, as expected, transcriptomic data from these species failed to show evidence for inclusion of exon 4b or expression of longer A3H transcripts ( Supplementary Fig. 4). Interestingly, Δctc is also present in archaic genomes (Neanderthals and Denisovans; Supplementary Fig. 4). Original studies by the Emerman and Malik groups deduced that the ancestral hominid A3H gene encoded a stable protein, and that two different unstable variants emerged independently in humans (R105G in haplotype I and ΔN15 in haplotypes III and IV) 13 . Our analyses indicate that either Δctc emerged as a result of a similar independent event or was introduced to modern haplotype II humans through recombination with an archaic genome (see Discussion for possible mechanism and origins).
An additional feature of this region of the A3H locus conserved in primates is a 195 bp antisense LINE1 insertion, L1PA17 51 , which spans the entirety of exon 4b (Fig. 3a). It is therefore tempting to propose that the intronic Δctc may have been the trigger that enabled exonization of this portion of this LINE1 sequence and, simultaneously, fortification of human innate immune function by creating a new A3H enzyme variant with differential antiviral activity.
Molecular mechanism of A3H exon 4b alternative splicing. To directly test the hypothesis that the intronic ctc deletion promotes alternative splicing to exon 4b, we constructed an A3H minigene with A3H exons 1-4, intron 4 with and without ctc (both including exon 4b), and exon 5 including the natural 3′ untranslated region (Fig. 4a). Transient minigene expression in HeLa and MCF7 cell lines yielded all 3 splice variants including SV200 (Fig. 4b). SV200 was only expressed from the Δctc construct demonstrating that this 3 bp deletion is essential for exon 4b inclusion and SV200 expression. The deleted ctc is part of an extended polypyrimidine tract, 5′-ttctcctctc-3′, which is a predicted binding site for several splicing factors including PTBP1, U2AF2, hnRNP C, and SRSF5 [52][53][54][55][56][57][58] . A reduction from 10 to 7 consecutive pyrimidines is likely to change the repertoire of splicing factors bound to this site, influence interactions with other splice factors, and enable inclusion of exon 4b or, alternatively, weaken an exon 4b exclusion mechanism. It is additionally notable that the ratio of SV200 to SV182/3 protein produced from the minigene is lower than that of the same Δctc c a c t a c c t g t t t c t t t t t t g t g t t c t c c t c t c g c a c c a g g g g c t c c t t c t t g t t c t t t t a g a t t c . . . c a c t a c c t g t t t c t t t t t t g t g t t c t c t c g c a c c a g g g g c t c c t t c t t g t t c t t t t a g a t t c . The discovery of SV200 cleavage by viral protease. To directly compare the HIV-1 restriction capabilities of untagged A3H haplotype II SV182, SV183, and SV200, dose-response experiments were done with varying amounts of each splice variant against Vif-deficient HIV-1 (strain LAI) 23 . All three splice variants expressed well in 293T cells as evidenced by similar immunoblot band intensities (Fig. 5a). All three A3H splice variants also packaged into virus particles and, like A3G, elicited a dose-dependent restriction of virus infectivity (Fig. 5a). However, A3H SV200 was invariably found in two distinct forms in viral particle immunoblots, despite being monomorphic in whole-cell lysates (Fig. 5a). The slower migrating band appeared at the expected molecular weight for the full 200 residue protein (23 kDa), and the virus particle-specific faster migrating band appeared between those of SV182/3 (21 kDa) and SV200. These unexpected results suggested that a protease specifically cleaves SV200 inside HIV-1 particles. To test this possibility, Vifdeficient HIV-1 restriction experiments were done with SV183 and SV200 in the absence or presence of varying amounts of darunavir (DRV) or atazanavir (ATV), which are potent and highly selective HIV-1 protease inhibitors (Fig. 5b). Following a PBS wash 6 h post-transfection, protease inhibitor or vehicle control was added to the virus-producing cells and after an additional 42 h of incubation virus-containing supernatants were collected for analysis. Neither protease inhibitor affected cellular or viral levels of A3H-II SV183, but both drugs clearly prevented, in a dose-dependent manner, the accumulation of the smallersized form of A3H-II SV200 in viral particles. The protease inhibitor treatments were robust as they also prevented Gag processing and suppressed virus infectivity. Taken together, these data demonstrated that HIV-1 protease specifically cleaves A3H SV200 into a smaller form within viral particles.
Cleavage of A3H haplotype II SV200 by HIV-1 protease is most likely occurring near the C-terminus between residues 183 and 200 because this polypeptide region is unique to this splice variant (i.e., the N-terminal 181 residues are shared between all three splice variants and SV182 and SV183 are unaffected in viral particles; alignment in Fig. 1 and immunoblots in Fig. 5). The HIVcleave webserver 59 predicted a high-confidence cut site between amino acid positions 186 and 187 within the unique C-terminal extension of A3H SV200. To test this prediction, single-alanine mutants of SV200 spanning residues 184 to 190 were used in virus particle immunoblot experiments (excluding 187 which is naturally alanine). Most of these single-amino acid changes showed altered rates of proteolytic cleavage with the strongest phenotypes being hypo-cleavage (V185A) and hypercleavage (R186A; Fig. 5c). Additional single-amino acid substitution mutants and a panel of SV200 truncation mutants (stop codons at positions 185, 186, 187, and 188) provided further support for HIV-1 protease acting at this site ( Supplementary  Fig. 5). In addition, A3H haplotype II SV200 cleavage by HIV-1 protease is likely to be a conserved function because representative lab strains (subtype B) and clinical HIV-1 isolates from subtypes B and C (transmitted founders and chronic circulating isolates), but not SIVmac239, showed the same cleavage event within particles (Fig. 5d).

A3H-II SV200 hyperactivity is attenuated by HIV-1 protease.
Prior studies comparing the HIV-1 restriction activities of the different human A3H splice variants were constrained by a need to use epitope tags and additionally limited because A3H amounts and particularly SV200 levels were not quantified per unit incorporated into viral particles 39,42,44,60 . Moreover, experiments above in Fig. 5 indicated similarly strong Vifdeficient HIV-1 restriction activities for SV182/3 and SV200 but the activity of the latter splice variant was likely underestimated due to lower enzyme amounts in viral particles (and potentially also to proteolytic cleavage).
We therefore sought to quantify the Vif-deficient HIV-1 restriction activities of untagged A3H haplotype II SV200, a hypo-cleavable mutant SV200 V185Q, a hyper-cleavable mutant SV200 R186A, and SV183 in a series of single-cycle experiments (mutants from Fig. 5c and Supplementary Fig. 5). In these titration experiments, restriction activities were quantified as a function of the amount of encapsidated A3H normalized to p24 ( Fig. 6 and Methods). Interestingly, A3H SV200 and the hypocleavable SV200 derivative V185Q showed the strongest restriction activities per unit of encapsidated protein, whereas SV183 showed the weakest and the hyper-cleavable variant R186A showed intermediate levels (quantification in Fig. 6a of immunoblots from three biologically independent experiments shown in Fig. 6b). In quantitative terms, A3H SV200 is fourfold more restrictive than SV183 and at least twofold more restrictive than the hyper-cleavable variant R186A (Fig. 6). Thus, altogether, these data indicate that HIV-1 protease functions to attenuate the potent hyper-restriction activity of A3H SV200.

Discussion
A3H is the most variable human A3 gene with five biallelic amino acid positions that make-up at least 12 different haplotype blocks (including 6 new ones reported here). Additional A3H diversity occurs through four different splice variants. Here we show that the splice variant encoding the longest protein, SV200, is expressed as a result of a non-coding 3 nucleotide deletion (Δctc)  Fig. 4 Influence of non-coding intron 4 ctc versus Δctc on A3H minigene expression. a Schematic of the A3H minigene construct with key elements labeled. The minigene haplotype I intron has ctc (green) and the haplotype II intron has Δctc (blue). These constructs are otherwise identical. b Immunoblots of HeLa and MCF7 whole-cell extracts expressing the indicated A3H minigene constructs (representative of two independent experiments). Low and high exposures are provided to show that SV200 expression is specific to the haplotype II minigene. An anti-TUBULIN (TUB) blot is shown as a loading control unique to the haplotype II. The other major splice variants, SV182/3, occur at similar frequencies in all major haplotypes and are therefore not haplotype-specific. Quantification of Vifdeficient HIV-1 restriction activity showed that SV200 is at least fourfold more potent than that of the shorter splice variants. Thus, the A3H haplotype II-specific Δctc is the first non-coding polymorphism shown to change the amino acid composition and affect the function of a human APOBEC enzyme. Furthermore, functional characterization of this splice variant led to the surprising finding that HIV-1 protease specifically cleaves SV200 into a shorter 186 amino acid isoform. Additional Vif-deficient HIV-1 restriction experiments with hypo-cleavable and hypercleavable single-amino acid mutant derivatives of A3H haplotype II SV200 indicated that this cleavage event attenuates antiviral activity. Thus, these studies combined to show that HIV-1 protease provides an additional layer of protection against restriction by A3H haplotype II, in addition to the established Vif-mediated proteasomal degradation mechanism.
A3H is the only A3 gene with a repetitive element that contributes to the coding sequence (LINE L1PA17, Fig. 1a and Fig. 3a). Phylogenetic comparisons of human and non-human primate A3H genes indicate that this insertion occurred early in the primate lineage, most likely prior to the evolutionary divergence of known primates (Fig. 3c). This conclusion is supported by a lack of a homologous LINE1 insertion in the A3Z3 gene (A3H homolog) of non-primate mammalian lineages (DNA sequence alignments not shown). The specificity of the intronic Δctc to A3H haplotype II (apart from rare recombinants) and the highly significant correlation between the occurrence of this deletion variant and alternative splicing to include exon 4b and ultimately produce the A3H haplotype II SV200 enzyme (Figs. 2c,  3b) combine to support a model in which the deletion itself may have been the final genetic step in creating exon 4b from LINE1 sequence (i.e., birth of a new exon). Further work will be needed to determine if the Δctc promotes an exon inclusion mechanism or suppresses an exon exclusion mechanism because   [52][53][54][55][56] . However, to our knowledge, this is the first example of LINE1 exonization of a primate A3 gene. This is ironic given that A3H has been shown to elicit strong LINE1 restriction activity 13,61,62 . DNA polymerases are known sources of indels, particularly in repetitive sequence tracts 63 . The original Δctc may therefore have been caused by a DNA polymerase slippage event that removed one of the three possible ctc trinucleotides from the larger polypyrimidine tract 5′-tt(ctc)(ct(c)tc)-3′ (brackets demark possibilities). The timeframe in which this event occurred is difficult to pinpoint but most likely prior to the global radiation of the four most common A3H haplotypes (I, II, III, and IV; Fig. 2a). The original Δctc may have independently occurred one or more times in an ancestral A3H haplotype II population, as it does not exist (apart from rare recombinants) in other modern human A3H haplotype populations. An alternative hypothesis is that Δctc emerged in modern hapII humans as a result of recombination with an archaic genome. An analogous Δctc variant exists in Neanderthal and Denisovan genomic DNA sequences (Supplementary Fig. 4). Moreover, within the 7 kb A3H gene there are an additional 13 SNPs specific to archaic and modern humans with the haplotype II genotype. In contrast only one SNP is shared between archaic and modern humans with the haplotype I genotype. Altogether, these data suggest that Δctc may have an archaic origin. However, further studies are needed to pinpoint the exact source of this important variant.
Regardless of the exact window in time that the Δctc variant entered the human A3H haplotype II population, it was most likely long before HIV-1 became a pandemic virus (i.e., multiple thousands of years ago versus hundreds of years ago for HIV-1 zoonosis 64 ). This suggests that the antiviral activity of A3H haplotype II SV200 is not limited to HIV-1 and may be considerably broader, consistent with the current working model for A3-mediated innate immunity in which multiple enzymes combine to form a broad, overlapping defense (reviewed by refs. 2,3 ). We speculate that the emergence of Δctc in ancestral humans may have originally conferred resistance to an ancestral human pathogen. Such resistance may have conferred a selective advantage to ancestral A3H haplotype II humans and promoted their enrichment in the overall population and particularly in sub-Saharan Africa where A3H haplotype II is currently dominant (57% allele frequency equating to 82% of the population with at least 1 copy, versus other parts of the world such as a 7-10% allele frequency in Asians and Europeans, respectively equating to 13-19% of the population with at least one copy; Fig. 2a). Haplotype-specific differential splicing has the potential to have a significant contemporary global impact because haplotype II individuals will have a larger, and potentially stronger, antiviral A3H repertoire, overall A3 protein repertoire, and therefore potential to restrict a broader array of pathogens. It is not hard to imagine a scenario, for instance, in which populations with A3H haplotypes I, III, and IV may be more susceptible to a virus (or DNA-based parasite) than those with A3H haplotype II. A3H heterozygotes may have an even greater advantage given significant amino acid and functional differences between haplotypes.
The increased potency of A3H haplotype II SV200 against Vifdeficient HIV-1 and counteraction by HIV-1 proteolytic cleavage strongly suggests that this variant enzyme poses a significant threat to the virus. It is tempting to speculate that the viral protease-dependent A3 counteraction mechanism may have been part of the overall process of HIV-1 adaptation to the human population (especially given the dominance of this allele in sub-Saharan Africa corresponding with the likely geographical origin HIV-1 in humans 64 ). This idea is supported conservation of this activity amongst multiple different HIV-1 strains/sub-types but not for the rhesus macaque virus, SIV mac239 (Fig. 5d). Moreover, it may be a human-specific fortification of the conserved Vifdependent A3 degradation mechanism because the Δctc variant, intron 4b inclusion, and SV200 production are all unique features of the human A3H gene that do not occur in non-human primate A3H genes. Although this is the first example of a human retroviral protease cleaving a human A3 enzyme, there are at least two other reports of a retroviral protease cleaving a host A3 enzyme (i.e., FIV protease cleaving feline A3 65 and MLV protease cleaving murine A3 66 ). Thus, we propose that viral proteases may provide an additional layer of protection against restriction by A3 enzymes and, depending on the virus and host species, may constitute either a primary or a secondary counteraction mechanism.

Methods
Bioinformatic analyses. We first determined the A3H genotype of 2054 individuals from the genetic polymorphism data set of the 1000 Genomes Project, Phase 3. 461 of these individuals were represented by RNAseq data reported by the Geuvadis Project 49 . Using these two data sets we identified and randomly picked 20 homozygous A3H haplotype I and 20 homozygous haplotype II and inspected their RNAseq BAM files in the Integrative Genomics Viewer (IGV) tool ( Supplementary  Fig. 1 Fig. 6 Quantification of HIV-1 restriction for A3H haplotype II SV200 and hypo-and hyper-cleavable derivatives. a Dose-response summary of infectivity data for Vif-deficient HIV-1 LAI produced in 293T cells in the presence of the indicated untagged A3H splice variants (12.5 ng, 25 ng, 50 ng, or 100 ng) and infectivity quantification by flow cytometry of infected CEM-GFP reporter T cells. Each data point represents the mean of three biologically independent experiments. The percent restriction values (yaxis) were obtained by normalization to no A3 control reactions (mean ± SEM). Relative A3H abundance in viral particles (x-axis) was determined by normalization to p24 levels (mean ± SEM; see main text and methods for additional details). b Immunoblots of A3H and p24 in virus particle lysates for the three biologically independent experiments quantified in (a) the read depth of this exon defined as the sum of read depths at all bases of exon 4b normalized by the read depth of exon 3 for each sample. The average percentage read depth of exon 4b was 15.8 with a 95%CI of 1.2 in haplotype II cells and 3.0 with a 95%CI of 0.2 in haplotype I cells.
Using the SAMtools program 67 we sorted and indexed the RNAseq BAM files and then used the Cufflinks program 68 to assemble A3H transcripts and estimate their abundance in each individual. The information of known A3H transcripts was provided to the Cufflinks program as a GFF3 file containing ENSMBL annotated A3H transcripts ENST00000421988 (SV-154), ENST00000348946 (SV-182), ENST00000442487 (SV-183), and ENST00000401756 (SV-200). The output of this analysis for each A3H transcript of each individual is a FPKM (Fragments Per Kilobase of transcript per Million mapped reads), which is a measure of the abundance of each transcript. We presented these expression data as percentages such that the sum of FPKM values of all A3H transcripts within an individual sample is equal to 100.
To identify the genetic source of differential A3H haplotype II SV200 expression, we performed a global correlation analysis between the expression of SV200 in 461 donors (Geuvadis Project LCLs) and the number of copies of each allele in each of the 5488 polymorphic positions in the A3 locus (human chromosome 22 region spanning A3A-A3H plus 1000 nucleotides upstream of A3A and downstream of A3H).
A phylogenetic analysis was done using a multiple alignment of full-length A3H gene sequences from human haplotypes I (hg reference), II, III, IV, chimpanzee, and gorilla. The A3H genes of each of these species as well as that of rhesus macaque were extracted directly from the UCSC Genome Browser or deduced from known variants using the 1000 Genomes Project Database. The Neanderthal and Denisovan A3H sequences were inferred from their corresponding UCSC Genome Browser tracks.
Endogenous A3H immunoblots. LCLs were obtained from the NIGMS Human Genetic Cell Repository at the Coriell Institute for Medical Research. One archived PBMC specimen used in Supplementary Fig. 2 came from a HIV-1 uninfected Ugandan participant in the Partners in Prevention HSV/HIV Transmission Study 69 . Regulatory review for this is covered by protocol review and approval through institutional review boards of the Ugandan HIV/AIDS Research Committee (NARC), the Ugandan National Council for Science and Technology (UNCST), and the University of Washington (UW), with the University of Minnesota added through modification of the UW approval. Additional PBMC specimens were isolated from mononuclear cells enriched Trima Cones (Memorial Blood Centers, St. Paul, Minnesota) using Ficoll-Paque Premium (GE Healthcare). These additional specimens were classified as exempt from review by the University of Minnesota Institutional Review Board (IRB) because donors are deidentified by Memorial Blood Centers. Cells were stimulated with 0.5 μg/mL anti-CD3 antibody (clone UCHT1, R&D System), 0.1 μg/ mL anti-CD28 antibody (clone CD28.2, eBioscience) and 20 U/mL interleukin-2 (IL-2) (Miltenyi Biotec) in R20 medium (RPMI with 20% FBS and 1% P/S) for 3 days. Fluorescein (FITC) conjugated antibodies against CD8 and CCR7, as well as Phycoerythrin (PE) conjugated antibodies against CD4 and CD45RO were used to verify composition and stimulation of cells. PBMCs were maintained in R20 medium with 20 U/mL IL-2. Vesicular stomatitis virus G protein (VSV-G) pseudotyped viruses were generated by transfecting 8 μg of proviral HIV-1 IIIB VifX26 × 27 (Vif-deficient) expression construct and 2 μg of VSV-G expression construct into 293T cells. Viral titers were quantified using the CEM-GFP reporter cell line 27 . PBMCs were infected at a 0.5 MOI and harvested 10 days post-infection. LCLs, non-infected, and infected PBMCs, respectively, were lysed and subjected to immunoblot analysis. For immunoblotting, cells were pelleted, washed with 1× PBS, and then lysed with 2.5× Laemmli sample buffer. Lysates were then subjected to SDS-PAGE followed by protein transfer to PVDF using a Bio-Rad Criterion system. Membranes were probed with a mouse anti-A3H mAb P1D8 19 , a rabbit anti-A3H pAb (NBP1-91682, Novus), a rabbit anti-A3G (NIH AIDS Reagent Program 10201 courtesy of Dr. Jaisri Lingappa), and a mouse anti-tubulin mAb (Covance). Secondary antibodies were goat anti-rabbit IRdye 800CW (Li-COR 926-32211) and goat anti-mouse Alexa Fluor 680 (Molecular Probes A21057). Additional secondary antibodies were donkey anti-Ms IgG-HRP (Jackson Laboratory) and goat anti-Rb IgG-HRP (Bio-Rad). All primary antibodies were used at a 1:1000 dilution, and all secondary antibodies were used at a 1:10,000 dilution. Membranes were imaged using commercial chemiluminescence reagent (Thermo Scientific™ SuperSignal™ West Femto Maximum Sensitivity Substrate Cat#34095) and a LI-COR Odyssey instrument. Images were prepared for presentation using ImageJ. Uncropped immunoblot images are included as Supplementary Fig. 6.
A3H minigene alternative splicing experiments. To make the A3H minigene constructs, genomic DNA was extracted from a homozygous A3H haplotype II LCL sample (ID: HG00442; obtained from NIGMS Human Genetic Cell Repository at the Coriell Institute for Medical Research) and A3H intron 4 was amplified by high-fidelity PCR, cloned into pJET1.2, and verified by DNA sequencing. Next, a gBlock was digested with XmaI/NcoI and ligated into the pJET1.2 A3H haplotype II intron 4 construct to introduce a unique BamHI site into A3H exon 3, which was then confirmed by DNA sequencing. A 'ctc' insertion to restore the intronic polypyrimidine tract was then performed using a standard site-directed mutagenesis protocol on the pJET1.2 A3H haplotype II intron 4 construct. Both constructs (±ctc) were then digested with BamHI/XbaI, and cloned into the base A3H haplotype II expression vector. Subsequently, the 3′-UTR from the HG00442 LCL genomic DNA was amplified, ligated into pJET1.2, and confirmed by DNA sequencing. The 3′-UTR was then added to both of the expression constructs (±ctc) using compatible AleI/XbaI cut-sites.
Cell and viral particle lysates were prepared for immunoblotting as follows. Cells were pelleted, washed with 1× PBS, and then lysed with 2.5× Laemmli sample buffer. Virus-containing supernatants were filtered and pelleted via centrifugation through a 20% sucrose cushion, then lysed in 2.5× Laemmli sample buffer. Lysates were then subjected to SDS-PAGE followed by protein transfer to PVDF using a Bio-Rad Criterion system. Membranes were probed with a mouse anti-A3H mAb P1D8 19 , a rabbit anti-A3H pAb (NBP1-91682, Novus), a mouse anti-V5 mAb (Invitrogen), a mouse anti-HIV-1 Vif (NIH AIDS Reagent Program 6459 courtesy of M. Malim), an anti-HIV-1 p24/CA mAb (NIH AIDS Reagent Program 3537), or a mouse anti-tubulin mAb (Covance). Secondary antibodies were goat anti-rabbit IRdye 800CW (Li-COR 926-32211) or goat anti-mouse Alexa Fluor 680 (Molecular Probes A21057). An additional secondary goat anti-Rb IgG-HRP pAb (Bio-Rad) was also used. All primary antibodies were used at a 1:1000 dilution, and all secondary antibodies were used at a 1:10,000 dilution. Membranes were imaged using commercial chemiluminescence reagent (Thermo Scientific™ SuperSignal™ West Femto Maximum Sensitivity Substrate Cat#34095) and a LI-COR Odyssey instrument. Images were prepared for presentation using ImageJ. Uncropped immunoblot images are included as Supplementary Fig. 6.
Quantification of the levels of A3H in virus particles was performed using ImageJ (v1.47). First, viral particle A3H and Gag (p24) levels were quantified. Next, abundance of A3H in particles was determined by dividing the amount of A3H by its corresponding amount of Gag. Then, the highest abundance of A3H was set to 100 (A3H SV183 in this case; hence no horizontal error bars for this sample only) and all other A3H abundances were normalized against this value.

Data availability
All A3H genomics and transcriptomic data sets used in this manuscript are available at 1000 Genomes (http://www.internationalgenome.org/data) and Jeuvadis (http://www. geuvadis.org) databases. The RNAseq data sets of non-human primates are available at the Nonhuman Primate Reference Transcriptome Resource (NHPRTR, www.nhprtr. org). A3H sequences of Denisovans and Neanderthals and non-human primates are available at the USCS Genome Browser (https://genome.ucsc.edu/). A full list of all primers used in this study is provided as Supplementary Table 1. The authors declare that all other data supporting the findings of this study are available within the article and its Supplementary Information files, or are available from the authors upon request.