Molecular outcomes, clinical consequences, and genetic diagnosis of Oculocutaneous Albinism in Pakistani population

Nonsyndromic oculocutaneous Albinism (nsOCA) is clinically characterized by the loss of pigmentation in the skin, hair, and iris. OCA is amongst the most common causes of vision impairment in children. To date, pathogenic variants in six genes have been identified in individuals with nsOCA. Here, we determined the identities, frequencies, and clinical consequences of OCA alleles in 94 previously unreported Pakistani families. Combination of Sanger and Exome sequencing revealed 38 alleles, including 22 novel variants, segregating with nsOCA phenotype in 80 families. Variants of TYR and OCA2 genes were the most common cause of nsOCA, occurring in 43 and 30 families, respectively. Twenty-two novel variants include nine missense, four splice site, two non-sense, one insertion and six gross deletions. In vitro studies revealed retention of OCA proteins harboring novel missense alleles in the endoplasmic reticulum (ER) of transfected cells. Exon-trapping assays with constructs containing splice site alleles revealed errors in splicing. As eight alleles account for approximately 56% (95% CI: 46.52–65.24%) of nsOCA cases, primarily enrolled from Punjab province of Pakistan, hierarchical strategies for variant detection would be feasible and cost-efficient genetic tests for OCA in families with similar origin. Thus, we developed Tetra-primer ARMS assays for rapid, reliable, reproducible and economical screening of most of these common alleles.


Results
After institutional review board approval, we enrolled 94 Pakistani families segregating nsOCA ( Supplementary Fig. S1). All the affected individuals of these families have congenital hypopigmentation phenotype. Inter-familial variation of hair color was noted among individuals, ranging from white to honey blonde or brown (Table 1). Similarly, variation in iris color was noted, with tones ranging from light grey to brown (Table 1). However, due to limited facilities available in the remote areas of Pakistan, detailed clinical evaluations in every affected person from every family was not possible, therefore, we refrained ourselves from commenting on genotype-phenotype correlation for every variant. Through the combination of Sanger and whole exome sequencing techniques, we identified 38 variants, including 22 novel variants, segregating with the phenotype in 80 families ( Table 2). The 22 new variants include 9 missense, 4 splice site, 2 non-sense, 1 insertion, and 6 gross deletions (Table 2 and Fig. S2). None of these variants were found in ethnically matched control samples ( Table 2 and Table S1). We also documented the frequencies of polymorphic alleles of nsOCA genes in our cohort (Table S2).
Missense variants altered the targeting of encoded OCA proteins. The nine novel missense variants include two alleles of TYR (OCA1), six of OCA2 (OCA2) and one in TYRP1 (OCA3). Three prediction programs, specifically, Polyphen-2 14 , MutationTaster 15 and SIFT 16 , suggested that each of these missense variants were deleterious (Table 2). To further evaluate the effects of these variants on the encoded proteins, we performed in silico molecular modeling and in vitro protein targeting studies.
TYR encodes tyrosinase in melanocytes, an essential enzyme for the biosynthesis of melanin 17 . Previously, it was shown that missense alleles of tyrosinase lead to ER retention of encoded protein due to misfolding 18 . To evaluate the targeting of p.Cys55Ser and p.Asp75Tyr harboring proteins, we introduced these variants in a GFP-tagged tyrosinase and transiently transfected human melanocytes. Wild type tyrosinase was localized throughout the cytoplasm of melanocytes (Fig. 1). Immunofluorescence studies with calregulin (an ER marker) demonstrated that the mutant proteins, however, predominantly co-localized with calregulin, indicating retention in the ER (Fig. 1). In contrast, p.(Asp86Tyr) missense variant did not affect the targeting of TYRP1 protein (Fig. 1).
OCA2 protein, with 12 putative transmembrane α-helices, belongs to the Na + /H + antiporter family. Besides its presumed role in maintaining the pH of the melanosomes [19][20][21][22] , OCA2 also participates in the sorting and transport of tyrosinase and tyrosinase-related protein 1 (TYRP1) to the plasma membrane [23][24][25] . We performed comparative computational modeling of wild type and six novel missense alleles harboring OCA2 proteins using Phyre2 26 program. All the identified missense alleles were predicted to impact either protein folding, interaction with the lipid-bilayer, protein topology, or protein-protein interactions (Fig. S3). When transiently transfected in HEK293 cells, in contrast to wild type, disease-associated alleles of OCA2 proteins showed retention in the ER (Fig. 2). Collectively, these studies support the deleterious nature of novel missense variants identified in our OCA cohort.
Ex vivo splicing is defective due to splice site variants identified in Pakistani families. To evaluate the impact of four new splice site variants elucidated in our cohort, we examined the RNA splicing pattern of wild type and mutated exons by transfecting minigene constructs in COS7 cells. Results of our splicing assays are summarized in Fig. 3. The c.1037-18 T > G variant in exon 3 of TYR generated an upstream cryptic splice acceptor site, which resulted in insertion of 17 base intronic region in the spliced product (Fig. 3). Retention of 17 base intronic region in the spliced mRNA would result in the frameshift and premature truncation of the encoded tyrosinase. In contrast, the c.1184 + 2 T > C change in exon 3 of TYR revealed loss of canonical splice donor site. Splicing assay revealed utilization of a cryptic donor site within exon 3, resulting in loss of 55 bp from the coding region (Fig. 3), predicted to cause frameshift and premature truncation of the encoded protein. Both splice site variants of OCA2 (c.1182 + 2 T > TT and c.1951 + 4 A > G) revealed skipping of their respective exons in the minigene splicing assays (Fig. 3). The skipping of exon 11 (66 bp) due to c.1182 + 2 T > TT leads to in-frame deletion of 22 amino acids, while the loss of exon 18 (109 bp) of OCA2 due to c.1951 + 4 A > G variant would cause frameshift and premature truncation.
Exome sequencing revealed six gross deletions in OCA families. Approaches to detect indels using exome sequence data are an active area of research. As yet there is no single method that guarantees consistent success. We used the widely-evaluated methods XHMM 27 and CoNIFER 28 to identify gross insertions/deletions in our exome data. Our analyses revealed six novel gross deletions of TYR and OCA2 genes segregating with OCA phenotype in six families (Fig. 4). To investigate the mechanisms involved in these deletions, the intervals surrounding the breakpoints were analyzed through RepeatMasker (http://www.repeatmasker.org/). In addition, significantly over-represented motifs within ±15 bp of GRaBD translocations breakpoints were also sought 29 . Thorough bioinformatics analysis revealed that the breakpoint for these deletions lie in the repetitive element, and the highly similar Alu short interspersed nuclear elements (SINEs) may serve as the substrate for nonhomologous recombination.
Next, to analyze the impact of deletions on protein 3D structure, we used Phyre2 modeling software. Removal of p.Ser395 to p.Leu529 amino acids due to deletion of exons 4 and 5 would eliminate the tyrosinase central domain that binds to its copper ligand for subsequent function (Fig. S4A) would result in the partial or complete loss of Na-Sulphur-symporter domain, which mediates the intake of several different molecules with the concomitant uptake of Na + (Fig. S4B) and thus predicted to result in non-functional, truncated proteins.  (Table S1). TYR and OCA2 alleles are the frequent cause of OCA in Pakistanis (Table S1). In our cohort, variants in TYR and OCA2 collectively account for the majority [67.8% (97/143); 95% confidence interval (CI): 60.2-76.0%] of the genetic causes of nsOCA, which is comparable to prevalence in European population (Table S3). In approximately 14% of our OCA families, we did not find any pathogenic variant in the known OCA genes (Fig. 5B). Next, we investigated the frequency of alleles of nsOCA genes in our cohort. Overall, four alleles of TYR, three of OCA2, and one of SLC45A2 together account for ~56% (95% CI: 46.52-65.24%) of the variants responsible for nsOCA in our cohort (Fig. 5C). Therefore, we developed rapid and inexpensive assays for detecting carriers and homozygotes (Fig. 5D). For most of these alleles, we were able to develop tetra-primer ARMS assays. The sensitivity and specificity of these assays were confirmed on multiple DNA samples with different genotypes followed by Sanger sequencing.

Discussion
Our study illustrates the relative genetic contribution of four major OCA genes in the prevalence of albinism, primarily in families (69) currently residing in the Punjab province of Pakistan. However, our study also includes 19 families from Sindh and 6 families from Khyber Pakhtunkhwa (KPK) provinces. Pakistanis have a rich anthropogenic background owing to successive waves of invasions and adaptations of haplogroups. Most did not intermingle with the original local population and practiced endogamy, giving rise to genetic isolates that persists even today. Parental consanguinity is an important risk factor (0.25-20% higher chances) for recessive genetic defects 34 . In Pakistan, 62.7% of marriages are consanguineous, ~80% of which are between first cousins 35 . Specific clans and high consanguinity in Pakistan are the root causes of increased incidences of recessive disorders, including OCA. In our cohort, OCA phenotype was observed in families of different linguistic/ethnic origins (Table 1). We did not observe any apparent enrichment of a particular OCA allele within families of certain clans (Table 1). For instance, c.832 C > T in TYR and c.1045-15 T > G in OCA2 are the most frequent alleles observed in our samples (Fig. 5C). However, both these alleles were observed in families of various ethnical and geographical origins (Table 1). Similarly, other common variants (e.g. c.649 C > T, c.1255 G > A, c.1456 G > T) were also found in families enrolled from Punjab, Sindh and KPK (Table 1). Therefore, with the current samples size for each of the identified allele, it would be inapplicable to comment on ethnic/linguistic origins of different alleles observed in our cohort. Our overall estimated prevalence of TYR and OCA2 alleles is quite similar in certain cases from Europe but fairly different from other studies in different populations (Table S3). For instance, TYR and OCA2 variants account for 70% and 10%, respectively, of OCA in a study of 127 patients from a Chinese population (Table S3). In a few eastern and central regions of China, TYR and OCA2 contribution varies; however, TYR alleles remain the most common cause of OCA. In India, a study of 82 OCA patients revealed approximately 60% and 11% prevalence for TYR and OCA2, respectively 36 . Similarly, in the US, Europe, Italy, Japan, and Korea, the alleles of TYR are the most common cause of nsOCA [7][8][9]11,37 (Table S3). In contrast, variants in OCA2 account for ~80% of the OCA cases in an African population 38 .
Usually the alleles of TYR result in the retention of encoded tyrosinase in the ER 18,30 . Here, we also observed retention of OCA2 proteins harboring missense and truncating alleles in the ER (Fig. 2). A portion of the known human and mouse TYR variants have shown temperature-sensitive behavior 30,[39][40][41][42][43][44] . Cultivating mammalian cells at permissive temperature (31 °C) resulted in increased cytoplasmic expression of tyrosinase harboring missense alleles, especially for those present in the copper-binding region 30,39,41,42,44 . However, the new OCA2 alleles were imaged at 31 °C and did not reveal an apparent increased cytoplasmic expression (data not shown).
Besides single nucleotide variants, our study, for the first time in Pakistani population, revealed six novel gross deletions in TYR and OCA2 genes (Fig. 4). These genetic aberrations span from a single exon deletion up to 12 exons with the encompassed introns (Fig. 4). These deletions confer pigmentation disorders either because of almost entire protein would be missing or non-functional truncated forms would be encoded. Considering their relative contribution simplified methods and bioinformatics tools will be needed for rapid detection of gross deletions for clinical genetic diagnosis of OCA.
Clinical presentation of OCA is often not very helpful in genetic diagnosis due to significantly overlapping features (Table 1). For economic and geographical reasons, it is not feasible to routinely perform Sanger sequencing of all the known OCA genes to detect underlying genetic defects. However, PCR-based assays that are developed to detect common alleles are quick and affordable. Twenty of the variants identified in this study were private (found in single families). However, eight variants (Fig. 5C) account for more than half of the alleles found in our families (Table 2). Therefore, we developed tetra-primer ARMS assays for these common alleles for rapid detection and genetic screening in large cohorts before embarking for exome/genome wide studies. We are cognizant of the fact that the majority of families in our cohort are from the Punjab province of Pakistan. Hence, the details of a hierarchical nonsyndromic OCA genes mutational screening strategy may need to be refined for other geographical regions and lingo-ethnic groups within Pakistan. Our results will be helpful for future diagnosis, genetic counseling, molecular epidemiology, and functional studies of nsOCA genes associated pigmentation disorders.
Intriguingly, fourteen families in our cohort did not reveal any pathogenic variants in common nsOCA genes. There are several possibilities for our failure to detect disease-associated alleles in these fourteen families. First, potential pathogenic variants may alter sequence of cis-acting regulatory or deep intronic splicing elements that are necessary for expression of these genes in melanocytes. Presently, we have very limited knowledge of the location of the regulatory elements of nsOCA genes. Secondly, the hypopigmentation segregating in these fourteen families may be syndromic and requires comprehensive clinical phenotyping, which would help in filtering the candidate genetic variants for further assessments. Third, there may be additional genes responsible for OCA in humans. For instance, in the US population the detection rate of alleles in known nsOCA genes is around 75% 10 . Thus, there exists both a need for further genetic understanding of albinism and an opportunity to improve the molecular diagnosis of albinism, and quite possibly its prevention.

Methods
Patients. This study followed the tenets of the Declaration of Helsinki. This study was approved by the IRB   of Biochemistry and Biotechnology, Gomal University, D.I. Khan, Pakistan. All the methods were performed in accordance with the UMSOM relevant guidelines and regulations. Pedigrees were drawn after interviewing multiple individuals to confirm the relationships. Informed written consents were obtained from the adult subjects and the parents of minor subjects. Two to five ml of peripheral blood samples were collected from each participating individual. Human genomic DNA was isolated from peripheral blood by using inorganic method 45 . Detailed medical histories were taken for all of the participating individuals of the enrolled families, including the hypopigmentation phenotype of the hair, skin, eye, disease onset, segregation, presence of eye abnormalities (nystagmus, strabismus, photophobia, and poor vision) and information about immunological, neurological or bleeding time. (Table S4)  Tetra Primer Amplification Refractory Mutation System (ARMS) assay. PCR primers (Table S5) used for Tetra-primers ARMS assay were designed by adjusting maximum (1:8) and minimum (1:3) relative size difference of two inner products and keeping others by default settings. Reaction mixture contains 10 ul 2X Exon-trapping assay. To ascertain the consequences of novel splice site variants found in our OCA families, the wild and mutant exons along with flanking intronic region (200 bp) were PCR amplified, cloned in pSPL3 vector (Invitrogen, Carlsbad, CA) and sequence verified as described 49 . Purified cloned constructs were transfected into COS7 cells using PEI (Polyethylenimine). After 48 hours of transfection, RNA was extracted using TRIzol reagent (Invitrogen, Carlsbad, CA) and single stranded cDNA (Clontech, Mountain View, CA) was synthesized. Primary PCR amplification of cDNA was performed using SD6 and SA2 vector primers and amplified products were cloned in TA-cloning vector (Invitrogen). At least ten bacterial clones for each construct were Sanger sequenced.

Sanger sequencing and segregation analysis. Primers
Expression constructs, transfection and immunofluorescence. All expression constructs used for immunofluorescence study were in eGFP-tagged vector (Clontech). For generating mutant constructs, mutagenesis was performed by using QuikChange kit (Stratagene, La Jolla, CA) and used specific wild type eGFP construct as a template. Each construct was transiently transfected in melanocytes or HEK293 cells seeded on cover slips with Lipofectamine 3000 (Invitrogen). After 48 hours of transfection either at 37 °C or 31 °C, cells were fixed and permeabilized in 4% paraformaldehyde and 0.1% Triton X-100 in PBS, respectively. For endoplasmic reticulum and nucleus visualization anti-Calregulin antibodody (Santa Cruz Biotechnology, Santa Cruz, CA) and DAPI staining was performed. A Zeiss LSMDUO confocal microscope was used for imaging.