ACE2 variants underlie interindividual variability and susceptibility to COVID-19 in Italian population

In December 2019, an initial cluster of unexpected interstitial bilateral pneumonia emerged in Wuhan, Hubei province. A human-to-human transmission was immediately assumed and a previously unrecognized entity, termed coronavirus disease 19 (COVID-19) due to a novel coronavirus (2019-nCov) was suddenly described. The infection has rapidly spread out all over the world and Italy has been the first European Country experiencing the endemic wave with unexpected clinical severity in comparison with Asian countries. It has recently been shown that 2019-nCov utilizes host receptors namely angiotensin converting enzyme 2 (ACE2) as host receptor and host proteases for cell surface binding and internalization. Thus, a predisposing genetic background can give reason for interindividual disease susceptibility and/or severity. Taking advantage of the Network of Italian Genomes (NIG), here we mined around 7000 exomes from 5 different Centers looking for ACE2 variants. A number of variants with a potential impact on protein stability were identified. Among these, three missense changes, p.Asn720Asp, p.Lys26Arg, p.Gly211Arg (MAF 0.002 to 0.015), which have never been reported in the Eastern Asia population, were predicted to interfere with protein cleavage and stabilization. Rare truncating variants likely interfering with the internalization process and one missense variant, p.Trp69Cys, predicted to interfere with 2019-nCov spike protein binding were also observed. These findings suggest that a predisposing genetic background may contribute to the observed inter-individual clinical variability associated with COVID-19. They allow an evidence-based risk assessment opening up the way to personalized preventive measures and therapeutic options.


INTRODUCTION
In December 2019, a new infectious respiratory disease emerged in Wuhan, Hubei province, China (1)(2)(3). An initial cluster of infections likely due to animal-to-human transmission was rapidly followed by a human-to-human transmission (4). The disease was recognized to be caused by a novel coronavirus (2019-nCov) and termed coronavirus disease 19 . The infection spread within China and all over the world, and it has been declared as pandemic by the World Health Organization (WHO) on 2 nd March 2020. The symptoms of COVID-19 range from fever, dry cough, fatigue, u congestion, sore throat and diarrhea to severe interstitial bilateral pneumonia with a ground-glass image at the CT scan. While recent studies provide evidence of a high number of asymptomatic or paucisymptomatic patients who represent the main reservoir for the infection progression, the severe cases can rapidly evolve towards a respiratory distress syndrome which can be lethal (5). Although age and comorbidity have been described as the main determinants of disease progression towards severe respiratory distress, the high variation in clinical severity among middle-age adults and children would likely suggest a strong role of the host genetic asset. All rights reserved. No reuse allowed without permission. author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint (which was not peer-reviewed) is the . https://doi.org/10.1101/2020.04.03.20047977 doi: medRxiv preprint A high sequence homology has been shown between SARS-associated coronavirus (SARS-CoV) and 2019-nCov (6). Recent studies modelled the spike protein to identify the receptor for 2019-nCov, and indicated that angiotensin converting enzyme 2 (ACE2) is the receptor for this novel coronavirus (7,8). Zhou et. al. conducted virus infectivity studies and showed that ACE2 is essential for 2019-nCov to enter HeLa cells (9). Although the binding strength between 2019-nCov and ACE2 is weaker than that between SARS-CoV and ACE2, it is considered as much high as threshold necessary for virus infection. The spike glycoprotein (S-protein), a trimeric glycoprotein in the virion surface (giving the name of crown -corona in latin-), mediates receptor recognition throughout its receptor binding domain (RBD) and membrane fusion (10,11). Based on recent reports, 2019-nCov protein binds to ACE2 through Leu455, Phe486, Gln493, Asn501, and Tyr505. It has been postulated that residues 31, 41, 82, 353, 355, and 357 of the ACE2 receptor map to the surface of the protein interacting with 2019-nCov spike protein (12), as previously documented for SARS-CoV.
Following interaction, cleavage of the C-terminal segment of ACE2 by proteases, such as transmembrane protease serine 2 (TMPRSS2), enhances the spike protein-driven viral entry (13,14). Thus, it is possible, in principle, that genetic variability of the ACE2 receptor is one of the elements modulating virion intake and thus disease severity.
Taking advantage of the Network of Italian Genomes (NIG), a consortium established to generate a public database (NIG-db) containing aggregate variant frequencies data for the Italian population (http://www.nig.cineca.it/), here we describe the genetic variation of ACE2 in the Italian population, one of the newly affected countries by the 2019-nCov outbreak causing COVID-19. Three common (p.Lys26Arg, p.Gly211Arg, p.Asn720Asp) and 30 rare missense variants were identified, 12 of which had not All rights reserved. No reuse allowed without permission. author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint (which was not peer-reviewed) is the . https://doi.org/10.1101/2020.04.03.20047977 doi: medRxiv preprint previously been reported in public databases. We show that p.Asn720Asp, which affect a residue located close to the cleavage sequence of TMPRSS2, likely affects the cleavage-dependent virion intake. Along with the other two common variants, this substitution is significantly represented in the Italian and European populations but is extremely rare in the Asian population. We also show that three rare variants, namely, p.Trp69Cys, p.Leu351Val and p.Pro389His are predicted to cause conformational changes impacting RBD interaction. As the uncertainty regarding the transmissibility and severity of disease rise, we believe that a deeper characterisation of the host genetics and functional characterization of variants may help not only in understanding the pathophysiology of the disease but also in envisaging risk assessment.

Population
Parents provided signed informed consents at each participating center for exome sequencing analysis, and clinical and molecular data storage and usage, for both diagnostic and research purposes. The work has been realized in the context of NIG, with the contribution of centers: Azienda Ospedaliera Universitaria Senese, Azienda Ospedaliera-Universitaria Policlinico Sant'Orsola-Malpighi di Bologna, Città della Salute e della Scienza di Torino, Università della Campania "Luigi Vanvitelli", Ospedale Pediatrico Bambino Gesù. All subjects were unrelated, apparently healthy, and of Italian ancestry.

Whole Exome Sequencing
Targeted enrichment and massively parallel sequencing were performed on genomic DNA extracted from circulating leukocytes of 6984 individuals. Genomic DNA was All rights reserved. No reuse allowed without permission. author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

Computational studies
The structure of native human angiotensin converting enzyme-related carboxypeptidase author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint (which was not peer-reviewed) is the . https://doi.org/10.1101/2020.04.03.20047977 doi: medRxiv preprint structural stability. The graphs were plotted by the XMGrace software (23). MD simulations were performed by LINUX Cluster having 660 cpu on 21 different nodes, 190T of RAM, 30T hard disk partition size and 6 NVIDIA TESLA gpu with CUDA support. PyMOL2.3 was used as a molecular graphic interface. The protein structures were solvated in a triclinic box filled with TIP3P water molecules and Na +/Cl -ions were added to neutralize the system. The whole systems were then minimized with a maximal force tolerance of 1000 kJ mol-1 nm-1 using the steepest descendent algorithm. The optimized systems were gradually heated to 310 K in 1 ns in the NVT ensemble, followed by 10 ns equilibration in the NPT ensemble at 1 atm and 310 K, using the V-Rescale thermostat and Berendsen barostat (24,25). Subsequently, a further 100 ns MD simulations were performed for data analysis.

RESULTS
The extent of variability along the entire ACE2 coding sequence and flanking intronic stretches was assessed using 6984 Italian exomes. Identified variants and predicted effects on protein stability are summarized in Table 1, Table S1 and Table 2 were found with a frequency of 0.001 (lower than the frequency in the European-non All rights reserved. No reuse allowed without permission.
author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint (which was not peer-reviewed) is the . https://doi.org/10.1101/2020.04.03.20047977 doi: medRxiv preprint  Table 2); among these, p.Val506Ala is indeed the only amino acid change reported in the European non-Finnish population (rs775181355; frequency: 0.000006561, cadd 27,2) and is predicted as probably damaging for the protein structure by Polyphen and deleterious by SIFT .
Similarly, p.Trp69Cys, p.Leu351Val and p.Pro389His, which affects a highly hydrophobic core, were predicted to induce conformational changes influencing the interaction with spike protein. The aminoacidic substitution p.Pro389His (rs762890235, European-non Finnish population allele frequency: 0.00002453, cadd 24,8) was All rights reserved. No reuse allowed without permission.
author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint (which was not peer-reviewed) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint (which was not peer-reviewed) is the . https://doi.org/10.1101/2020.04.03.20047977 doi: medRxiv preprint for the differential morbidity and lethality observed among different countries, population awareness and constrictive measures apart.
We integrated genomic data from 5 Italian centers (Siena, Naple, Turin, Bologna Along with these more common variants we also identified very rare variants, some of which only described in the non-Finnish European population. Some of them are able to affect a highly hydrophobic core inducing a conformational change impacting in RBD interaction and could give reason for a higher affinity for the 2019-nCov spike protein All rights reserved. No reuse allowed without permission. author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint (which was not peer-reviewed) is the . https://doi.org/10.1101/2020.04.03.20047977 doi: medRxiv preprint ( Figure 2). Among these the p.Trp69Cys has never been reported before and it is likely to be an Italian population-private rare variant. Other rare variants are predicted to truncate the protein in different positions of the Protease domain thus likely acting on the internalization process. These rare variants would likely account for the interindividual clinical variability and likely explain severity even in young adults. Notably, morbidity and lethality have been reported definitely higher in men compared to women (~70% vs 30%, 20th March 2020 ISS report). Although several parameters have been brought to case to explain this difference, i.e. smoking, differences in ACE2 localization and/or density in alveolar cells, hormonal asset, it is noteworthy that ACE2 is located on chromosome X and that given the low allele frequency of the identified variants the rate of homozìgous women is extremely low (see Results section).
Therefore, the impact of X-inactivation on the alternate expression of the two alleles would guarantee a heterogeneous population of ACE2 molecules, some of which protective towards the infection until the point of a complete or almost complete protection in the case of a X-inactivation skewed towards the less 2019-nCov -binding prone allele. This hypothesis would justify the high rate of asymptomatic or paucisymptomatic patients. ACE2 is definitely one of the main molecules whose genetic heterogeneity can modulate infection and disease progression; however a deeper characterisation of the host genetics and functional variants in other pathway-related genes may help in understanding the pathophysiology of the disease opening up the way to a stratified risk assessment and to tailored preventive measures and treatments. author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint (which was not peer-reviewed) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint (which was not peer-reviewed) is the . https://doi.org/10.1101/2020.04.03.20047977 doi: medRxiv preprint  author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

Titles and legends to figures
The copyright holder for this preprint (which was not peer-reviewed) is the

Table 1. Identified variants and predicted effects on protein stability
All rights reserved. No reuse allowed without permission.
author/funder, who has granted medRxiv a license to display the preprint in perpetuity.  higher is the destabilizing effect, while a positive sign corresponds to a mutation predicted as stabilizing.
All rights reserved. No reuse allowed without permission.
author/funder, who has granted medRxiv a license to display the preprint in perpetuity. author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint (which was not peer-reviewed) is the

58,V59
Destabilizing All rights reserved. No reuse allowed without permission.
author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint (which was not peer-reviewed) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.