Shotgun transcriptome, spatial omics, and isothermal profiling of SARS-CoV-2 infection reveals unique host responses, viral diversification, and drug interactions

In less than nine months, the Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) killed over a million people, including >25,000 in New York City (NYC) alone. The COVID-19 pandemic caused by SARS-CoV-2 highlights clinical needs to detect infection, track strain evolution, and identify biomarkers of disease course. To address these challenges, we designed a fast (30-minute) colorimetric test (LAMP) for SARS-CoV-2 infection from naso/oropharyngeal swabs and a large-scale shotgun metatranscriptomics platform (total-RNA-seq) for host, viral, and microbial profiling. We applied these methods to clinical specimens gathered from 669 patients in New York City during the first two months of the outbreak, yielding a broad molecular portrait of the emerging COVID-19 disease. We find significant enrichment of a NYC-distinctive clade of the virus (20C), as well as host responses in interferon, ACE, hematological, and olfaction pathways. In addition, we use 50,821 patient records to find that renin–angiotensin–aldosterone system inhibitors have a protective effect for severe COVID-19 outcomes, unlike similar drugs. Finally, spatial transcriptomic data from COVID-19 patient autopsy tissues reveal distinct ACE2 expression loci, with macrophage and neutrophil infiltration in the lungs. These findings can inform public health and may help develop and drive SARS-CoV-2 diagnostic, prevention, and treatment strategies.

negative samples, not just those with alternative respiratory viruses. The biological significance of a difference in interferon response would be quite a bit less if the comparators do not all have viral infections. Distinguishing SARS-CoV-2 specific responses vs general antiviral responses is important, but not sufficiently clear as is.
-The authors have considerably expanded their analysis of ACEi/ARB use to incorporate more comorbidities and other antihypertensive comparators. This portion of the paper feels quite distinct from the remainder, both methodologically and thematically, and probably is best reviewed by a biostatistician, as the detailed analysis of their methods for correction of confounding is critical to assess their conclusions, but is beyond my expertise as a reviewer. The chance for additional confounding with these medications and risk factors for COVID severity remains high; the authors do note this in their discussion.
-HLA enrichment (new in this resubmission), Fig S12: rather than reporting p-values, reporting a statistical test that reflects multiple hypothesis testing (eg false discovery rate) would be preferred for HLA type enrichment; their permutation analysis provides an assessment statistical confidence, but eliminates one allele (HLA-B*35) for which they report a significant p-value. Also, and perhaps more importantly, given disparities in both incidence and severity by race/ethnicity, the biological significance of this HLA enrichment is unclear as it could simply be a proxy for ethnicities that are at greater risk for exposure -this should be commented on.
-Discussion, p11: ACEi/ARBs do not target the ACE2 receptor; the end of the first paragraph on p11 seems to suggest that they do.

MINOR CRITIQUES:
-Fig 4c legend still refers to an intersecting heatmap, but still displays a volcano plot -Spatial transcriptomics (new in this resubmission), Fig 6c: the authors should clarify these "normal" samples, referred to in-text as "donors". Are these from autopsies of patients who died of nonpulmonary causes? From living donors undergoing lung biopsy? Other? Neither the text nor the Fig  legend (Fig 6 or S11) makes this clear. A comparison to other viral lung infections would be optimal if possible, at least from the literature.
Reviewer #5 (Remarks to the Author): The authors have generated a very impressive amount of data and analysed it thoroughly. As a result of the review process the authors have revised their article and all the major issues have been responded to. The design model for their transcriptomics analysis and statistics are all appropriate. It is good to see all of the reviewers suggestions/comments have been considered carefully.
Reviewer #6 (Remarks to the Author): I have been asked to review the statistical analysis of the clinical data. This task is complicated by the poor reporting by the authors.
It is not clear what patients were used for what analysis. The main text reports they have a cohort of n=50 821 patients in two hospitals. The discussion also mentions an analysis of >50 000 patient records. The methods section however describes a cohort of n=8 856 patients with suspected infection of whom n=4 829 were infected. And also data for n=90 989 patients from 2019, presumably before SARS-CoV-2 was present or widespread in New York. However, Table 1 says that n = 17 812 + 3 500 + 23 578 + 2 464 = 47 354 people were tested for infection, that n = 4 697 + 1 252 + 2 878 + 479 = 9 306 were confirmed to be infected. So: 50k or 90k patients? 9k or 5k infections? 47k or 9k tested?
In my view the paper can't even be reviewed until the reporting is done thoroughly. Who can tell what the results mean if the basics about who was analysed are not set out clearly? The Strobe checklist would be a good place to start this endeavour.
Other issues abound: What were the 2019 patients used for? The main analysis of clinical data has three parts: of risk of being a case, which is clearly 0 for 2019 patients regardless of their profiles, and then of outcomes among cases, again not possible for 2019 patients. So what are they contributing?
In the analysis, the authors say they have scaled quantitative variables to fall on the range 0 to 1 to allow for comparisons. I don't think this is really appropriate: if the distributions differ, the coefficients are not in fact comparable with such rescaling (instead, standardisation should aim to equalise the standard deviation, eg to set them to have mean 0 and sd 1).
The methods section needs to describe how correlated variables were removed.
Supplementary table 6 to 9 were missing, which reportedly contain some of the results from the clinical data analysis, so I have not reviewed them.
Overall it seems like the focus on the cool transcriptome work led to neglect of the clinical data. I suggest they either remove this part or invest some time in reporting it properly.
In two minor unrelated points: the title is misleading since they have not done a spatial profiling of infection but a spatial transcriptome profiling of infection. Readers might expect to see maps of NY boroughs with the current phrasing. And, paragraph 3 in the intro contains a non-sequitur. "As NY was the epicentre… we did some science". The science doesn't seem to follow from a public health problem in a city.

SARS-CoV-2 Infection (NCOMMS-20-35521A)."
Reviewer #3 (Remarks to the Author): The authors have revised their ambitious and multi-pronged characterization of SARS-CoV-2 infection, including (1) a LAMP assay for rapid detection, (2) viral sequencing and phylogenetic analysis of 735 cases, (3) host transcriptional profiling from the same nasal swabs, and (4) epidemiological analysis of ACEi/ARB use. This work covers a tremendous amount of groundeven more so in this revision.
We thank the reviewer for their positive comments and review of our revised manuscript.
In this revision, they: -better contextualized their phylogenetic analysis -added additional comparators and comorbidity considerations into their epidemiological analysis of ACEi/ARB use -added spatial transcriptomic profiling of autopsy samples -added HLA type enrichment of COVID patients compared with those in their cohort who tested negative for SARS-CoV-2 We thank the reviewer for their positive comments and summary.  Figure 4b, R 2 = -0.74). As expected, the higher Ct values correlate with the lower LAMP output. Of note, the 95% in Fig. 1e was for technical reproducibility from controls, but indeed clinical samples were more variable (Supp Fig 3). Also, we have clarified the text to note that the performance of the assay is indeed a function of viral abundance, now stating: "Of note, we obtained similar performance on bulk oropharyngeal swab lysate, including increasing reaction sensitivity as a function of viral copy number, but with deteriorating performance at Ct>30." We note that, since our first submission, there have been a wealth of papers that now compare LAMP to RT-PCR (most of which that use our exact same primer set), and we have We also note that our RT-LAMP test has recently been authorized for home collection (in collaboration with Color) and COVID-19 testing: https://www.fda.gov/media/138249/download -The authors comment that the interferon responses in SARS-CoV-2 samples were "significantly higher when compared to SARS-CoV-2 negative samples that harbored other respiratory viruses (Fig 4 a, We have clarified this distinction in the text, and we now note that the comparison is just those without any known respiratory viruses. To detail virus-specific responses in depth, we think a follow-up paper with larger sample sizes for each viral infection would serve this question better, and we are collecting patients with these other viruses at our hospital, but this will not be ready in time for this revision.
-The authors have considerably expanded their analysis of ACEi/ARB use to incorporate more comorbidities and other antihypertensive comparators. This portion of the paper feels quite distinct from the remainder, both methodologically and thematically, and probably is best reviewed by a biostatistician, as the detailed analysis of their methods for correction of confounding is critical to assess their conclusions, but is beyond my expertise as a reviewer. The chance for additional confounding with these medications and risk factors for COVID severity remains high; the authors do note this in their discussion.
Please see below for a full set of statistical reviews and our responses and updates for Reviewer #6.
-HLA enrichment (new in this resubmission), Fig S12: rather than reporting p-values, reporting a statistical test that reflects multiple hypothesis testing (eg false discovery rate) would be preferred for HLA type enrichment; their permutation analysis provides an assessment statistical confidence, but eliminates one allele (HLA-B*35) for which they report a significant pvalue. Also, and perhaps more importantly, given disparities in both incidence and severity by race/ethnicity, the biological significance of this HLA enrichment is unclear as it could simply be a proxy for ethnicities that are at greater risk for exposure -this should be commented on.
We have clarified our explanation of HLA association findings to firmly distinguish first-pass candidate findings (previously reported as raw p-values) from those that withstood withstanding permutation-based correction for false discovery. Indeed, to account for any spurious enrichment form the two-tailed Fisher exact, we used the nonparametic permutation testing to generate empirical p-values for these loci, which is an established method for testing statistical significance. The HLA-B*35 locus only passed significance in one test, which was noted. In parallel, we have clarified our description of permutation methods themselves, in the main text and supplemental figure.
Also, we agree on the need for cautious interpretation, given all the possible confounding variables, and we have updated the testing for the HLA enrichment and added this to the results section: "We caution that these findings neglect linkage-based interdependence of type frequencies among distinct HLA genes; potential ancestry-confounded variability in infection incidence or ascertainment; imprecision of diplotype inference; and any unassayed infection-or transcription-relevant sequence variation distinctive to particular haplotypes within surveyed haplogroups, including in MHC genes other than class I HLA loci." -Discussion, p11: ACEi/ARBs do not target the ACE2 receptor; the end of the first paragraph on p11 seems to suggest that they do.
We have edited the paragraph to clarify this point. Thank you. Normal samples in these figures refer to tissues from patients who were lung transplant donors. We have clarified this both in-text and in legends, saying they are from "excess lung material from healthy lung transplant donors." We thank the reviewer for the careful reading and helpful suggestions.
Reviewer #5 (Remarks to the Author): The authors have generated a very impressive amount of data and analysed it thoroughly. As a result of the review process the authors have revised their article and all the major issues have been responded to. The design model for their transcriptomics analysis and statistics are all appropriate. It is good to see all of the reviewers suggestions/comments have been considered carefully.
We thank the reviewer for the positive comments and review of our revised manuscript.
Reviewer #6 (Remarks to the Author): I have been asked to review the statistical analysis of the clinical data. This task is complicated by the poor reporting by the authors.
It is not clear what patients were used for what analysis. The main text reports they have a cohort of n=50 821 patients in two hospitals. The discussion also mentions an analysis of >50 000 patient records. The methods section however describes a cohort of n=8 856 patients with suspected infection of whom n=4 829 were infected. And also data for n=90 989 patients from 2019, presumably before SARS-CoV-2 was present or widespread in New York. However, Table   1 says that n = 17 812 + 3 500 + 23 578 + 2 464 = 47 354 people were tested for infection, that n = 4 697 + 1 252 + 2 878 + 479 = 9 306 were confirmed to be infected. So: 50k or 90k patients? 9k or 5k infections? 47k or 9k tested?
• We apologize for our confusing reporting. We have now removed these errors and made our reporting more consistent throughout results, discussion, and methods. Table 1  o After reflecting on the interpretation and the possible issues raised by the reviewer, we decided to remove the comparison with a 2019 clinical cohort (see also below).
In my view the paper can't even be reviewed until the reporting is done thoroughly. Who can tell what the results mean if the basics about who was analyzed are not set out clearly? The Strobe checklist would be a good place to start this endeavour.
We agree with the reviewer. We have completed Strobe checklist with our resubmission.
Other issues abound: What were the 2019 patients used for? The main analysis of clinical data has three parts: of risk of being a case, which is clearly 0 for 2019 patients regardless of their profiles, and then of outcomes among cases, again not possible for 2019 patients. So what are they contributing?
We appreciate the reviewer's comment. We had been using these patients as a reference for patients without any recorded test. Upon reflection, however, these comparisons are unnecessary and difficult to interpret. They have been removed to help clarify our methods and the paper overall, to focus just on the COVID-19 cohort and timeframe.
In the analysis, the authors say they have scaled quantitative variables to fall on the range 0 to 1 to allow for comparisons. I don't think this is really appropriate: if the distributions differ, the coefficients are not in fact comparable with such rescaling (instead, standardization should aim to equalize the standard deviation, eg to set them to have mean 0 and sd 1).
• We agree with the suggestion that zero mean unit variance scaling is more useful for interpreting the coefficients of quantitative variables. However, instead of modifying the analysis, we removed estimates of covariate effect from our tables. Four reasons motivated this decision.
o First, while the analysis in our initial submission included various quantitative variables, age is the only quantitative variable in our current analysis, and age was not scaled. This confusion is due to our including additional paragraphs from the previous revision. o Fourth, as our analysis was run separately at different institutions using separate electronic health record systems, restricted computational frameworks ,and numerous compatibility challenges, a minor modification to the analysis entails a prohibitive time cost.
The methods section needs to describe how correlated variables were removed.
The final analysis we report did not include a removal of correlated variables. This language was part of an earlier revision and methods section and was unfortunately retained in the final version. We have removed this language and updated the section.
Supplementary table 6 to 9 were missing, which reportedly contain some of the results from the clinical data analysis, so I have not reviewed them.
These tables were uploaded into the online system, but they did not transit correctly. To simplify this, we have condensed these tables into one (Supp Table 6) and included it in the revision, as well as a new clinical analysis methods section (Supplemental Methods).
Overall it seems like the focus on the cool transcriptome work led to neglect of the clinical data. I suggest they either remove this part or invest some time in reporting it properly.
We hope these latest revisions help to clarify the reporting and our statistical analyses of the clinical data. We believe that linking the molecular data to the clinical data for this work provides a manuscript of greater depth and thoroughness.
In two minor unrelated points: the title is misleading since they have not done a spatial profiling of infection but a spatial transcriptome profiling of infection. Readers might expect to see maps of NY boroughs with the current phrasing. And, paragraph 3 in the intro contains a nonsequitur. "As NY was the epicentre… we did some science". The science doesn't seem to follow from a public health problem in a city.
We have updated the title to clarify that it was spatial omics (transcripts and protein) profiling, now stated as "Shotgun Transcriptome, Isothermal, and Spatial Omics Profiling of SARS-CoV-2

Infection."
For the introduction, we agree. We have changed the introductory clause for that 3 rd paragraph to now lead with: "To better understand the impact and progression of SARS-CoV-2 infection, we applied a multiplatform and molecular diagnostic approach to samples collected during the outbreak in NYC."