Strain-resolved analysis of hospital rooms and infants reveals overlap between the human and room microbiome

Preterm infants exhibit different microbiome colonization patterns relative to full-term infants, and it is speculated that the hospital room environment may contribute to infant microbiome development. Here, we present a genome-resolved metagenomic study of microbial genotypes from the gastrointestinal tracts of infants and from the neonatal intensive care unit (NICU) room environment. Some strains detected in hospitalized infants also occur in sinks and on surfaces, and belong to species such as Staphylococcus epidermidis, Enterococcus faecalis, Pseudomonas aeruginosa, and Klebsiella pneumoniae, which are frequently implicated in nosocomial infection and preterm infant gut colonization. Of the 15 K. pneumoniae strains detected in the study, four were detected in both infant gut and room samples. Time series experiments showed that nearly all strains associated with infant gut colonization can be detected in the room after, and often before, detection in the gut. Thus, we conclude that a component of premature infant gut colonization is the cycle of microbial exchange between the room and the occupant.

I have tried very hard to uynderstand the diagrams on sampling, and the authors have clearly tried very hard to put their complex methodology into a visual form, but I am afraid it was impossible to understand. I think the authors need to put into this paper more clear details around this very important aspect.
It is clearly possible that the babies colonise the environment not vice versa -again the data as presented did not to me justify why they feel that is definitely environment to baby rather than baby to environment In addition as this is DNA the authors have not shown viability of the strains, simply presence.
Reviewer #3 (Remarks to the Author): this manuscript intends to determine the direct link between environmental bacteria in NICU with the bacteria colonized in the preterm infants using metagenomic whole genome shotgun sequencing. it is an important contribution to the current understanding of the bacterial colonization in preterm infants.
Several issues need to be considered: 1. it is critical to define the similarity of the strains from rooms and infants. in the whole manuscript, different criterion was applied to define the strains are the same origin (L57,l63,l90). Can the author explain why not using the same standard? 2. from the total 50 infants, how many of them showed evidence of direct link between the infants bacteria and room bacteria? which body site (stool, skin or oral) is affected more by the environment? 3. can the authors comments how much influence of the room bacteria influences the infants' bacteria? from Extended fig1, only 12 strains were shared by room and infants,which seems a very small number. 4.it is known that there are inter-subjects strain variation, is there a between room variation too? 5.one caveat of the study is that although there is 99% identity between infant and room strains, it is hard to know how similar infants' strain with other potential sources such as caregiver, or mother. This make the Staphylococcus epidermidis conclusion is less solid. 6. the metadata of the infant is not provided. 7.there is no explanation of the meaning of each column in the suppl tables. it will be helpful to provide the info for certain key columns. 8. define 'bin'

Reviewer #1 (Remarks to the Author):
Here the authors test the hypothesis that hospital room environment may contribute to microbiome development in preterm babies. They used whole genomes assembled from metagenomics data, they compared compare 317 bacterial genomes from the baby feces with 231 bacterial genomes from NICU surfaces, and found extensive sharing, with the most persistent being nosocomial infections-related aero-tolerant organisms (Staphylococcus epidermidis, Enterococcus faecalis, Pseudomonas aeruginosa, and Klebsiella pneumoniae). The results have important implications in the search for measurements that decrease preterm infections in hospital NICUs.
We thank the reviewer for the positive evaluation.
In this a well written manuscript, the results are very interesting an important, but there are some issues, particularly of clarification and acknowledgement of limitations of the study.
Below we address each of you concerns and have incorporated most of your suggestions into the manuscript.
1. It seems that the results are based in 1 NICU and 6 babies. This needs to be explained, because the 50 babies studied in 2011-2014 and the 1038 samples of NICU falsely gives the impression of a big study.
We agree that the cohort information and metadata could be expanded to provide better understanding of the infants and samples involved in the study. To this end, we have added Supplementary Table 2, which includes specific SRA accessions for accessing short read data and many additional metadata fields (e.g., day of life, cohort, infant, birth day synced to study day, gestational age, sex, birth mode, birth weight, hospital, city, NICU room etc.). We think the additional metadata fields greatly improve clarity and accessibility to this study's supporting data.
To directly address your comment, the study was based on 50 infants (line 56) and 622 fecal metagenomes derived from new and previously published data (lines 55). These fecal samples are now detailed with metadata in the newly added Supplementary Table 2. The "1038 samples" reflects the number of room samples pooled in order to achieve enough biomass for deep shotgun sequencing. The details of these samples were in the initial submission, Supplementary Table 1.

How many years apart were the 6 babies separated?
Supplementary Fig.1 Table 2 for details.

The study lacks design for the time series collection of infant and NICU samples. This needs to be acknowledged or the design presented.
Please see comment 1 and the newly added Supplementary Table 2  In our revised manuscript we conduct a second analysis with maximum stringency to improve confidence that two organisms are "the same". We designate organisms whose sequences have ≥ 99.999% average nucleotide identity (ANI) as "strains" and groups of sequences that share ≥ 99 % ANI as subspecies. This terminology is important in the responses that follow. Fig 1: this is based on 3 babies and 1 NICU? Or infants were in different NICU rooms? Of these 6 babies, which shared or not the same room? How many rooms in total? . There were 14 unique rooms housing infants in the S2_2013 cohort. This room information has been added to Supplementary

The Cohort count (size of circles) is 10? Not 6?. Please clarify
There were 6 room-infant pairs for which there is metagenomics data for both room and infant.
The dots indicate subspecies found within each cohort and the size of the dot indicates how many infants (or rooms in column 4) that subspecies was detected in. For better clarity, we have changed the size to reflect the percent of infants within each cohort that contain a particular strain.
8. From 448 strain genomes assembled in the study (317 from babies and 131 from NICUs), there were 12 shared between NICU and babies. But how many were found in the total 50 babies or in the 6 resampled babies? Supplementary Fig. 2.

How many NICUs? A single NICU in one hospital? Please clarify
The study was conducted in one NICU and each baby had its own room. Rooms and hospital columns have been added to the recently updated metadata. Additionally, a more detailed description of the NICU has been sited in the main body text (references 4, 9, 10, and 11). Please see comment 1 and the newly added Supplementary The article was originally submitted in Letter format, limiting the length in which we could extend discussion and speculation of the findings. While we could speculate on these topics, we have ongoing research to directly answer many of these questions and feel that these considerations are outside the scope of the current study. However, we have added substantial new analysis and figures. We have added discussion of the room reservoirs and briefly speculate that deeper sequencing of the room might have revealed greater overlap (this would point to strong infant-based selection).

Reviewer #2 (Remarks to the Author):
Thank you for asking me to review this manuscript that attempts to address the potentially important issue of whether organisms really do stay in hospital environments and then colonise new babies. The main concerns I have are around: the pooling of the DNA for the environmental samples -from this manuscript alone I really can't tell how many samples were taken and pooled, over what time frame etc.

As one of the key things the authors claim their data shows is that organisms really do stay in the environment this is crucial data, as pooling over a long time period could really compound this issue I have tried very hard to uynderstand the diagrams on sampling, and the authors have clearly tried very hard to put their complex methodology into a visual form, but I am afraid it was impossible to understand. I think the authors need to put into this paper more clear details around this very important aspect.
It is clearly possible that the babies colonise the environment not vice versa -again the data as presented did not to me justify why they feel that is definitely environment to baby rather than baby to environment Concerning directionality, we agree with the Reviewer that microbial exchange is likely bidirectional. In lines 141-161 we note that of the twelve substrains in Infant 5, five were found in the room before detection in the infant. The remaining substrains were either detected in simultaneous time points or at later time points in the room. This is suggests some are emitted from the infant to the room. The detection of these substrains long before their detection in the room (in infants from previous cohorts), suggests emission of strains from infant occupants to room surfaces may be the transmission route/cycle for these microbes.

Similar to Reviewer 1's comments, we agree with both Reviewers that the experimental design was difficult to interpret. To provide clarity, we have added Supplementary
In response to this question, we re-evaluated the criteria used to establish that strains in different reservoirs were the same. This greatly improved the robustness of the analysis of strain detection (prior to, simultaneously with, and following room occupancy).

In addition as this is DNA the authors have not shown viability of the strains, simply presence.
You are correct that we present no data to support viability in this paper. Replication rates for organisms colonizing infant 5's skin, oral, and fecal samples were recently published (line 144, reference 8). In this study, replication rates were higher in skin and mouth samples compared to the gut. The replication rate analysis strengthens our inference based on repeated detection that there is a population of actively replicating cells.

Reviewer #3 (Remarks to the Author): this manuscript intends to determine the direct link between environmental bacteria in NICU with the bacteria colonized in the preterm infants using metagenomic whole genome shotgun sequencing. it is an important contribution to the current understanding of the bacterial colonization in preterm infants.
We thank the reviewer for the positive evaluation.
Several issues need to be considered: 1. it is critical to define the similarity of the strains from rooms and infants. in the whole manuscript, different criterion was applied to define the strains are the same origin (L57,l63,l90). Can the author explain why not using the same standard?
Thank you for this very important comment. This comment, along with input from Reviewer 2, inspired us to re-evaluate the criteria used for comparative analyses. We have removed the figure that included information based on the "species-level" 96.5% threshold.
The revised analysis now has two components. In the first, we use the same 99% similarity that we calculated by the gANI algorithm as implemented in dRep using default parameters. This is comparable to stringency used in prior strain tracking analyses. We refer to these as subspecies (collections of exceedingly closely related strains) and note their relevance as potential colonizers of the infant microbiome.
The second, new analyses use the maximum stringency threshold of 99.999% (where maximum is defined based on comparison of reads and the matched genome sequence). This maximum stringency approach gives us very high confidence that the same strain was detected in different samples.

from the total 50 infants, how many of them showed evidence of direct link between the infants bacteria and room bacteria? which body site (stool, skin or oral) is affected more by the environment?
We directly address this question in the revised text (Line 107) and display the information.
"This analysis revealed 26 cases of identical strains present in both the room environment and infant gut (rows in Figure 4)." Due to the limited number of skin and oral samples, we cannot attempt to answer the question of what site is most affected, but we hope our next sampling campaign can help provide clarity on this point.

can the authors comments how much influence of the room bacteria influences the infants'
bacteria? from Extended fig1, only 12 strains were shared by room and infants,which seems a very small number.
The premature infant gut is a very low diversity environment so it is not surprising that the majority of room-associated strains were not detected in the infant gut. Importantly, however, the majority of the infant-associated organism types were found in the room.

4.it is known that there are inter-subjects strain variation, is there a between room variation too?
There are several cases where we found different strains of the same species in different rooms. The manuscript has been modified to clarify this. The information should now be apparent in Fig. 2. 5.one caveat of the study is that although there is 99% identity between infant and room strains, it is hard to know how similar infants' strain with other potential sources such as caregiver, or mother. This make the Staphylococcus epidermidis conclusion is less solid.
As noted in responses to the other reviewers, this and other reviewer comments motivated the addition of a maximum stringency analysis (at the ≥ 99.999 % ANI level) that allows us to discriminate extremely similar strains from each other and provide high confidence as to potential sources.
We agree that we cannot rule out human activity and have included this comment in the revised manuscript. Table 2 for metadata details. The new table contains specific SRA accessions for accessing short read data and many additional metadata fields (e.g. day of life, cohort, infant, birth day synced to study day, gestational age, sex, birth mode, birth weight, hospital, city, NICU room etc.). We think the additional metadata fields greatly improve clarity and accessibility to this study's supporting data.

Please see the newly added Supplementary
7.there is no explanation of the meaning of each column in the suppl tables. it will be helpful to provide the info for certain key columns.
Supplementary Tables (2-11, Extended Data 1-9) were generated using dRep (line 83). dRep provides extensive documentation to help readers interpret the output. For tables not generated with dRep we have added a comment line that provides description for each column header.

define 'bin'
Bins are generated using the methods and software sited in lines 204-220.