Primate gastrulation and early organogenesis at single-cell resolution

Our understanding of human early development is severely hampered by limited access to embryonic tissues. Due to their close evolutionary relationship with humans, nonhuman primates are often used as surrogates to understand human development but currently suffer from a lack of in vivo datasets, especially from gastrulation to early organogenesis during which the major embryonic cell types are dynamically specified. To fill this gap, we collected six Carnegie stage 8–11 cynomolgus monkey (Macaca fascicularis) embryos and performed in-depth transcriptomic analyses of 56,636 single cells. Our analyses show transcriptomic features of major perigastrulation cell types, which help shed light on morphogenetic events including primitive streak development, somitogenesis, gut tube formation, neural tube patterning and neural crest differentiation in primates. In addition, comparative analyses with mouse embryos and human embryoids uncovered conserved and divergent features of perigastrulation development across species—for example, species-specific dependency on Hippo signalling during presomitic mesoderm differentiation—and provide an initial assessment of relevant stem cell models of human early organogenesis. This comprehensive single-cell transcriptome atlas not only fills the knowledge gap in the nonhuman primate research field but also serves as an invaluable resource for understanding human embryogenesis and developmental disorders.

Nature Portfolio wishes to improve the reproducibility of the work that we publish. This form provides structure for consistency and transparency in reporting. For further information on Nature Portfolio policies, see our Editorial Policies and the Editorial Policy Checklist.

Statistics
For all statistical analyses, confirm that the following items are present in the figure legend, table legend, main text, or Methods section.
n/a Confirmed The exact sample size (n) for each experimental group/condition, given as a discrete number and unit of measurement A statement on whether measurements were taken from distinct samples or whether the same sample was measured repeatedly The statistical test(s) used AND whether they are one-or two-sided Only common tests should be described solely by name; describe more complex techniques in the Methods section.
A description of all covariates tested A description of any assumptions or corrections, such as tests of normality and adjustment for multiple comparisons A full description of the statistical parameters including central tendency (e.g. means) or other basic estimates (e.g. regression coefficient) AND variation (e.g. standard deviation) or associated estimates of uncertainty (e.g. confidence intervals) For null hypothesis testing, the test statistic (e.g. F, t, r) with confidence intervals, effect sizes, degrees of freedom and P value noted Give P values as exact values whenever suitable.
For Bayesian analysis, information on the choice of priors and Markov chain Monte Carlo settings For hierarchical and complex designs, identification of the appropriate level for tests and full reporting of outcomes Estimates of effect sizes (e.g. Cohen's d, Pearson's r), indicating how they were calculated Our web collection on statistics for biologists contains articles on many of the points above.

Software and code
Policy information about availability of computer code Data collection Single-cell libraries were constructed using the Single cell 3' Library & Gel Bead Kit v3 of 10X Genomics, and were sequenced on an Illumina Hiseq X Ten platform.
For manuscripts utilizing custom algorithms or software that are central to the research but not yet described in published literature, software must be made available to editors and reviewers. We strongly encourage code deposition in a community repository (e.g. GitHub). See the Nature Portfolio guidelines for submitting code & software for further information.

Data
Policy information about availability of data All manuscripts must include a data availability statement. This statement should provide the following information, where applicable: -Accession codes, unique identifiers, or web links for publicly available datasets -A description of any restrictions on data availability -For clinical datasets or third party data, please ensure that the statement adheres to our policy Raw data and processed data were uploaded to the NCBI Gene Expression Omnibus (GEO) database with the accession number GSE193007.

Field-specific reporting
Please select the one below that is the best fit for your research. If you are not sure, read the appropriate sections before making your selection.

Life sciences Behavioural & social sciences Ecological, evolutionary & environmental sciences
For a reference copy of the document with all sections, see nature.com/documents/nr-reporting-summary-flat.pdf

Life sciences study design
All studies must disclose on these points even when the disclosure is negative.

Sample size
The sample size of this study was determined based on availability of highly regulated primate embryo samples. In compliance with the 3R guidelines, we reduced the number of used animals to minimum and obtained pregnant females uteri at E20 (n=2), E22 (n=1), E23 (n=2), E26 (n=1), and E29 (n=1), which allowed us to obtain high coverage transcriptome for each cell type, and perform confident downstream analyses.
Data exclusions First, single cells with a number of detected genes (nFeature_RNA) above 500 and detected transcripts (nCount_RNA) above 1000 were retained to exclude the apoptotic or dead cells. Then, the doublet or multiplet cells were figured out with the Scrublet, according to the recommended multiplet rate reference table from 10X Genomics (Wolock et al., 2019). Next, the Seurat objects of different samples were created independently with the expression matrix and metadata containing cell barcodes, cell status, and assignment information identified by Souporcell and cell multiplet information inferenced by Scrublet, then these Seurat3 objects were merged.

Replication
Sequenced samples from two independent embryos of the same stages showed similar gene expression patterns.
Since developmental stage of embryo in utero is uncontrollable, though we collected monkey embryos by calculating the day post fertilization and combining with b-ultrasound, CS10 embryo was not successfully collected, so not performed.
The IF experiments on mouse embryos were repeated in three independent biological samples. The stem cells experiments were independently repeated at least three times. All attempts of experiment replication were successful.
Randomization Samples were not allocated into randomized groups. Randomization was not relevant to the study. All embryo samples were analyzed individually.

Blinding
Blinding of the investigators was not possible due to study design and was not relevant to the study. It was not possible to blind the experiments during neither embryo collection nor single cell collection. We performed lineage assignment in an unbiased way, in detail we assigned samples to lineages based on their gene expression profile and then validated our findings by their localization within the embryo.

Reporting for specific materials, systems and methods
We require information from authors about some types of materials, experimental systems and methods used in many studies. Here, indicate whether each material, system or method listed is relevant to your study. If you are not sure if a list item applies to your research, read the appropriate section before selecting a response.