Dynamic changes in the epigenomic landscape regulate human organogenesis and link to developmental disorders

How the genome activates or silences transcriptional programmes governs organ formation. Little is known in human embryos undermining our ability to benchmark the fidelity of stem cell differentiation or cell programming, or interpret the pathogenicity of noncoding variation. Here, we study histone modifications across thirteen tissues during human organogenesis. We integrate the data with transcription to build an overview of how the human genome differentially regulates alternative organ fates including by repression. Promoters from nearly 20,000 genes partition into discrete states. Key developmental gene sets are actively repressed outside of the appropriate organ without obvious bivalency. Candidate enhancers, functional in zebrafish, allow imputation of tissue-specific and shared patterns of transcription factor binding. Overlaying more than 700 noncoding mutations from patients with developmental disorders allows correlation to unanticipated target genes. Taken together, the data provide a comprehensive genomic framework for investigating normal and abnormal human development.

For all statistical analyses, confirm that the following items are present in the figure legend, table legend, main text, or Methods section.
n/a Confirmed The exact sample size (n) for each experimental group/condition, given as a discrete number and unit of measurement A statement on whether measurements were taken from distinct samples or whether the same sample was measured repeatedly The statistical test(s) used AND whether they are one-or two-sided Only common tests should be described solely by name; describe more complex techniques in the Methods section.
A description of all covariates tested A description of any assumptions or corrections, such as tests of normality and adjustment for multiple comparisons A full description of the statistical parameters including central tendency (e.g. means) or other basic estimates (e.g. regression coefficient) AND variation (e.g. standard deviation) or associated estimates of uncertainty (e.g. confidence intervals) For null hypothesis testing, the test statistic (e.g. F, t, r) with confidence intervals, effect sizes, degrees of freedom and P value noted Give P values as exact values whenever suitable.

For Bayesian analysis, information on the choice of priors and Markov chain Monte Carlo settings
For hierarchical and complex designs, identification of the appropriate level for tests and full reporting of outcomes Estimates of effect sizes (e.g. Cohen's d, Pearson's r), indicating how they were calculated Our web collection on statistics for biologists contains articles on many of the points above. Please also see preceding section. We have provided these descriptions in the Methods including the version and referenced the source. This includes new code freely available on Github and fully referenced with hyperlink in the manuscript.

Software and code
Imaging software used in the study: CellSens Olympus software Fiji ImageJ nature research | reporting summary

October 2018
For manuscripts utilizing custom algorithms or software that are central to the research but not yet described in published literature, software must be made available to editors/reviewers. We strongly encourage code deposition in a community repository (e.g. GitHub). See the Nature Research guidelines for submitting code & software for further information.

Data
Policy information about availability of data All manuscripts must include a data availability statement. This statement should provide the following information, where applicable: -Accession codes, unique identifiers, or web links for publicly available datasets -A list of figures that have associated raw data -A description of any restrictions on data availability Field-specific reporting Please select the one below that is the best fit for your research. If you are not sure, read the appropriate sections before making your selection.

Life sciences Behavioural & social sciences Ecological, evolutionary & environmental sciences
For a reference copy of the document with all sections, see nature.com/documents/nr-reporting-summary-flat.pdf

Life sciences study design
All studies must disclose on these points even when the disclosure is negative. No sample size was chosen for the human embryo collection that contributed to the study. For the zebrafish transgenic experiments we ensured multiple stable transgenic founder lines were established (range: 3 to 7 according to the usual variation in the success of trangenesis). All founder lines were included in subsequent analyses of reporter gene expression. 100% of founder lines yielded correct reporter gene expression strongly implying that sample sizes were sufficient. These details are included in Supplementary table 6 and referred to in the manuscript text.
For the example analysis of wildtype and mutant enhancer cardiomyocyte differentiation of hPSCs, a power calculation was not undertaken because it was not possible at the outset to infer an effect size. Therefore, we ensured an analysis of a large number of embryoid bodies, aiming at 30 per group. The control group was reduced to 29 because of one technical failure of cell culture. All EBs were included in the analyses of GFP fluorescence and gene expression across three independent experiments. Conclusive results were obtained for RBM24 gene expression.
The only data excluded are described in the Methods for the 1kb binning which pertains to the results from Figure 5 onwards: 'Reads from mitochondrial and unplaced chromosome annotations were removed. A further 697 bins were filtered out for possessing >10,000 reads in all samples or if the mean read count from input controls was >50% of the mean read count of all samples or for being situated in pericentromeric regions (using table ideogram from UCSC; listed in Supplementary table 8).' Mitochondrial reads were excluded because they emanate from a separate genome. Annotations that were unplaced were excluded because there is no reliability about their origin. The 697 bins with massively high read counts across all samples were excluded because of the near certainty of these reads being technical artifacts. These exclusion criteria were pre-established as part of commonplace analysis pipelines for ChIPseq data.
All results were reproducible. For the ChIPseq data replicates were undertaken for 11 out of the 13 tissues. Where investigation is restricted to the replicated samples this is clearly stated in the manuscript Results text. For the promoter state analysis, both replicates gave equivalent results and the same categorization. For the zebrafish trangenics, the expected profile of GFP detection was observed in 100% of multiple founder lines. Major batch effect was excluded by hierarchical clustering (Supplementary figure 13). Correlation was proven between MACS peak calling and the 1kb binning approach and shown in Supplementary figure 14. Replication was ensured for the hPSC differentiation by undertaking independent experiments in triplicate with each group containing 10 embryoid bodies (except one control group with 9).
This section on Randomzation does not feel relevant to the approaches that we undertook which are described in the manuscript. Analysis was undertaken on specific tissues with no element of different treatments being undertaken (e.g. as would need randomization to avoid bias in drug treatment trials).

Reporting for specific materials, systems and methods
We require information from authors about some types of materials, experimental systems and methods used in many studies. Here, indicate whether each material, system or method listed is relevant to your study. If you are not sure if a list item applies to your research, read the appropriate section before selecting a response. Note that full information on the approval of the study protocol must also be provided in the manuscript.

Human research participants
Policy information about studies involving human research participants Population characteristics

Recruitment
Ethics oversight Note that full information on the approval of the study protocol must also be provided in the manuscript.

ChIP-seq Data deposition
Confirm that both raw and final processed data have been deposited in a public database such as GEO.
Confirm that you have deposited or provided access to graph files (e.g. BED files) for the called peaks.

Data access links
May remain private before publication.
Blinding is not relevant to the undertaking of the RNAseq and ChIPseq analyses because the genome-scale bioinformatic investigations are not user-defined, i.e. the opportunity for user bias is removed. For the zebrafish analysis the images are provided as the data. While blinding would theoretically be possible for the analysis of fluroescence in the wildtype and mutant embryoid bodies, in reality the entire EB was subject to automated fluorescent measurement in providing the data in Fig. 8e, such that user bias would not be possible.
The antibodies used are described in the 'ChIPseq antibody section' below.
These antibodies have been extensively used for ChIP-seq histone modification analyses.
Zebrafish (danio rerio), embryos are injected prior to knowing sex at the 1-cell stage.
No wild animals were used No field collected samples were used All protocols used have been approved by the Ethics Committee of the Andalusian Government (license numbers 450-1839 and 182-41106 for CABD-CSIC-UPO). This is stated in the Methods.
Embryonic material was collected from women undergoing termination of pregnancy in Manchester University NHS Foundation Trust. The women referred to this clinical service represent the diverse ethnicity and demographics of women of fertile age (over the age of 16) in the Greater Manchester region of the UK.
We approach all women who have given clinical consent within the confines of our ethical approval (over 16 years, without undue emotional distress). Our study population reflects the ethnically diverse population of Greater Manchester and so ascertainment bias is not anticipated compared to other human embryonic tissue sources. North West Regional Ethics Committee as stated in the manuscript.
ChIPseq and RNAseq datasets have been deposited in the European Genome Phenome repository under accessions: EGAS00001003738 and EGAS00001003163. Supplementary tables 1-3 detail the human embryonic material contributing to