Introduction: human-specific features of brain development

Understanding the development of the human central nervous system is fundamental for unraveling its complex functions, evolutionary innovations, and pathophysiology of neuropsychiatric disorders. The central nervous system originates from the neural tube, which forms within the ectoderm in the dorsal part of the embryo. In early development, succeeding orthogonal waves of diffusible molecules, called morphogens, control the specification and differentiation of its broad domains (i.e., telencephalon, diencephalon, mesencephalon, and spinal cord), their subsequent subdivisions into regions (for instance, the distinction of hippocampus and neocortex in the dorsal telencephalon or the lateral, medial, and caudal ganglionic eminences in the basal telencephalon), and further refined suborganizations (such as the neuronal layers of the neocortex) [1]. This tissue organization is accompanied on the cellular scale by several transitions, starting from a transition of symmetrically dividing neuroepithelial cells into neurogenic radial glia (RG), the stem cells of the developing brain. RG generate over time and space sequential waves of neuronal progenitors, neurons, astrocytes, and oligodendrocytes. In parallel, cells migrate across different regions and form reciprocal interconnections through axonogenesis, finally resulting in neurons and glia developing network electrical activities and completing the process of brain development.

Compared with other mammals, the human brain has more neurons, larger neuronal diversity and pronounced morphological differences—with a complex pattern of gyri and sulci—and a greater brain lateralization compared with nonhuman primates. These features are the result of an increased duration of neurogenesis, an increased number and diversity of progenitors, and an increased complexity of cell fate programs. This divergence is supported by a longer gestational period and a postnatal maturation finishing only in the third decade of life (reviewed in [2]). The cerebral cortex is the region that has undergone the most remarkable number of changes at the anatomical, cellular, and molecular levels over the course of evolution. The human cerebral cortex has a six-layered laminar architecture, which is relatively conserved in mammals, but with a surface area 1000 times larger than that of a mouse [3]. Cortical areas, with their inherent neuronal architecture and connections, are much more diversified in humans, with many novel areas associated with cognitive functions [4, 5]. The cortex is an anatomical and functional map of our interactions with the external world, and, together with its interconnected regions, forms a biological entity responsible for higher human cognitive abilities and associated psychopathology.

The emerging discipline of functional genomics is crucial to understand the establishment of this cellular and functional architecture (see Box 1 for a glossary of important terms and technologies). Gene regulatory networks, composed by cascades of transcription factors (TFs) and epigenetic modifications, are central to the correct differentiation of every neural cell as the brain develops [6]. TFs bind to genomic noncoding regions called enhancers and the 3D chromatin architecture enables their interactions with gene promoters and other regulatory proteins, even over long distances [7]. These transcriptional machineries control the expression of their target genes in space and time (Fig. 1b). Enhancer activity is regulated in part through epigenetic modification of nearby histones (i.e., acetylation, methylation, and hydroxylation). Altogether, enhancer activities are used as an integrative mechanism for the control of gene expression and are central to the establishment and maintenance of cell identity, affecting the fate of multiple classes of neural cells [8, 9]. As such, they are controlled on a more refined level than the expression of TFs [10, 11]. This fundamental role highlights the importance of identifying TF-enhancer-gene relationships as well as their epigenetic regulation to understand how external information is integrated by the cell. Enhancers, along with other regulatory regions, therefore constitute the main “data” necessary to understand the control of brain development. Finally, mutations within those regions, such as single-nucleotide polymorphisms (SNPs), can affect transcription by, for instance, modifying TF binding affinity or chromatin conformation. Therefore, mutations in noncoding regions can have a more subtle, precise, and context-dependent impact than when affecting coding regions, making them an important subject to understand complex genetic diseases, including many neuropsychiatric diseases [12, 13].

Fig. 1: Integrative approaches to study human brain development.
figure 1

a Illustration of two complementary models, postmortem human brain tissue and iPSC-derived brain organoids. b Potential of multi-omics approaches to connect/integrate genomic, transcriptomic, and epigenomic information generated or available through datasets of in vivo/in vitro studies. Enhancer activity (identified from ATAC-seq, ChIP-seq) influences transcription (RNA-seq) of gene/s through DNA looping (identified through Hi-C). Similarly, eQTL connects risk variants (discovered via DNA-seq, SNPs, GWAS) to gene expression. These variants may disrupt transcription factor binding sites (TFBS) within enhancers. Human-specific gene regulatory mechanisms can be validated in live cell systems, i.e., iPSC-derived organoids, or human progenitor cell lines, by high throughput methods (e.g., MPRA, STARR-seq). c Studying gene-enhancer dynamics over time both in vivo and in organoids (ORG) can reveal biological insight into the formation and evolution of the human brain (e.g., the hourglass shape of interregional diversity over time between cortex (CTX), striatum (STR), cerebellum (CRB)). d Single-cell omics can define those dynamics at the cellular levels, for instance revealing how gene regulation evolves between radial glia (RG) and neuronal derivatives (N1, N2).

Studying functional genomics of the developing human brain is challenging due to the limited availability of fetal tissue for science. The generation of embryonic stem cell (ESC) and human induced pluripotent stem cell (iPSC) lines have recently offered an alternative. Using those cell lines, several labs have developed 3D in vitro models of brain development generally referred to as brain organoids [14,15,16]. By mimicking the morphology of the embryonic nervous system—especially the apico-basal polarity of the RG in the ventricular zone (VZ) and the generation of the outer VZ typical of the human brain — organoids have the inherent capability of generating the multiple cellular lineages of the brain while reproducing in vivo cell-to-cell communications and organization [17]. The organoid model presents multiple advantages, from the ability of studying individual variations using iPSCs from different human genetic backgrounds to the ability of conducting longitudinal analyses in the same genomic context. Brain organoids can be easily perturbed to model the effects of environmental and genetic factors on neural development in a controlled setting. While holding invaluable promises, the brain organoid field is still emerging and many developments remain necessary to translate its discoveries into the clinical world (see below, “Exploiting scRNA-seq to study organoids cellular composition and gene networks” section).

In this review, we will describe how the application of functional genomics to the human brain has led to the development of invaluable resources and datasets. We will then focus on recent studies that leveraged those resources to characterize the unique features of the developing human brain, and how the boundaries have been pushed by the emergence of single-cell technologies. The applications of the brain organoid model are presented and we discuss how they compare and can be integrated with their in vivo counterpart, illustrating this by our recent work and the plethora of single-cell studies recently published (Table 1). Overall, we present a blueprint of integrative studies and techniques at the crossroad of developmental biology, stem cell biology, neuroscience, genomics, and system biology to understand brain development and its disorders (Fig. 1).

Table 1 List of scRNA-seq and scATAC-seq studies in human brain organoids.

Omics and big datasets

Genomics and epigenomics

The development of cost-efficient sequencing in the last 10 years has drastically accelerated the reproducible generation of high throughput nucleic acid sequence data (hereafter referred generally as Omics). These include genomic variant discovery, like SNPs, by whole-exome or whole-genome sequencing (WGS). Transcriptome analysis by RNA-sequencing (RNA-seq) profiles global expression patterns of distinct RNA species, including messenger RNA (mRNA), microRNA or long noncoding RNA (lncRNA) and their eventual posttranscriptional modification, like alternative splicing or RNA editing. Finally, ChIP-seq directed at histone modifications and Hi-C or ATAC-seq that inform about chromatin structure, can be used to evaluate epigenomic changes (Fig. 1b) (Glossary in Box 1).

Large-scale Omics datasets are key to establish exhaustive repertories of cellular and molecular features of normal and diseased-affected brain development, estimate the validity and fidelity of in vitro models and feed in silico drug discovery. Several public–private research initiatives, national or international (e.g., NIH, EU, Simons Foundation, Allen Institute for Brain Science), have funded collaborative efforts to catalog and analyze genomic, transcriptomic and epigenomic data of cells and tissue in human, nonhuman primates, and model organisms. Data have been uploaded in open repositories such as Gene expression Omnibus or in user-friendly databases to allow further analysis by different groups and enable cross comparisons.

It is now known that many neurodevelopmental and neuropsychiatric disorders, such as autism spectrum disorders (ASD), bipolar disorder, or schizophrenia, are genetically heterogeneous and do not always follow a Mendelian inheritance [13, 18, 19]. Instead, such polygenetic disorders arise from multiple causative variants, both common and rare, and complex environmental factors. Genome and exome sequencing studies in patient cohorts and normal individuals have identified potential causative genetic variations of all sizes, from SNPs to large structural variation in the DNA [13, 20, 21], however the major challenge is to interpret their functional impact. The importance of sharing genetic and phenotypic data has fostered global networks since 2015, like Matchmaker Exchange, Decipher and GeneMatcher [22, 23], in which patients with similar genetic variants and/or phenotypes were matched. Other essential resources have been developed and implemented, including databases of diseases-associated genes, like OMIM and ORPHANET [24], or clinical interpretation of variation, like ClinVar or ClinGen [25, 26], initiating a move towards precision medicine and research. Population-wide cataloging of genetic variation, like gnomAD, has allowed the exclusion of variants that were too common at the population level to be plausible causes of diseases [27].

For complex diseases, numerous sequencing-based gene association studies have been done to link phenotype differences with variant allele frequencies. Genome-wide association studies (GWAS)—the first dating back to early 2000s—have been charting common genomic variants across the genome in individuals with or without the disease, initially using genome-wide SNPs arrays and more recently WGS. GWAS studies have demonstrated that most of such variants are found in noncoding regions of the genome, 60% associated to enhancer and super-enhancer, and so are more likely to be involved in gene regulation [28]. This approach has led to the recent identification of common risk variants for schizophrenia, albeit it required more than 36,000 subjects and 110,000 controls [29]. A GWAS effort focused upon neuropsychiatric disorders is the Psychiatric Genomics Consortium that covered 11 psychiatric disorders including attention-deficit/hyperactivity disorder, Alzheimer’s disease, ASD, bipolar disorder, eating disorders, major depressive disorder, obsessive-compulsive disorder/Tourette syndrome, post-traumatic stress disorder, schizophrenia, substance use disorders, and all other anxiety disorders [30]. Of note, the NHGRI-EBI published a catalog of GWAS (ebi.ac.uk/gwas). More focused consortia subsequently developed, for instance the Brain Somatic Mosaicism Network (BSMN), aimed at studying somatic mosaicism both in neurotypical postmortem human brains and in schizophrenia, ASD, bipolar disorder, Tourette syndrome, and epilepsy patient populations. The BSMN aims at cataloging the frequency and pattern of somatic mutations, which are not inherited but occur during the life. To overcome the challenges inherent to discovering somatic mutations, which are present at low frequency in a subset of an individual’s cells, members integrate a variety of complementary approaches which include clonal analyses, deep coverage DNA-sequencing, single-cell genomics, and cutting-edge bioinformatics, while the BSMN enables a cross-platform integrated analysis with other omic-datasets [31,32,33].

Several brain transcriptomic studies have been performed in both human and nonhuman samples offering the opportunity to probe the molecular basis of neuronal function, understanding its developmental process, and discovering conserved evolutionary mechanisms or diversity between species (reviewed in [34]). The analysis of transcripts has proved challenging, as RNA is more unstable than DNA, especially for postmortem samples. Yet, modification of transcriptional activity remains the core way to link genomic variants to alteration in gene expression through identification of expression quantitative trait loci (eQTL) and other analyses (Fig. 1). Pioneer transcriptome studies were performed in human, macaque, and mouse brain regions across lifespan by microarrays, highlighting time-dependent, layer-, region-, and species-specific features in cortical gene expression profiles [6, 35,36,37,38,39,40,41]. Larger transcriptomic datasets were soon generated using RNA-seq by several consortia (e.g., Allen brain Map, BRAINSPAN) (see ALLEN BRAIN ATLAS data portal: https://portal.brain-map.org/) from multiple cell lines, human brain regions at mid-fetal and adult stages, and from embryonic and adult mouse cortex. The GTEx project [42, 43] generated transcriptome datasets across tissues in many “normal”, non-diseased tissues and each donor was genotyped for common SNPs, creating one of the biggest eQTL studies. Such studies have offered not only the characterization of variation in gene expression levels, but also its link to genetic variants and the basic process of gene regulation. With its extension in 2017, Enhancing GTEx proposes to integrate previous data with telomere length, DNA accessibility, histone modifications, DNA and RNA methylation, somatic mutation, allele-specific expression, and protein quantification across individuals [44].

Another important role of these multicentric studies was to investigate the exquisite gene regulatory mechanisms upstream to the transcriptome, through epigenomic studies. The Encyclopedia of DNA Elements (ENCODE) consortium offered the first functional annotation of regulatory elements in the genome, both coding and noncoding and systematically in human, mouse, fly and worm. To date, the Encode includes 10,868 projects and several bio-sample types. Through many assays (DNA binding, accessibility, methylation, transcription, RNA binding, replication timing, and 3D chromatin structure) the ENCODE performed the first segmentation of the human genome into different categories of functional elements [45]. These include active enhancers, which are typically enriched in H3K27ac-labeled histones; poised enhancers, which exhibit H3K4me1; promoters, which are associated with H3K4me3; and repressed chromatin, associated with H3K27me3.

Finally, our group has taken part in an integrative omics analysis initiative called the Psychiatric Encyclopedia of DNA Elements (PsychENCODE), a consortium focused on understanding gene regulatory mechanisms in the human brain [46]. This is in contrast to the ENCODE, that focused largely on human cell lines. The PsychENCODE consortium has generated a comprehensive online resource (http://www.psychencode.org/) of transcriptomic, epigenomic, and genomic data from postmortem developing and adult human brains, both normal and diseased (schizophrenia, ASD, and bipolar disorder), and human cellular model systems. Three main research areas were pursued: dissecting human brain development, studying disease transcriptomes and its regulation, and finally integrating bulk tissue and single-cell data with deep learning approaches to deconvolute the unique features of the human brain. In the following sections, we present some of the main findings of the consortium and related research.

Emergence of single-cell omics

The recent years have seen an explosion of technologies to study genome, transcriptome, and epigenome at the single-cell level [47]. This advancement was enabled by improvements in single-cell isolation and barcoding techniques, coupled to a reduction in sequencing costs. Many single-cell isolation methods now exist, each having different advantages and caveats [48,49,50,51]. Alternative methods that rely on combinatorial barcoding to identify single cells without requiring physical isolation have also been applied successfully to neural tissues [52].

Single-cell RNA-sequencing (scRNA-seq) generates transcriptomic signatures of hundreds to millions of single cells, revealing both cellular composition (Fig. 1d) and cell-type-specific gene networks in pluricellular structures such as brains or organoids (Table 1) [47]. The quality of scRNA-seq is highly dependent on correct cell isolation, to avoid doublets (two cells instead of one), and unbiased transcript capture and amplification from each cell, to avoid representation artefacts. Indeed, most techniques capture only a limited fraction of the cell’s RNA content, leading to transcripts dropout. This unavoidable stochastic loss of transcripts requires the aggregated analysis of multiple cells to recover statistically significant information. Consequently, scRNA-seq output describes the state of cell subpopulations and not of single cells per se [47]. Although sequencing coverage is an important parameter, it has been shown that low coverage scRNA-seq (i.e., 50,000 reads per cell) is enough to identify and reconstitute cell diversity in the developing cortex [48]. While most single-cell platforms analyze only the three-prime end of mRNA, some library preparations allow to sequence the full-length mRNA transcript, improving sensitivity in isoforms detection [53].

Bioinformatic analyses of scRNA-seq have become highly complex and are still under active development (reviewed in [47, 54]). For best practice recommendations and a workflow in scRNA-seq data analysis, see [55]. While an extensive overview of current analytical steps is beyond the scope of this review, we wanted to highlight some commonly used tools. The Seurat package (developed in R by the Satija Lab) [56] has become the most commonly used in our field (referenced in 13 out of 19 scRNA organoid studies listed in Table 1) largely due to its centralized handling of the scRNA-seq analytical pipeline, including normalization, batch-effect correction, clustering, visualization, and multifeature integration [57]. Visualization is an important part of scRNA-seq interpretation and mainly relies on nonlinear dimensionality reduction (e.g., t-SNE or UMAP) to reduce the data to a 2D plot of single cells. In developmental datasets, such as fetal brain or organoids, cells are evolving along lineages (e.g., from RG to neurons). This hidden dimension can be revealed along a pseudo-time or pseudo-differentiation axis projection through trajectory analysis [58], appropriate visualization tools [59, 60] (see below, Fig. 1d and Table 1) and further verified by elegant methods such as RNA velocity analysis [61].

Improvements in scRNA-seq analytical tools are still required, especially to merge together the increasing number of datasets generated across multiple technologies and studies, including integration with multiple other cellular features. Indeed, similar to bulk methods presented in the previous section, single-cell studies are progressively becoming multimodal, capturing multiple information from the same sample or even from the same cell, including spatial, epigenomic, morphological, immunophenotype, DNA sequence or mutations, or even electrophysiological [62, 63]. Spatial transcriptomics is a recent development that allows capturing transcriptomic data from a given location in a tissue slice while retaining spatial information close to single-cell resolution [64]. Similarly, profiling single-cell epigenetic information, such as open chromatin state through scATAC-seq or DNA methylation, opens a new feature of classification of cell diversity [65]. Finally, the simultaneous collection of electrophysiological (e.g., patch-clamp or calcium imaging) and transcriptomic data from the same neural cells constitutes an important innovation for neuroscience [66, 67].

Genomics trajectories of the developing brain

Reconstituting human neural development from postmortem human tissues though transcriptomic, epigenomic, and integrative analyses: the PsychENCODE Consortium

The PsychENCODE project is aimed at defining a comprehensive map of functional regulatory genomic elements active in the human brain, differently than the ENCODE project, that mainly focused on peripheral and/or transformed cell lines [45]. The main success of PsychENCODE [46] has been the multi-omic approach that allowed a systematic characterization of noncoding elements, along with the transcriptome, in neurotypical developing and adult brains, in individuals with neuropsychiatric disorders and in human cellular models [12, 68,69,70,71,72,73]. Coupled with other notable recent studies in developmental genomics [74, 75] and single-cell studies (discussed below) this resource provides new insights into the biology of brain development and its diseases.

Among the main findings, it was observed that the overall transcriptomic signature of all brain regions undergoes a sharp transition phase between mid-fetal and late-fetal stages, suggesting that major changes occur around birth [68]. This temporal trend was accompanied by a transient drop in interregional variability. This suggests that cortical regions become more similar around birth and that adult region-specific signatures arise mainly after the late infancy stages. Part of these dynamic changes could be explained by a regional variability in cell type composition, including differences in progenitor populations during the prenatal stages and differences in mature cell types and functional diversification during later postnatal stages [68, 69]. Different levels of alternative splicing contributed to the overall transcriptional variability over time and space. As splicing dysregulation has been shown to be involved in neurodevelopmental diseases, including ASD, schizophrenia, and bipolar disorder [71], this highlights the importance of studying alternative splicing in early development.

Spatiotemporal variability was also described at the epigenomic level. Major changes in chromatin accessibility as assessed by ATAC-seq between the germinal zone and the cortical plate in fetal cortical samples reflected the transcriptomic changes that happen during neurogenesis [74]. The study also associated putative enhancers to their corresponding TF using binding sites enrichment analysis, confirming that the germinal zone accessible regions are enriched in binding sites for TF implicated in neural progenitor specification (i.e., PAX6, SOX2, ARX, EMX1/2, LHX2, etc.), although the study stopped short of comprehensively defining actual enhancers. This is important because many studies, including large-scale chromatin conformation analysis in mid-gestation brain samples, have revealed that most enhancer–promoter interactions within topologically associating domains (TAD) were long range and not with the adjacent genes [75].

Overall, there is a good concordance between DNA methylation, histone marks, and gene expression over brain development [68]. For instance, enhancers active during the fetal period were associated with genes linked to neural development functions, and became hypermethylated over the postnatal period, heralding the expected decrease in target gene activity. Genome-wide, chromatin accessibility correlated relatively well with gene expression, both at transcription start sites (TSS, r = 0.417) and at regulatory regions (putative enhancers), especially when the latter were defined using Hi-C chromatin interactions (r = 0.456) [74, 75].

All those studies constituted the bases of an integrative model for the discovery and interpretation of functional genomics of the adult human brain within the PsychEncode consortium [69]. This included adult brain bulk transcriptome, chromatin, genotype, and Hi-C and single-cell datasets from major human brain regions and merged these datasets with others available through GTEx, ENCODE, and Roadmap Epigenomics (see Fig. 1). All the datasets were uniformly processed to create many fundamental resources, including a list of brain-expressed genes, co-expression modules, 79,000 brain-active enhancers, and their putative targets; more than 2.5 million eQTLs, including relationship with splice isoforms, cell fractions, and chromatin activity. The study generated a brain gene regulatory network where TFs, enhancers, and target genes are linked to each other, based on QTLs, element-activity correlation and Hi-C data. Disease genes were linked to GWAS variants for psychiatric disorders. The regulatory network was used as an input for a machine-learning model to predict psychiatric phenotypes, giving back a threefold increase in prediction compare with other models, highlighting the value of having both epigenome and transcriptome data. This integration remains to be extended to the developmental brain and functionally validated. Interestingly, the integrative model revealed that cell composition is the major contributor to the overall developmental trajectory signature of the human brain, a central result that was only possible to obtain by using single-cell resources.

A parallel longitudinal in vitro study of iPSC-derived organoids and fetal brains (described in more detail below) generated a dataset of roughly 96,000 enhancer elements active in early brain development and linked to genes by chromatin conformation analyses, and demonstrated a good correlation between enhancer activity and gene expression along neural differentiation (see below, Fig. 1 and [70]). Altogether, this integration between transcriptome and epigenomic studies demonstrated the validity of multi-omic approaches to deconvolute complex processes like neurodevelopment.

Single-cell transcriptomic of postmortem developing brain samples

Single-cell nucleus analysis of adult human cortex have been successfully applied to decipher human-specific diversity and organization (e.g., [76, 77]), such as transcriptomic signature of multiple cortical areas [78] and integrated analyses between transcriptomic and epigenomic signatures [79]. In addition, scRNA-seq has revealed cell-type-specific alterations in multiple human brain disorders, like ASD [80], glioblastoma [81], multiple sclerosis [82, 83], or Alzheimer’s disease [84].

Single-nucleus analysis of fetal neocortex brain tissue from different post-conceptional weeks (PCW) and gestational weeks (GW) were described in a seminal series of publications by the Kriegstein group and others (GW 16 in Pollen et al. [48]; PCW12-13 in Camp et al. [85]; PCW 6-37, in Nowakowski et al. [86]; GW6-22 in Bhaduri et al. [87]). Other groups have refined and extended the single-cell spatiotemporal dynamics (GW 16-18, Darmanis et al. [77]; GW8-26 in Zhong et al. [88]; PCW 22-23 in Fan et al. [89]; 5-20 PCW in Li et al. [68]; GW17-18 in Polioudakis et al. [90]). Although the neocortex captures most of the attention, similar valuable datasets have been produced for other regions, such as the hippocampus (GW 16-27, [91]) or the embryonic ventral midbrain (PCW 6-11, La Manno et al. [92]). The quality, number of cells, and depth of analysis in those studies have followed the advances in single-cell technology. Many of these studies confirm decades of neurodevelopmental research: from the organization into ventricular, subventricular, subplate, and mantle zones, the order in which neurons of each cortical layer are specified over time, to the switch towards gliogenesis in the later stage. The full diversities of cell type transcriptomes are being characterized, covering not only RG, intermediate progenitor cells, inhibitory, and excitatory neurons, but also astrocytes and oligodendrocyte precursor cells, microglia, choroid plexus cells, mural cells, and endothelial cells [86].

In addition to cataloging the transcriptomic signatures of every single population, scRNA-seq brings the promise of refining or giving a new take on important unresolved questions of forebrain development biology in human brain, and specifically the most evolved regions such as the cerebral cortex.

Identifications of the outer- and truncated-RG subtypes (oRG, tRG) have received a particular attention for their hypothesized impact on human-specific brain size and morphology. Multiple studies have now established their molecular profiles, such as the oRG marker HOPX [93,94,95,96]. Interestingly, among the RG subpopulation, oRG presents an enrichment of the mTOR pathway, which has been implicated in pathological conditions related to dysplastic growth and defective cortical migration with co-morbid epilepsy such as focal cortical dysplasia type 2 and hemimegalencephaly [86, 97,98,99]. Another interesting aspect highlighted by these recent studies is that humans gained enhancers (referred to as HGE) which increase in activity over the course of human evolution, seem to preferentially target genes expressed in the oRG and regulate their growth during neurogenesis [74].

Another important question that has been debated in the mouse literature is the establishment of cortical patterning following either a premitotic model (specialized progenitors) or a postmitotic model (common progenitor but specialized neurons). By comparing visual and prefrontal cortex areas, Nowakowski et al. [86] noted that RG do not present regional transcriptional signatures but show evidence of a progressive divergence in gene networks during neurogenesis. They also noted that those two areas seem to mature at different speeds. This asynchronicity was also observed in another study analyzing 20 different areas, where the authors also observed different proportions of interneurons across the cortical regions [89]. This is in apparent contrast to a more recent study that found cortical areal signatures already present in RG progenitor cells [87].

The development of cortical layers and subpopulations of excitatory neurons seems to go through a phase where immature neurons expressed combination of genes, in particular TFs, that are known to be expressed by distinct cortical layers in adult neurons [68]. An example is the co-expression in embryonic and mid-fetal excitatory cells of BCL11B (CTIP2) and FEZF2 (both known markers of layers V/VI) with CUX2, an upper layer marker (layers II/IV). Another example is the co-expression of RELN and PCP4, specific for layer I and deep layers, respectively. This suggests that human neuronal cell types could be very malleable during early postmitotic differentiation and their molecular identities not completely resolved before the end of mid-fetal development.

The origin and establishment of interneurons diversity in the neocortex have received major attention and there is still a controversy regarding the capacity of dorsal cortical RG to generate interneurons [100,101,102,103]. It seems that similarly to excitatory neurons, different proportion of interneurons subtypes populate different cortical areas [89]. There is an early presence of GABAergic progenitors and SST and CALB2+ interneurons in the cortex [88, 89] and one study suggested that some SST-expressing cells could originate from the cortex around GW7 although without entering the cell cycle [88], but there is no evidence of interneurons progenitors later in the neocortex at GW17 [90]. Hence, generation of interneurons from the human neocortical neuroepithelium remains controversial.

Using GW13 to 23 samples, Liu et al. [104] cataloged the expression of lncRNAs in human neocortex development and applied it to detect 1400 lncRNA in single cells from four neocortex samples (19–23 GW) and in previously published samples (Pollen et al. [93]). They resolved cell-type-specific expression (e.g., higher expression of MEG3 or DLX6-AS lncRNA in interneurons compared with excitatory neurons), showed that some lncRNA that are barely detectable in bulk tissue are enriched only in certain cell types, and showed examples of a lncRNA regulating proliferation in RG.

The availability of scATAC-seq allows the mapping of regulatory elements in a cell-type-specific fashion. This is particularly important since regulatory elements exhibit far more cell-type specificity than genes [105, 106]. Open DNA (i.e., DNA accessible to transcriptional regulation) and the transcriptome can then be intersected across single cells [107] and used to infer gene regulatory network at the cell level [90]. This information can be integrated with cell-type enrichment of TFs and co-factors, and intersected with published epigenomic datasets [74]. Using disease variants datasets and scATAC-seq datasets, it is now possible to identify cell type enrichment not only in neurodevelopmental disorder risk genes, but also in SNPs within enhancers active in particular cell types [90, 108].

Finally, it is now possible to link transcriptomic information to functional heterogeneity, as Mayer et al. [67] demonstrated by coupling scRNA-seq with calcium imaging in dissociated cells from mid-gestational (PCW14-22) cortical plate, subplate, and germinal zone.

Altogether, multi-omics and single-cell information of developing brain samples constitute an invaluable resource. In addition to allowing to reconstitute the dynamics of brain development, they represent a necessary reference to validate and improve results obtained using human in vitro models, such as brain organoids.

Brain organoid, an in vitro model to validate features of human brain development

Although a diversity of in vitro models exists relying on human cells, we will focus on recent developments in the organoid field, presenting the incremental improvements of protocols, the multi-omics characterization and integration with brain data and the main conclusions from 5 years of single-cell study on many aspects of organoids biology (Table 1). Finally, we review innovative approaches for characterization of noncoding elements which, in our opinion, could leverage the power of organoids to answer long-standing questions of human genetics posed in the first part of this review.

Diversity of organoid protocols and applications

Organoid protocols can be separated in three major types—undirected, directed, and patterned—which reflects the extent to which molecular cues are used to guide neural differentiation. Each protocol has benefits and limitations to model different aspects of brain development. Undirected protocols rely on the capacity of neural progenitors to self-organize and yield multiple regional fates [16]. On the contrary, directed organoids take advantage of morphogen agonists or antagonists to mimic developmental cues, guiding cell fates during neural tube patterning in vivo [15, 109,110,111,112]. This encompasses canonical signaling pathways such as BMPs, TGFß, Wnts, or SHH. Although at first many directed protocols focused on obtaining telencephalon and neocortex in particular [17, 110, 111], there now exist a full repertoire of protocols to generate regional organoids, including hippocampus [113], cerebellum [114], midbrain [115], thalamus [116], and others [117]. Finally, spatial patterning of organoids is a recent addition to the field where a local molecular signal, for instance SHH, allows long distance spatial organization inside the organoid, a process mimicking the morphogen gradients fundamental to establishing positional identity during development [118].

Maintenance of healthy organoids over a long period of time is crucial for the emergence of spontaneous neuronal activity and network oscillation patterns [119]. This time requirement and the aspiration to mimic other in vivo aspects has led to new inventive methodologies: moving from static towards dynamic culture systems (e.g., spinning bioreactors [112] or SpinΩ [117], growing organoids at the air–liquid interface [120] or even incorporating engineered microfilaments [121]). Recently, Qian et al. demonstrated that cultivating thick organoid sections—instead of culturing whole organoids—improves nutrient access, decreases necrosis in the organoid core and results in an extended formation of most human neocortical layers [122]. In parallel, other groups have focused their efforts on obtaining all neural cell types, including astrocytes [123] or oligodendrocytes [124, 125]. Important for favoring neuronal maturation and energy exchanges, vascularization has been modeled by transplanting organoids in mouse brains [126] or by incorporating external mesodermal sources [127, 128]. Proper angiogenesis and reproducing vascular cues will certainly improve both longevity and fidelity of neural organoids since in mouse they strongly influence neurogenesis dynamics in a region-dependent manner [129]. Microglial cells are another mesodermal cell type that is a key player of neural development and are difficult to obtain in brain organoids. While it has been reported that an undirected protocol can yield some microglia [130], others proposed to rely on an external source [131,132,133]. Finally, to study interregional migration and connectivity over development, several studies work on a fusion of different regionally directed organoids, baptized assembloids [116, 134, 135]. Interestingly, human cells within organoids can form electrophysiologically active connections with mouse cells in xenografts, as nerve tracts have been modeled using co-culture with mouse spinal cord-muscle explants [120, 126]. Altogether, these studies show that the organoid research field is innovating new approaches at a rapid pace to explore what can be done using this model system.

In addition to studying normal development, the organoid model has also been successfully used to characterize the neurological impact of many conditions, including the Zika virus [117], drug exposure [136], syndromic mendelian gene mutations (e.g., in Rett syndrome [137] or Timothy syndrome [138]), and common idiopathic neuropsychiatric disorders such as ASD [110, 139] and schizophrenia [140, 141]. For this, organoids have the unprecedented advantage to be able to model and manipulate pathological mechanisms in a controlled human neural system, deciphering their impact on a large range of features in a dynamic manner (Fig. 1a, c), which is difficult to achieve in other relevant models such as postmortem human brain.

Multi-omics integration of organoids with the developing human brain

The correspondence between organoids and normal brain development has been difficult to investigate, in part due to scarcity of human brain samples, especially at the early stage of development, as well as genetic heterogeneity. The analysis of gene regulation in organoids has been pioneered by Amiri et al., who performed an integrative analysis of enhancer activity and gene expression in organoids derived from fetal skull fibroblasts of three individuals, and compared it to the isogenic fetal human brains [70]. Gene expression dynamics were assessed by RNA-seq during the transition from stem cell to neuronal progenitor cells and from progenitor to neurons, and reflected cell cycle exit, increase in neuronal differentiation, transcriptional regulation in cortical precursor cells and increase in synaptic transmission, cell adhesion, and axon guidance. Transcriptome comparison between organoids, isogenic brains and the PsychEncode developmental dataset showed that organoids mapped more closely to human cortex before 15 PCW.

The study also included cortical brain tissue from the same subjects (15–17 PCW), as internal reference tissue. In comparison with organoids, brain samples were enriched in more mature neuronal transcripts while depleted in transcript related to RG and cell division. Noncoding elements (enhancers, promoters, and repressed chromatin) were mapped by ChIP-seq for several histone posttranslational modifications. Comparison in enhancer number and activity between organoids and brains revealed 1.8 more enhancers in organoids, as 59% of the enhancers still active in organoids were already inactivated in the mid-fetal brain samples.

Overall, this study identified over 300,000 putative enhancers active in organoids and fetal brains during development [70]. Proximity and chromatin conformation analyses were used to link these putative enhancers with their target genes. About 30% (96,375) of the enhancers, among which 10% were novel, could be associated with protein coding genes. Among the ~96,000 gene-linked enhancers, 35% were shared with the isogenic human cortex. Based on correlations between enhancer activity and the expression level of their associated genes, enhancers were cataloged into potentially activating (A-reg) or repressing (R-reg) regulators of gene expression. This classification was reflected in A-reg and R-reg being significantly enriched in genes respectively upregulated or downregulated over time.

Gene expression and enhancer activities were then modeled into a weighted gene correlation network (WGCNA) encompassing 54 co-expressed gene modules, and 29 co-active enhancer modules with specific activity profiles and biological annotations consistent with the organoid’s developmental trajectory [70]. Over 24% of the SFARI ASD-associated genes were differentially expressed in the organoids over time, and 80% were associated to enhancers active in both organoids and fetal brain. Genes associated with ASD by postmortem transcriptome analyses were significantly overrepresented in three gene modules related to synapse development and the regulation of cell proliferation. Similar enrichment was observed in the corresponding gene-associated enhancer modules, most of them showing an upregulated activity across development.

Organoids have also been shown to be a promising system to study the genetic mechanisms driving human brain evolution. Over 60% of the human gained enhancers [142], those set of enhancers that possess increased activity in early human brain development compared with rhesus macaque and mouse brains, were active in organoids, particularly at the earliest stages, and showed decreasing activity along differentiation. This evidence suggests that organoids can capture dynamic gene regulatory events, pointing them out as potentially involved in brain neurodevelopmental disorders.

The findings by Amiri et al. provides the most comprehensive integrative analysis of gene-enhancer interactions in human brain organoids so far, where enhancers were identified by a combination of peak calling (ChIP-seq) and chromatin segmentation, followed by identification of interacting gene-enhancer pairs achieved by cross-reference with Hi-C data of human fetal brain. One of the most notable results was the definition of a convergent gene and enhancer network defining global pattern of expression and activity along trajectories of neural cell differentiation and maturation. These enhancers harbored mutations found in ASD probands from external datasets, suggesting that organoids may provide a system to better understand the functional impact of disease-associated risk variants located in noncoding regions of the human genome and their potential to disrupt certain TF binding sites [70].

Exploiting scRNA-seq to study organoids cellular composition and gene networks

Brain organoids are a multi-cellular 3D model by definition. Therefore, the characterization of organoid models depends on identification of cell types and structures obtained, often by immunohistochemistry using antibodies against known markers of neural development and regional identities. The scRNA-seq technology was used early on to unbiasedly identify cellular diversity and lineage in organoid models [85] and has now become a standard for the field. This has led to a recent increase in the organoid single-cell transcriptomic data available. In the 19 studies that we have reviewed and listed in Table 1, the field has accumulated transcriptomic information over close to 800,000 cells, encompassing multiple relevant stages and types of human organoids.

Overall, organoid models seem to reproduce the mechanisms and temporal dynamics of neural system development, which can be observed both in cellular composition over time and gene expression networks (Table 1). Notably, the systematic presence of cell clusters with expression signatures typical of RG (marked by PAX6, NESTIN, SOX2, etc.), intermediate precursor cells (marked by EOMES/TBR2) and different subtypes of excitatory neurons (marked by TBR1, BCL11B, SATB2, CUX1, etc.) confirm the presence of cortical neurogenesis in most models. Interestingly, the ability to clearly distinguish between multiple subtypes of cortical neurons, including Cajal–Retzius cells/layer I neurons and lower/upper layer projection neurons, seems to be dependent on a sufficient period of culture [122]. As in vivo, astrocytes and oligodendrocytes require an extensive period of development to be generated (at least 8 months for “mature astrocytes”) [123,124,125, 135]. Despite their subpallial origin, it is also surprising to note that a GABAergic lineage of interneuron is observed in cortically driven organoids [110, 119, 135, 143, 144], although they become more abundant when specifying ganglionic eminence fate. Finally, there seems to be other lineages emerging in many models, such as choroid plexus cells and ependymocytes of the VZ, although there is less clear consensus on their annotations (Table 1). Since many studies specifically use cortical organoids, a consensus on annotation strategies and markers of reference could be established in future work.

Despite this overall fidelity in reproducing brain cellular lineages and gene networks, Bhaduri et al. [87] recently exposed a divergence in the specification and maturation of different cell subtypes between organoids and brain samples (GW6-22). They link decreased neuronal maturation with an aberrant activation of oxidative and stress pathways in vitro, including glycolysis and endoplasmic reticulum stress, which could suggest improvements of organoid culture conditions (e.g., modifying glucose concentration or oxygen level).

Variability remains a major limitation of organoid models. Undirected brain organoids seem to present a higher intra- and inter-organoid regional variability, perhaps due to stochastic organization, with not only forebrain identity, but also regions such as retina, spinal cord, and others [16, 145]. Study of cortical development in those models often requires micro-dissection of cortex-like areas that can be visually identified from the rest of the organoid, although extracted regions can end up having non-cortical identity [85]. Dorsal forebrain patterned organoids seem to present less organoid to organoid and line to line variability [143, 144], although some variation, for instance in cortical areal identity pattern, have been reported [87]. The assessment of organoid cell composition variability, both between lines and between batches, is an important step relevant to statistical modeling and differential gene expression analysis. Notably, many published scRNA-seq studies are generated from 1 or 2 biologically distinct ESC or iPSC lines (14 out of 19 datasets in Table 1), and with rare intra-line variability estimation.

Integration efforts, especially across the diversity of existing protocols, should generate meaningful transversal conclusions. One limitation of the integration is the batch correction step when dealing with diverse datasets in terms of isolation technology, library preparation, and read depth level per cell, with data origin often driving the clustering [87]. Using canonical correlation analysis, Tanaka et al. recently integrated eight datasets from different studies containing both directed and undirected protocols and found similar cellular composition and gene expression per lineage [146]. With recent improvements in batch correction and integration methods [65, 147, 148], such transversal analysis should become more accessible, and lead to establishing a common single-cell transcriptomics organoid atlas defined through common marker genes and reference datasets. Such integration will lead to a clear transcriptomic definition of the cellular space of in vitro brain organoids, leading to a protocol-independent definition of artefactual cell types and structuring the diverse lineage trajectories. Such integration is vitally important to determine inter- and intra-protocol variability in cell fate.

Finally, evolution has received a particular interest in single-cell organoid studies, owing to the capacity of generating iPSCs and brain organoids from multiple primate species, including chimpanzee, orangutan, and rhesus macaque [149,150,151,152]. It was demonstrated that compared with other apes, human organoids present a delayed maturation, with less mature neuronal signatures and astrocytes presence at equivalent stages, which agrees with the longer gestational period in humans [152]. Interestingly, transcriptomic divergence in the telencephalic lineage seems to consist mainly of gains in new gene expression in humans, with related functions spanning proliferation of RG, neuron migration, and neurite formation [152]. Using scATAC-seq in complement to scRNA-seq, Kanton et al. also showed that differentially accessible peaks between human and chimpanzee have a cell-type-specific pattern and are enriched in single-nucleotide evolutionary changes. Coherently, evolution seems to affect gene networks differently in different neural lineages, with major changes occurring in astrocytes [152].

Functional validation of noncoding elements in vitro

Identifying the physical location of putative gene regulatory elements does not represent definitive proof of their functional activity in regulating gene expression. This is true even if the degree of open chromatin, quantified by ATAC-seq or ChIP-seq signals, correlates well with gene expression. There is, therefore, a need to combine the biochemical annotation-based techniques, aimed at assessing both the accessibility of chromatin (DNaseI-seq and ATAC-seq) and its interactions (Hi-C and variant technologies such as capture Hi-C [153], PLAC-seq [154], and HiChIP [155]) with orthogonal assays more directed on demonstrating regulatory element activity and their effect on target genes.

In this regard, massively parallel reporter assay (MPRA) [156, 157] (Box 1) allows testing thousands of regulatory elements in a single experiment. A synthetized library of candidate sequences is cloned into a vector, wherein each candidate element is upstream of a basal promoter and a reporter gene. Each reporter gene is associated with a unique barcode providing a quantitative readout of the cognate candidate enhancer activity. Permitting the artificial introduction of SNPs [158], eQTL [159] or potential TF binding sites disrupting variants [160], MPRA is a powerful system for studying the effect of traits or disease-associated genomic variations on the functionality of regulatory regions. The shortcoming of this technique is the limited size of the tested fragments, that might not fully represent the entire enhancer’s region. Furthermore, both the use of episomal and lentiviral (lentiMPRA [161, 162]) reporter vectors do not reflect the enhancers functionality in their endogenous genomic context. Finally, the MPRA is a good tool to measure enhancers’ activity but fails to identify their endogenous target gene/s.

An alternative method for the enhancers’ validation is STARR-seq (self-transcribing active regulatory region sequencing [163]) (Box 1). Differently than MPRA, the candidate regulatory sequences are placed downstream of the minimal promoter and reporter gene, and therefore they will be transcribed as “enhancer RNA” such that each enhancer sequence works as its own barcode. Like MPRA, STARR-seq allows to investigate the functional activity of regions of interest selected by other predictive analyses (CapSTARR-seq [164], ChIP STARR-seq [165]) or to test the functional effect of disease-associated SNPs but removes any epigenetic contextualization.

Using the CRISPR/Cas9 technology [166] (Box 1) is complementary to MPRA and STARR-seq as it allows to perturb the sequences of interest in their native context and can also reveal the cognate gene(s) of enhancer regulation. Many studies use the active Cas9 and sgRNA libraries to “destroy” specific noncoding regions (up to hundreds of kilobases) in order to affect the expression of target genes [167,168,169,170,171]. An alternative approach alters the epigenomic landscape rather than the genomic sequence of the target region (CRISPR-epigenome editing). A catalytically inactive Cas9 (dCas9), fused with functional repressor or activator domains, triggers, respectively, repressive (CRISPR interference, or CRISPRi) or activating (CRISPR activation, or CRISPRa) chromatin modifications [172, 173]. Alongside CRISPR-mediated strategies based on detectable features (expression of the genetic reporter [171], drug resistance [169], or growth assays [174, 175]) recent works have combined CRISPRi-based enhancers perturbation with scRNA-seq in order to evaluate the variation of the transcriptome across the genome at single-cell resolution [176, 177]. However, all the CRISPR-based screens are not exempt from technical issues such as the potential presence of false positives and false negatives, or inefficiency of the Cas9-fused repressors/activators on certain enhancers [178].

The still unexplored combination of the above-mentioned methods (MPRA, STARR-seq, and CRISPR) with the powerful system of the cerebral organoids can open new avenues for a deeper understanding of the regulatory network involved in brain development and disorders.

Future research directions

There is an impressive amount of data available for the developmental neuroscience field, both from postmortem brain samples and in vitro models, which are largely derived from genome-scale sequencing efforts at both DNA and RNA levels. As demonstrated by some of the most recent studies presented here, we argue that the future lies in integration of these different levels of analyses. Impactful results can come from the intersections between transcriptomic information, epigenomic context, and genomic variation, which could be made even more compelling by integration of imaging and electrophysiological results. Targets relevant to the clinic can be obtained by intersecting those results with GWAS, WGS, exome and other databases to begin to understand disease pathophysiology during development of the brain. We tentatively summarize this multimodal approach in Fig. 1.

As all new technology, brain organoids come with obstacles and challenges. Although organoids resemble the embryonic to early fetal human brain at the molecular level, along with the capacity to generate most of the neural lineages found in humans, they are still less mature compared with adult neurons. Moreover, this system lacks the capacity to fully recapitulate features of human brain development like gyrification, full distinct cortical neuronal layers, gliogenesis, and complex neuronal circuitry formation. Features that undoubtedly will be explored in future years are how to promote vascularization and proper morphogenetic patterning to acquire better fidelity to in vivo development and try to recapitulate later stages of fetal development. Meanwhile the human iPSC-derived brain organoids have opened new ways to analyze brain development for a specific individual and in a longitudinal fashion. They promise to help us gain better understanding of the field of functional genomics, including defining enhancer–gene relationships and other regulatory mechanisms that govern brain development (Fig. 1b).

Multimodal studies, incorporating genome-scale analyses with other biological features of both brain tissue and organoids, have become more complex to understand. Developmental neurobiology is undergoing a progressive transformation from traditional one gene-one function studies to integrative “big data” studies, and the shift towards single-cell analysis has exponentially increased the amount of information available. We predict that transversal and meta-analysis will reveal more than originally meet the eyes, and that computational biology and machine-learning techniques will allow attaining a deeper understanding of brain development.

Funding and disclosure

The authors declare no competing financial interests related to this article. FMV, DC, and SS are members of the PsychEncode consortium. This work was funded by NIH grants #R01 MH109648, #R56 MH114911, #U01 MH103365, and Simons Foundation grant# 632742.