Periodic formation of epithelial somites from human pluripotent stem cells

During embryonic development, epithelial cell blocks called somites are periodically formed according to the segmentation clock, becoming the foundation for the segmental pattern of the vertebral column. The process of somitogenesis has recently been recapitulated with murine and human pluripotent stem cells. However, an in vitro model for human somitogenesis coupled with the segmentation clock and epithelialization is still missing. Here, we report the generation of human somitoids, organoids that periodically form pairs of epithelial somite-like structures. Somitoids display clear oscillations of the segmentation clock that coincide with the segmentation of the presomitic mesoderm. The resulting somites show anterior-posterior and apical-basal polarities. Matrigel is essential for epithelialization but dispensable for the differentiation into somite cells. The size of somites is rather constant, irrespective of the initial cell number. The amount of WNT signaling instructs the proportion of mesodermal lineages in somitoids. Somitoids provide a novel platform to study human somitogenesis.

S omites are transient blocks of cells that give rise to a variety of tissues, including the vertebrae, rib cage, skeletal muscle, and part of the skin, in vertebrate embryos [1][2][3] . Bilateral pairs of somites periodically bud off from the presomitic mesoderm (PSM) along the anterior-posterior axis: as the mesenchymal cells migrate from the posterior PSM region near the tailbud to the anterior PSM region, they undergo a mesenchymal-epithelial transition (MET), acquiring the apical-basal polarity and elongated shapes. The anterior PSM cells eventually form spherical epithelial somites surrounding a core of mesenchymal cells. The somite formation, or somitogenesis, starts around day 20 after fertilization (Carnegie stage 9) in human embryos, and a total of 40 pairs of somites are formed 4 . The timing of sequential somitogenesis is controlled by the segmentation clock, a molecular oscillator that peaks its activity every 5-6 hours in humans 2,[5][6][7][8][9] .
Several aspects of somitogenesis have recently been recapitulated in vitro with pluripotent stem cells. PSM-like flat tissues made from mouse embryonic stem cells (ESCs) display the segmentation clock accompanied by tissue boundary formation 10 . Gastruloids are ESC-derived embryonic organoids that mimic early developmental events, including three-germ layer differentiation and axis patterning [11][12][13] , and mouse gastruloids embedded in an extracellular matrix (ECM) surrogate, Matrigel, form a string of single somites 14 . Mouse ESC-derived trunk-like structures (TLSs) form both the neural tube and bilateral somites 15 . Regarding human somitogenesis, several groups have induced PSM and somite cell fates from human pluripotent stem cells in 2D cultures and recapitulated the segmentation clock [6][7][8][9][16][17][18] . However, somites or epithelial structures are not formed in those human stem cell-derived models. Although 3D human 'somitoids' have recently been reported, the somite-like structures are not sequentially formed 19 . Thus, there is no in vitro model of human somitogenesis so far that recapitulates both the periodic somite formation coupled with the segmentation clock and the maturation into epithelial structures.
In this study, we report human embryonic organoids that periodically form pairs of somite-like structures with the mature epithelial organization along the anterior-posterior axis.

Results
Human iPSCs self-organize into somite-like structures. To make human organoids that form somite-like structures (hereafter referred to as somitoids), we first made aggregates of human induced pluripotent stem cells (iPSCs) in a low attachment U-bottom plate (Fig. 1a). The aggregates were treated for 2 days with a cocktail of signaling molecules and inhibitors that has been used previously to induce the human PSM cell fate in 2D culture conditions 7,8,17 . The cocktail comprises CHIR99021 (a WNT signaling activator through the inhibition of GSK3β), bFGF, SB431542 (a TGFβ signaling inhibitor), and DMH1 (a BMP signaling inhibitor), mimicking WNT and FGF activation as well as TGFβ and BMP inhibition seen in the presumptive PSM region of mouse and chick embryos 17 . After 2-day treatment, the cocktail was gradually diluted by medium changes. The aggregates became oval-shaped around days 3-4 and then further elongated, reminiscent of gastruloids and TLSs [11][12][13][14][15] (Fig. 1b; Supplementary  Fig. 1a). Matrigel was added at a 10% concentration on day 4, and the somite-like ball structures first appeared around days 4-5 ( Supplementary Fig. 1a). Approximately 10 (pairs of) somites were found per somitoid on day 7 (Fig. 1c-e).
The average length (along the anterior-posterior axis) and width of newly formed single somites were 116 µm and 157 µm, respectively, while those of paired somites, measured as individual somites, were 110 µm and 104 µm, respectively ( Supplementary  Fig. 2a). Namely, the somites formed as pairs were smaller than the single somites. Even though precisely measuring the size of somites of human embryos is challenging due to the limited resources, the somites of somitoids were in a comparable size range of in vivo human somites or in vitro mouse somite-like structures 15,20 (Supplementary Fig. 2). The somite width and the somite-to-somite distance showed a gradual increase from the posterior (newly formed) somites to the anterior (older) somites (Fig. 1f), as reported in chicken embryos and mouse gastruloids 14,21 . The older somites also tended to be more spherical (Supplementary Fig. 3; see Fig. 1c, for example), possibly due to dominant tissue surface tension in the absence of constraints from neighboring tissues 15,22,23 .
To further examine the spatial gene expression patterns, we visualized the marker genes with immunostaining and in situ hybridization chain reaction (HCR) 29 . The co-expression of BRACHYURY (a mesodermal marker) and SOX2 (a neural marker) in somitoids indicates the existence of NMPs, bipotent progenitor cells that give rise to both mesodermal and neural lineages [24][25][26] (Supplementary Fig. 5). The NMP region at the posterior end of somitoids was surrounded by PSM marker (TBX6 and HES7)-positive cells, and the PSM region extended slightly more anteriorly (Supplementary Fig. 5; Fig. 1g). By contrast, a somite marker UNCX4.1 was expressed in somite-like structures in the anterior regions of somitoids (Fig. 1g). The boundary between HES7 and UNCX4.1 corresponds to the newest somite formed from PSM. TBX18, a marker for the rostral halves of somites 1 , showed stripe patterns in somites as reported in the somite-like structures of mouse gastruloids 14 . The alternating patterns of UNCX4.1 (a marker for the caudal halves of somites) and TBX18 indicate the rostral-caudal patterning in individual somites 1 (Fig. 1g, 30 . These results demonstrate that the NMPs, PSM, and somites are properly located along the posterior-anterior axis of somitoids and that each somite has rostral and caudal compartments. To examine the maturity of somites, we stained markers for epithelialization 31 . A tight junction marker ZO-1 showed clear localization to the inner surface of somites (Fig. 1h, i), demonstrating the establishment of the apical-basal polarity in somite cells and the formation of the apical lumen in somites. The F-actin staining with Phalloidin also showed a stronger signal on the apical surface of somites (Fig. 1h, i) and demarcated the elongated shapes of epithelial somite cells (Fig. 1j, k, Area 3). The nuclei aligned along the basal surface of mature somites (Fig. 1j, Area 3). By contrast, the F-actin and nuclear stainings showed more random patterns in the PSM/NMP regions, and those immature mesenchymal cells had rounded morphologies (Fig. 1j, k, Area 1). These results demonstrate that a MET happens while PSM cells differentiate into somite cells along the posterior-anterior axis of somitoids and that the formed epithelial somites are mature enough to establish the apical-basal polarity. Between paired epithelial somites, we sometimes observed a line of cells ( Supplementary Fig. 7a). However, those cells did not display any continuous structure that resembles a neural tube or notochord. Some cells at the center were BRACHYURY and SOX2 double-positive (Supplementary Fig. 7b; Supplementary Movie 3), suggesting that they are NMPs. Thus, we concluded that somitoids display transcriptional and morphological hallmarks of somitogenesis but not those of neural tube or notochord formation and that the formed somites possess organized epithelial structures (Fig. 1l). This conclusion supports the idea that human somitogenesis is largely a tissue-autonomous process that does not require adjacent tissues, including the structured neural tube and notochord, consistent with previous results of mouse explant cultures and gastruloids 14,32 .  Supplementary Fig. 9b). In living embryos, the ventral and dorsal portions of somites further become the sclerotome and dermomyotome, respectively 1,3 . A couple of sclerotome markers, including TWIST1 and PAX9, and dermomyotome markers, including DMRT2 and PAX3, were expressed in the late somite cell population of somitoids 15 (Fig. 2c, d; Supplementary Fig. 9e). The sclerotome and dermomyotome markers tended to show mutually exclusive patterns (Fig. 2d). These results suggest that somites in somitoids may have started differentiating into sclerotome and dermomyotome cells, even though somitoids do not show morphological features of sclerotomes and dermomyotomes.
We further characterized the developmental stage and cell proliferation patterns of somitoids by using expression patterns of HOX gene clusters and PCNA, respectively. The PSM, somite, and late somite cells in somitoids showed the continuous expression of HOX genes until HOX9-10, suggesting that somitoids mostly recapitulate thoracic and lumbar somitogenesis ( Supplementary  Fig. 9d). The cell proliferation marker PCNA was expressed in most regions of somitoids, but the expression was weaker in late somite cells (Supplementary Fig. 9c). Consistent with this, EdU incorporation into somitoids demonstrated that cells still proliferate in the regions of NMP, PSM, and the first 1-2 somites, while cell proliferation is not active in older somites ( Supplementary Fig. 10).
Somite formation is coupled with the segmentation clock. Multiple somite-like structures can be induced even in the absence of the segmentation clock when signaling pathways are modulated 33,34 . To test whether the somitogenesis in somitoids is regulated by the segmentation clock, we monitored the expression patterns of HES7, a core gene of the molecular oscillator 27,35 . The HES7 promoter-luciferase reporter 7,8 exhibited clear collective oscillations immediately after the medium change on day 2 ( Supplementary Fig. 11a). Postponing the medium change postponed the onset of collective oscillations, suggesting that the medium change entrains the oscillation phases of individual cells ( Supplementary Fig. 11a). The HES7 reporter also showed traveling waves from the posterior PSM to the anterior PSM of somitoids (Fig. 3a, b; Supplementary Movie 4). The HES7 waves were easier to be detected on day 4 when somitogenesis was about to start (Fig. 3a, c): the HES7 oscillations in the posterior PSM preceded those in the anterior PSM, demonstrating traveling waves (Fig. 3c, d; Supplementary Fig. 11b). The oscillations and waves were disrupted by treatment with a NOTCH signaling inhibitor DAPT ( Supplementary Fig. 12), consistent with the in vivo segmentation clock 2,3,36 .
On day 6, when somitogenesis was actively happening, the waves became less visible, but the oscillations were still clear ( Fig. 3b, e). Importantly, the timings of HES7 oscillations coincided with those of somite formation: one somite or one pair of somites was produced during one oscillation cycle (Fig. 3e). The period of HES7 oscillations was~5 hours (Fig. 3f), a typical period of the human segmentation clock 2,5-9 . The period calculated from the somite formation timings was also~5 hours (Fig. 3f). These results indicate that the somite-like structures in somitoids are periodically formed according to the human segmentation clock.
Matrigel is crucial for somite formation. We next explored the conditions to generate successful somitoids. Matrigel is a mixture of ECM components, and Matrigel embedding has been demonstrated to be crucial for somite formation in mouse stem cell-derived models 14,15 . The first somites were formed in somi-toids~10 hours after Matrigel addition on day 4 ( Supplementary  Fig. 13a, b). When Matrigel was added 4 hours later than the usual timing, the first somites were formed~4 hours later than usual, suggesting that Matrigel addition is the trigger of the first somite formation ( Supplementary Fig. 13a, b). In both conditions, the first somites were formed at the~20% position of somitoids along their midline ( Supplementary Fig. 13c, d).
In the absence of Matrigel, our human somitoid protocol (  (Fig. 4b). However, the cell clusters with apical-basal polarities were small and sporadic, and they did not form sequential epithelial organizations with a consistent apicalbasal polarity (Fig. 4b, c).
Interestingly, somitoids without Matrigel still displayed a proper anterior-posterior axis, expressing a somite marker UNCX4.1 on the opposite side to a PSM marker HES7 (Fig. 4d). Another somite marker TBX18 showed a localized expression in the neck region of the hourglass. The expression levels of a MET marker N-CADHERIN 37 checked with qRT-PCR analysis were also comparable between the absence and presence of Matrigel ( Supplementary Fig. 14b). Our scRNA-seq analyses further confirmed that all the major cell types identified in the control somitoids (10% Matrigel), putative NMP, PSM, somite, and late somite cells, existed in the somitoids without Matrigel (Supplementary Fig. 15). Even some sclerotome and dermomyotome markers were similarly expressed in the absence of Matrigel ( Supplementary Fig. 15a, e). These results suggest that Matrigel is dispensable for the axis elongation and somite cell differentiation but necessary for the epithelialization and the establishment of a consistent apical-basal polarity. In embryos, the PSM and somites are surrounded by the surface ectoderm, and the ECM produced from the surface ectoderm, particularly fibronectin, is essential for somite formation 15,32,38,39 . Matrigel may act as a surrogate for the surface ectoderm that signals the apical-basal polarity to somiteforming cells. Even though fibronectin is not a major component of Matrigel 40  The somites have a preferred size. As another crucial condition in many types of organoids is the initial cell number 11

DMRT2 TWIST1
TWIST / DMRT2  Supplementary Fig. 17). Surprisingly, however, the size of somites did not increase accordingly, but it was rather constant (Fig. 4e, f; see Somite morphometry). This suggests that the somites have a preferred size, which might be determined by local cell-cell interactions 33 , the segmentation clock, or other mechanisms. Note that, however, the somitoids with large initial cell numbers tended to form multiple axes and sometimes failed to form somites (Fig. 4g), indicating an optimal initial number of 350-500 cells.
WNT signaling instructs lineage specification. The amount of WNT signaling is known to instruct NMPs to become either the neural tube or PSM/somites: 15,25,26,45 WNT activation promotes mesodermal lineage differentiation at the expense of neural lineages. Thus, we modulated the dosage of the WNT signaling activator CHIR during the initial 2 days of the somitoid protocol ( Fig. 5a). High doses of CHIR (8-10 µM) of the original protocol resulted in usual somitoids with strings of somites ( Fig. 5b; Supplementary Fig. 18a). Lowering the dosage decreased the possibility of somite formation (Fig. 5b-d; Supplementary  Fig. 18). With 5 µM CHIR, most somite-like structures either disappeared or became sporadic. Instead, an elongated, large epithelial structure was observed ( Fig. 5b; Supplementary  Fig. 18a). The elongated epithelial structure was SOX2-positive (Fig. 5e, f), suggesting that it may be closer to the neural tube. With intermediate CHIR doses (6-7 µM), both the elongated epithelial structure and somites were formed (Fig. 5b, e; Supplementary Fig. 18a). The elongated epithelial structure was occasionally flanked by bilateral somites (Fig. 5e, 6-7 µM), reminiscent of the neural tube and bilateral somites formed in mouse TLSs 15 . The elongated epithelial structure was indeed positive for neural tube markers (SOX2, SOX1, and PAX6) 15 , whereas the somite-like structures were negative ( Supplementary  Fig. 19). Note that, however, long strings of somites were rarely observed with lower CHIR doses, and sporadic or short rows of somites were often formed (Fig. 5c, d; Supplementary Fig. 18b, c, 5-7 µM). The WNT sensitivity of somitoids was so high that even a 1 µM increase in the CHIR dosage dramatically changed the outcome (Supplementary Fig. 18a). These results suggest that somite formation in somitoids needs a high amount of WNT signaling to make mesodermal lineages, namely the PSM and somites, at the expense of neural lineages (Fig. 5f).
To confirm the effect of WNT signaling on the cell lineage specification in somitoids, we barcoded the somitoids that were created with 5 µM, 7 µM, and 10 µM CHIR and performed MULTI-seq analyses 46  . As expected, the populations of putative NMP, PSM, somite, and late somite cells were identified as the major cell types in the somitoids with the high CHIR dose (Supplementary Fig. 20b, c, 10 µM). With the low CHIR dose, by contrast, neural tube markers SOX1 and PAX6 were expressed in the somitoids ( Supplementary Fig. 21, 5 µM), and the putative neural cells and NMPs were the major cell types in addition to the minor somite cell population ( Supplementary  Fig. 20b, c, 5 µM). RNA velocity demonstrated the differentiation of the neural cells from NMPs ( Supplementary Fig. 20e). The neural cell population was not detected in the somitoids with the high CHIR dose (Supplementary Fig. 20b, c, 10 µM). All these cell types, including both neural and mesodermal lineages, were detected in somitoids with the intermediate CHIR dose (Supplementary Fig. 20b, c, 7 µM). Gene ontology (GO) terms related to neural and mesodermal differentiation were enriched in the genes upregulated by 5 µM and 10 µM CHIR, respectively (Supplementary Fig. 20d; Supplementary Data 1). Thus, the MULTI-seq comparison clearly depicted the bias towards neural or mesodermal lineages depending on the low or high amount of WNT signaling, respectively, during the initial 2 days of the somitoid protocol ( Supplementary Figs. 21-22). The day 3 somitoids created with 10 µM CHIR already expressed a PSM marker TBX6 while those with 5 µM CHIR did not (Supplementary Fig. 23), suggesting that the bias in the lineage specification in somitoids already starts on day 3 when somites are yet to be formed.

Discussion
We created human somitoids that recapitulate periodic somite formation coupled with the oscillations of the segmentation clock. The somitoids elongated, and the anterior PSM cells rhythmically budded off to make pairs of somites. The resulting somite tissues were mature enough to display the caudal-rostral compartments and apical-basal polarity.
Although the somite cells in somitoids expressed a couple of sclerotome and dermomyotome markers, we did not notice morphological changes corresponding to the ventral sclerotomes and dorsal dermomyotomes of in vivo embryos. We reason that for further somite maturation, inductive signals coming from the neighboring tissues, such as the neural tube, notochord, and lateral plate mesoderm, should be essential 1 . Treating somitoids with additional growth factors and inhibitors may be a way to make more mature sclerotomes and dermomyotomes. As our somitoids displayed the neural tube-like structure flanked by somites with lower doses of the WNT activator, making several neighboring tissues altogether with somites may be an alternative way.
Like many other epithelial organoids, somitoids needed Matrigel to form somites. Matrigel is currently a 'magic powder' in the organoid field, and its precise role remains unclear. Since PSM cells differentiated into the somite fate even in the absence of Matrigel, the ECM surrogate may play a role mainly in cell epithelialization and apical-basal polarization. In addition, the fact that epithelialization itself happened with a wide range of Matrigel concentrations (5-50%) suggests that the accurate stiffness of the gel may not be important for epithelialization. Another unexpected finding regarding the somitoid protocol was that the size of somite was relatively constant irrespective of the initial cell number. This was surprising because the somite size has been proposed to scale with the size of the PSM region [47][48][49] . Further investigation of the effect of cell number on the segmentation clock, cell differentiation, cell movement, and tissue mechanics will be necessary to address the question 23,33,39,50,51 .
Even though the somitoid protocol was reproducible, it was sensitive to experimental materials and cell states. For instance, we found that the combination of a particular iPSC line, culture medium, and even U-bottom plate was crucial for somitoids. The morphology of iPSC colonies was also important. The incompatible materials and troubleshooting are summarized in a separate protocol 52 . Human somitoids should provide a novel platform to study congenital abnormalities in vertebral segmentation, including spondylocostal dysostosis (SCD) and congenital scoliosis 5,53 . We previously modeled SCD phenotypes by recapitulating the human segmentation clock from patient-derived iPSCs 7 . Whereas the in vitro segmentation clock revealed congenital defects in oscillation patterns, the new somitoid model will enable us to study defects related to somite formation and epithelialization. As somitogenesis is sensitive to developmental environments, such as hypoxia 54 , somitoids may also be useful for high-throughput assessments of environmental factors and teratogens. Another interesting direction is the inter-species comparison of somitogenesis. We have previously compared the human and mouse segmentation clocks, demonstrating the oscillation period difference between the species 8,55 . The new human somitoid model will enable the comparative study on how the tissue dynamics and morphogenesis of somite formation are regulated across species 56 . were seeded into an iMatrix-coated 3.5 cm dish in the StemFit medium containing 10 µM ROCK inhibitor, Y27632 (Sigma, Y0503). The cells were incubated at 37°C in a humidified atmosphere of 5% CO 2 . The medium was changed the next day into the StemFit medium without Y27632. The cells were passaged every 5-6 days (upon 70-80% confluency). The HES7 promoter-luciferase reporter line was described previously 7 . The fluorescent nuclear reporter (pCAG-mCherry-NLS) was introduced into the cells by using a piggyBac vector 58 and a 4D-Nucleofector (Lonza). After antibiotics selection, a stable clone was isolated. The cells were regularly tested and reported negative for mycoplasma contamination.
Generation of human somitoids. Ethical approval for this project was granted by Departament de Salut de la Generalitat de Catalunya (Carlos III Program). Once human iPSCs reached 50-70% confluency, the cells were washed with PBS(-) twice and incubated with 2 ml 0.5 mM EDTA in PBS at 37°C for 6-7 min. The cells were mechanically dissociated by pipetting and transferred into 8 ml of the N2B27 medium described below. The cell suspension was centrifuged at 152 × g for 3 min and resuspended into 10 ml of the N2B27 medium. After 1 more wash with the N2B27 medium, the supernatant was completely removed. The cell pellet was resuspended into 150-500 μl of the somitoid induction medium described below with 10 µM Y27632. The cell concentration was adjusted to 7 cells/ml with an adequate amount of the pre-warmed induction medium with Y27632. Then, 50 µl of the cell suspension (=350 cells) was aliquoted into each well of a U-Shaped-Bottom, 96-well-plate (Nunclon Sphera 96U-well plate, Thermo Scientific, 174925) by using a multichannel pipette. The 96-well-plate was centrifuged at 152 × g for 2 min for the cells to settle down on the bottom of the plate. One day after aggregate formation, 150 µl of the fresh somitoid induction medium without Y27632 was added to each well. On days 2 and 3, 150 µl of the medium was carefully removed and replaced with 150 µl of the fresh N2B27 medium without disturbing the aggregate. On day 4, 150 µl of the medium was removed from the well and replaced with 150 µl of the fresh N2B27 medium containing 10% Matrigel (growth factor reduced, Corning, 356231). The medium was not changed after the Matrigel addition. The step-by-step instructions and typical problems can be also found in the Protocol Exchange 52 .
Quantitative RT-PCR. Total RNA was extracted with RNeasy Micro Kit (Qiagen, 74004). Then cDNA was synthesized from 0.5 μg of the total RNA with Quantitect Reverse Transcription kit (Qiagen, 205311). Quantitative PCR was performed with LightCycler 480 SYBR Green I Master (Roche, 4707516001) and the gene-specific primers 9,16,59 (Supplementary Data 2) with LightCycler 480 II (Roche). The expression levels of the target genes were normalized by GAPDH expression.
Analysis of HCR somitoid images was performed using a custom-written napari 60 plugin which is available via GitHub (Fig. 5f). Images were loaded in the plugin, and the binarized image for each available channel was computed using the Otsu thresholding algorithm. Next, the masks were visually inspected to confirm the accurate segmentation and occasionally correct segmentation errors that were caused, for instance, by the low fluorescence signals in some of the images. To compute the overall volume of the somitoid, we made use of the fact that the genes chosen displayed patterns of expression that jointly covered the whole somitoid structure. An artificial channel which was the sum of all the individual fluorescence channels for the same somitoid was created, and it was binarized using the Otsu algorithm. Given the anisotropic resolution of the 3D stack (Z = 2 μm, XY = 0.391 μm), the positive pixel values within each mask were counted, and the pixel counts were converted into volumes.
EdU labeling. EdU staining was performed using the Click-iT EdU Cell Proliferation Kit (Invitrogen, C10337). Somitoids were collected with wide-bore tips into a 3.5 cm dish and incubated with 10 μM EdU in the N2B27 medium for 2 hrs in a cell culture incubator. The samples were collected with wide-bore tips into a 2 ml Eppendorf tube. After 2 washes with PBS(-), they were stained for SOX2, TBX6, and DAPI according to the IHC protocol described above. After the final step of PBST wash, they were incubated with the Click-iT reaction cocktail at room temperature for 30 min, protected from light. The samples were washed 3 times with PBST for 15 min each, and images were taken with an Opera Phenix HSC system.
Live imaging of somitoids. Bright-field images of live somitoids were taken with an Opera Phenix HSC system in the wide-field mode using a 10× air objective (NA 0.3). The focus was manually adjusted. For time-lapse imaging, the incubator module was set at 37°C and 5% CO 2 , and images were taken every 10 min.
Time-lapse imaging and quantification of HES7 oscillation. The HES7 promoter-luciferase reporter line was used 7 . A glass-bottom 3.5 cm dish was coated with 2% w/v lipidure (Amsbio, CM5206) in ethanol twice at room temperature for 5 min each to prevent somitoids from attaching to the dish. After the second coating, the remaining lipidure solution was removed, and the dish was dried at room temperature until the residual ethanol evaporated. To monitor HES7 oscillations under the microscope (Fig. 3; Supplementary Fig. 12), we put day 4 somitoids on the lipidure-coated dish with 2 ml of the fresh N2B27 medium containing 0.5 mM luciferin and 10% Matrigel. Live-luminescence imaging was performed with an LV200 luminescent microscope (Olympus, cellSens Dimension software) using a 10x objective lens (UPLXAPO10X). Images were taken every 5 or 10 min with 100 ms exposure for the bright field and 1.5 min exposure for the bioluminescence signal. To monitor a collective luciferase signal from day 1 to day 4 ( Supplementary Fig. 11a), we collected 8 somitoids and put them on a non-coated 3.5 cm dish with 2 ml of the fresh N2B27 medium containing 0.5 mM luciferin. The luciferase activity was recorded using Kronos Dio Luminometer (Atto, Kronos control software v2.3) every 10 min with 1 min exposure.
For image analyses, the ImageJ software and a Python script were used as previously described 10 . Briefly, kymographs were generated from time-lapse images filtered by Median filter and Tissue aliment filter to keep somitoids at the same position in a frame 61 . Resliced stack images were arranged from top to bottom in a temporal order. The detrended intensity and phase dynamics were calculated by pyBOAT 62 with a 50-frame window. The peak-to-peak period of HES7 oscillation was measured with Python functions detecting peaks and bottoms of HES7 oscillation. The period of somite formation was manually calculated from timelapse bright-field images.
Somite morphometry. DAPI images taken with an FV3000 confocal microscope or an Opera Phenix HSC system were used. Somite morphometric analysis was performed in a semi-automated manner using custom-written Python scripts which is available via GitHub. First, images were loaded and resized to obtain a pixel size of 1 μm and subsequently binarized, choosing an algorithm that could accurately segment the somitoid. The algorithm chosen depends on the pixel intensity histogram, and either the triangle 63 or the yen 64 algorithm worked well for all images. Next, holes and debris smaller than 10000 pixels in the binarized image were removed, and the mask was smoothened using the morphological opening operation with a disk-shaped footprint of 5 pixels radius. Then the smoothened mask was used to compute the euclidean distance transform and find its local maxima, which resulted in a disordered collection of 2D points roughly located along the midline of the somitoid. To order the points and find a smooth curve along the longitudinal axis of the somitoid, we manually selected the posterior position and the anterior position of the somitoid. The ordered points were used to generate a B-spline representation with a smoothening value of 10000, and the width of the somitoid at every point along the spline curve was computed. This approach allowed us to generate a characteristic width profile of each somitoid, in which minima and maxima correspond to the somite-to-somite edges and the maximum somite width, respectively (Fig. 1f, top). Somite morphometric parameters were then computed using the minima-to-minima distance (inter-somite distance) and the maxima of the width profile (width) (Fig. 1f, bottom). Somite area and circularity were calculated by using the somite-to-somite distance and the somite width ( Supplementary Fig. 3). Only somitoids with single somites were used for this measurement. The graphs were plotted with Python's boxplot function with the default setting. The middle line of the boxes indicates the median, and the box edges are the 25th and 75th percentiles. The maximum and minimum values are indicated by whiskers.
Comparison of somite morphometry. Confocal images of somitoids labeled with the fluorescent nuclear reporter were used. The newest somites were manually segmented, and the morphometric parameters were calculated using the ImageJ software. For the somites of human embryos, the images were obtained from the Virtual Human Embryo project 20 and the Kyoto collection, and the first 4-5 rows of somites were measured. For the somites of mouse trunk-like structures (TLSs) and CHIR99021-and LDN193189-treated TLSs (TLS-CLs) 15 , the DAPI images were provided by the authors. The somites were randomly selected and measured, as the newest somites were difficult to define in TLSs and TLS-CLs. The graphs were plotted with Python's boxplot function with the default setting. The middle line of the boxes indicates the median, and the box edges are the 25th and 75th percentiles. Values located 1.5 times outside the quartile range were defined as outliers and plotted with dots. The maximum and minimum values, excluding outliers, are indicated by whiskers (Supplementary Fig. 2a).
Cell morphometry. Phalloidin and DAPI staining image of a day 6 somitoid was taken by a confocal microscope. Nine cells were randomly chosen in 3 regions (Area 1, 2, and 3) of the image, and the cells were manually segmented. The morphometric parameters were calculated using the ImageJ software. The graphs were plotted with Python's boxplot function with the default setting. The middle line of the boxes indicates the median, and the box edges are the 25th and 75th percentiles. Values located 1.5 times outside the quartile range were defined as outliers and plotted with dots. The maximum and minimum values, excluding outliers, are indicated by whiskers (Fig. 1k).
Size measurements of somitoids with different initial cell numbers. Somitoids were created from 350 to 1000 human iPSCs that were labeled with the fluorescent nuclear reporter or DAPI, and images were taken by a confocal microscope. The body, which includes NMP and PSM regions, and the newest somites in somitoids were manually segmented. When somitoids showed multiple axes, the longest string of somites was chosen. The morphometric parameters were measured using the ImageJ software. The graphs were plotted with Python's boxplot function with the default setting. The middle line of the boxes indicates the median, and the box edges are the 25th and 75th percentiles. Values located 1.5 times outside the quartile range were defined as outliers and plotted with dots. The maximum and minimum values, excluding outliers, are indicated by whiskers (Fig. 4f).
Measurement of first somite formation. Time-lapse bright-field images of somitoids were analyzed using MOrgAna 65 . Briefly, approximately 3% of the images were manually annotated and were fed into MOrgAna to generate a classifier neural network. After the application of the network to the time-lapse images, a segmentation mask, together with morphological parameters, such as area, length, and midline, of the somitoids were obtained for every image frame in the movie (Supplementary Fig. 13a). Subsequently, the left-right (LR) position at which the first somite is formed was annotated manually in the movie. The annotated image frame determined the timing of the first somite formation ( Supplementary  Fig. 13b). The intersection point P between the LR segment and the midline of the somitoid was used to obtain the relative position of somite formation (Supplementary Fig. 13c). Finally, the angle of the first somite formation was defined as the angle between the vector orthogonal to the LR segment and the tangent vector in the intersecting position P along the midline (Supplementary Fig. 13d). MULTI-seq sample preparation. The somitoids created with different CHIR concentrations were labeled with MULTI-seq barcode oligonucleotides for sample multiplexing as described   46 . Briefly, 5 × 10 5 cells from the prepared single-cell suspension were resuspended in Cell Prep Buffer (PBS(-) containing 0.1% poly(vinyl alcohol) (PVA) and 1 mM EDTA). First, a 1:1 mixture of the cholesterol-conjugated Anchor-oligonucleotides (Anchor CMO, synthesized by Integrated DNA Technologies) and Barcode oligonucleotides with a distinct barcode for each sample (final concentration, 0.2 µM) was added and incubated on ice for 5 min. Then, the same concentration of Co-Anchor CMO (synthesized by Integrated DNA Technologies) was added and incubated for another 5 min, followed by rigorous washes with PBS containing 1% BSA 3 times. After washing, we counted the number of cells in each sample and combined them so that the multiplexed suspension contained the same numbers of cells from each sample. The combined sample was filtered through a 35 µm cell strainer and counted again before barcoding below. The MULTI-seq barcode sequences used in this study are listed in Supplementary Data 3.
Barcoding and sequencing. Transcripts of each cell in the single-cell suspension were barcoded with Chromium Controller (10× Genomics, firmware version 4.00). The reagent system used in this study was Chromium Single Cell 3′ GEM, Library & Gel Bead Kit v3 (10× Genomics) and a Chromium Single Cell B Chip Kit (10× Genomics). Barcoding and cDNA library construction were performed according to the manufacturer's instructions. After the cDNA amplification step, the barcode fraction was collected, amplified, and single-indexed with KAPA HiFi HotStart ReadyMix (Roche, #KK2601) for sequencing. Both finished cDNA and MULTI-seq-barcode libraries were sequenced with NextSeq500 (Illumina). We read 8 base pairs (bp) for TruSeq Indices, 28 bp for 10× barcodes and unique molecular identifiers (UMIs), and 56 bp for both fragmented cDNA and MULTI-seq barcodes.
scRNA-seq data analyses Read alignment. Sequenced reads were aligned to the human genome (GRCh38) and counted to generate the feature-barcode matrices with the CellRanger pipeline (v.6.1.1, 10× Genomics). The reads having the same UMI were collapsed as a single count. The basic statistics of sequencing results are shown in Supplementary Data 3.
Quality control (QC). The quality control of the single-cell transcriptome data was performed with R version 4.1.2 66 with the package Seurat (v.4.0.5) 67 as follows: First, we excluded the droplet barcodes that had less than 200 UMIs and the undetected genes that were found less than 3 times in an entire sample. Then, we identified cell barcodes based on scatter plots of detected gene counts against the proportion of mitochondrial gene expression in each cell ( Supplementary Fig. 24). The areas demarcated by the red polygons were filtered as cells and used for further analyses. After this initial filtering, potential cell doublets were removed with Scrublet (v.0.2.3) 68 except the MULTI-seq sample. Doublets in the MULTI-seq sample were removed based on the barcode information. The basic statistics of the filtered cells are shown in Supplementary Data 3.
Demultiplexing of the MULTI-seq sample. We counted MUTLI-seq barcode reads associated with the filtered cell barcodes by utilizing the R package deMULTIplex (v.1.0.2) 46 , and assigned each cell to one of the following classes: a singlet of one of the original 3 samples, a doublet, or a label-negative cell with Seurat. The summary of the demultiplexing results is shown in Supplementary Data 3. We kept only singlets for further analysis.
Data normalization. The raw UMI counts of the QC-filtered cells were normalized in each sample according to a deconvolution approach based on pool-based size factors 69 with the R package scran (v.1.22.1) 70 . Briefly, cells were clustered with approximation and pooled as blocks, and the size factor to correct the library size of each cell was calculated with the calculateSumFactors function, taking into account the size factor of the block where the cell belonged. Finally, raw counts of each cell were normalized based on the size factor and Log2-transformed with the logNormCounts function. These scores appear as 'Log expression level'. To average the expression levels within cell populations, we used the R packages scuttle (v.1.4.0) 71 for heatmaps and scater (v.1.22.0) 71 for dot plots.
Identification of highly variable genes (HVGs). After normalization, highly variable genes were chosen with the modelGeneVarByPoisson function in the scran package, which decomposed the total variance of each gene's expressions into biological and technical components by assuming the Poisson distribution for the technical noise. Genes with the biological variance > 0.1 were chosen as HVGs for further dimension reduction and data integration. The number of HVGs were 1,618, 1,567, and 1,304 for 10% Matrigel, W/O Matrigel, and CHIR MULTI-seq samples, respectively.

Data integration and visualization.
To integrate all three data sets, we first scaled the normalized data according to the whole library size of each sample with the multiBatchNorm function in the R package batchelor (v.1.10.0) 72 . HVGs across data sets were selected as the genes with mean biological variances > 0.1 (n = 1,486) for principal component analysis (PCA). We utilized a mutual nearest neighbor approach by fastMNN 72 in the batchelor package for data integration considering the first 20 dimensions of PCA. The corrected results were used for visualization of all samples in a single plot of uniform manifold approximation and projection (UMAP) 73 with the Seurat package. This UMAP embedding was also used for all plots of individual samples simply by splitting the plot sample by sample. The gene expression levels shown in UMAP plots are Log2-transformed normalized values of individual samples or the scaled values with themultiBatchNorm function when multiple samples were involved.
Clustering and marker gene detection. To define the cell population clusters, a rankweighted shared-nearest neighbor graph was constructed for the integrated data with the R bluster (v1.4.0) 74 package, and the Leiden algorithm 75 was applied for modularity optimization. Marker genes of each cluster were detected with the scran package by calculating an area under the curve (AUC) for each gene as an index of the performance to distinguish two clusters in a pairwise comparison, and the minimum AUC against other clusters was taken as a summarizing score of each gene. The genes with the highest minimum AUC were chosen as marker genes of each cluster, and we annotated each cluster based on them and also the expression of other known marker genes. The top 10 markers of each cluster were shown in Supplementary Fig. 8b. This annotation, based on the integrated data set across the three samples, was maintained in the analysis of individual samples to keep consistency in the paper.
Gene ontology (GO) enrichment analysis. We calculated AUC by comparing the CHIR 5 µM and 10 µM groups and chose 100 upregulated genes with the highest AUC from each group as inputs. The GO enrichment analysis and visualization of the results were performed with the clusterProfiler package (v.4.2.2) 78 in R.
Statistics and reproducibility. The number of samples and independent experiments are given in the figure legends. No statistical method was used to predetermine sample size. If cells did not aggregate on day 1 of the somitoid protocol due to cell culture conditions, the experiment was discontinued. The experiments were not randomized. The Investigators were not blinded to allocation during experiments and outcome assessment.
Reporting summary. Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Data availability
The scRNA-seq data used in this study are available in the ArrayExpress database under accession code MTAB-11334. Source data are provided with this paper.

Code availability
The custom Python scripts used for the somite morphometry and HCR volume analysis are available from Nikoula86's Github [github.com/Nikoula86/2022_Somitoids_ Analysis].