Large-scale genome sequencing is poised to provide a substantial increase in the rate of discovery of disease-associated mutations, but the functional interpretation of such mutations remains challenging. Here we show that deletions of a sequence on human chromosome 16 that we term the intestine-critical region (ICR) cause intractable congenital diarrhoea in infants1,2. Reporter assays in transgenic mice show that the ICR contains a regulatory sequence that activates transcription during the development of the gastrointestinal system. Targeted deletion of the ICR in mice caused symptoms that recapitulated the human condition. Transcriptome analysis revealed that an unannotated open reading frame (Percc1) flanks the regulatory sequence, and the expression of this gene was lost in the developing gut of mice that lacked the ICR. Percc1-knockout mice displayed phenotypes similar to those observed upon ICR deletion in mice and patients, whereas an ICR-driven Percc1 transgene was sufficient to rescue the phenotypes found in mice that lacked the ICR. Together, our results identify a gene that is critical for intestinal function and underscore the need for targeted in vivo studies to interpret the growing number of clinical genetic findings that do not affect known protein-coding genes.
Access optionsAccess options
Subscribe to Journal
Get full journal access for 1 year
only $3.90 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Rent or Buy article
Get time limited or full article access on ReadCube.
All prices are NET prices.
All RNA-seq data used in this study have been deposited in the GEO repository (National Center for Biotechnology Information). The files are accessible through the GEO accession number GSE94245. The cDNA and predicted protein sequence for Percc1 are available in GenBank (record KY964488). All other relevant data are available from the corresponding authors on request.
Avery, G. B., Villavicencio, O., Lilly, J. R. & Randolph, J. G. Intractable diarrhea in early infancy. Pediatrics 41, 712–722 (1968).
Straussberg, R. et al. Congenital intractable diarrhea of infancy in Iraqi Jews. Clin. Genet. 51, 98–101 (1997).
Bamshad, M. J. et al. Exome sequencing as a tool for Mendelian disease gene discovery. Nat. Rev. Genet. 12, 745–755 (2011).
Canani, R. B. & Terrin, G. Recent progress in congenital diarrheal disorders. Curr. Gastroenterol. Rep. 13, 257–264 (2011).
Qu, H. & Fang, X. A brief review on the Human Encyclopedia of DNA Elements (ENCODE) project. Genomics Proteomics Bioinformatics 11, 135–141 (2013).
Calo, E. & Wysocka, J. Modification of enhancer chromatin: what, how, and why? Mol. Cell 49, 825–837 (2013).
Eeckhoute, J. et al. Cell-type selective chromatin remodeling defines the active subset of FOXA1-bound enhancers. Genome Res. 19, 372–380 (2009).
Pennacchio, L. A. et al. In vivo enhancer analysis of human conserved non-coding sequences. Nature 444, 499–502 (2006).
Dimaline, R. & Varro, A. Novel roles of gastrin. J. Physiol. 592, 2951–2958 (2014).
Barker, N. et al. Lgr5+ve stem cells drive self-renewal in the stomach and build long-lived gastric units in vitro. Cell Stem Cell 6, 25–36 (2010).
Haber, A. L. et al. A single-cell survey of the small intestinal epithelium. Nature 551, 333–339 (2017).
Helander, H. F. & Fändriks, L. The enteroendocrine “letter cells” – time for a new nomenclature? Scand. J. Gastroenterol. 47, 3–12 (2012).
Spence, J. R. et al. Directed differentiation of human pluripotent stem cells into intestinal tissue in vitro. Nature 470, 105–109 (2011).
Mellitzer, G. et al. Loss of enteroendocrine cells in mice alters lipid absorption and glucose homeostasis and impairs postnatal survival. J. Clin. Invest. 120, 1708–1721 (2010).
Thiagarajah, J. R. et al. Advances in evaluation of chronic diarrhea in infants. Gastroenterology 154, 2045–2059 (2018).
Osterwalder, M. et al. Enhancer redundancy provides phenotypic robustness in mammalian development. Nature 554, 239–243 (2018).
Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 25, 1754–1760 (2009).
Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).
Ge, D. et al. SVA: software for annotating and visualizing sequenced human genomes. Bioinformatics 27, 1998–2000 (2011).
Zhu, M. et al. Using ERDS to infer copy-number variants in high-coverage genomes. Am. J. Hum. Genet. 91, 408–421 (2012).
Trapnell, C. et al. Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks. Nat. Protocols 7, 562–578 (2012).
Bockenhauer, D. et al. Epilepsy, ataxia, sensorineural deafness, tubulopathy, and KCNJ10 mutations. N. Engl. J. Med. 360, 1960–1970 (2009).
Purcell, S. et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 81, 559–575 (2007).
Huang, W., Sherman, B. T. & Lempicki, R. A. Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nat. Protocols 4, 44–57 (2009).
Lindemann, S. R. et al. The epsomitic phototrophic microbial mat of Hot Lake, Washington: community structural responses to seasonal cycling. Front. Microbiol. 4, 323 (2013).
Wang, H. et al. One-step generation of mice carrying mutations in multiple genes by CRISPR/Cas-mediated genome engineering. Cell 153, 910–918 (2013).
Yang, H. et al. One-step generation of mice carrying reporter and conditional alleles by CRISPR/Cas-mediated genome engineering. Cell 154, 1370–1379 (2013).
Mali, P. et al. RNA-guided human genome engineering via Cas9. Science 339, 823–826 (2013).
Kvon, E. Z. et al. Progressive loss of function in a limb enhancer during snake evolution. Cell 167, 633–642 (2016).
Barrett, T. et al. NCBI GEO: archive for functional genomics data sets—update. Nucleic Acids Res. 41, D991–D995 (2013).
Warlich, E. et al. Lentiviral vector design and imaging approaches to visualize the early stages of cellular reprogramming. Mol. Ther. 19, 782–789 (2011).
McCracken, K. W., Howell, J. C., Wells, J. M. & Spence, J. R. Generating human intestinal tissue from pluripotent stem cells in vitro. Nat. Protocols 6, 1920–1928 (2011).
Glusman, G., Caballero, J., Mauldin, D. E., Hood, L. & Roach, J. C. Kaviar: an accessible system for testing SNV novelty. Bioinformatics 27, 3216–3217 (2011).
The authors thank the patients and their families for their cooperation and support. This work was supported by grants to D.L. from the SysKid EU FP7 project (241544), the Wolfson Family Charitable Trust and the Crown Human Genome Center at the Weizmann Institute of Science. A.V. and L.A.P. were supported by NHLBI grant R24HL123879 and NHGRI grants R01HG003988, U54HG006997 and UM1HG009421; J.M.W. and Y.A. were supported by the Cincinnati Children’s Hospital and Sheba Medical Center’s Joint Research Fund; J.M.W. and M.F.K. were supported by NIH grants 1R01DK092456 and 1U18NS080815 as well as a digestive disease center grant (P30 DK0789392); R.K. was supported by the David and Elaine Potter Charitable Foundation; B.L.B. was supported by NIH grants HL089707, HL064658 and HL136182; and I. Barozzi was funded through an Imperial College Research Fellowship. Research was conducted at the E. O. Lawrence Berkeley National Laboratory and performed under the Department of Energy contract DE-AC02-05CH11231 (University of California). iPS cell lines were generated in collaboration with the Cincinnati Children’s Pluripotent Stem Cell Facility. This work was performed in partial fulfilment of the requirements for a PhD degree for D.O.-L. (Weizmann Institute of Science, Rehovot, Israel) and I.B.-J. (The Sackler Faculty of Medicine, Tel Aviv University, Israel).
The authors declare no competing interests.
Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data figures and tables
Filled black symbols represent affected individuals, and deletion genotypes are indicated in red. WES was done for individuals 1.1, 2.1, 3.1, 4.1 and 4.2, WGS was done for individual 2.1 and transcriptome analysis was done for individuals 2.1 and 2.4. Patient 1.1 (marked with an asterisk) was found to have uniparental disomy.
a, Analysis of the SNP genotyping that was performed on 6 of the patients in families 1–5 and their 22 relatives detected a single significant telomeric linkage interval on chr16 with a maximum LOD score of 4.26. Haplotype reconstruction confirmed this interval, with flanking marker rs207435 (chr16: 2,984,868), and showed two distinct disease haplotypes either in a homozygous setting (in affected individuals for disease allele 1 (that is, ΔL) in families 2, 3, 5) or in a compound heterozygous setting (in affected individuals for disease alleles 1 and 2 (that is, ΔS) in family 4). All the affected individuals who carried disease allele 1 showed an identical disease haplotype from rs533184 (chr16: 1,155,025) to rs397435 (chr16: 2,010,138). b, Schematic of reads covering exons in the C16orf91 gene, for the five exome-sequenced patients and for three unaffected controls who underwent sequencing under identical conditions. The first three patients (individuals 1.1, 2.1 and 3.1), who had a chr16ΔL/ΔL genotype, had zero coverage in the three upstream exons (right). The last two patients (individuals 4.1 and 4.2), who had a chr16ΔL/ΔS genotype, had non-zero coverage in these exons, but coverage was lower than in controls. All subjects had high coverage in the downstream exons (left). Numbers indicate the scale in sequencing reads per base.
a, Overview of targeting approach. See Methods for details. b, Genotyping results obtained from genomic DNA (n = 554) isolated from the tails of homozygous and heterozygous ICR-knockout (ΔICR) mice, compared to a wild-type control. See Methods for primers and details. c, Percc1 expression derived from RNA-seq from control littermates (left) and knockout mice (right). Tissues and time points are indicated to the left of each plot.
a, Modified intestinal content in wild-type mice (left) compared to chr17ΔICR/ΔICR mice (right; n = 45) at P10. b–d, ICR deletion causes changes in intestinal and faecal microbiome composition. Microbial communities in different intestinal compartments and faeces were analysed by 16S rRNA-based sequence profiling. b, Family-level relative abundance profiles of the top 15 most abundant prokaryotic families for wild-type (n = 22) and chr17ΔICR/ΔICR (n = 21) intestinal and faecal samples, organized by sample type. The most pronounced changes were observed in colon and faecal samples. c, Heat map of log-transformed read counts for those genera that exhibited the greatest variance (top 60%) across all faecal samples. The abundance profiles exhibit perfect clustering of the faecal samples (rows) into wild-type (n = 6) and chr17ΔICR/ΔICR (n = 7) groups. d, Bar charts of Shannon’s diversity for all faecal samples from b, grouped into wild-type and chr17ΔICR/ΔICR samples.
Extended Data Fig. 5 Gastrointestinal X-gal staining of ICR-reporter transgenic embryos compared to Percc1 mRNA in situ hybridization.
a, b, Cross-sections of E14.5 mouse tissues with a β-galactosidase ICR-driven transgene. c, d, Percc1 mRNA in situ hybridization analysis on E14.5 wild-type sections. For X-gal staining and in situ hybridization experiments, two embryos for each experiment and each condition were collected at E14.5 and a minimum of three sections from each embryo were examined. Representative sections are shown.
a, Comparison of weight in Percc1-knockout mice (n = 38; red) and littermate controls (n = 25; blue), showing that Percc1-knockout mice have reduced body weight. Percc1-knockout mice were generated in an FVB/N genetic background. b, Percc1 transgenic rescue of the body weight phenotype that is found in chr17ΔICR/ΔICR mice. An 8.5-kb Percc1 mini gene was constructed (Supplementary Table 10) and used to generate a Percc1 mouse line that overexpressed PERCC1. When this transgene was introduced into the chr17ΔICR/ΔICR mouse genetic background, we observed the rescue of all the phenotypes (including the severe reduction in body weight) that were found in chr17ΔICR/ΔICR mice. Chr17ΔICR/ΔICR mice were generated in a mixed 129/C57Bl6 background. P values were determined using a two-tailed t-test; n.s. indicates a P value of 0.8–1.0. Lines show the mean and shaded areas represent ±1 s.d.
a, Western blot analysis of PERCC1–mCherry fusion protein. Two stable transgenic lines (B3269 and B3309) were established through standard pronuclear microinjection of fertilized mouse eggs. Protein extracts from juvenile mice (P13–P14) were separated by SDS–PAGE and transferred for western hybridization. Lanes: 1, molecular mass marker (M); 2 and 3, line B3269; 4 and 5, line B3309; 6, wild-type control; 7, mCherry positive control; 8, molecular mass marker. mCherry is predicted to be 28.8 kDa and the PERCC1–mCherry fusion protein is predicted to be 59 kDa, with both proteins running about 5 kDa larger. Line B3309 does not express the fusion protein, in contrast to line B3269 (probably owing to a position effect). These experiments were performed four times. b–e, Identification of cells with PERCC1+ identity, and the effect of PERCC1 ablation in gastrointestinal tissues. b, A subpopulation of PERCC1+ cells (red) in the corpus epithelium (mucosa) expresses SYP (green) at P8. Arrowheads mark double-positive cells. Arrows mark a minor fraction of PERCC1+ cells that were detected in longitudinal smooth muscle (lSM). DAPI-stained nuclei are shown in blue. c, Dispersed PERCC1+ cells (red) are observed in the villi of the duodenum at P8. Top, cross-section through villi illustrates the absence of endocrine identity (green) in these cells. Bottom, sagittal section showing the distribution of PERCC1+ cells in the epithelium of villi (CDH1; green). d, Top left, schematic depicting the anatomical compartments of the distal stomach and the location of sections used for cell counting. Top right, reduction of the fraction of G cells observed predominantly in the pyloric antrum of Percc1-deficient chr17ΔICR/ΔICR) mice at P8. Box plots indicate median (centre line), interquartile values (box limits), range (whiskers), outliers (circled dots) and individual biological replicates (dots). P values were determined using an unpaired two-tailed t-test. Bottom, comparative immunofluorescence analysis illustrating the reduced number of gastrin-expressing cells (red) in the absence of Percc1 (chr17ΔICR/ΔICR) in the antrum at P8. SYP-expressing endocrine cells are green and nuclei are grey. e, Immunofluorescence from HIOs derived from control (ICR+/+) and patient (ICRΔL/ΔL) iPS cell lines. Detection of anti-FOXA2 (blue) and anti-CDH1 (red) was used to visualize the HIO epithelium, and EECs were localized at 21 and 42 days on the basis of SYP expression (green) and counted. The average number of SYP+ cells (NSYP+) per 1,000 epithelial (Epi) cells from cell counts in n = 2 technical replicates from independent HIO preparations is indicated (P = 1.75 × 10−18 for reduced number of SYP+ cells in ICRΔL/ΔL HIOs; Fisher’s exact test). n represents independent biological replicates with similar results. Scale bars, 50 μm.
a, Left, bar chart showing the fraction of the total cells profiled in a previous study11 (n = 11,665) that was assigned to each one of the major cell types identified. Right, the same information, but limited to those cells that express Percc1 (n = 8). b, Same as a but limited to EECs. P values were calculated using a chi-squared test, using data from the corresponding left panel as reference (a, b). EP, epithelial, TA, transit-amplifying. c, Box plots showing the distributions of the normalized gene-expression values for known EEC-associated transcription factors and hormones in the eight Percc1-positive cells from a. Box plots indicate median (centre line), interquartile range (IQR; box limits) and 1.5 × IQR (whiskers).
Extended Data Fig. 9 Validation of human RNA-seq data by RT–qPCR in duodenal tissue from two different patients and control tissue.
Pairwise comparison of the relative gene-expression levels of six peptide hormones (cholecystokinin (CCK), gastrin (GAST), glucagon (GCG), gastric inhibitory polypeptide (GIP), neurotensin (NTS) and somatostatin (SST)) in duodenal tissue from patients and normal duodenal tissue (control; represented as 1). Relative expression levels for patients represent the average between two patients (patients 1.1 and 5.1).
a, HIOs generated from an affected patient, a carrier and an unaffected sibling all show normal morphology. Differentiation into HIOs was performed in duplicate with qPCR and histological analyses that yielded similar results. b, iPS cell lines from an affected patient, a carrier and an unaffected sibling display a normal karyotype. This was a single experiment for each sample, as an assessment of quality control.
This file contains additional information on IDIS patients including genetic and transcriptome results and Supplementary Tables 5-11.
Differentially expressed genes (DEGs) in stomach and intestine of P10 mice and human patients. For each species and organ under consideration, a list of up- and down-regulated genes is provided. For each DEG, the gene symbol along with the RPKM (Reads Per Kilobase per Million mapped reads) in KO and WT individuals, the log2-fold-change, the p-value (estimated using Cuffdiff, n = 1 biological replicates) and the q-value (Benjamini-Hochberg) are reported.
Intersection of DEGs in the mouse with the corresponding murine orthologs of the human DEGs. For each organ and separately for up- and down-regulated genes, those DEGs common to both species are listed along with those unique for each species. p-values indicating the probability of observing an equal or better overlap by chance are provided (hypergeometric test).
Functional enrichment analysis of DEGs. For each species and separately for up- and down-regulated genes, functional enrichment was run using DAVID (https://david.ncifcrf.gov). The results of Functional Annotation Clustering (default parameters) are provided. The most representative GO and KEGG terms are highlighted in red. These terms were used for the prioritization strategy reported in Table 1. P-value estimated using the modified Fisher’s exact test (EASE Score).
Clinical characteristics of congenital diarrhea patients with deleted enhancer.