Anorectal malformations (ARMs) constitute a variety of congenital defects involving improper formation of the anus and rectum during embryogenesis, with a reported population frequency of 2–5 per 10,000 births [1]. While mortality due to ARMs is considered low (estimated between 3–16% with the highest rates among children born with ARM and other congenital anomalies), co-morbid conditions associated with survival can be extensive and persistent throughout life [2]. ARMs may present as an isolated birth defect (estimated as 35–55% of cases [1]), as a feature of established genetic syndromes and chromosome imbalances, or as a component of ‘‘VACTERL’’ complex of associated birth defects (Vertebral anomalies, Anorectal Malformations, Cardiac anomalies, TracheoEsophageal fistula, Renal anomalies, and Limb anomalies) [1]. VACTERL is defined as the presence of at least three of these birth defects, occurs in 1 out of 10,000–40,000 births [3, 4], and has been estimated to occur in 10–15% of all ARMs cases [2]. Very few genetic mutations have been associated with ARMs outside of associated syndromes with known genetic origins [2].

Estimates of familial recurrence of ARMs and associated anomalies have been derived from several different populations, generally indicating higher than population risk for first-degree relatives of probands [1, 2, 5,6,7]. The findings of these studies provide strong evidence of a genetic, and in some instances heritable, component to ARMs. However, the lack of robust genealogical data to adequately define relationships between subjects is likely to introduce bias to these estimates. Here, we have applied well-established familial aggregation methods to a population genealogy of Utah to provide unbiased familial risk estimates for ARM and associated anomalies in an effort to guide future efforts to investigate the genetic basis of ARMs and to improve clinical interpretation of recurrence risk for this defect.

Materials and methods

Utah Population Data Base (UPDB)

The UPDB contains a genealogy of the pioneers who settled in Utah in the mid-1800s and their descendants. The genealogy was originally compiled from ~180,000 three-generation family group sheets collected by the Church of Jesus Christ of Latter Day Saints (e.g., Mormons) in the 1970s and included ~1.6 M individuals spanning six generations [8]. The genealogy has since been extended to the present time via Utah Vital Statistics records, and now includes ~3 million individuals with at least three generations of genealogy, which were used for genealogical analysis. The UPDB genealogy has been record-linked to the Enterprise Data Warehouses (EDWs) of the two largest healthcare providers in Utah, the University of Utah Health Sciences Center (UUHSC) and Intermountain Healthcare (IM), with electronic health records available from 1994–2014. Approximately 760,000 University of Utah patients and 1.7 million Intermountain Healthcare patients intersect the ~3 million individuals with three generations of genealogy in the UPDB [9].

ARM cases and co-morbid anomalies

International Classification of Diseases, Ninth Revision (ICD-9) diagnostic codes from UUHSC and IM EDWs from records spanning 1994–2014 were used to identify individuals diagnosed with ARMs and frequently associated birth defects. Table 1 provides the ICD-9 codes and numbers of cases identified with linked genealogy data. Codes that are consistent with the VACTERL complex of associated diseases (V-Vertebrae, A-anorectal malformation, C-cardiac, TE-trachea-esophageal, R-renal, L-limb) are noted in Table 1 by the uppercase letter corresponding to each component. This study was approved by the University of Utah Institutional Review Board.

Table 1 International Classification of Diseases, Ninth Revision (ICD-9) codes used to identify patients with anorectal malformations and classically associated diseases for individuals with linked genealogy data

Genealogical index of familiality

The genealogical index of familiality (GIF) is a well-established test for excess relatedness (familial clustering) in a set of individuals [9,10,11]. The GIF statistic is the averaged pair-wise kinship coefficient of every pair of individuals in a set and is used to estimate the average relatedness of sets of subjects. The kinship coefficient is defined as the probability that randomly selected homologous alleles from two individuals are identical by descent from a common ancestor, estimated from the number of meioses (e.g., birth events) separating a pair of individuals according to the genealogy data. The GIF statistic is multiplied by 105 for ease of presentation. To test for excess relatedness, the GIF statistic is computed for case subjects and then compared in distribution to the GIF statistics derived from 1000 randomly selected sets of controls sampled from the UPDB genealogy and matched to cases on 5-year birth cohorts, sex, and place of birth (in or out of Utah). Comparison of the case GIF to the distribution of the 1000 sets of matched controls provides an empirical test for excess relatedness of case subjects. A more restrictive test for excess relatedness can be achieved by ignoring first- and second-degree relationships and comparing only more distant relationships (referred to as the distant GIF or dGIF test), where it is assumed that fewer shared environmental factors are present [9]. The contribution to the GIF statistic at each genetic distance, defined as the number of meioses separating a pair of individuals, can be plotted for cases and controls, comparing the cases to the average of the matched control sets, where a genetic distance (number of meioses) of 1 corresponds to parent/offspring, 2 primarily to siblings, 3 primarily to aunt or uncle or nephew or niece, 4 primarily to first cousins, and so forth (see Fig. 1).

Fig. 1
figure 1

Contribution to the GIF statistic by pair-wise genetic distance (number of meioses) for 255 anorectal malformation cases (solid line) compared to the average of 1000 sets of matched controls (dashed line)

Relative risk in relatives

The relative risk (RR) statistic for a specified set of relatives (first-degree relatives, second-degree relatives, etc) is the ratio of the observed number of affected relatives of cases to the expected number of affected relatives [10,11,12,13]. Care is taken in counting the affected number of relatives to avoid duplication. The expected number of affected relatives is based on the population rate of disease adjusting for 5-year birth windows, sex, and place of birth (Utah or not). Disease rates are estimated by assigning all individuals with genealogical and medical record data to cohorts based on 5-year birth windows, sex, and place of birth (Utah or not). Then, a cohort-specific disease rate is estimated as the number of cases in each cohort divided by the number of individuals. The expected number of affected relatives is calculated by multiplying the total number of relatives of cases in each cohort by the cohort-specific disease rate, summing over all cohorts. Assuming that the observed number of cases follows a Poisson distribution with mean equal to the expected value [12] provides a test for significance of the RR statistic and a 95% confidence interval. Relative risks were estimated for various categories of relatives, including all first-, second-, third-, and fourth-degree relatives, as well as types of first-degree relatives including siblings and children. Relative risks were estimated for ARMs and for various birth defects previously associated with ARMs, for first-, second-, and third-degree relatives. Nominal p-values are reported without correction for multiple tests.


Of the 255 ARMs cases with genealogy data, 130 (51%) were male. Ninety-four (37%) ARMs occurred as isolated defects, 92 (36%) had one associated anomaly, and 68 (27%) had three or more anomalies (range 3–9) consistent with the VACTERL syndrome. In addition, 44 ARM cases (17%) occurred with other genetic syndromes.

Genealogical index of familiality

The GIF test identified significant excess pair-wise relatedness among the set of ARM cases (case GIF = 4.03, mean control GIF = 1.48, p < 0.001). The dGIF test, which ignores first- and second-degree relationships, did not identify excess relatedness among ARM cases (case GIF = 1.08, mean control GIF = 1.18, p = 0.559), indicating that observed excess relatedness occurred among close relatives. A plot of the contribution to the GIF statistic at different genetic distances is given in Fig. 1 and shows clear excess of pair-wise relationships among cases at genetic distances 2 (siblings or grandparent/grandchild) and 4 (mostly first cousins), while for the remainder of the genetic distances pair-wise relatedness of the cases is similar to that observed for controls. None of the ARM cases presenting with ARM alone were related closer than eight meioses (births) apart; close relationships that were observed involved patients with other anomalies (data not shown).

Relative risk

RR estimates for ARMs among relatives of ARMs cases are shown in Table 2 for various categories of relationships. RR estimates in first-degree relatives of cases, including siblings, ranged from 14.4–20.8. Significantly elevated RRs were also observed for all third-degree relatives of cases (RR = 6.5, p = 1.2e-3). No affected second-degree relatives were identified. In addition, no children from 255 ARM cases were observed, which is most likely due to censoring from the limited diagnostic interval 1994–2014.

Table 2 Estimated relative risks (RR) in relatives for anorectal malformations (n = 255)

The RR statistic can be interpreted as the excess risk above population level risk that is a result of being related to an individual with ARM. Taking ‘‘siblings’’ as an example, the RR estimate for a sibling of an ARM case is 16.1, indicating that a sibling of an ARM case is 16.1 times more likely to have an ARM than a similar individual (birth year and sex) selected at random from the population. Using the population frequency of ARMs (1 in 5000, or 0.02%) to estimate population level risk, siblings of ARM cases have a 0.3% (0.02% x 16.1) probability of having ARM themselves.

Relative risks of other associated anomalies in ARM cases

Table 3 shows RR estimates in ARMs cases for birth defects that have been previously reported to co-occur with ARMs. Significantly elevated risks were observed for all birth defects tested (p < 0.001) with the exception of omphalocele.

Table 3 Relative risks (RRs) for other birth defects in 255 anorectal malformation cases with linked genealogy data

Relative risks of other associated anomalies in first-degree relatives of ARM cases

Table 4 shows RRs estimates in first-degree relatives of ARM cases for birth defects that have been previously reported to co-occur with ARMs. While elevated risk to first-degree relatives of ARM cases is present for some of these birth defects, no significantly elevated risks were observed after correcting for multiple tests (p < 0.001).

Table 4 Relative risks (RRs) for birth defects in 883 first-degree relatives of 255 anorectal malformations cases

High-risk pedigrees

We identified all clusters of ARMs cases that occur among the descendants of a pair of common ancestors (founders) in the UPDB genealogy. The majority of clusters containing multiple ARMs cases contained only distantly related ARMs cases (at least six meioses apart), a portion of the relatedness spectrum where a statistical excess of familial clustering was not observed for ARMs using the dGIF statistic. Several sets of close relatives were identified: two pairs of siblings having no other affected relatives, and a pair of siblings with two affected cousins.


Our analysis of the familial clustering of 255 ARM cases, conducted with a population genealogy of Utah record-linked to the two largest healthcare providers in Utah, provides evidence of a genetic contribution to ARMs, statistically confirms association of many birth defects previously reported to be associated with ARMs, and provides unbiased risk estimates for various categories of relatives of ARMs cases. While the GIF analysis clearly indicates excess relatedness among cases, the dGIF analysis (ignoring first- and second-degree relatives) indicates that this excess occurs among close relatives and not among more distant relatives; a feature that is emphasized by the diminishing risk estimates for ARMs among categories of more distant relatives. Taken together, these outcomes suggest that a heritable component to ARMs may exist for some cases, whereas others are more likely explained by the presence of de novo mutations or environmental factors.

ARMs are highly associated with many different genetic syndromes including Townes-Brocks syndrome, Currarino triad, Pallister-Hall syndrome, Johanson-Blizzard syndrome, McKusick-Kaufman syndrome, and X-linked heterotaxy [1]. Furthermore, ARMs frequently co-occur with various chromosome abnormalities (aneuploidy) including Trisomy 21 (Down syndrome) as the most frequently encountered, followed by 5p-deletion syndrome (Cru-du-chat syndrome), trisomy 18, trisomy 13, and several others which are less frequently encountered [14]. Our analysis identified 44 ARM cases (17% of total) with other confirmed genetic syndromes, which is a slightly higher than the 4.5–11% previously reported [14].

VACTERL has been estimated to occur in 10–15% of all ARMs cases where VACTERL is defined as having at least 3 of the 7 observed features [2]. By this definition, 94 (37%) of the 255 ARM cases analyzed here met criteria for the VACTERL complex. In a recent survey of 203 ARMs patients of European origin, 72% of ARM cases had at least one VACTERL-associated anomaly [15]. Our study detected one associated anomaly in only 36% of ARM cases, which suggests either differential manifestations of VACTERL in different populations or varying sensitivity of data capture methods between studies.

An interesting finding of our study was the association of ARM with Hirschsprung’s disease. Eighteen ARM cases (7%) were also diagnosed with Hirschsprung’s disease, providing significant evidence for this association (p = 1.4e-18). This association was higher than the 3.4% reported by Hofmann et al. [16]. We also observed elevated, although non-significant, risk of Hirchsprung’s disease among first-degree relatives of ARM cases. These outcomes may indicate some genetic component resulting in both ARM and Hirchsprung’s.

Population estimates of the frequency of ARM as an isolated defect vary widely: 9% [15], 35–55% [14], and up to 90% of cases [2]. Our analysis identified 94 ARM cases (37%) occurring as an isolated defect. The variation of this parameter estimate between studies may be due to differential data censoring between studies. The male to female ratio of ARMs in our study population was essentially equivalent, which is similar to previous reports [2, 15].

Two previous attempts have been made to estimate recurrence risks of ARM in relatives. An analysis of the Colorectal Center for Children of Cincinnati Children’s Hospital Medical Center database identified several instances of familial aggregation of ARMs including ten sibling pairs (no other relatives) and nine other multigenerational families with multiple affected individuals. From this data, the authors estimated a recurrence risk for siblings of 4% [1]. The relative risk of recurrence for siblings among the Utah population was much lower at 0.3%. Another survey of 1.9 million people with one or more relatives in the Danish Family Relations Database containing 764 anal/rectal defects found that individuals with a family history (first-degree relatives) of anal/rectal defects had much higher risk (RR = 10.3; 95% CI 2.6-41.1) for a similar defect [5], which is similar to our first-degree RR estimate (15.6).

Potential limitations must be noted. First, the reliance on ICD-9 diagnosis codes from medical encounters to identify ARM cases creates the possibility of bias due to the manner in which codes are assigned to patients by different clinicians. A further limitation of the ICD coding scheme is its inability to provide granularity in the phenotypic spectrum of ARMs. The use of ICD-9 codes identified 255 ARM cases with genealogy data across the years 1994–2014, which is close to the expected number of new patients that would be encountered over this time interval (~30 new cases are seen annually in Utah), and typically only half of all identified cases in a cohort link to the genealogy in the UPDB (~300 expected cases). Thus, the ascertainment of ARMs by ICD-9 is at least moderately sensitive. Second, since the founding population is of Northern European extraction, results of this analysis may not generalize well to other populations. Third, the unavailability of electronic medical records prior to 1994 creates a ‘‘window of analysis’’ that is particularly apparent for rare phenotypes that occur at birth, such as ARMs, examples of which are the low number of affected third-degree relatives compared to first- or second-degree relatives, and the lack of documented children of ARMs cases. Data censoring due to the window of analysis is unbiased, affecting the number of expected ARMs cases as much as the number of observed cases, but may reduce statistical power. Fourth, the genealogy itself may contain some censored data in the form of either non-linked patient medical records, missing data, or the possibility of genealogical relationships not corresponding to biological relationships.

This analysis provides strong evidence for the presence of heritable factors influencing some cases of ARMs. In addition, the unbiased risk estimates produced by the analysis may have clinical value in the form of recurrence risk estimates for prospective parents who are either themselves affected by this birth defect, or who have an affected child.