The evolutionary origin of jaw yaw in mammals

Theria comprises all but three living mammalian genera and is one of the most ecologically pervasive clades on Earth. Yet, the origin and early history of therians and their close relatives (i.e., cladotherians) remains surprisingly enigmatic. A critical biological function that can be compared among early mammal groups is mastication. Morphometrics and modeling analyses of the jaws of Mesozoic mammals indicate that cladotherians evolved musculoskeletal anatomies that increase mechanical advantage during jaw rotation around a dorsoventrally-oriented axis (i.e., yaw) while decreasing the mechanical advantage of jaw rotation around a mediolaterally-oriented axis (i.e., pitch). These changes parallel molar transformations in early cladotherians that indicate their chewing cycles included significant transverse movement, likely produced via yaw rotation. Thus, I hypothesize that cladotherian molar morphologies and musculoskeletal jaw anatomies evolved concurrently with increased yaw rotation of the jaw during chewing cycles. The increased transverse movement resulting from yaw rotation may have been a crucial evolutionary prerequisite for the functionally versatile tribosphenic molar morphology, which underlies the molars of all therians and is retained by many extant clades.

single-branch lineage that is not nested within any group of this study (Martin et al. 2015. Vincelestes is a stem cladotherian that is also recovered as a singlebranch lineage that is not nested within any group of this study. In some phylogenetic analyses it is recovered outside of Zatheria (Theria + peramurans) (Krause et al. 2104, Martin et al. 2015, but in additional analyses it is recovered within Zatheria (Rougier 2012, O'Meara andThompson 2014).
An additional issue with Fruitafossor and Vincelestes is that they appear to have derived dental and jaw features that may be due to adaptations for specialist diets. Fruitafossor has tubular molars that are indicative of obligate insectivory (Luo and Wible 2005). The jaw of Fruitafossor is unique in possessing a distinct (yet diminutive) and inflected AP, and a small coronoid process that is at the approximate elevation of the raised jaw joint. This combination of dental and jaw features is not present in any mammal group of this study. In addition, Vincelestes demonstrates characters indicative of carnivory, such as very large canines and a short tooth row (Rougier 1993). Like Fruitafossor, it possesses jaw traits that are unique to any other group in this study. The coronoid process is extremely elevated relative to the molar row, which is common for many modern and extinct carnivorous mammals (including eutriconodontans such as Repenomamus). However, unlike most carnivores, it also has a relatively elevated jaw joint and, unlike eutriconodontans, it has a distinct AP. The tooth row relative to the length of the jaw is also shorter than crown mammals of this study besides several multituberculates. Thus, if Fruitafossor and Vincelestes were included with any group of this study they would be outliers in morphometric analyses, and they are unlikely to be ideal representatives of morphologies of any group.

PART B. EXTENDED METHODS
Specimens. Images of stem mammaliaform and early crown mammal jaws were collected from the primary literature. Sources, geologic ages, and specimen information are provided in Supplementary Table S1. If more than one jaw specimen is known for a genus, the best-preserved (or most complete) specimen was chosen to represent the genus. All specimens are from the latest Triassic through Early Cretaceous (i.e., ~210-100.5 Ma). This time period was chosen because it captures all phylogenetic nodes of interest for this study and includes a substantial sample of taxa. In addition, only two major groups of mammals (i.e., cimolodontan multituberculates and crown therians) remain diverse after the mid-Cretaceous (Grossnickle and Polly 2013), at least in Laurasian landmasses, and both groups experience an ecomorphological radiation in the Late Cretaceous in which evolutionary changes in jaw shape are likely associated with increased dietary diversity (Wilson et al. 2012, Grossnickle and Polly 2013, Grossnickle and Newham 2016. Thus, I believe that truncating the study at the mid-Cretaceous is important for examining the primitive morphologies of therians and multituberculates.

Taxon
Image source Specimen Age Stem Mammaliaforms Agilodocodon Meng et al. 2015 Reconstruction based on BMNH001138A and B M. Jurassic Arboroharamiya Zheng et al. 2013 Reconstruction based on STM33-9 M. or L. Jur. Castorocauda Ji et al. 2006 Reconstruction based on JZMP 04-117 M. Jurassic Docodon Simpson 1929 Reconstruction based primarily on YPM 11826. L. Jurassic Docofossor Luo et a. 2015B Composite reconstruction using micro-CT scans of BMNH131735A and B

L. Jurassic
Haramiyavia Luo et al. 2015A Composite reconstruction using micro-CT scans of MCZ7/95A and B.

L. Triassic
Kuehneotherium Gill et al. 2014 Composite reconstruction based on micro-CT scans of multiple specimens.

E. Jurassic
Morganucodon Gill et al. 2014 Composite reconstruction based on micro-CT scans of multiple specimens.
L. Triassic-E. Jurassic Sinoconodon Zhang et al. 1998 Hahn 1969 Reconstruction based on V. J. 1-155 L. Jurassic Plagiaulax Simpson 1929 Composite reconstruction based on multiple specimens. M. Jurassic Rugosodon Yuan et al. 2013 Composite reconstruction based on BMNH1143A and B L. Jurassic Sinobaatar Hu & Wang 2002 IVPP V12517 E. Cretaceous Zofiabaatar Bakker & Carpenter 1990 L. Jurassic Eutriconodontans Amphilestes Simpson 1928 Composite reconstruction based on multiple specimens. M. Jurassic Argentoconodon  Reconstruction based largely on MPEF-PV2363 E. Jurassic Gobiconodon Jenkins & Schaff 1988 MCZ 19965 E. Cretaceous Jeholodens Ji et al. 1999 Drawing of GMV 2139 E. Cretaceous Liaoconodon Meng et al. 2011 Reconstruction based on IVPP V16051 E. Cretaceous Phascolotherium Simpson 1928 Composite reconstruction based on multiple specimens. M. Jurassic Priacodon Kielan-Jaworowska et al. 2004 Composite reconstruction based on multiple specimens. L. Jurassic Repenomamus Wang et al. 2001 IVPP V12549 E. Cretaceous Triconodon Simpson 1928 Composite reconstruction based on multiple specimens. E. Cretaceous Trioracodon Simpson 1928 Composite reconstruction based on multiple specimens. L. Jur.-E. Cret. Volaticotherium Meng et al. 2006 Reconstruction Jaw measurements. The linear jaw measurements and location of geometric morphometrics (GM) landmarks of this study are briefly explained in the main text. In addition, the region of landmarks for the angular process (AP) shape analysis is illustrated in Figure 3, and elevation measurements for the coronoid process and jaw joint (i.e., condylar process) are shown in Figure 4. However, additional details are provided here.
To measure jaw joint elevation (or depression), an extended line was first drawn from the alveolar margin (i.e., dorsal edge of the jaw body at the base of the molar row) posteriorly to the condylar process. (I chose to use this line rather than a line from the molar cusps because molars are not always preserved in fossil jaws. Further, molar cusps are often worn or include multiple occlusal surfaces at different elevations (e.g., tribosphenic lower molars include elevated shearing crests on the trigonid and wear surfaces on the depressed talonid basin), making it especially difficult to determine a horizontal line based on the molars.) From the initial line, a perpendicular line was drawn to the jaw articulation surface (i.e., jaw joint), or posterodorsal-most point of the condylar process. I used the midpoint of the articulation surface for genera with an extensive articulation surface (e.g., multituberculates). The length of this line was measured as the jaw joint elevation (or depression), and dividing this value by jaw length standardized the measurements.
I measured the length of the tooth row as a means of testing whether the typical out-lever length (i.e., distance from the axis of rotation to the bite point) was likely to vary significantly among mammal groups. My assumption is that similar tooth row lengths relative to jaw length will result in similar bite point locations. Conversely, if one mammal group has significantly longer tooth rows in which molars are more posteriorly positioned, I would expect that the typical bite points during molar occlusion are also relatively posterior in position.
The tooth row length was measured from the base of the anterior-most incisor to the posterior edge of the ultimate molar, and this was then standardized by dividing by the length of the jaw. For a few genera, the anterior tips of jaws are not preserved, and in these cases the jaw and tooth row lengths were estimated. Results for this measurement are not shown in the main text but are provided in the Extended Results of the Supplementary Information.
Geometric morphometrics (GM). For the GM analysis of the AP region of the jaw (Fig.  3), one landmark was placed between the ultimate and penultimate molars. This landmark served to maintain the correct polarity for the outline during the Procrustes analysis (otherwise, the Procrustes analysis might flip outlines vertically to better align morphologies), and capture variance associated with thickness of the jaw body and elevation/depression of the AP. The semilandmarks begin along the ventral edge of the jaw, at a spot that is perpendicular to the base of the molar row (i.e., the horizontal line at the alveolar margin that was also used in the linear morphometric measurements). See the discussion below for reasons why semilandmarks were used instead of sliding semilandmarks. The landmarks end at the ventral-most point of the articulation surface (or head of the condylar process).
Ideally, landmarks and semilandmarks of GM analyses should represent homologous points of a structure, but this is unlikely to be the case for many landmarks/semilandmarks in this study. For example, the number of molars varies among mammal groups, so the single landmark between the ultimate and penultimate molars is unlikely to be homologous among all taxa. In addition, the presence/absence of the AP varies among mammal groups, meaning that the semilandmarks cannot possibly capture homologous points if some taxa have structures that other taxa lack. However, I consider these landmarks to still fall within the 'Type III' landmark category of Bookstein (1997) because they are outlining the same region of the jaw in all taxa and likely capturing homologous muscle insertion locations for the SM and MP. Also, the goal of the GM analysis of the AP region is simply to quantify the shape of the posteroventral region of the jaw to examine the variation among early mammal groups. In particular, the results help demonstrate that the position of the AP in stem mammaliaforms (i.e., a relatively anterior position) is different than the position of the AP in cladotherians (i.e., a relatively posterior position) (Fig. 3). This can be demonstrated through qualitative descriptions of the jaws as well, but the GM analysis provides strong support for this observation with quantitative evidence.
The landmark and semilandmark coordinates were subjected to standard geometric morphometrics procedures, which include a Procrustes analysis and a principal components analysis (PCA). The Procrustes analysis realigns shapes (in this case, sets of x and y coordinates representing shapes) to eliminate variation associated with size (i.e. scaling), translation, and rotation. This results in shapes (represented by Procrustes values) that only vary in terms of shape differences. The Procrustes values are then ordinated using a PCA. Results for the first two principal components (i.e., PC1 and PC2) are shown in Figure 3 (and Supplementary Figure S4 of the Extended Results). In the Figure 3 plots, PC1 is the horizontal axis and accounts for 54.5% of variance, and PC2 is the vertical axis and accounts for 17.4% of variance. The mean shape for the APs was calculated for each mammalian group using all Procrustes values, and the position of these are marked on the PC1 and PC2 morphospace plots of Figure 3 as large, black points. The thin plate splines of the average shapes for groups are shown in middle column of Figure 3. To determine the locations for the SM and MP muscle insertion in the jaw models ( Fig. 6), the thin plate splines were overlaid atop one another (see below).
The Procrustes analysis, PCA, calculation of group means, and production of thin plate splines (Fig. 3) were performed using Wolfram's Mathematica and Polly Morphometrics for Mathematica (2016). See the User's Guide for Polly Morphometrics for Mathematica (available at http://mypage.iu.edu/~pdpolly/) and citations within for additional information.
Semilandmarks versus sliding semilandmarks. It is common for geometric morphometric studies that use equally spaced semilandmarks to adjust the semilandmarks by 'sliding' them along tangents of the shape outline to minimize 'bending energy' or Procrustes distances (e.g., Gunz et al. 2005, Grossnickle andNewham 2016), helping to increase the likelihood of capturing homologous points along an outline. However, I did not use sliding semilandmarks in this study for a couple reasons. First, the PCA results for analyses using sliding semilandmarks and non-sliding semilandmarks are very similar. Supplementary Figure S1 shows PC1 and PC2 results from my original analysis (left) and results using sliding semilandmarks (right) produced using the Geomorph package (Adams and Otárola-Castillo 2013) for R 3.2.4 (R Core Team 2016). Dryolestoids (red) and therians (orange) are highlighted. Note that dryolestoids are outliers along PC1 (horizontal axis) in both analyses, which is a key result for my conclusions. The morphospace positions of multituberculates and stem mammaliaforms are most affected by using sliding semilandmarks, but those two groups are of less interest in this study.

Sliding semilandmarks
Supplementary Figure S1. Principal component (PC) analysis results for PC1 (horizontal axis) and PC2 (vertical axis) from the original analysis with non-sliding semilandmarks (left; Fig. 3) and a repeated analysis using sliding semilandmarks (right). Dryolestoids are red and therians (and close kin) are orange. Note the similar morphospace positions for the polygons between the two analyses. See Figure 3 for additional details.
Second, sliding the semilandmarks disrupts the original shapes by moving the semilandmarks along tangents of the outline. In the analysis of this study, semilandmarks on the posterior tip of prominent angular processes are 'slid' away from the tip to minimize Procrustes distance (the default method in Geomorph). Gunz et al. (2005) note a similar problem with sliding semilandmarks (Figure 8a in Gunz et al. 2005) and suggest that this error can be corrected by adding an additional landmark at the tip of the extended region. However, this correction is not possible in my analysis because of a lack of clear homologous points due to many taxa (multituberculates, spalacotherioids, and eutriconodontans) not having an angular process. Thus, the resulting thin plate splines using sliding semilandmarks do not capture the shape of the angular process as well as non-sliding semilandmarks. I demonstrate this in Supplementary Figure S2 with thin plate splines for the mean shape of dryolestoids from the original analysis (left) and from the analysis using sliding semilandmarks (right). This issue is especially problematic because the average shapes produced with sliding semilandmark analyses are less informative (and may not be accurate due to the sliding) for inferring the coordinates used in the jaw models. For example, it is difficult to know from the sliding semilandmark spline how posteriorly extended the angular process is, and this posterior point is important for estimating the muscle insertion locations for the medial pterygoid and superficial masseter. Also, the average angular process shape would simply not be illustrated as clearly in Figure 3 for the reader if sliding semilandmarks were used (Supplementary Figure S2).

Sliding semilandmarks
Supplementary Figure S2. Thin plate splines for the mean shape of dryolestoid angular processes from the original analysis with non-sliding semilandmarks (left; Fig. 3) and from a repeated analysis using sliding semilandmarks (right). See Figure 3 for additional details.
Norman MacLeod explains these two problems (and an additional issue) with sliding semilandmarks in more detail and provides further examples: http://cdn.palass.org/palaeomath_101/moribund/downloads/Landmarks_Semilandmarks.pdf. The 'relaxation step' described by Gunz et al. (2005) may help alleviate the second issue by returning the semilandmark from the tangent line to the original outline, but I do not think this would help analyses of this study because the similar problem that they address (Figure 8a from Gunz et al. 2005) still occurred even with the relaxation step.
Two-dimensional analyses. One potential concern with the morphometrics analyses of this study is that the methods only capture two dimensions of the jaw. This concern is highlighted by the fact that marsupials (i.e., metatherians) tend to possess a prominent, medially inflected AP that cannot be well represented by two-dimensional (2D) landmarks. However, 2D analyses of jaw function allow for a much greater sample size and are commonplace in the literature. Therefore, it is expected that broad evolutionary patterns and functional considerations can be obtained from the 2D jaw morphometric analyses performed in this study. Further, only one jaw of a metatherian, Sinodelphys, is preserved from the time period of this study, and it does not possess an inflected AP (Luo et al. 2003).
Jaw models -measurements. To examine functional changes in therians and their close ancestors, 3D jaw models were constructed for eutriconodontans, spalacotherioids, dryolestoids, and early therians (and close kin) (Figs. 5-6). These were produced using Wolfram's Mathematica. See Figure 2 for the x, y, and z coordinate system used for the models.
In the jaw models, the base of the tooth row (i.e., dorsal edge of the jaw body, or alveolar margin) is set as a horizontal line at y = 0. The y-axis length from the mandibular symphysis to the jaw joint is 10 distance units (d.u.), and this is kept constant for all jaw models. For each mammal group, the average (median) jaw joint elevation above (or depression below) the base of the molar row ( Fig. 4) is used to assign the vertical (i.e., y-axis) jaw joint locations (Fig. 6, Supplementary  Fig. S3). For instance, the average jaw joint elevation for dryolestoids is 12.6% of the jaw length. Thus, the y-axis value for the dryolestoid jaw joint in the model is 12.6 d.u. Similarly, the average (median) coronoid process elevations (Fig. 4) were used for each group and represent the approximate muscle insertion locations for the TM. (It is recognized that the TM includes an extended attachment along the dorsal edge of the coronoid process, and the y-axis location for the center of the force vector could vary among groups. However, examining these variables is beyond the scope of this study.) See the Extended Results (Supplementary Table S3) for additional median values of jaw joint elevation and coronoid process elevation that were used in the models.
To determine the x and y coordinates for the SM and MP muscle insertion locations, thin plate splines from the GM analysis of AP shape (Fig. 3) were overlaid atop one another with the single landmark (which is between the ultimate and penultimate molars) aligned. The splines were horizontally stretched so that the length between the single landmark and the posterior-most semilandmark is equal for all splines. For the dryolestoid and therian groups, the SM and MP insertion location is assigned to the posterior-most point along the edge of AP. For spalacotherioids and eutriconodontans, which don't have a distinct AP, considering the location of the posterior portion of the masseteric fossa assisted in determining the central location of the insertion site.
It is worth noting that although the AP is relatively gracile in many early cladotherians, this does not necessarily indicate that only a small amount of muscle attaches to the AP. For instance, the MP and SM of modern mammals tend to produce a large, muscular sling that wraps around the AP and the posteroventral region of the jaw (Turnbull 1970). Even taxa such as shrews that have an elongate and very thin AP maintain a considerable amount of muscle mass in the AP region.
To determine jaw dimensions (i.e., length and posterior width) to use for the models, direct measurements of specimens could not be used for early mammal genera for a couple reasons. First, preserved fossils of intact lower jaws with both hemimandibles are extremely rare. Further, it is unlikely that those that are preserved maintain a reliable angle of attachment at the mandibular symphysis since many fossils are flattened or distorted. Second, even if both hemimandibles are preserved intact, the angle at the symphysis remains unreliable because early mammals are believed to have had unfused symphyses, which allows for 'wishboning' of the jaw and changes to the angle at the symphysis.
Thus, jaw length and width for the models was determined using the average dimensions of six extinct mammaliaforms (including three crown mammals) and 25 extant mammal genera (Supplementary Table S2). The extant mammals chosen for measurements are primarily small insectivores or omnivores that are appropriate analogs of Mesozoic mammals. In addition, they are taxonomically diverse, representing several orders of eutherians and metatherians. Measurements of extant mammals were taken at the Field Museum of Natural History (FMNH), and measurements of extinct mammals were obtained from images in the published literature (sources given in Supplementary Table S2). Since many mammalian species have unfused mandibular symphyses, museum preparators often glue hemimandibles together, meaning that the angle at the symphysis (and width of the jaw) is not reliable for many specimens. Thus, jaw width measurements were taken as the distance from the centers of the articular surfaces of the glenoid fossae of the skull. These measurements were then divided by the jaw length. Results indicate that the average jaw width at the jaw joint (or glenoid fossae) is approximately 60% of jaw length. Hence, the jaw width for the models was set at 6 d.u. and the length was set at 10 d.u. These values were kept constant for eutriconodontans, spalacotherioids, dryolestoids, and early therians (and close kin). Supplementary Table S2. Measurements of mammaliaform and modern mammal jaws and skulls. Average jaw measurements are used for determining the length and posterior width of the jaw models ( Fig. 6, Supplementary Fig. S3). Average skull measurements are used to determine the estimated position of the muscle origins of the superficial masseter (SM) and medial pterygoid (MP) (see text). Posterior jaw widths at the jaw joints are often estimated from the glenoid fossae of the skull (see text). 'Pterygoid to pterygoid width' and 'zygoma to zygoma width' values are divided by the total jaw width at the jaw joints. Length measurements (i.e., 'anterior tip of jaw to pterygoid' and 'anterior tip of jaw to zygoma') are divided by the total jaw length. Thus, table values represent ratios of the measurements to the total width or length of the jaw. Values in bold and italics are averages (means) for each group, and overall means, medians, and standard deviations are given at the bottom. Field Museum of Natural History (FMNH) specimen numbers are given in parentheses after the genus name. As with jaw dimensions, determining x and z coordinates for the SM and MP muscle origins on the skull could not be obtained by direct measurements of specimens, largely because 3D skull fossils of early mammals are extremely rare. Thus, the same specimens of modern and extinct mammals that were used for jaw dimension calculations were also used to determine approximate locations of the SM and MP origins (Supplementary Table S2). The SM origin was measured as the approximate location of the anterior zygoma, and the MP origin was measured as the center of the pterygoid process of the sphenoid bone. The distance between the anterior right zygoma and left zygoma was measured and divided by the width of the jaw. On average, the width between anterior zygomae is 96.24% of the jaw width (6 d.u.) (Supplementary Table S2), and therefore the width at the SM origins was set at 5.8 d.u., or 96.7% of the jaw width. Similarly, the distance from pterygoid process to pterygoid process is 24.0% of the width of the jaw, and therefore the width at the MP origins was set at 1.44 d.u., or 24.0% of the jaw width.
Because of the curvature of the TM around the braincase, the origin of the TM was not estimated from linear measurements of skulls. In the models, the TM force vector is directed posteriorly and slightly dorsally as it is in many extant mammals (Turnbull 1970). The TM is treated as a single muscle rather than being split into the anterior temporalis and posterior temporalis. However, merging the anterior and posterior regions of the temporalis as a single muscle vector is common in the literature (e.g. Turnbull 1970, Davis et al. 2010, Law et al. 2016. Further, the anterior and posterior portions of the TM contract concurrently in Didelphis (Crompton and Hylander 1986), and Didelphis is viewed as having the primitive condition of Triplet muscle groups that was likely present in the earliest therians (Weijs 1994, Williams et al. 2011. Thus, separating the TM into different portions for the modeling analyses would be unlikely to have a significant effect on results. Based on the various measurements and considerations discussed here, 3D coordinates were produced for muscle origins, muscle insertions, and jaw joints. Muscle insertion and jaw joint coordinates (x, y, z) are provided in Supplementary Figure S3, which includes the same jaw models as in Figure 6 except that individual models are shown for each mammal group. The coordinate system is based on that shown in Figure 2. The mandibular symphysis and muscle origin coordinates are kept constant in all models. The muscle origin coordinates used in the models are not given in Supplementary Figure S3, but are provided here: SM (4.35, 1.1, 2.9), MP (1.75, 0.75, 0.72), and TM (-1.5, 2.9, -2.7). The working-side SM, working-side MP, and balancing-side TM are used in the model because these represent a Triplet muscle group that contracts synchronously during the power stroke of a chewing cycle (Weijs 1994). As previously discussed, the TM muscle vector is truncated due to the curvature of the braincase. Thus, the TM origin is not expected to reflect the true location of the muscle origin.
Supplementary Figure S3. Jaw models for early mammal groups with coordinates (x, y, z) for the jaw joints and muscle insertions. The SM and MP force vectors end at the muscle origins, the coordinates for which are provided in the text. Force vector lengths do not represent the relative strengths of the vectors. The x, y, and z coordinate system is based on that shown in Figure 2. Muscle forces. Assigned muscle forces (F in Equation 1 of the main text) are based on relative muscle masses reported in Turnbull (1970), and these values are provided in the Methods and Figure 6 key. Turnbull expresses these as percentages of the overall mass of jaw musculature. Analyses are run with relative muscle masses for three mammalian species to help capture the range of relative muscle forces that might be found in early mammals. Thus, even if muscle masses are not perfect proxies for muscle forces, by including a variety of force values the analyses capture a range of potential results for the mammal groups. It is worth noting that results remain relatively similar when using the different force assignments (Fig. 6), suggesting that potential variation in muscle forces among mammal groups is unlikely to alter the broad trends seen in this study.
Rather than relative muscle masses, physiological cross-sectional area (PCSA) is often used in analyses of jaw mechanics to estimate muscle forces. PCSA is calculated using an equation that includes muscle mass, muscle density, fiber length, and fiber pennation angle (see Davis et al. 2010 and citations within). Unfortunately, this equation cannot be implemented in this study because information such as fiber lengths (for extinct taxa and most modern analogs) is unknown. However, a muscle density value near zero and a pennation angle of zero degrees are often used in the PCSA equation for all jaw muscles (e.g. Davis et al. 2010, Law et al. 2016. Thus, these factors should have little effect on calculations of PCSA, and muscle masses are expected to be an appropriate proxy for PCSA in this study. Alternatively, PCSA can be estimated based on the area of the infratemporal fossa. However, Davis et al. (2010) demonstrate that PCSAs based on infratemporal fossa area severely overestimate the contribution of the medial pterygoid and superficial masseter. Thus, estimates from muscle masses seem more appropriate.
Jaw models -moment calculations. The general method for calculating moment values for the musculoskeletal jaw configurations of early mammal groups is provided in the Methods section of the main text. This includes an explanation for why a single Triplet muscle group (as defined by Weijs 1994) is used for calculations (Figs. 2 and 6). However, some additional information is discussed here.
Moment calculations based on the jaw models of this study extend beyond conventional 2D jaw mechanics analyses by incorporating the third dimension (i.e., z-axis of Figure 2). Including the additional dimension is considerably beneficial by allowing for analyses of yaw rotation and roll rotation (Figs. 2 and 6). These analyses would not be possible if only 2D models based on lateral jaw images were used. However, it is worth noting that the z-axis values are estimated based on average measurements of many extant mammals and are kept constant among mammal groups. Thus, future studies should attempt to examine how variations in width of the jaw and muscle insertion/origin locations affect results (although see the Extended Results section for an analysis of variation in origin location), especially since these are likely to vary among early mammal groups.
Moment arm lengths are critical components of moment calculations. Moment arms are the perpendicular distances from individual force components to the axis of rotation. These are not visualized in the models of Figure 6 and Supplementary Figure S3. However, Figure 5 shows an example of moment arms (black lines, labeled Lx and Ly) in 2D for the eutriconodontan SM for pitch rotation around an axis through the jaw joints. The x-and z-axis components for the SM force vector are also shown (gray arrows, labeled Fx and Fy), although they are scaled to the length of the muscle and not the size of the force vector. Moment arm lengths were calculated for all directional components of all muscle vectors. These are multiplied by the force components and then summed to calculate moment (i.e. torque) values (see Methods). See equations in the Methods section of the main text for further information on how these moment arm lengths were applied to calculate moment.
Mammals typically have unilateral mastication, meaning they have chewing cycles that include molars of a single hemimandible occluding with upper molars. Thus, only the working-side hemimandible needs to be considered when examining the power stroke of the chewing cycle in which molars are in occlusion. This is especially relevant for calculations of roll rotation, since only one hemimandible is used for this analysis (Fig. 6b).
Calculations of moment values can be simplified because not all force vector components contribute to rotation around axes. For instance, if the axis of rotation is mediolaterally oriented through both jaw joints (for pitch), the axis is parallel to the z-axis. In this case, the x-and y-axis force components contribute to the moment while the z-axis component does not. Hence, for calculations involving pitch (Fig. 6a), z-axis force components can be ignored. Similarly, y-axis force components do not contribute to yaw (Fig. 6b). Although the axis of rotation for roll is oblique for each group, the axis is rotated prior to moment calculations so that it falls along the x-axis (see below). Thus, the x-axis force component does not contribute to roll after rotation of the axis.
Moment calculations for roll rotation present a unique challenge in that the axes of rotation are oblique. The axis for roll is defined here as the line from the mandibular symphysis to the jaw joint, and this axis varies among mammal groups due to differences in jaw joint elevation (Fig. 6, Supplementary Table S3). For pitch and yaw, the axes of rotation are parallel to the z-axis and yaxis, respectively, and calculations can be simplified by only considering the two relevant force vector components, as described above. However, due to the oblique axis for roll, all three force components (i.e., x-, y-, and z-axis components) must be considered for roll analyses. To avoid the additional computation necessary for this analysis, I simply rotated the hemimandible model (and all associated points such as muscle origins) so that the axis of rotation was parallel to they x-axis. This included two rotations, which were performed using the RotationTransform function in Wolfram's Mathematica. First, the model was rotated horizontally so that all points on the jaw were in sagittal plane and had a z-axis value of zero. Second, the model was rotated vertically so that the axis of rotation was parallel to the x-axis. This second rotation was repeated individually for the models for each mammal group, since each group possessed a different axis of rotation that depends on the elevation of the jaw joint. After the rotations, roll rotation was calculated in the same manner as pitch and yaw.
Mechanical advantage. A commonly calculated performance metric for jaw mechanics is mechanical advantage. Mechanical advantage of a system is calculated as the in-lever (i.e., moment arm) length divided by the out-lever length. This can then be multiplied by the force vector components and summed for multiple muscles, as done for calculations of moment. For the jaw models in this study, the out-lever length would be the distance from the axis of rotation to the bite point. However, this measurement is disregarded here for several reasons: the molars are missing or worn in many taxa; the elevation of the bite point could vary for molars that have wear facets of different elevations (e.g., tribosphenic molars have elevated trigonid cusps and a depressed talonid basin); and the relative length of the tooth row and position of the ultimate molar from the axes of rotation appear to be fairly consistent among mammal groups (see Extended Results, Supplementary Table S3), indicating that the out-lever distance would not change significantly among mammal groups. If the bite point is kept constant among mammal groups, the out-lever distance would remain the same for all calculations of yaw because the axes of rotation do not vary among groups (Fig. 6c). (Thus, 'moment' and 'mechanical advantage' are used interchangeably in the main text of this study.) However, the out-lever length will change in the calculations for pitch and roll since the axes of rotation are altered among mammal groups, although the effect of this change is expected to be minor. Therefore, future studies could explore the effects of this variable to a greater extent, especially for well-preserved taxa in which common bite points can be accurately identified. Results may also vary when mechanical advantage is calculated for other teeth besides the molars, such as the canines and incisors.

PART C. EXTENDED RESULTS AND DISCUSSION
Angular process (AP). Results of the AP geometric morphometrics (GM) analysis are shown in Figure 3. Results include three morphospace plots of the first two principal component analysis (PCA) axes, with PC1 along the horizontal axis and PC2 along the vertical axis. The three plots are identical but are replicated to highlight results for different mammal groups, which are designated by polygons. These plots are reproduced in Supplementary Figure S4 with labels for individual genera. As with Figure 3, the images are replicates of a single PCA plot for PC1 and PC2, but different mammal groups are highlighted by polygons in the replicates. These analyses were performed with semilandmark outlines of AP jaw regions, but see the Extended Methods and Supplementary Figures 1 and 2 for consideration of results when using sliding semilandmarks. A considerable shift in average jaw morphology occurs early in the crown mammalian tree. Compared to stem mammaliaforms (which have an anteriorly positioned AP), the most notable morphological change is the lack of an AP in three of the early crown mammal groups: multituberculates, eutriconodontans, and spalacotherioids ( Fig. 3 and Supplementary Fig. S4). These are the 'blue' lineages in Figures 1, 3-6. However, it is worth noting that this lack of an AP does not necessarily indicate reduced medial pterygoid and superficial masseter muscles, since these taxa often possess deep fossae allowing for considerable muscle attachment.
At the cladotherian node, a second major shift in average jaw morphology occurs: cladotherians evolve a prominent, posterior AP. Cladotherians include dryolestoids and early therians, which are the 'red' lineages in Figures 1, 3-6. In dryolestoids (i.e., the earliest branching cladotherians), the distal end of the AP is located directly ventral to the jaw joint, whereas the APs in stem mammaliaforms are located in a position that is anterior to the jaw joint. Thus, dryolestoids do not overlap in PCA morphospace with earlier lineages that also possess an AP (Fig. 3 and Supplementary Fig. S4). Early therians also possess prominent APs, but they are not as posteriorly positioned as the APs of dryolestoids. As noted in the main text, the results of the AP analysis support previous suggestions that the APs of non-mammalian cynodonts (which have been referred to as "pseudangular" processes) are not homologous to the APs of therians (e.g., Jenkins et al. 1983). This is in contrast to recent studies that have argued that the APs of non-mammalian cynodonts and mammals are homologous, based in part on the observation that the APs serve as insertion sites for the same jaw muscles in all groups (Abdala andDamiani 2004, Rougier et al. 2015). This debate over the homology of the AP seems to focus in part on whether the non-mammalian cynodont AP is homologous to the AP of the earliest crown mammals, australosphenidans. Here, the results support homology of the AP between these two groups, especially since they partially overlap in the GM morphospace (Fig. 3). This is in congruence with conclusions of Rougier et al. (2015). However, results of this study also conflict with conclusions of Rougier et al. (2015) by suggesting that the AP of australosphenidans and cladotherians are not homologous. Additional research and considerations may be needed to help resolve this issue.
Jaw joint elevation -implications. Eutriconodontans possess depressed jaw joints (Fig. 4, Supplementary Table S3). In addition, they lack an AP and have very little curvature of the posterior-ventral region of the jaw (Fig. 3). These traits are comparable to modern mammalian carnivores, which tend to possess a very reduced AP and a depressed condylar process. One eutriconodontan, Repenomamus, is a confirmed carnivore based on the presence of embryonic dinosaur bones found in the stomach region (Hu et al. 2005B). These results suggest that the jaw features of eutriconodontans evolved as adaptations for a carnivorous diet. Spalacotherioids tend to be smaller than eutriconodontans and therefore more likely to have been insectivores. Their jaw joints are less depressed than those of eutriconodontans and the posteroventral region of the lower jaw often forms a "bulge," although a true AP is not present (Figs. 3, 4). Thus, they appear to be morphological intermediates between eutriconodontans and cladotherians. Supplementary Table S3. Jaw joint elevation, coronoid process elevation, and tooth row length of jaw specimens of this study (see Supplementary Table S1). The values are ratios, calculated by dividing the measurements by the length of the jaw. Median values for groups are in bold and italics. Jaw joint and coronoid process elevation results are shown in Figure 4. In the notes column, "molar landmark" refers to the single landmark between the penultimate and ultimate molars (Fig. 3), and "AP shape analysis" refers to the geometric morphometrics analysis of the angular process (AP) region of the jaw (Fig. 3). Jaws with joints at the same elevation as the molar row may be better adapted for slicing (analogous to scissors), and jaws with an elevated jaw joint may be better adapted for grinding by allowing multiple molars to occlude simultaneously (analogous to tongue-and-groove pliers) (Maynard Smith andSavage 1959, Greaves 1974). This concept may be relevant to early cladotherians due to the appearance of the talonid shelf of the molars. The very elevated jaw joints in dryolestoids (Fig. 4) may permit the paracones of multiple upper molars to occlude simultaneously with the talonid shelves of multiple lower molars as the jaw moves mediodorsally. However, if this was the sole reason for the elevated jaw joint then it could be expected that early therians, which possess grinding surfaces on talonid basins, would have jaw articulations with the same elevation, and this is not the case (Fig. 4). The jaw joint elevation in early therians is not as great as dryolestoids. Thus, additional considerations must account for the elevated jaw articulations of dryolestoids.
It is possible that the elevated jaw joint in dryolestoids is associated with the posterior position of the AP in this group (Fig. 3). Elevating the jaw joint provides space for the AP in the posteroventral region of the lower jaw, and it helps maintain the lengths of moment arms (i.e., inlevers) between the jaw joint (i.e., fulcrum) and force vectors of the SM and MP during pitch rotation.
Coronoid process. In this study, morphological changes to the coronoid process are emphasized to a lesser extent than changes to the AP. This is largely due to the observation that the major morphological change in early cladotherian jaws is the appearance of a prominent, posterior AP (Fig. 3). Hence, changes to the force vectors of MP and SM, which insert on the AP, are a major focus of this study. However, changes to the coronoid process (and insertion point for the TM) are considered by measuring the elevations of the coronoid process above the molar row (see Methods and Supplementary Extended Methods). The general conclusion is that the coronoid elevation does not vary considerably among groups, especially relative to changes in AP shape (Fig. 3) and jaw joint elevation (Fig. 4). Thus, the TM force vectors that are shown in Figure 6 (i.e., purple arrows) are kept constant for the four mammal groups in which models were produced. However, the slight differences in coronoid elevation among groups (Supplementary Table S3, Supplementary Fig. S3) are incorporated into the actual calculations of moment values. Average (median) coronoid elevation relative to jaw length for the early mammal groups: eutriconodontans, 22.6%; spalacotherioids, 21.5%; dryolestoids, 25.2%; and therians (and close kin), 21.9% (Supplementary Table S3). The most notable result is that dryolestoids have coronoid processes with greater elevations than those of additional mammal groups. This could be related to the elevated jaw joint in this group, since a concurrent elevation of the coronoid process (and TM force vector) would help maintain the length of the moment arm for pitch rotation.
Further analyses of the coronoid process shape (such as a GM analysis that is similar to the one performed on AP shape) were not performed here for several reasons. First, the coronoid process in early mammal fossils is not as commonly preserved as the condylar or angular processes, meaning the sample is smaller for this jaw process (e.g., the australosphenidan group only includes coronoid elevation results for two genera). Second, because of the poor preservation of this process, the shape of the coronoid is often reconstructed with less confidence in publications (i.e., it is often drawn with a dashed line outline, indicating that the shape is uncertain). Third, there is considerable within-group variation in coronoid process shape. This makes it more difficult to discern broad evolutionary trends and blurs the differences between early mammal groups.
Tooth row length. In the jaw model analyses, the locations of the bite points are not incorporated. The importance of this variable becomes apparent when considering morphological and functional differences between pre-mammalian synapsids and mammaliaforms (Crompton and Hylander 1986). Pre-mammalian synapsids such as cynodonts tend to have shorter tooth rows (relative to total jaw length) than mammaliaforms. Thus, the typical bite points along the tooth row for cynodonts are expected to be more anteriorly located than the bite points of mammaliaforms. Variation in the bite point position is likely to alter the mechanical advantages and bite forces of different groups. For instance, calculation of mechanical advantage includes dividing the in-lever (i.e., moment arm) length by the out-lever length. The out-lever in the jaw models is the distance between the bite point and the axis of rotation, and therefore this value will vary among taxa with different bite point locations.
To help address whether bite point locations are expected to vary considerably among early mammal groups, I measured the relative tooth row lengths for the genera of this study (Supplementary Table S3). My assumption is that taxa with similar tooth row lengths will have similar bite point locations, on average, especially for the molars. Results are similar among the mammal groups for which I produced jaw models (Supplementary Table S3). Average (median) tooth row lengths relative to jaw length: eutriconodontans, 63.4%; spalacotherioids, 63.9%; dryolestoids, 63.4%; and therians (and close kin), 60.7% (Supplementary Table S3). Based on these results, the lengths between axes of rotation and the bite points (i.e., out-lever lengths) are not expected to vary considerably among mammal groups and are not examined in more detail.
Moment results. As the posterior processes of the lower jaw underwent evolutionary changes among early mammal groups, performance metrics (e.g., torque and mechanical advantage) for the musculoskeletal jaw configurations would have been altered for various movements. The 3D jaw models (Fig. 6, Supplementary Fig. S3) help assess differences in moment (i.e., torque or moment of force) for three types of jaw rotation (i.e., pitch, roll, and yaw). Results for these analyses are shown in Figure 6 and Supplementary Table S4. See the Results and Discussion section of the main text for discussion of the moment values for the mammal groups. Supplementary Table S4. Moment values associated with the jaw model results in Figure 6. Results are given for the two yaw axes of rotation shown in Figure 6c. Results were calculated using multiple force assignments for the superficial masseter (SM), medial pterygoid (MP) and temporalis (TM) muscles (see text). These are based on the relative muscle weights reported by Turnbull (1970) for Didelphis virginiana (D. v.) (TM, 0.57; SM, 0.14; and MP, 0.07), Echinosorex gymnurus (E. g.) (TM, 0.61; SM, 0.11; and MP, 0.09), and Canis familiaris (C. f.) (TM, 0.67; SM, 0.10; and MP, 0.03). Deep masseter. The deep masseter muscle is not included in functional analyses primarily because it is not considered a member of the Triplet muscle groups, as defined by Weijs (1994). Also, it does not insert on the jaw processes that are a focus of the morphometric analyses of this study, making it more difficult to examine evolutionary changes to the muscle vector. Further, estimation of deep masseter position in 3D is especially difficult without a distinct process for the muscle insertion (in contrast to the other muscles of this study) and preservation of the zygomatic arches for muscle origins.
Because the deep masseter is not one of the Triplet muscles, it is not expected to reach maximum contraction concurrently with the Triplet muscles. Instead of including the deep masseter with Triplet muscle groups, Weijs (1994) described the working-side deep masseter and balancingside deep masseter as 'Vertical Closers.' However, Crompton et al. (2011) included the workingside deep masseter in their 'Group 1' muscles and balancing-side deep masseter in their 'Group 2' muscles for Didelphis, although they also note that the deep masseter is involved in the fast close and fast open stages of the chewing cycle. They depict Group 1 and Group 2 muscles contracting concurrently, similarly to the Triplet muscles of Weijs (1994). Thus, if the deep masseter was included in analyses, the balancing-side deep masseter would be included with the Triplet II muscles that are modeled in Figure 6.
Unlike the SM and MP, which have a large x-axis (horizontal) component (Figs. 5 and 6), the deep masseter is directed dorsolaterally (i.e., it has significant z-and y-axis components). The dorsally-directed component of the force vector is parallel to the axes of rotation during yaw, meaning that it does not affect yaw rotation. Similarly, the laterally-directed component is parallel to the axis of rotation during pitch, meaning that it does not affect pitch rotation. Thus, incorporation of the deep masseter into moment calculations for yaw and pitch may have a minimal effect on results for these types of rotation. Unlike yaw and pitch, the deep masseter may have a considerable effect on results for roll rotation (Fig. 6b). It connects the masseteric fossa of the jaw to the zygomatic arch, and the force vector likely has a relatively large moment arm for the axis of rotation for roll, especially if the zygomatic arch extends laterally from the skull. However, the deep masseter does not insert on the AP and is not expected to experience as great of change in vector as that of the MP and SM among mammal groups. Therefore, including it in analyses is likely to alter results for all mammal groups in a similar manner, which is not expected to change the broad trends seen here.
As noted in the main text, some mammals such as carnivorans may produce transverse molar movements via mediolateral translation (along the z-axis of this study) rather than via yaw rotation. The deep masseter may be especially involved in this movement since it has a large lateral (i.e., z-axis) component. This is supported by results in Davis (2014), which show evidence of a late-contracting deep masseter causing significant mediolateral translation in the kinkajou.
Gape. The moment results from modeling analyses (Fig. 6) suggest that morphological jaw changes in early cladotherians (i.e., dryolestoids and early therians) may be adaptations associated with greater yaw rotation during mastication. However, it is possible that additional performance metrics played a role in the morphological changes in cladotherian jaws. For instance, musculoskeletal configurations of the jaw can have a considerable impact on gape, both in terms of the amount of gape allowed and the force produced by adductor muscles during varying degrees of gape. In general, the greater a muscle is put in tension (i.e., stretched), the greater the reduction in force output (Herring and Herring 1974). Therefore, muscles that stretch to a greater degree during jaw opening will produce less force during a wide gape. In the jaw models of this study (Fig. 6, Supplementary Fig. S3), the SM and MP are longer in dryolestoids and therians than in eutriconodontans and spalacotherioids due to the posteriorly extended AP. Thus, it is expected that the jaw muscles in early cladotherian would experience relatively less stretch of the muscle during jaw opening. This assumption suggests that increased gape and greater muscle forces during wide gape may be possible in early cladotherians, and that this may have been an additional factor in the evolution of an extended AP.
However, additional considerations indicate that gape is unlikely to have played a role in the evolution of an extended AP in early cladotherians. First, stem cladotherians were likely insectivores, based on their relatively small body sizes (especially compared to eutriconodontans), dentitions that are analogous to zalambdodont dentitions of modern "insectivorans" such as tenrecs, and jaw morphologies (Grossnickle and Polly 2013). Consuming insects is unlikely to require increased gape or strong forces during wide gape. Second, lengths and directions of muscle fibers affect force-tension curve of muscles and the amount of gape possible, and these variables are unknown for fossil taxa. Thus, additional analyses are necessary to further examine the effect of potential muscle fiber lengths and directions on gape in early mammals. Third, the AP of dryolestoids (and some insectivorans such as shrews) tends to be long and thin (Fig. 3), with the long axis of the AP pointed anteriorly in the direction of the inferred superficial masseter origin (near the anterior zygoma) during jaw closure (e.g., Turnbull 1970). Thus, the muscle force vectors are assumed to be largely parallel to the long axis of the AP (and mostly horizontal) when the teeth are in occlusion. However, during wide gape the posterior tip of the AP will rotate dorsally behind the jaw joint, bringing the muscle vectors of the SM and MP closer to the jaw joint and significantly reducing their moment arm lengths for pitch rotation (around an axis through the jaw joints). This suggests much weaker bite forces for pitch during wide gape. Also, as the AP rotates during wide gape its long axis is no longer pointed toward the muscle origins (instead it will point ventromedially). If the SM and MP place a strong tensile force on this gracile AP while it is not aligned with (or parallel to) the direction of force, it is likely to put considerable strain on the AP.
Gape is more likely to have been a factor in the evolution of the eutriconodontan jaw morphology rather than the cladotherian jaw. For instance, modern taxa like carnivorans that require a wide gape for large food items tend to have a small AP that is positioned close to the jaw joint, and the jaw joint tends to be depressed (Herring andHerring 1974, Grossnickle andPolly 2013). Similarly, eutriconodontans have a depressed jaw joint, and the MP and SM insert close to the jaw joint. Eutriconodontans are some of the largest Mesozoic mammals and at least one genus, Repenomamus, has been shown to be carnivorous based on fossilized gut contents (Hu et al. 2005B). This convergent musculoskeletal configuration of carnivorans and eutriconodontans likely results in minimal muscle stretch during wide gape and/or maintains moment arm lengths for the muscles during pitch rotation (unlike taxa with a posteriorly extended AP), allowing a strong bite force to be maintained even during wide gape.
Jaws, middle ears, and macroevolution. The evolutionary transition that resulted in the origin of the definitive mammalian middle ear (DMME; i.e., a single dentary squamosal jaw joint and middle ear elements that are detached from the jaw) is somewhat paradoxical. That is, bones expected to receive considerable compression and tension at the jaw joint during mastication (i.e., the quadrate and articular bones) transitioned to delicate sound transmitting ossicles (i.e., the incus and malleus, respectively) in the middle ear. It has been suggested that in taxa with bones serving dual roles as jaw and ear components, musculoskeletal configurations must have minimized forces at the jaw joint while preserving strong bite forces (e.g., Crompton and Hylander 1986). Although a diversity of jaw morphologies satisfies these requirements (Reed et al. 2016), it is expected that some morphologies and functions would not be plausible. For example, an elevated jaw articulation relative to the molar row can result in greater reaction forces at the jaw joint (especially when muscle forces are anteriorly directed), meaning that a substantially elevated jaw joint is unlikely in taxa with dual functioning ear and jaw components (Reed et al. 2016). Thus, it has been hypothesized that constraints on musculoskeletal configurations of the jaw diminished when ear components were relieved of jaw joint functions (i.e., the dentary-squamosal joint became the sole jaw joint), resulting in a diversification of morphologies in early mammal groups (Crompton and Parker 1978). This hypothesis is supported by the observed morphological and taxonomic diversifications in Jurassic mammaliaforms ( Fig. 1; Luo 2007, Grossnickle and Polly 2013, Close et al. 2015, in addition to the variety of jaw morphologies documented in this study . Although pre-mammaliaform cynodonts are not included in this study, it is worth considering their jaw and ear morphologies when examining macroevolutionary patterns of stem mammaliaforms and early crown mammals. The cynodont AP tends to be anteriorly positioned relative to that of crown mammals (e.g., Crompton and Hylander 1986). This is similar to stem mammaliaforms, which also have an anteriorly positioned AP. However, unlike stem mammaliaforms, the AP is ventrally deeper (due to a ventrally extended AP and/or a deeper mandibular body) on average. An issue with an anteriorly positioned AP is that the attached adductor muscles may impede gape, since muscle stretch will be greater (if all other variables are kept constant) when the muscles are further from the axis of rotation. By extending the angular process ventrally, cynodonts lengthen the jaw adductor muscles and likely lessen the forces lost to stretched muscles during gape. In addition, cynodonts appear to have a shortened tooth row relative to the length of the jaw (e.g., Crompton and Hylander 1986), which means the typical bite point (of molars) is expected to be more anteriorly positioned relative to that of mammaliaforms. The combination of the anterior AP and anterior bite point may have helped lessen the reactionary forces at the jaw joint during pitch rotation (Crompton andHylander 1986, Reed et al. 2016).
In stem mammaliaforms in which the ear elements no longer perform a load-bearing function at the jaw joint, evolutionary constraints on the musculoskeletal configurations of the jaw may have been reduced. As noted above, this may have allowed for jaw joints that were significantly elevated. Further, it may have permitted jaw configurations with greater reactionary forces on the jaw joint because the delicate ear elements were no longer attached. Thus, the bite point and AP could move posteriorly, likely raising reactionary forces but benefitting the taxa in additional ways (e.g., allowing for a longer tooth row). See Crompton and Hylander (1986) and Reed et al. (2016) for additional considerations.
Early crown mammals such as eutriconodontans and spalacotherioids possess a bony connection between the middle ear and jaw via an ossified Meckel's cartilage ( Fig. 1; Wang et al. 2001, Meng et al. 2003, Luo et al. 2007A, Ji et al. 2009, Luo 2011, but this connection does not appear to be present in early cladotherians. It is worth noting that mandibles of early cladotherians often possess a Meckel's groove (e.g., Davis 2012, Close et al. 2016, indicating that a cartilaginous ear-jaw connection may be maintained in adults. However, the lack of fossil evidence for a strong ear-jaw attachment suggests that any connection that remained was not rigid. Thus, the possible loss of a strong attachment between the jaw and middle ear in cladotherians may have permitted further diversification of jaw morphologies and functions, although this is speculative due to the limited fossil evidence. Chewing cycles of stem mammaliaforms and early mammals undoubtedly involved significant pitch rotation to produce orthal occlusion. Since middle ears remain attached to the jaws of many of these groups (including spalacotherioids and eutriconodontans), it suggests that the presence of attached middle ears did not hinder pitch rotation. However, this may not be the case for yaw rotation. During yaw rotation, the condylar processes must protract and retract relative to the glenoid fossa (Fig. 2). Rather than simply rotating the middle ear elements, as occurs during pitch, protraction during yaw might create tension in the ear elements and the retraction during yaw might put stress on the ear elements. Thus, I posit that significant yaw rotation is likely in jaws that maintain a solid attachment to the middle ear. However, additional studies and fossil discoveries are necessary to further examine this hypothesis. Cladotherian evolution. Mammaliaforms experienced an evolutionary radiation in the Jurassic that was marked by the appearance and diversification of numerous clades ( Fig. 1; Luo 2007, Grossnickle and Polly 2013, Close et al. 2015. Most of these groups persisted for tens of millions of years and achieved considerable ecomorphological diversity (e.g., Luo 2007, Grossnickle and Polly 2013, Meng et al. 2015, Chen and Wilson 2015. However, many mammaliaform and early crown mammalian clades went extinct or were greatly diminished during the Cretaceous Terrestrial Revolution (KTR) at ~125-80 million years ago (Ma) and K-Pg mass extinction event at 66 Ma, periods of considerable environmental perturbation and ecological change (e.g., Alvarez et al. 1980, Labandeira et al. 2002, Lloyd et al. 2008, Bond and Scott 2010, Tobin et al. 2012, Grossnickle and Polly 2013. Cladotherians survived the KTR and K-Pg extinction event and, led by therians, diversified after both events (Alroy 1996, Benson et al. 2013, Grossnickle and Polly 2013, Wilson 2014, Grossnickle and Newham 2016. Cladotheria includes "eupantotherians" (comprised primarily of Dryolestida, Amphitheriida, and "peramurans"), which were abundant in the Late Jurassic and regionally diverse in South America in the Cretaceous, and therians that began a diversification event in the Late Cretaceous (Grossnickle and Newham 2016) and now comprise all but five species of living mammals. Thus, cladotherians have been globally diverse for over 150 million years, and examining their early history is critical to understanding the origins of modern mammalian diversity.
The masticatory changes in early cladotherians that are described in this study may have resulted in greater occlusal precision, more efficient food processing, and greater dietary diversity. For instance, the appearance of a crushing function in the tribosphenic molars of therians may have allowed for improved mastication of plant matter. Thus, this consideration offers a possible explanation for the differential survival of cladotherians during periods of ecological perturbations and elevated extinction rates, such as the KTR and K-Pg extinction event. This conclusion is supported by evidence suggesting that early mammals with generalist diets were less prone to extinction than dietary specialists (Simpson 1944, Smits 2015, Grossnickle and Newham 2016. However, since diet is not directly tested in this study, further examination is needed to test this hypothesis. Australosphenidans. Australosphenidans are not the focus of this study, due in part to a scarcity of fossils, questions concerning their phylogenetic affinities (see the Mammal Groups section of the Extended Methods), and the dominance of modern mammalian faunas by therians. However, morphological results for this group do lend some support to the hypothesized link between molar morphology, jaw morphology, and jaw yaw. Some australosphenidans possess jaw morphologies that are similar to those of therians (Figs. 3 and 4, Supplementary Table S3). Further, australosphenidans are believed to have convergently evolved tribosphenic molar morphologies and detachment of the middle ear from the jaw , Ramírez-Chaves al 2016. This suggests convergent adaptations in these clades, and it supports the predicted functional link between jaw, molar, and ear morphologies. It is worth noting that the only additional extant mammalian group to survive the KTR and K-Pg extinction events is Australosphenida (including monotremes), providing support to the hypothesis that the evolutionary changes to the molars, jaws, ears, and chewing cycles may have assisted in mammalian survival during these events.
Docodonts. Docodonts are a diverse group of mammaliaforms that are included within the 'stem mammaliaforms' group of this study. They are an additional group (besides therians and australosphenidans) with tribosphenic-like molars that suggest mediolateral movement during occlusion (e.g., Pfretzschner et al. 2005). They often have a distinct AP (e.g., Rougier et al. 2015), although it is not as posteriorly positioned as that of early therians.
In some respects, the results for docodonts conflict with the conclusions of this study. For instance, the molar morphologies suggest mediolateral movement during occlusion (Pfretzschner et al. 2005) like that seen in early cladotherians, but the jaw joints are not as elevated and the AP is not as posteriorly positioned as those of early cladotherians. Thus, the musculoskeletal configurations in docodonts are not expected to be as ideal for yaw, at least in comparison to early cladotherians. However, there are a couple considerations that may help explain these conflicting results. First, docodont jaws maintain attached middle ear elements, which may inhibit the evolution of potential jaw morphologies. For instance, the middle ear elements may prohibit posterior migration of the angular process, and if taxa need to maintain small resultant forces at the jaw joint region because of attached ear elements then a depressed jaw joint may be necessary (Crompton and Hylander 1986). Thus, if docodonts are experiencing yaw rotation during occlusion, ideal musculoskeletal jaw configurations (like those in early cladotherians) may simply not be possible due to additional factors. Second, there is a possibility that the transverse molar movements in docodonts are produced by mediolateral translation (along the z-axis) rather than by yaw. For instance, the molar morphologies of docodonts are complex in shape and may direct molar movement during occlusion, meaning that jaw muscle control of yaw may not be necessary. Prominent X and Y cusps (of the medial portion of the upper molars) appear to truncate medial movement of lower molars during occlusion (Pfretzschner et al. 2005). Also, the medial region of the upper molars (analogous to the trigon of tribosphenic molars) is often directed posteromedially (e.g. Luo and Martin 2007). This is in contrast to early cladotherians and tribosphenic taxa in which the trigon is often directed medially or anteromedially. This suggests that a relatively unique type of occlusal movement may be occurring in docodonts, such as posteromedial movement of lower molars (Gingerich 1973). For instance, yaw rotation may be occurring around a vertical axis of rotation that is at or near the working side jaw joint (rather than the balancing side jaw joint as modeled in Figure 6), and yaw around an axis in this position was not tested in this study. Thus, the docodont musculoskeletal jaw configuration may be more ideal for yaw around an axis in this position, although this possibility will need to be explored in future studies.
Muscle origins -sensitivity test. The focus of the jaw model analyses is on evolutionary changes to muscle insertion locations and jaw joint elevations (Figs. 2-6). However, an additional variable that is expected have a considerable effect on muscle vectors and moments is the locations of muscle origins. Muscle origin locations are based primarily on measurements of skulls of modern analogs and well-preserved mammaliaform skulls (see Extended Methods and Supplementary Table  S2), and these locations are kept constant in the jaw models of Figure 6 and Supplementary Figure  S3.
It is possible that concurrent evolutionary changes to muscle insertion and muscle origin locations would negate the changes in moment values that are reported here. For instance, the SM muscle vector angles in Figure 5 are shown as being closer to horizontal in dryolestoids (in comparison to eutriconodontans and spalacotherioids) due to the posterior extension of the AP. However, if the SM origin evolved posteriorly with the concurrent evolution of a more posterior muscle insertion, this would negate the inferred change in muscle vector angle (Figs. 5 and 6). This could alter the moment value trends reported in Figure 6.
To examine the effect of potential variation in muscle origin locations on moment calculations, a sensitivity test was performed. This involved altering muscle origin locations for the three muscles of this study (i.e., the working-side SM, working-side MP, and balancing-side TM) and repeating moment value calculations. The SM and MP muscle origins were altered along the xaxis and y-axis, and the TM was altered along the y-axis and z-axis ( Supplementary Fig. S5). For the SM and MP, muscle origins were shifted two standard deviations in both directions away from the original origin location (Fig. 6, Supplementary Fig. S3). The standard deviations are those reported in Supplementary Table S2 and are therefore based on variation seen in modern analogs and fossil skulls. Since the locations were moved two standard deviations, this means that approximately 95% of the variation seen in these taxa is captured in the analyses. For the TM, muscle origin location was not based on measurements of skulls (see Extended Methods), so the muscle vector was moved approximately the same amount in each direction as was done for the SM (Supplementary Fig. S5). Muscle force assignments in all calculations are based on Didelphis muscle masses (see Methods and Extended Methods). Moment calculations were repeatedly calculated using the same methodology as described previously (see the Methods and Extended Methods). However, roll rotation was excluded because results for roll remained low in all previous calculations (Fig. 6b) and is not expected to change significantly with different muscle origin locations, especially since the TM is not included in roll calculations (see main text). Results for the muscle origin sensitivity test are provided in Supplementary Figure S5, with model images for spalacotherioids shown with muscle vectors based on the varying muscle origins (i.e., multiple blue arrows from the same muscle insertion location).  Figure S5. Sensitivity test in which muscle origin locations have been altered to examine variation associated with potential evolutionary changes to these locations. The model images (left) are spalacotherioids, although moment analyses (right) were performed for the four mammal groups: eutriconodontans (blue, 1), spalacotherioids (cyan, 2), dryolestoids (red, 3); and therians and close kin (orange, 4). For the superficial masseter and medial pterygoid, origin locations were moved two standard deviations in both directions from the original muscle origin, and these new vectors are denoted with additional arrows in the model images. (The standard deviations are based on results of measurements in Supplementary Table S2). Only pitch and yaw were examined (see text), and the muscle force assignments in all analyses are based on muscle masses of Didelphis. As in analyses shown in Figure 6, there are two sets of results for yaw that are based on two potential locations for the axis of rotation. The greater values always correspond to the vertical axis that is just medial to the balancing side jaw joint. See Figure 6 for locations of these axes and additional information.
In the original moment calculations, early cladotherians (i.e., dryolestoids and therians) show relatively small moments for pitch and relatively large moments for yaw (Fig. 6). This pattern remains in all analyses of the sensitivity test in which muscle origin locations are altered (Fig. S5). In addition, for some calculations (e.g., mediolateral variation of the MP) the results for pitch and yaw remain nearly unchanged when the origin location is altered. Results for pitch rotation are especially consistent for all calculations of the sensitivity test ( Supplementary Fig. S5). This suggests that the pattern shown for pitch in the original analysis ( Fig.  6) is unlikely to have been altered by evolutionary changes to the muscle origin locations among mammal groups. This is likely due to pitch results being based largely on jaw joint elevation and coronoid process elevation, since the distance between these points roughly represents the moment arm length for the large TM during pitch. Thus, changes to the angle of the TM force vector are unlikely to alter pitch results significantly unless the TM vector changes considerably more than the amount that is tested here. This provides additional evidence for the conclusion that musculoskeletal jaw configurations of early cladotherians were less ideal for pitch rotation than those of eutriconodontans and spalacotherioids.
Compared to pitch, results for yaw show greater variation when muscle origin locations are altered ( Supplementary Fig. S5). However, this variation is still minimal for many of the calculations. The greatest variation is for anteroposterior changes to the MP origin and mediolateral changes to the TM origin. Thus, moment results for yaw in Figure 6 should be considered with some caution, although it is worth reiterating that the general trends for the four mammal groups are not altered when the muscle origins are varied. Considerable evolutionary change in muscle origin would have had to occur among early mammal groups to disrupt the original results pattern from Figure 6.
The sensitivity analysis tests the variation in muscle origin locations of modern taxa, but it cannot be ruled out that early mammal groups had muscle origin locations that are outside of the range of variation seen in modern analogs (and fossil mammals) that were measured for this study (Supplementary Table S2). However, there is little or no evidence to suggest that this is the case for these taxa, especially since the skull material of early mammals in this study (e.g., Yanoconodon, Maotherium, Juramaia, Vincelestes, Eomaia, Sinodelphys) do not indicate significant morphological divergence beyond the variation seen in modern analogs.