The structure of first-cousin marriages in Brazil

This paper deals with the frequency and structure of first-cousin marriages, by far the most important and frequent type of consanguineous mating in human populations. Based on the analysis of large amounts of data from the world literature and from large Brazilian samples recently collected, we suggest some explanations for the asymmetry of sexes among the parental sibs of first-cousin marriages. We suggest also a simple manner to correct the method that uses population surnames to assess the different Wright fixation indexes FIS, FST and FIT taking into account not only alternative methods of surname transmission, but also the asymmetries that are almost always observed in the distribution of sexes among the parental sibs of first-cousins.


Scientific RepoRtS
| (2020) 10:15573 | https://doi.org/10.1038/s41598-020-72366-z www.nature.com/scientificreports/ to a selection process much stronger than the corresponding one to autosomes, because they are in hemizygous state among male individuals, and therefore with a much smaller frequency than their autosomal counterparts 5 . Crow and Mange 6 introduced a method that uses the frequencies of population surnames and marriages of persons with the same surname to estimate the different fixation indexes F IT , F IS and F ST . The original method can be applied only to literate populations with a fixed manner of surname transmission and with no asymmetry in types A, B, C, and D. The method was revisited by several authors, including Cabello and Krieger 7 , who made room in it for the asymmetry of patrilineal subtype D.
The first (and main) objective of the present paper is to provide explanation on the frequency asymmetry of types A, B, C, and D, a topic still not solved in a completely satisfactory manner in the literature.
The second objective of this work is to adapt the isonomy method taking into account not only distortions in the frequencies of types A, B, C, and D but also the variable mode of surname transmission in poorly acculturated populations without fixed rules.

Results
Asymmetry of first-cousin subtypes A, B, C, and D. With the aim of obtaining a global vision of the distribution of subtypes A, B, C, and D, we reduced drastically the number of population samples shown in Tables IS to IVS of supplementary material, agglomerating all regional groups shown to be homogeneous after the application of standard chi-squared heterogeneity tests (results detailed on supplementary Table VS). The various samples from four countries resisted to this process of coalescence (Israel; Japan; Jordan; and Spain). The 18 groups from Japan could be coalesced in only two subpopulations, rural (14 groups) and urban (four groups). In the case of Spain, two groups (B-C) out of the four could be considered as one population. With the application of this procedure, the total of 126 different individual samples listed in Tables IS to IVS (supplementary material) reduced to just 28, as shown in Table 1, which summarizes the results of descriptive analysis from these new results.
All data from Table 1 Table 1.
The influence of ruralization and urbanization and of other factors such as different mobility rates among males and females on the frequency of first-cousin marriage subtypes A, B, C, and D has been discussed in the literature 1,8 , however without any results that can be generalized, probably due to temporal, geographical and cultural differences in relation to the concepts of what is rural or urban in different populations. For example, a community considered as rural in present Belgium probably corresponds to urban life in a large city fairly developed in a poorly developed and industrialized country, in which an area categorized as rural in present times surely corresponds to a rural area in fully developed and industrialized countries many decades ago. Taking this reasoning into account, the agglomeration of rural or urban areas from geo-politically distinct regions makes no sense at all. The Japanese reports (as well as other researches developed in Japan by American geneticists), on the other hand, have an astonishing amount of reliable information as to the rural or urban character of the samples, but their data were used generally with the main aim of comparing the viability, morbidity, and mortality of the offspring in subtypes A, B, C, and D. The heterogeneity test in 14 samples collected in rural areas (Hirado A, Hoshino, Ina, Kurogi, Kyushu, Mishima, Nansei, Nanto, Okayama, Onodani, Oshima, Shizuoka A, Shizuoka B, Shizuoka C) showed that they could be coalesced, the same occurring with four urban localities (Fukuoka, Hirado B, Hiroshima A, Hiroshima B) (Table VS). Testing the hypothesis that the distribution of subtypes A, B, C, and D is aleatory (expected frequency of each class = 1/4) in the urban and rural samples resulting from four and 14 subsamples respectively, we found that the observed frequencies are significantly different from the expected by chance (p-value < 0.0001). By comparing both coalesced samples through a chi-squared test on a contingency table, a very significant chi-squared value (p-value < 0.0001) was obtained; the analysis of the adjusted residuals (Pearson/Haberman test) showed that there is a significant increase of subtype A and a significant decrease of the other subtypes (B, C, and D) in the urban communities and the opposite in rural ones.  If the observed population frequency of couples with the same surname is P, the approximate value of F IT can be obtained from F IT = P/4. If p i and q i are the proportions of men and women with a given surname, the population expected frequency of couples with this surname will be, on the hypothesis of random mating, p i q i , and its contribution to the inbreeding coefficient will be p i q i /4; and the contribution of all surnames to the random inbreeding coefficient will be F ST = ∑p i q i /4. From F IT = F ST + F IS − F IS F ST we obtain the estimate of the non-random inbreeding coefficient F IS = (F IT − F ST )/(1 − F ST ) = (P − ∑p i q i )/(4 − ∑p i q i ) obtained from the analysis of surname population frequencies. Under the hypothesis that surnames occur with same frequencies among males and females (p i = q i ), it comes out that p i q i = p i 2 ; and the above formula can therefore be simplified to F IS = (P − ∑p i 2 )/(4 − ∑p i 2 ). The authors of the method called attention to the problems of its application in poorly acculturated populations, without fixed rules of surname transmission. In the lines below we adapt the method to the population of the rural municipality of Brejo dos Santos, in the state of Paraiba in the Brazilian NE region, taking into account (1) the non-random distribution of sexes among the pairs of parental sibs of the consanguineous couples; and (2) the irregular mode of surname transmission in the population, that can take place: (1) strictly through the father; (2) strictly through the mother; (3) with an uncertain mode (case that takes place when both parents have the same surname that is transmitted to their offspring); and (4) when the offspring has surnames different from those of both parents.  The studied sample was composed by 538 individuals with twice-checked information; in 243 of them the offspring surname was received only from the father, in 112 it was transmitted only by the mother; in 95 cases the surname didn't originate from either parent; and in 88 cases the transmission mode could not be ascertained (parental pair with the same surname). The data from 90 women that had the marriage surname of their spouses, without reliable information as to their family surname, were previously excluded from the sample. Assuming that in the 88 cases of couples with the same surname the transmission took place patri-or matrilinearly after the proportions 243 : 112 respectively, we obtain the corrected proportions of transmission occurring patri-and matrilinearly, respectively, 243(1 + 88/355)/538 = 0.5636 and 112(1 + 88/355)/538 = 0.2598.
Surname distribution in the population and in the couples. In the calculations shown below, we considered 233 marriages that did not contain the 90 women without reliable information as to their family surname. Some men and some women from this population married twice or more times; this explains the fact that the totals of men and women that married are smaller than the total number of marriages (233) Table VIS and applying to them, without any correction, the formulas proposed by Crow and Mange 6 , we obtain As shown in Table IS, in the municipality of Brejo dos Santos the frequencies of subtypes A, B, C and D of first cousin marriages were respectively 0.2169, 0.2530, 0.2048, and 0.3253. Taking into account that in this locality the frequencies with which the surname is patrilinearly and matrilinearly transmitted are 0.5636 and 0.2598 respectively, a gross correction of the factor 0.25 that occurs in the formulas of Crow and Mange can be obtained from 0.2169 × 0.2598 + 0.3253 × 0.5636 = 0.2397 ~ 0.24. The estimates of the three fixation indexes F IT , F ST , and F IS then become

Rates of intentional (inbreeding strictly speaking) and random consanguineous matings in NE Brazil.
The importance of the surname method is that it enables the indirect estimation of the factors responsible for the consanguineous marriages. Since all figures obtained for the estimates F IT , F ST , and F IS are positive, we can obtain the approximate proportion of first cousin marriages due to intentional factors (inbreeding), which is 0.47 or about 50%. Table 2 (adapted from Weller et al. 9 ) shows the estimates of population size, frequency of consanguineous marriages and the average value of the inbreeding coefficient F IT of a set of rural localities in the state of Paraiba in NE Brazil. The table originally published contains data from 39 localities, not including the data from Gurjão and Lagoa Seca (shown in Table IS) Table 2 among the localities with a population size of less than 10,000. Since the surname method indicated a frequency of about 50% of consanguineous marriages taking place due to intentional factors (strict sense inbreeding), it is expected that an inverse correlation (or at least a tendency) exists between the population sizes and the frequency of consanguineous mating occurring in them. In order to verify this point, the data from Table 2 (frcm and n) were submitted to a simple model of linear regression analysis, after applying to the dependent variable (frcm) the transformation arc-sin(x) 1/2 with the aim of normalizing its distribution. The results of this analysis revealed the existence of a tendency, with the values of frcm being roughly estimated by the formula frcm = 0.2688-0.0056n; r 2 = 0.6675; F(1;3) = 6.021 (p-value = 0.09), as shown in the graph of Fig. 4.

Discussion and conclusions
Consanguineous marriages still take place at significant proportions worldwide (at least 10% of all human unions), being a frequent practice not only among Muslim populations (in the Middle East, North Africa, and West Asia) and in some regions of India but also in many areas (especially rural, isolated, and underdeveloped ones) throughout the whole world [10][11][12][13][14] . By far the most common type of consanguineous marriage is the firstcousin one, which accounts generally for proportions as high as 30% to 50% or even more of all mating taking place between relatives in a given population 11,13,14 . Although there is currently no general consensus about the reasons that favor the occurrence of consanguineous mating in such high rates, and probably because of its very complicated multifaceted nature, the issue has been subjected to a large number of different well-planned studies, an important list of which is found in www.consa ng.net14 . These studies were able to suggest explaining factors related to social, religious, cultural, political, and economical status, as well as the smaller size of isolated rural populations 13,15 . For example, in Arab populations consanguineous marriages are favoured because unions between non-relatives are thought to be less stable, since in marital disputes the husband's family would side with the consanguineous wife 11,[16][17][18] .
An important issue is why particular types of first cousin unions are strongly favored in some major populations, e.g. patrilineal parallel cousin marriage (type D) in Muslim Arab communities, but prohibited in others 19 , a result also observed by us. The scatter diagram of Fig. 2 draws immediate attention to this fact. In more than Table 2. Estimated population sizes (n), observed frequencies of consanguineous mating (frcm), and average fixation index (F IT ) in localities from NE Brazil (state of Paraíba). All data were adopted and condensed from Weller et al. 9    www.nature.com/scientificreports/ half of the samples we selected (numbered as 12, 13, 17, 18, and 22), corresponding to Near and Middle East populations with more rigid and strict traditional patriarchal rules, the subtype D is larger than A. This tendency might be explained by the simple fact that in strongly patriarchally rooted populations or communities the father of the bride or bridegroom, when arranging a consanguineous marriage within his family, will preferentially choose the partner cousin among the offspring of his brothers in flagrant detriment to the offspring of his sisters. This tendency is strikingly downplayed in the majority of other population groups, as the graph of Fig. 2 clearly shows. When Southern Indian Hindus are considered, for example, the situation is reversed, because parallel cousin unions (types A and D) are forbidden in Hindu society 20 . This simple fact probably explains per se the lower frequency (of about 40%) of types A and D in the Indian samples (123 out of 290 first-cousin marriages) we included in the present study.
The separate analysis of the four Brazilian samples (Table 1 and Fig. 3) indicates an overwhelmingly predominant increase of the subtype D in the old NE sample, above the overall average values of D for both geographical regions NE and SE and far below from the global averages value of A for the same regions. This predominance is fairly suggestive of a more stringent application of patriarchal rules, a common issue in old rural areas of NE Brazil; just the opposite is observed in relation to the average values of A for both regions. The whole situation is completely inverted in present SE Brazil (recent sample from the Laboratory of Human Genetics at USP), with increase of subtype A and corresponding decrease of subtype D. The global values corresponding to the population samples collected recently in the state of Paraíba (NE Brazil) and the old data from S/SE Brazil are very similar, in a position exactly intermediary to the overall global average values of A and D. This finding seems to be strongly correlated to the geopolitical transformations of NE and SE that took place in Brazil along the time.
In order to verify these facts globally, we performed a correlation analysis comparing all A and D percentage values from Table 1 with the corresponding 2017 HDI (human development index taken from the WWW site U. N. Development Programme) of all countries and regions listed there. We observed an almost significant (p-value = 0.0573) positive correlation (Spearman's rank correlation coefficient r = 0.3635) between HDI and the frequency of A subtype and a significant (p-value = 0.0122) negative correlation (Spearman's rank correlation coefficient r = − 0.4669) between HDI and the frequency of D subtype (Fig. 1S of supplementary material).
All these results are corroborated, at least partly, by the behavior of A and (B + C + D) subtypes in rural and urban Japanese communities.
The simple, rough method we proposed to correct the method of Crow and Mange 6 for distribution asymmetries of the four subtypes of first-cousin marriages and the absence of fixed population rules of surname transmission provided an estimate of 0.2397 (estimated from the observed frequencies of subtypes A, B, C, D, and the corrected rates of paternal and maternal surname transmission in Brejo dos Santos) that is not significantly different from the factor 1/4 = 0.25 used in the original method. This was just coincidental and probably applicable just to that population aggregate. For other population samples with any asymmetry in the distribution of subtypes A, B, C, D and without very fixed rules of surname transmission, the method of Crow and Mange should always be corrected as proposed, despite the fact that our calculations did not take into account the complication of adopted surnames, a phenomenon that is fairly common in some countries.
The estimated frequency of F IT in Brejo dos Santos, obtained from the genealogical analysis of the sampled population, was 0.00504 9 , a value significantly smaller than the one estimated by the surnames method. Some discrepancy is expected to occur because our estimate used only information from first-cousin marriages. As the estimated frequency of all types of consanguineous marriages in Brejo dos Santos is 0.1948, if all consanguineous marriages in the locality took place only among first cousins (coefficient of inbreeding 1/16 = 0.0625), the value of the total fixation index would be at least F IT = 0.1948 . 0.0625 = 0.0122; then at least 1 -0.00504/0.0122 = 0.5869 or about 60% of the consanguineous marriages occurring in the region takes place between relatives with a degree of biological relationship much smaller than the one prevailing between first cousins. A better, more plausible explanation for this discrepancy is the fact that the surname method estimates the degree of relationship even when a fraction of the interviewed individuals doesn't know that they are actually married to relatives.
The real importance of the method, however, is that it enabled the indirect estimation of the relative frequencies of consanguineous marriages due to strictly inbreeding causes (intentional or strictu sensu inbreeding) and random inbreeding (caused by small population effective numbers or other aleatory phenomena). For the studied population group the estimated rate of inbreeding that is intentional was about 50%. Should this be true, there should exist an association (or at least some tendency) between the population sizes and their corresponding frequencies of consanguineous mating. Using data provided by our group in a paper published recently (based on practically the same set of populations studied in the state of Paraíba in NE Brazil), and using a simple model of regression analysis, we showed that the values of the frequencies of consanguineous mating (frcm) in NE Brazil can be roughly estimated by the corresponding sizes of the populations (n) in which they occur by the formula frcm = 0.2688-0.0056n; r 2 = 0.6675; F(1;3) = 6.021 (p-value = 0.09). The statistical non-significance of the probability test value indicates just a tendency, that could be explained by the paucity of available coalesced data pairs that validated the regression analysis (just five pairs, as shown by the graph on Fig. 4).

Subjects and methods
The present work was based on large data sets personally collected by us (1) on a routine basis from the identification archives of the genetic counseling service of the Human Genetics Laboratory (Department of Genetics, Institute of Biosciences, University of São Paulo, São Paulo, SE Brazil) from 1979 to 2010, totaling anonymoulsy collected information on 989 marriages between first cousins; and (2) in 35 municipalities from the state of Paraíba in NE Brazil, totaling 909 marriages between first cousins and collected recently 21 . We performed also a comprehensive review of reliable published data in a significant number of reports from the literature and comprehensively reanalyzed all this material. All these data have been double checked thoroughly.
Scientific RepoRtS | (2020) 10:15573 | https://doi.org/10.1038/s41598-020-72366-z www.nature.com/scientificreports/ All data were analyzed by standard statistical methods used in population genetics and described in basic college textbooks such as Weir 22 and Zar 23 . For this analysis we prepared computer programs/scripts in R (Copyright 2018 The R Foundation for Statistical Computing) and Liberty Basic (Copyright 1992-2010 Shoptalk Systems) languages.
All data we used in the present paper, together with their descriptive analysis, are detailed in the supplementary tables IS, IIS, IIIS, and IVS. Tables IS and IIS contain the original material of the present paper. Ethics approval and consent to participate. The data sampling protocol and the consent procedure were reviewed and approved by the State University of Paraiba Ethics committee (CAAE: 67,426,017.6.0000.5187) and State University of São Paulo Ethics committee. It was in accordance with the principles of Resolution 466/12 of the Brazilian National Health Council. All participants or their guardians received verbal and written explanations regarding the study procedures, and when they agreed, they signed the informed consent form and institutional declaration of approval.