Absolute abundance of southern bluefin tuna estimated by close-kin mark-recapture

Southern bluefin tuna is a highly valuable, severely depleted species, whose abundance and productivity have been difficult to assess with conventional fishery data. Here we use large-scale genotyping to look for parent–offspring pairs among 14,000 tissue samples of juvenile and adult tuna collected from the fisheries, finding 45 pairs in total. Using a modified mark-recapture framework where ‘recaptures' are kin rather than individuals, we can estimate adult abundance and other demographic parameters such as survival, without needing to use contentious fishery catch or effort data. Our abundance estimates are substantially higher and more precise than previously thought, indicating a somewhat less-depleted and more productive stock. More broadly, this technique of ‘close-kin mark-recapture' has widespread utility in fisheries and wildlife conservation. It estimates a key parameter for management—the absolute abundance of adults—while avoiding the expense of independent surveys or tag-release programmes, and the interpretational problems of fishery catch rates.

EXP row shows expected frequencies if there were no null alleles; OBS row shows observed. The discrepancy is typically small but signicant, and can be accounted for by estimating a null-allele frequency.  The question to consider is: can any unmeasured covariate aect both the probability that an adult can be sampled, and the expected number of its ospring (i.e. the chance of it being tagged by virtue of an ospring being caught in the juvenile sample)? For example, if the more fecund sh in the right-hand panel of Figure 1 are also more likely to be caught, then applying the naive 2/N-based estimate would give a negatively-biased estimate of adult abundance; the naive version fails to fully condition on the covariate data, because the mere fact that an adult occurs in the sample now increases its expected number of ospring in the juvenile sample. Similar bias arises in conventional mark-recapture when some individuals are (or become)`trap-happy' 2 . Note that the phenomenon has to occur twice, both at`marking' and at`recapture', to cause any bias. Since CKMR involves dierent animals on each occasion, the risk is somewhat mitigated compared to individual mark-recapture, but cases such as stock structure could certainly cause bias unless correctly allowed for when probabilities are computed In the case of SBT, we assume that adult length aects residence time, which proportionally aects both selectivity and annual fecundity (the latter being also aected by daily fecundity, which is known for females). Age is assumed not to be directly related to residence time (i.e. given length, age is irrelevant). However, it is still necessary to take account of age in equation (2), because the historical length of an adult (in the year when a juvenile was born) is aected not just by its current length, but also by whether it is a fast-or slow-growing sh overall, which can be assessed by taking into account its age. Although length (and sex) is measured for all sampled adults, age is not, so for adults where age is not measured, the posterior distribution of age given length and sex must be used to integrate equation (2).

Supplementary Note 2 Precision and design
The precision of a CKMR abundance estimate (or, say, the mean of a time-series of abundance estimates) will depend somewhat on which other parameters need to be estimated and what data is available, but primarily on the number of POPs (P ). For a large population, this will approximately follow a Poisson distribution so that the lower bound on CV N adult is 1/ E [P ]; thus, a 15% CV on abundance requires at least 50 expected POPs.
In a scaled-up version of Figure 1, where E [P ] = 2m A m J /N adult , this entails about 10 N adult total samples if split optimally (equally) between adults and juveniles. For large N , this means that sampling will be sparse (i.e. most samples will not be part of a kin-pair, and there will be a negligible proportion of kin-triads), so that pairwise comparisons are almost statistically independent (Supplementary Note 3).
These simple formulae can be useful for study design, since E [P ] can be simulated easily using guesstimates of demographic parameters without needing to t any statistical models. Sample collection is often cheaper than genotyping, so a wise strategy is to collect more samples than thought necessary, and simply to stop genotyping when an adequately precise estimate has been obtained.
In longer-term studies, where the aim is time-series monitoring rather than just a one-o estimate, the annual sample size requirement would be lower, because the expected number of POPs is roughly quadratic in the number of years (since all pairwise comparisons are used). Eventually an equilibrium will be reached because, in pairs separated by many years, the chance of a parent having died before being sampled renders the information content minimal; but in our SBT study, simulations indicate that the quadratic benet will continue to apply for several more years.

Supplementary Note 3 Approximate independence for variance estimation
This section examines when it is reasonable to treat the set of pairwise comparisons as statistically independent for purposes of variance estimation. Formally, this means checking that the joint expected Fisher information of all pairs is approximately equal to the sum of expected Fisher information from individual pairs. Equality will never apply exactly, because each juvenile cannot have more than one mother nor one father (so, once that parent is found, then comparisons of that juvenile against other adults of the same sex are uninformative), and because some juveniles may be siblings (so, if the adult happens to be the parent of one of the juveniles, it will also be be the parent of the other; the two comparisons are clearly not independent). Nevertheless, approximate equality will hold if the population is large enough relative to the sample sizes, as shown below. A similar situation applies in conventional single-sample mark-recapture: each animal in the second sample of m 2 can match at most one animal in the rst sample of m 1 , so sampling is without replacement and it is not strictly correct to say that there are m 1 m 2 independent comparisons. However, for large populations sampled sparsely, the true hypergeometric distribution of the number of recaptures is well-approximated by a Binomial distribution, which corresponds to an assumption of independence.
For CKMR, it is easiest to examine independence for the cartoon scenario in Figure 1 but restricted to female adults only, so that the a priori chance of a MOP (Mother-Ospring Pair) is 1/F where F is the number of adults (implicitly all female). For brevity, assume equal sample size m for juveniles and adults. Clearly, if each juvenile and each adult was involved in just one comparison, all comparisons would be independent; however, each animal is involved in many comparisons, so independence is not guaranteed. It suces to consider two scenarios, in each case computing the true and approximate joint Fisher information: 1. The same juvenile sample j * is compared to all m adults, versus comparing a dierent juvenile to each adult.
2. The same adult sample a * is compared to all m juveniles, versus comparing a dierent adult to each juvenile.
Let C aj be the outcome of a pairwise comparison between adult a and juvenile j: 1 if MOP, or 0 otherwise.
Let C J a C aJ be the total number of MOPs found comparing one particular juvenile J to all adults; clearly C J ∈ {0, 1} since animals have only one mother. We have The log-likelihood Λ J for the Bernoulli-distributed observation c J is a function of the unknown parameter F , given by The independence version consists of m IID comparisons each with probability-of-success F −1 , and it can be shown that the Fisher information is −mF −3 + O mF −4 , with the same leading term as equation (S4). The second-order term does dier, and is larger in equation (S4) by a factor m, but will be negligible in ecientlydesigned surveys of large populations. To see this, recall from the Introduction that the required sample size m is something like 15 √ F (adjusted to female adults only). The constant of proportionality hidden by the O (.) notation turns out to be 3, so the ratio mF −1 between the rst and second terms might be around 45/ √ F . Thus, as long as F is, say, more than 100000, the correct second term in equation (S4) is much smaller than the rst, and the independence approximation (which uses a too-small approximation to the second term) will be adequate.
In case 2, provided that none of the juveniles are full siblings or maternally-linked half-siblings, then at most one of the comparisons can yield a MOP, and exactly the same sampling-with/without-replacement argument can be applied. If some of the juveniles share a mother, so that the m juveniles have only m * < m distinct mothers, then the Fisher information about F is only that of m * independent (actually, sampling-without-replacement) comparisons. To compute the expected information, we need to know what m * is likely to be. In a cartoon world, the chance that a second randomly-chosen juvenile will have the same mother as a rst juvenile is 1/F , and the chance that a third juvenile will have again the same mother is negligible, so m comparisons will yield on average about m * = m 1 − F −1 distinct mothers. Unless F is small, this source of non-independence will not matter, even in the cartoon. In more realistic settings, the expected value of m * would decrease due to reproductive variability among females, and to non-independence in juvenile samples (e.g. if siblings commonly school together at the age they are caught). The larger F is, the less those issues are likely to matter (because they only change the multiplier on F −1 , which is still much less than 1), but an a priori argument must be made on a case-by-case basis. A posteriori analysis of the SBT results rules out high levels of sibship among the juvenile sample. In the lower rows, there is clear separation between the POPs and the morass of unrelated pairs, which also lets us assess false-negatives arising from genotyping error. If false-negatives were common, then some of the true POPs in the lower left-hand corner would be shifted into the neighbouring column(s), which has not occurred. While the possibility of false-negatives cannot be entirely ruled out in the upper rows where no gap would be seen, detailed statistical analysis indicates an upper 95% CI of 1 false-negative (see nal subsection below). Table 3 shows the same process applied only to the juvenile samples, which cannot contain any true POPs; the leftmost column, which contained 45 pairs in Table 2, has just 1 pair in Supplementary Table 3, and that pair was scored at relatively few loci so its false-positive probability is higher.

For comparison, Supplementary
The observed and predicted column totals match well in both tables, and in particular with no excess in the number of pairs with one or two parentage-excluding loci, so the calculations appear sound and there is no reason to expect that the number of POPs has been substantially underestimated due to false-negatives, nor overestimated because of false-positives from unrelated pairs and/or non-POP close-kin.
Bias in POPs is thus not a concern, but in principle overdispersion could still arise if half-or full-sibs are common among the sampled juveniles, since comparisons of one adult against members of a sib-group are not statistically independent. There are too few loci to detect half-sibling pairs directly, but a substantial incidence of full-or half-sibs would lead to excess of observed over expected in the rst few column totals of Supplementary Table 3. adult-juvenile GGPs and HSPs must be less common than POPs because of the many years of extra adult survival required by the grandparent or parent respectively i.e., the same reason that just one parent-ospring pair was found among 11,000,000 adult-adult comparisons alone. Aside from scarcity, HSPs/GGPs are also intrinsically unlikely to give a false-positive POP in this study simply because of the number of loci used; on average, a GGP or HSP will coinherit a shared allele at half the loci (though this will vary between pairs), and in a typical 20-locus comparison in this study, only 0.002% of HSP/GGPs that coinherit at 10 loci would fail to exclude parentage at any of the remaining 10 by chance. A solitary adult-juvenile HSP or GGP does nevertheless seem the most likely explanation for the (C25,X2) entry; this has since been checked using an independent panel of SNP markers, conrming that the pair was clearly not a POP, and that the level of allele-sharing was consistent with GGP or HSP.
The next-closest pairing is of the (half )-uncle-niece type; these could be commoner than POPs on demographic grounds, but their degree of kinship is much weaker than for HSP/GGP, and the false-positive POP probability is negligible.
In practice, if there was an appreciable number of false-positives from non-POP close-kin, then there would also be much larger numbers of near-misses that would be apparent in the X1 and X2 columns of Supplementary to values considerably higher than observed; e.g., with 1000 half-sibling pairs and no full-siblings, the probability of seeing as few as the 6 observed X1 pairs is 0.015.
The impact of juvenile sibship on variance estimates (by reducing the eective number of independent comparisons) depends not just on the number of pairs, but also on how the groups are arranged; for example, 1000 juvenile half-sib pairs could come from anywhere between one group of about 45 juveniles sharing one common parent, or from 2000 juveniles in 1000 pairs. The latter would obviously be the worse scenario for non-independence, since about 1/8 of all juveniles would be involved, but even then it turns out that the overall CV computed assuming independence would still only be about 12% too low (not 12 percentage points). A further bound on the extent of sibship comes from the fact that no triads were encountered among the 45 POPs found; if the number of HS-groups was as high as 800, for example, then we would expect to see 4 triads, and the probability of encountering zero would be just 0.014.

Rigorous bounds on false-negatives
Genotyping error rates, insofar as they apply to (and matter in) parentage studies, are not easy to estimate in advance; certain types of error may be heritable in the sense that, if that error is made when scoring the parent, it then it is more likely to be made when scoring an ospring of that parent. However, once a parentage study has revealed enough clear POPs which may, as here, require large sample sizes then a post hoc analysis can provide information on likely error rates and the implications for accuracy of the POP count.
We distinguish between large-scale errors, which could aect many sh at once, medium-scale errors which could aect many loci for a single sh, and small-scale errors, which can happen independently to individual loci for individual sh. Large-scale errors could arise from chimeras and mass failures of PCR on a run plate, but our protocols were able to detect and therefore eliminate these problems. Medium-scale errors could arise from contamination (easy to see, with more than two peaks at a locus) or perhaps from poor-quality DNA leading to some peaks being overlooked; however, this would likely lead to an excess of recorded homozygotes in sh with poor-quality DNA, and we found no evidence of such a relationship among sh with enough scorable loci to be included in the pairwise comparisons.  Table 2 would smudge into the next column or two; this would be visible in the lowest rows of the Table, where the non-POPs on the right-hand side of the Table do not reach the left-hand columns. However, any smudging would not be obvious in the upper rows, where non-POPs do reach as far as the X1 column. We examine this formally in the rest of this section.
The data used for bounding false-negatives are a pre-re-scoring version of the data (Table 5), which includes some small-scale genotyping errors that were later manually rescored and corrected. The pre-rescoring data in Table 5 are very similar to Table 2, the main dierence being that the C23 row starts (3,1) rather than (4,0). This is one case where a scoring error did initially cause a false-negative, but the error was detected and xed on re-scoring. The other dierences did not aect POP status of any pairs.
Likelihood for estimating false-negative rate Let θ be the probability that a pair of sh will be a POP (so θ is inversely related to abundance, etc), and let e be the probability that one shared locus in a POP will fail the parent-ospring compatibility test 1 , either through mis-scoring or mutation. Assuming scoring errors at dierent loci are independent 2 and equally likely 3 , then the probability of f loci failing in a POP where c loci are compared, is a simple Binomial probability. Also, for a non-POP pair where c loci are being compared, let p NON cf be the probability that f of the loci will fail the test. For any given pair, this actually depends on the particular loci involved, and is already calculated to form the basis for the expected values in Table 2. Any given pair with c loci compared is either a POP or not, and the probability p cf that the pair will fail at f loci is therefore Therefore, if n c denotes the number of comparisons using c loci in Supplementary Table 2, the expected value of cell (c, f ) is n c p cf . Strictly, the distribution within each row is Multinomial, but in the rst few columns the multinomial size index is enormous (millions) and p cf is small, so a Poisson approximation is accurate. If y cf denotes the observed number of pairs in the (c, f ) entry of Table 3, then the likelihood of the rst few columns up up to a multiplicative constant. The term p cf involves the parameters θ and e, which can be estimated via maximum likelihood.
The bulk of the information on false-negative rates is contained in the X1 column of Supplementary  Table shown here), the noise from the increasingly large numbers of almost-false-positives swamps any signal related to false-negatives with 2, 3, etc number of failures, which will be increasingly rare.

Condence intervals on actual false-negatives
Although the Hessian from the above likelihood could be used in the standard way to derive a condence interval for the expected number of false-negatives in a hypothetical replicate of this study, that would be solving the wrong problem. Our interest lies in the actual number in this particular study; so, if false-negatives were very unlikely beyond the X1 column, then the number of false-negatives would be capped above by the total number of X1s actually seen, regardless of how many might be found if the study was repeated. This makes quite a dierence in practice. A Bayesian argument is required to get the answer we need.
We need the probability distribution of the number of false-negatives #F N given the observed data, i.e. P [#F N |y] where y = (y cf : c ∈ 11 · · · 25, f ∈ 0 · · · 1) is the observed numbers in the X0 and X1 and possibly X2 columns (X3 onward are irrelevant because the chances of 3 or more scoring errors is negligible). For simplicity of argument, say for now that we neglect the X2 column as well. Obviously, the maximum possible value of #F N is the observed number of X1s, in this case 12. Each of these X1 pairs is either a near-false-positive or a false-negative.
The probability that an X1 pair with c loci compared is actually a false-negative rather than a near-false-positive, One implication is that a (C12,X1) sh is much more likely to be a near-false-positive than a (C25,X1) is, because (i) the probability of a non-POP matching by chance at 11 of 12 loci is much higher than for 24 of 25, and (ii) the chance of a scoring error is about twice as high with 25 loci as with 12.
The false-negative status of the pairs are independent given θ and e, so the total number of X1 pairs that are false-negative is the sum of (in this case) 12 independent Bernoulli (0/1) random variables, with probabilities depending on the number of loci involved. There is an algorithm due to Butler 3 for calculating the Bernoulli-sum probability distribution, which is already used in the expected-false-positive calculations. Hence, given a pair of values (θ * , e * ), we can easily compute P [#F N = x|y, θ * , e * ] for x ∈ 0 · · · 12. What we actually need, though, is which can be estimated by repeatedly drawing pairs θ * j , e * j from the posterior distribution of (θ, e|y) via importance-sampling, and then averaging the P #F N = x|y, θ * j , e * j across all the draws. This requires a prior for (θ, e), which we took to be independent uniform on log θ and logite, plus of course the likelihood from equation (S6). A fully-conditioned condence interval on #F N |y can then be found simply by inverting the cumulative distribution of #F N |y.

Outcome of false-negative checks
We ran the above algorithms on the X0 & X1 columns of Supplementary In addition, the ABI system uses an internal size standard added to each sample from which the size curve is extrapolated for estimating allele peak length relative to the standard curve. ABI states that variation using this system ensures +/-0.5bp accuracy from run to run. Furthermore, the GeneMapper program analyses each individual size curve for peak quality and general t to the theoretical ideal size curve. Any discrepancies detected by the software raise ags in the analysis window and can be scrutinized in further detail. We also examined each size curve analysis as well as the individual peaks that were used to generate the size curve for each individual in a run plate to ensure another level of QC in addition to that used by the GeneMapper software.
GeneMapper uses a standard set of allele size bins used to smooth out further subtle variation and ensured easy comparison among alleles from dierent individuals and provided another level of QC among plates. Bin sets are developed for each locus to permit automated genotyping using the GeneMapper software. Individual bins represent a value range centred on the median length value of each allele as ascertained following sizing of an initial set of individuals. Preliminary bin sets were developed following detailed analysis of about 500 sh. These sets were designed to encompass slight variations to permit detection of gross deviations from the norm greater than +/-1.0 bp. After genotyping about 5000 sh, the bin sets were re-assessed for consistent allele calls, and a nal consensus adjustment was determined. Bins permit assignment of an integer value to the continuous-valued allele length based on the GENESCAN size standard, and permit simple comparison of allele identities among individual genotypes.
A gap of one to three base pairs between bins ensures that an objective decision rule can be consistently applied to a genotype for inclusion of an allele into a designated integer bin. Alleles falling in the gap were rare and presumed to be a result of an insertion or deletion event on an individual's DNA. These were scored as unknown genotype but the real value could still be used for conrmation of parentage should it be required to conrm identity (not required with our samples to date).
The use of automated genotyping with a single set of GeneMapper bin-sets allowed us to detect if peaks were consistently falling outside of predetermined bins and would highlight a general problem with the running of a plate (eg. old buer or polymer in the sequencer leading to general failure of proper electrophoresis and inconsistent separation). Runs where problems were found were re-run with new buer and polymer; this rectied the problems in every case.

Avoidance of chimeras
Chimeric genotypes are (in this study) a composition of DNA from more than one sh, rather than (as in some other studies) DNA proles resulting from multiple DNA in a well (two or more contaminated DNA leading to more than two alleles present for each locus). There are only two possible sources. First, a chimeric error will result from turning a run plate 180 degrees, whereby e.g. the A1 position became the H12 position. This error produces what looks like a legitimate DNA prole but made up of some loci from sh A1 mixed with the remainder of loci from H12 from the run plates that were not rotated. Second, if two run plates are swapped, the loci for those panels (but not for the other panels on the same sh) will be swapped. Clearly, these errors will lead to any POP members on the plate being overlooked, aecting 100-200 sh at a time, so it is important to catch them. Fortunately, once one is aware of these possibilities, it is fairly easy to write QC software using the check-plate results and/or the controls to detect and x the problem. We did nd both types of chimera in this study (rarely), but thanks to the QC protocols we were able to detect and x them.