The relationships of passerines (such as the well-studied zebra finch) with non-passerine birds is one of the great enigmas of avian phylogenetic research, because decades of extensive morphological and molecular studies yielded highly inconsistent results between and within data sets. Here we show the first application of the virtually homoplasy-free retroposon insertions to this controversy. Our study examined ~200,000 retroposon-containing loci from various avian genomes and retrieved 51 markers resolving early bird phylogeny. Among these, we obtained statistically significant evidence that parrots are the closest and falcons the second-closest relatives of passerines, together constituting the Psittacopasserae and the Eufalconimorphae, respectively. Our new and robust phylogenetic framework has substantial implications for the interpretation of various conclusions drawn from passerines as model organisms. This includes insights of relevance to human neuroscience, as vocal learning (that is, birdsong) probably evolved in the psittacopasseran ancestor, >30 million years earlier than previously assumed.
Birds are important model organisms in many fields1,2, but ever since the time of Darwin, numerous attempts to reconstruct their phylogenetic relationships have yielded at least as many controversies3,4,5,6,7,8,9,10,11,12,13. In recent years, however, some morphological3,4 and most molecular studies7,8,9,10,11,12,13 have found congruence regarding the earliest chapters of bird evolution14. The root of extant birds lies between Palaeognathae (ratites and tinamous) and Neognathae, the latter comprising Galloanserae (chicken and ducks) and Neoaves (all remaining birds).
Despite immense efforts, most of the basal relationships among Neoaves remain unsolved. This includes one issue of great interdisciplinary relevance1,2: the discovery of the putative sister group of passerines (>50% of all birds species, including all songbirds), one of the most-studied groups of animals2. Morphological studies indicated a close affinity either to woodpeckers4 and rollers3 or to cuckoos5, whereas a more basal position among Neoaves was suggested by DNA hybridization data6. On the other hand, nucleotide sequences of mitochondrial genomes placed passerines as the sister group to all remaining Neoaves7,10, to a woodpecker/roller/trogon clade8 or to cuckoos9, whereas nuclear sequence analyses proposed a relation to woodpeckers and rollers11 or to parrots13, falcons and seriemas12.
A promising approach to overcome the present phylogenetic ambiguities is the use of retroposon insertions. Retroposons, jumping genetic elements that copy via RNA intermediates and insert nearly randomly anywhere in the genome (although some biases of insertion and retention have been proposed15), provide (by inheritance) virtually homoplasy-free evidence of relatedness16 that is detectable for more than 100 million years. Because parallel insertions or exact excisions are highly unlikely16, presence/absence patterns of retroposons at orthologous genomic loci are powerful, clear-cut phylogenetic markers capable of resolving long-standing uncertainties17,18,19,20.
In this study, we present an improved resolution of bird evolution using retroposon insertions, a marker system that rarely undergoes homoplasy and is fully independent from previous approaches (for example, morphology, DNA hybridization or nucleotide sequence analyses). We provide the first statistically significant phylogenetic evidence for the early branching events in the avian tree of life, including the identification of the so far enigmatic sister group of passerines. Additionally, we reconstruct the chronological impact of retroposons on the avian genome during the Mesozoic Era of bird evolution.
Reconstructing the avian tree of life using retroposon insertions
From the over 200,000 retroposed elements (REs) present in the chicken and zebra finch genomes1, we selected the two most numerous fractions (>97% of all REs1), namely, both the chicken repeat 1 (CR1) family of long interspersed elements (LINEs) and the long terminal repeat elements (LTRs) of endogenous retroviruses. Utilizing three different search strategies (Methods), we extracted 131 CR1 and 75 LTR loci that were experimentally tested via high-throughput PCR, leading to the identification of 51 phylogenetically informative markers. For each marker, representatives of the key avian lineages13,14 were sampled, sequenced and aligned using standard procedures21. To measure the strength of support for all recovered branches, we calculated P values using the Waddell et al.'22 likelihood ratio test for retroposon data. Thus, statistically significant retroposon evidence (P<0.05) is reached with three conflict-free markers (P=0.0370, (3 0 0)). Because of the mentioned strength and clearness of retroposon markers, our resultant maximum parsimony-based phylogenetic tree (Fig. 1, branches A–L) is effectively a maximum likelihood estimation23.
Resolving early bird phylogeny
Our retroposon markers are located on 14 different chromosomes, significantly clarifying more than the well-established3,4,7,8,9,10,11,12,13,14 avian relationships. We obtained six retroposon insertions that are shared among paleognaths and neognaths, corroborating the monophyly of extant birds (Fig. 1, branch A). These retroposon insertions feature a unique, diagnostic deletion present only in some avian CR1 elements (subtypes CR1-Y and CR1-Z; this deletion is absent in crocodilian and all other avian CR1 elements), and can therefore be regarded as bird-specific REs (Supplementary Fig. S1). Additionally, the root of living birds is located between the significantly supported Neognathae (Fig. 1, branch B; five REs, P=0.0041, (5 0 0), likelihood ratio test22) and Palaeognathae (Fig. 1, branch C; four REs, P=0.0123, (4 0 0), likelihood ratio test22). Significant support was also found for the monophyly of Neoaves (Fig. 1, branch D; six REs, P=0.0014, (6 0 0), likelihood ratio test22), Galloanserae (Fig. 1, branch E; four REs, P=0.0123, (4 0 0), likelihood ratio test22) and Passeriformes (Fig. 1, branch L; six REs, P=0.0014, (6 0 0), likelihood ratio test22).
Resolving the neoavian radiation
Within the hitherto largely unresolved7,8,9,10,11,12,13,14 radiation of Neoaves, we obtained four markers whose insertion patterns seem inconsistent with one another (Fig. 1, label F; Supplementary Fig. S2; Supplementary Table S1). As CR1 and LTR retroposons exhibit no or very short (<6 bp) target site duplications, exact excisions as proposed for primate Alu short interspersed elements24 cannot have occurred25. Because of the nearly 1.2 billion1 potential insertion sites in the avian genome, parallel insertions (featuring exactly the same target site, retroposon type, orientation and truncation) should be extremely rare. Therefore, the incongruent patterns among the four retroposon insertions are most likely a result of incomplete lineage sorting (leading to hemiplasy)26 of retroposon presence/absence dimorphisms that persisted during the very beginning of the neoavian radiation and were randomly fixed (that is, one of the two alleles was lost) in each of the descendant lineages (Supplementary Fig. S2). This complex evolutionary phenomenon was previously revealed by retroposons (for example, in the rapid radiations of cichlid fishes27 and placental mammals17,20) and is a further indication that the earliest period of the rapid radiation of Neoaves is a putative polytomy28.
The remaining retroposon evidence within Neoaves exhibits no incongruent presence/absence patterns. We recovered the previously reported12,13 'landbird' assemblage (Fig. 1, branch G; two REs), a novel clade consisting of all 'landbirds' to the exclusion of mousebirds (Fig. 1, branch H; two REs) and a close affinity12,13 among seriemas, falcons, parrots and passerines (Fig. 1, branch I; two REs). Statistical testing22 of the support for these three branches is not applicable, as some of the above incongruent presence/absence patterns are also inconsistent to these (Supplementary Fig. S2; Supplementary Table S1).
Unexpectedly, we obtained a wealth of conflict-free retroposon markers for two branches that were previously proposed by the Hackett et al.13 study of nuclear intronic sequences, and which received relatively moderate bootstrap support in their study. Seven retroposon insertions are exclusively present in falcons, parrots and passerines, but absent in hawks, woodpeckers and other 'landbirds' (Fig. 1, branch J; seven REs, P=0.0005, (7 0 0), likelihood ratio test22); we therefore suggest the new name Eufalconimorphae (true Falconimorphae) for this significantly supported monophylum. Most strikingly, the shared presence of three retroposon insertions solely in parrots and passerines (Fig. 1, branch K; three REs, P=0.0370, (3 0 0), likelihood ratio test22; see also Fig. 2 for sequence alignments) provides statistically significant evidence of parrots as the living sister group of the Passeriformes. To make this new phylogenetic resolution easily comprehensible, we propose the new name Psittacopasserae (parrots and passerines). It is worth noting that with this evidence, for the first time, passerines can be confidently placed within the avian tree of life.
Although our exhaustive zebra finch-based retroposon screening did not detect any evidence for incomplete lineage sorting within Eufalconimorphae, we cannot completely exclude the possibility of its occurrence in this part of the neoavian tree. Considering this, we expect that, once the genome sequence of a parrot or a falcon is available, parrot- or falcon-based retroposon screenings will permit an even stronger resolution of this issue and a reevaluation of the conflict-free support for Psittacopasserae reported here.
Reconstructing the chronology of Mesozoic retroposon activity
In addition to resolving phylogenetic controversies, our markers enabled us to reconstruct the temporal retroposon impact on the avian genome during early bird phylogeny via the comparison of these experimentally verified insertion events with computational estimates of retroposon activity. To determine a computational chronology of retroposon activities, 995 nested retroposons (retroposons that inserted into other retroposons) were extracted from the zebra finch genome and their coordinates were implemented in the transposition in transposition (TinT) model25,29. Because the insertion of a younger (active) RE subtype into an older (inactive) RE can be expected to occur more likely than the opposite situation, the genome-wide quantitative distribution of different subtypes of retroposons nested within other RE subtypes enables a reliable estimation of relative retroposon activity periods29. As some RE subtypes were active during relatively short periods, it is possible to plot the resulting TinT pattern against a chronogram of molecular divergence times30, yielding a congruent estimate of retroposon successions during the Mesozoic evolution of birds12,30 (Fig. 3). For instance, both approaches indicate that during the shared evolutionary history of the chicken and zebra finch (in the lineage leading to Aves and Neognathae), several retroposons (CR1-Y2_Aves, CR1-Y1_Aves and TguLTR5e) were active (see Supplementary Fig. S3 for a TinT pattern of the chicken genome). Subsequently, other REs (CR1-E_Pass, CR1-J2_Pass, TguLTR5a and TguLTR5d) were active in the ancestor of Neoaves and within the neoavian radiation. Considering that most of the identified retroposon markers that were inserted during the neoavian radiation are LTRs (including all evidence for Eufalconimorphae and Psittacopasserae), we assume that this period of extensive and accelerated speciation events was accompanied by an increased activity of endogenous retroviruses. This conclusion coincides with the observation that the zebra finch genome harbours about three times as many LTRs as the chicken genome1. Moreover, our zebra finch TinT pattern indicates that the greatest retroposon diversity was present during and bordering the neoavian radiation, including many different short-lived subtypes of REs. On the basis of these insights, future retroposon studies can easily select the REs that were active during an evolutionary chapter of interest to resolve the remaining uncertainties regarding the earliest divergences within Neoaves.
Our results have far reaching implications from more than an ornithological point of view. In addition to the reconstruction of speciation events in early bird phylogeny, we have established a calibrated chronology of retroposon activity during the Mesozoic Era of bird evolution. We identified retroposons that were inserted at the very beginning of the neoavian radiation and were probably subjected to incomplete lineage sorting, a phenomenon that likely accounts for some of the incongruent results from sequence-based phylogenies. Retroposons constitute unique tools for understanding such complex and otherwise irresolvable evolutionary scenarios27. Furthermore, we have determined a statistically significant resolution of a later part of the neoavian radiation, namely, the sister group relationship of passerines and parrots (Psittacopasserae) and their mutual affinity to falcons (Eufalconimorphae). Our retroposon evidence can serve as a robust prior hypothesis for future studies focusing on these bird taxa. As such, parrots and passerines not only share the ability to learn vocalization2, but also have a direct common ancestor. Although hummingbirds are also vocal learners31, our phylogeny indicates that they are only distantly related to Psittacopasserae; therefore it is most parsimonious to assume that their vocal learning capability evolved after the divergence of hummingbirds and swifts (Fig. 4). Nevertheless, the phylogenetic resolution of Psittacopasserae raises the question as to what extent the striking neuroanatomical and gene expression parallels2 (for example, the anterior-medial vocal pathway32) between parrots and oscine passerines (songbirds) are homologous and thus evolved in their shared ancestor (Fig. 4). Behavioural and neuroanatomical data on 'non-oscine' passerines (Suboscines and Acanthisittidae) is scarce33 and, to our knowledge34, limited to New World Suboscines, suggesting that some representatives do not learn vocalizations (that is, Tyrannidae35,36,37), whereas others possibly do (that is, the earlier-branching38 Cotingidae33 and Pipridae39). Thus, to assume that vocal learning evolved in the psittacopasseran ancestor (with a secondary loss in at least one lineage of suboscine passerines) seems more parsimonious than hypothesizing four independent evolutions of vocal learning within Psittacopasserae. Accordingly, the emergence of vocal learning of songbirds would have happened at least 30 million years30 earlier than evident from the previous assumption of the independent evolution of cerebral vocal nuclei40 in parrots and in (oscine) passerines. Thorough reevaluation of this issue will impact various conclusions drawn from passerines and might thereby change our current understanding of the evolution of vocal learning in general.
We used three different search strategies to computationally screen over 200,000 REs present in the chicken and zebra finch genomes (see Supplementary Table S1 for information on the contribution of each strategy to the 51 phylogenetically informative markers). On the basis of their suitability for cross-species PCR amplification (that is, only retroposon insertions situated in well-conserved intronic or intergenic regions smaller than 1.5 kb were considered), we identified 131 CR1 and 75 LTR candidate RE-containing loci. These loci were then experimentally screened in a reduced taxon sampling (comprising Nestor, Falco, Picus, Buteo, Ciconia and Columba for zebra finch REs; in the case of chicken and emu REs, the reduced taxon sampling consisted of the representatives of Galloanserae and Palaeognathae), revealing our 51 phylogenetically informative markers.
In silico screening
Initially, (first strategy; a) genomic three-way alignments (comprising emu, chicken, and zebra finch) were compiled by MAFFT41 (FFT-NS-2, version 6, http://mafft.cbrc.jp/alignment/server/index.html) using ~2.55 million bp of emu genomic contigs available in GenBank (http://www.ncbi.nlm.nih.gov/Genbank/) and the corresponding regions in the chicken and zebra finch genomes (assemblies galGal3 and taeGut1 in Genome Browser42, http://genome.ucsc.edu/cgi-bin/hgBlat). REs were annotated using CENSOR (http://www.girinst.org/censor/index.php), and retroposon insertion loci situated in well-conserved intronic or intergenic regions were chosen for primer generation. To identify additional candidate loci (first strategy; b), all avian sequences available in GenBank were screened for REs and (if a retroposon was present) aligned to the corresponding regions in the chicken and zebra finch genomes using MAFFT (E-INS-I, version 6). Second strategy; based on the insights gained by strategy I into the phylogenetic informativeness of representatives of certain CR1 and LTR subfamilies for our phylogenetic questions of interest, whole-genome in silico screenings for selected retroposons were conducted. This was done by extracting retroposon insertions including their flanking sequences (1 kb of each flank) from chicken or zebra finch genomes and BLAST screening these against chicken annotated unique exonic sequences to obtain well-conserved loci (<1.5 kb). Alternatively, retroposon consensus sequences from Repbase (http://www.girinst.org/repbase/index.html) were BLAT43 screened against chicken or zebra finch genomes and well-conserved loci in introns (of any size) or intergenic regions were chosen for primer generation. Third strategy; a CR1-enriched retroposon library of emu genomic DNA was constructed via a protocol utilizing digestion and circularization of genomic DNA and subsequent inverse PCR44. A total of 242 clones were sequenced and BLAT screened against chicken and zebra finch genomes to find CR1 insertions (situated in well-conserved regions) specific to the lineage leading to the emu and suitable for experimental presence/absence screening.
Our whole taxon sampling (voucher numbers of the samples in the LWL-DNA- und Gewebearchiv of the Museum für Naturkunde Münster are specified) consisted of representatives of the key lineages13,14 within Palaeognathae (Struthio camelus (LWL00446), Pterocnemia pennata (LWL00447), Eudromia elegans (LWL00448), Dromaius novaehollandiae (LWL00449)), Galloanserae (Dendrocygna viduata (LWL00450), Anas crecca (LWL00451), Alectura lathami (LWL00452), Gallus gallus (LWL00453)) and Neoaves (Chrysolampis mosquitus (LWL00458), Apus apus (LWL00459), Opisthocomus hoazin (LWL00457), Phoenicopterus ruber roseus (LWL00454), Tachybaptus ruficollis (LWL00455)/Podiceps cristatus (LWL00456), Columba palumbus (LWL00408), Carpococcyx renauldi (LWL00460)/Cuculus canorus (LWL00461), Balearica pavonina (LWL00462), Larus ridibundus (LWL00463), Ciconia ciconia (LWL00464)/C. boyciana, Urocolius macrourus (LWL00465), Cathartes aura (LWL00466)/Gymnogyps californianus, Buteo lagopus (LWL00467)/Gyps fulvus (LWL00468), Trogon viridis (LWL00469), Picus viridis (LWL00470), Alcedo atthis (LWL00105), Asio otus (LWL00417), Cariama cristata (LWL00474), Falco sparverius (LWL00471), Nestor notabilis (LWL00472), Acanthisitta chloris (LWL00475) and Taeniopygia guttata (LWL00473)). Species identity was confirmed by direct sequencing of a fragment of the mitochondrial ND2 gene using the published primers L5216+H6313 (courtesy of Michael D. Sorenson, Boston University) listed in Supplementary Table S2, and subsequent BLAST screening against GenBank's nucleotide collection and our own unpublished mitochondrial sequences. If no sequence or only the sequence of a closely related species was publicly available, we deposited the respective new ND2 sequence in GenBank.
In vitro screening
The marker candidates selected using our three in silico screening strategies were experimentally tested for their phylogenetic informativeness (see Supplementary Table S1 for presence/absence patterns of the 51 phylogenetically informative markers) using a taxon sampling that is essential for a phylogenetic conclusion. Genomic DNA was isolated from blood or muscle tissue using conventional phenol–chloroform extraction, whereas contour feathers were processed either via the QIAamp DNA Micro kit (Qiagen) using a modified protocol45 or using a rapid simple alkaline extraction46. Each 25-μl PCR reaction contained 0.5 U ThermoPrime Taq DNA Polymerase (ABgene), 75 mM Tris–HCl, pH 8.8, 20 mM (NH4)2SO4, 0.01% (v/v) Tween 20, 2.5 mM MgCl2, 0.1 mM of each deoxyribonucleotide triphosphate, 10 pmol of each primer (see Supplementary Table S2 for primer sequences) and >5 ng of genomic DNA. PCRs were carried out using the touchdown PCR strategy; 2 min at 94 °C were followed by 10 cycles of 30 s at 94 °C, 30 s at 55 °C (decreasing by 1 °C per cycle) and 80 s at 72 °C. The final 26 cycles of 30 s at 94 °C, 30 s at 45 °C and 80 s at 72 °C were followed by 120 s at 72 °C. Subsequent to agarose gel electrophoresis, all PCR products were immediately purified or excised from agarose gels and then purified. Sequencing of the samples was conducted either directly using the specific PCR primers or indirectly using standard M13 forward and reverse primers after ligation into the pDrive Cloning Vector (Qiagen) and electroporation into TOP10 cells (Invitrogen).
All nucleotide sequences were deposited in GenBank (accession numbers JF915895-JF916445). To complete our taxon sampling, we also used previously published sequences available in Genome Browser (assemblies galGal347 and taeGut11) and GenBank (accession numbers AB112956, AB235826, AB235829, AC153776, AC158282, AC158284-AC158286, AC160232, AF525979, AF525980, DP000685, DP000802, JF279549-JF279555, JF279558–JF279573 and JF279576–JF279590). Some of the sequence data48,49,50 (emu BAC sequences AC153776, AC158282, AC158284–AC158286, AC160232, DP000685 and DP000802; alligator BAC sequences DP000795 and DP000976) were generated by the National Institutes of Health Intramural Sequencing Center (http://www.nisc.nih.gov). The lizard genome sequence (assembly anoCar1 in Genome Browser) was generated by the Broad Institute (http://www.broadinstitute.org).
All sequences of each marker were first automatically aligned using MAFFT (E-INS-I, version 6) and then manually realigned (see Supplementary Data for 51 full sequence alignments). Each alignment was carefully inspected and the retroposon insertion considered a phylogenetically informative marker if, in all species sharing this RE, it featured an identical orthologous genomic insertion point (target site), identical RE orientation, identical RE subtype, identical target site duplications (direct repeats, if present) and a clear absence in other species. Candidate markers exhibiting an RE flanked by >10 bp of nearly identical, low-complexity sequences were excluded from the analysis to minimize the possibility of inconsistencies caused by precise RE excision as reported by van de Lagemaat et al.24
In the case of CR1 retroposon insertions shared among all the investigated bird lineages (markers A-1 to A-6), we initially aligned the avian retroposon flanks to the corresponding BAC sequences of the alligator available in GenBank (DP000795 and DP000976). Because of the ~220 million years of bird/crocodilian sequence divergence51, a classical presence/absence situation could not be ascertained. Although CR1 elements are also found in the genomes of other non-mammalian amniotes48,49,50, we consider these retroposon insertions to be suitable markers for the monophyly of birds, because each of them exhibits a diagnostic 6-nt deletion that is only present in a few bird-specific CR1 subtypes (that is, CR1-Y and CR1-Z) but not in CR1 elements of other amniotes (that is, all BLAST and BLAT search hits of avian CR1 against available genome or BAC sequences of alligator, lizard, turtle, platypus and human were inspected by eye; see Supplementary Fig. S1 for a structural comparison of the well-conserved terminal regions52 of amniote CR1 retroposons including lineage-specific diagnostic insertions or deletions). The majority-rule consensus sequences of the previously unrecognized CR1 subtypes 'ALL-LINEa', 'ALL-LINEb' and 'ANO-LINE' were derived from 17, 25 and 10 BLAST hits, respectively.
On the basis of the presence/absence matrix of our 51 phylogenetically informative markers (Supplementary Table S1), our phylogenetic tree was drawn by hand considering maximum parsimony and independently verified by a maximum parsimony analysis of a 1/0-coded version of our presence/absence matrix (Supplementary Software) in PAUP*(version 4.0b10; using the irrev.up option of character transformation, heuristic search with 1000 random sequence additions, and TBR branch swapping). This yielded one strict consensus parsimony tree (Fig. 1, consistency index=0.895 and tree length=57) derived from 577 equally parsimonious trees.
To determine a chronology of retroposon activity periods, we used the web-based TinT application29 (http://www.compgen.uni-muenster.de/tools/tint/). As input data, the precomputed RepeatMasker files (hosted on the server) from chicken or zebra finch were selected. Only the retroposon subtypes present in the respective figures (see Fig. 3 for the zebra finch TinT of 995 nested REs or Supplementary Fig. S3 for the chicken TinT of 2355 nested REs) were included in the analysis (but note that, in the case of the zebra finch TinT, the retroposons CR1-YB1_Tgu and TguLTR5c were added to the analysis but excluded from Fig. 3) using default parameters. The resultant graph of normal distributions of retroposon activity (ovals represent 75%, vertical lines 95% and horizontal lines 99% of the probable activity period) was plotted on a simplified chronogram30 using the experimentally verified retroposon insertions of Figure 1 as calibration points (for example, the succession of TguLTR5e to TguLTR5d activity in the zebra finch ancestor's genome after the divergence of Galloanserae and Neoaves). For this purpose, we considered the chronogram by Pereira and Baker30 to be most suitable, as it includes molecular divergence times for the Crocodylia/Aves split, the Palaeognathae/Neognathae split, the neoavian radiation, and the Acanthisitta/oscine Passeriformes split (other analyses of molecular divergence times8,12 have only investigated a few of these dates).
Accession codes: The nucleotide sequences have been deposited in GenBank database under accession numbers JF915895–JF916445.
How to cite this article: Suh, A. et al. Mesozoic retroposons reveal parrots as the closest living relatives of passerine birds. Nat. Commun. 2:443 doi: 10.1038/ncomms1448 (2011).
NCBI Reference Sequence
We thank Judith Brockhues for help with in vitro experiments and Werner Beckmann, Timm Spretke (Zoo Halle), Stephanie Hodges, Sharon Birks (Burke Museum), Sandra Silinski (Zoo Münster), Renate van den Elzen (Museum Koenig), Ommo Hüppop, Nils Anthes, Michael Wink, the LWL-DNA- und Gewebearchiv, Joes Custers, Holger Schielzeth, Gerald Mayr (Senckenberg Museum), Elisabeth Suh, Christoph Bleidorn and Andrew Fidler for providing feather, blood and tissue samples. Jón Baldur Hlí∂´berg provided the bird paintings and Marsha Bundman helped with editing. We thank Gerald Mayr for valuable comments. This research was funded by the Deutsche Forschungsgemeinschaft (KR3639 to J.O.K. and J.S.) and the Medizinische Fakultät der Westfälischen Wilhelms-Universität Münster.
Presence/absence matrix as a 1/0-code version in PAUP*.
About this article
Scientific Reports (2017)