Study on attractors during organism evolution

The important question that arises during determining the evolution of organisms is whether evolution should be treated as a continuous process or whether groups of organisms fall into 'local' attractors during evolution. A similar question arises during considering the development of cells after cancer transformation. Answers to these questions can provide a better understanding of how normal and transformed organisms evolve. So far, no satisfactory answers have been found to these questions. To find the answers and demonstrate that organisms during evolution get trapped in 'local' attractors, an artificial neural network supported by a semihomologous approach and unified cell bioenergetics concept have been used in this work. A new universal model of cancer transformation and cancer development has been established and presented to highlight the differences between the development of transformed cells and normal organisms. An unequivocal explanation of cancer initialization and development has not been discovered so far, thus the proposed model should shed new light on the evolution of transformed cells.

Random genetic mutations and natural selection are the factors that, according to existing theories, act as driving forces of organism evolution 1,2 . These phenomena may point at the continuous or discontinuous character of evolution. When considering evolution as a discontinuous process, large differences between genomes of evolving groups of organisms should be visible. In this case, it can be said that whole groups of organisms should be trapped in different attractors and evolve inside them. The term 'attractor' means a configuration (a set of values for the variables) towards which the system evolves over time. After attaining an attractor a given configuration of a system is sufficiently stable to return to the original state after disappearing an eventual perturbation 3 . In this work, the attractors which trap genomes of organisms are termed 'genome attractors' . For all cells of trapped (in the genome attractor) organism, special gene expression programs (termed ' cell-fates') are activated that enlive and keep alive the whole organism.
Most animal studies have used single mitochondrial DNA genes to evaluate population or low-level taxonomic relationships [4][5][6][7] . One of the most useful genes for phylogenetic reconstruction is cytochrome b, that is commonly used in systematic research to address many taxonomic level divergences 8,9 . Cytochrome b, alone or supported by other data sets (for example nuclear ribosomal rRNA gene, cytochrome oxidase I gene (COI), complete mitochondrial DNA), can yield phylogenetic trees that are in agreement with well-established phylogeny [9][10][11][12][13][14][15][16] . Research indicates that the use of cytochrome b is superior to COI when one locus is to be used as a standard for mammalian species phylogeny and identification 16 . Cytochrome b is very useful as a 'fingerprint' of organisms because it can harbour very few mutations due to the stringent structural and physiological links it obeys. This implies all the observed mutations are 'function preserving' and thus 'fitness independent' , so that they can be considered as evolutionary neutral mutations only working as 'time keeping' measuring the evolution timing between two species. For these reasons, cytochrome b sequences of the selected groups of organisms have been examined in this work to check trapping the evolving groups of organisms in genome attractors. Especially important for checking genome attractors is the ability to use cytochrome b sequence variability in comparison of organisms in the same genus or the same family 17,18 . An example of using cytochrome b alone as a molecular marker can be establishing phylogenetic relationship at various levels within the fish family Cichlidae 9 . The obtained trees (as a result of analysis based on cytochrome b alone) have been consistent with the trees obtained as a result of the extended (total) analysis 9 . Other authors have presented that a partial DNA sequence of cytochrome b can be sufficient for animal identification. This has been demonstrated on the example of the identification of the remains of endangered animals and species endemic to Taiwan (i.e. clouded leopards, leopard cats, lions, tigers, water buffalos and selected Formosans) 19 .
The reconstruction of evolution on genetic levels usually requires applying a distance correction to take into account the impact of an intermediate/invisible stages of evolution. For this reason, the selection of a stochastic Design and teaching the artificial neural network. The artificial neural network (ANN) has been designed as a full synapse three layer neural network and it has been taught in a similar way as it is presented in 18 . These three layers (i.e. the input, hidden and output layer) are composed of neurons, all having the same characteristics. Each layer has been implemented as a sigmoid layer and transfers the input pattern to the output pattern by executing a sigmoid transfer function (y = 1/(1 + exp(-x))) that gives a smooth output limitation within the range 0 and 1 30 . The layers are connected using synapses that permit a pattern to be passed from the input layer to the hidden layer and next from the hidden layer to the output layer. In the implemented neural network, all the nodes of the input layer are connected with all the nodes of the hidden layer and all the nodes of the hidden layer are connected with all the nodes of the output layer. Because the number of amino-acids (AA) in the cytochrome b sequences is usually not bigger than 400 for almost all organisms, it has been assumed that the length of sequences in the input of the neural network is equal to 400 AA. In order to obtain a length equal to 400, the sequences have been aligned by addition of the "−" characters at the end of sequences or have been cut. After the alignment, each sequence has been converted to the binary form by changing each character to a random generated five-positional binary number. After converting, the new binary form of sequence (which is entered into the neural network input layer) has a length of 2000 and the number of neurons in the neural network input layer is equal to 2000 (i.e. n = 2000). The number of neurons in the output layer (k) is equal to the number of organisms used for ANN teaching (i.e. k = 36). The number of neurons in the hidden layer (m) is calculated by the geometric pyramid rule 31 , i.e. m = sqrt(n*k) neurons (in this work m = round(sqrt(n*k)) = 268).
Sequences of cytochrome b of 36 organisms have been used to teach the neural network (see point A.1 in Appendix): https:// github. com/ biopg ms/ bioat tr/ blob/ main/ teach ing_ seque nces. xml. These 36 organisms have been selected from a wide spectrum of evolution, from primitive organisms to evolutionarily advanced organisms. Selected in this way 36 cytochrome b sequences can be considered as pattern sequences necessary to recognize the other, examined organisms. The supervised learning technique with the on-line backpropagation algorithm has been used for teaching the neural network. This technique allows the neural network to be taught very effectively 30 . A teaching output for the first sequence (i.e. the cytochrome b sequence of Bacteria {#1}) was equal to "0000…0001". For the other sequences used for teaching, "1" was shifted to the left, i.e. a teaching output for the second sequence (i.e. for the cytochrome b sequence of Green alga {#2}) was equal to "0000…0010". A teaching output for the 36th sequence (i.e. the cytochrome b sequence of Four-horned antelope {#36}) was equal to "1000…0000". In this way, the obtained results at the each of 36 outputs are in the range [0, 1] and inform about recognized similarities of examined organism to 36 organisms used to teach ANN (where value 0 means the minimum recognized similarity and value 1 means the maximum recognized similarity). ANN had been teaching (for learning rate = 0.3 and momentum = 0.1) until Root Mean Squared Error (RMSE) was less than 0.001. Teaching process time to decrease the RMSE less than 0.001 was approximately 6 days on computer IBM HS23 (2CPU Xenon E5-2650 × 4 Core, 2.00 GHz, RAM = 12 GB, HDD = 50 GB). The teaching process has been carried out in parallel 50 times, each time for different random generated five-positional binary numbers that code each character in the sequences that have been used to teach the neural network. In this way, 50 versions of the neural network have been obtained (that can be downloaded from: http:// staff. uz. zgora. pl/ akasp ers/ bioat tr/ saved_ NNs. zip), each version of neural network taught using a different amino-acid coding. Then, each examined Semihomologous approach. The semihomologous method assumes that the one-point mutation in the codons of compared amino-acids is the most frequent mechanism occurring in homologous proteins. This method posits close relations between amino-acids and their codons for the analysis of various relationships between proteins. The semihomologous approach allows for improving (compared to standard homologous approach) the accuracy of protein sequence comparison which avoids result misinterpretations [23][24][25][26][27] . The semihomologous algorithm assumes the existence of the following position types when comparing two amino-acids:

Results and discussion
In this section the results of two approaches are presented to examine evolution of selected organisms. These two approaches use artificial neural network (ANN) and semihomologous Dot-Matrix methods, that are implemented as the EvolutionXXI and dotPicker programs.
Human evolution-ANN approach. The human evolution has been examined taking into account evolution of monkeys (i.e. Tree shrews, Prosimians, New World Monkeys (NWM), Old World Monkeys (OWM)), Other hominoids (here: hominoids except for Old humans) and Old humans (here: Homo heidelbergensis, Homo sapiens ssp. Denisova, Homo sapiens neanderthalensis). The obtained evolutionary similarities (recognized using ANN) between selected organisms from these groups and Homo sapiens are presented in Tables 1-6 (see Appendix). Analysis of the results points out the evolutionary distances between these groups, i.e. it appears that the organisms of these groups are trapped in the local genome attractors. Assuming that these attractors are in the orbits (additionally see Remark 1), the distance between the orbits of Old human attractor and Homo sapiens attractor is very small with a distance factor equal to 1.0013 (i.e. Homo sapiens attractor orbit/Old human attractor orbit ≈ 1.0013). The distance between the orbits of Other hominoid attractor and Old human attractor is bigger with a distance factor equal to 1.1. The distances between the other orbits are much bigger with distance factors equal accordingly to: 3.2, 12.4, 17.8, 9.3 ( Fig. 1). The small arrows pointing from the genome attractors to the outside of the orbits schematically represent disturbances (i.e. attractions to the other organisms to which similarities have been recognized by ANN) of the attractor orbits ( Fig. 1).  It should be noted that the average ("#/$") factor increases during evolution of these organisms and is less than 1 for attractors of Tree shrews, Prosimians, NWM and is bigger than 1 for attractors of OWM, Other hominoids, Old humans (Fig. 3).
It should also be noted that using only the semihomologous approach it is impossible to separate most groups of organisms (i.e. Tree shrews, Prosimians, NWM, OWM) because the number of homologous and semihomologous positions and positions with two and three point mutations is almost the same for these groups (see Remark 2).

Remark 2
One of the key properties of neural networks is a generalization allowing for the correct recognition and classification of previously unseen/unknown patterns (i.e. sequences that have not been used for teaching when referring to the considerations presented in this article). The advantage of using ANN (comparing to semihomologous approach) for separation of organisms into groups that represent attractors may be due to the fact that during recognition ANN takes into account not only amino-acid similarities/dissimilarities but also the distribution of these similarities/dissimilarities, i.e. the recognition using ANN may resemble recognizing images formed ("painted") by sequences.  (Tables 7-10 in Appendix). Analysis of the results points out the evolutionary distances between these groups, i.e. it appears (similar as in the case of human evolution) that the organisms of these groups are trapped in the local genome attractors. Assuming that these attractors are in the orbits (additionally see Remark 1), the distance between the orbits of Saccharomyces attractor and Saccharomyces cerevisiae attractor can be considered as average with a distance factor equal to 1.4 (i.e. Saccharomyces cerevisiae attractor orbit/Saccharomyces attractor orbit ≈ 1.4). The distances between the other orbits are bigger with distance factors equal accordingly to: 2.2, 2.2, 11.7 (Fig. 4).   (Fig. 6). It should be noted that in this case ANN enables a much clearer separation of organisms into individual groups compared to semihomologous approach (see Remark 2).
Study on evolution of the other, selected organisms. The ANN approach has not detected clear direction in the evolution of bats, hippopotamuses, sirenians, rhinoceroses, squirrels (https:// github. com/  www.nature.com/scientificreports/ biopg ms/ bioat tr/ blob/ main/ other_ organ isms. pdf). This may indicate that these organisms are trapped in local genome attractors (bearing in mind that organisms from a wide range of evolution have been used to teach the neural network). The recognition of evolution using ANN points out that orbit of Bat attractor (checked for exemplary bats:

Attractors during development of transformed cells. Cells after cancer transformation display a
series of paradoxes-in this view, there is a need for a more system-based framework to understand the complex phenomena associated with the development of cancer 32 . This section presents a new (based on the attractor concept) system approach to the universal explanation of cancer transformation and cancer development. It is known, that mitochondria play especially important role during the development of cancer that, in certain cancer settings, can act as neoplastic drivers by generating high levels of oncometabolites that are able to change the genomic and epigenomic landscape of the cell 33 . Taking into account that the cancer genome can be considered as a complex network of mutually regulating genes, this network can lose stability and can also, under certain conditions, produce hundreds of stable equilibrium states termed as attractors [34][35][36] . In this view, the new universal model of cancer transformation and development has been established and schematically presented in Fig. 7. This new model can be considered a significant extension, improvement and unification of the proposal presented in 37 . There can be distinguished two types of attractors (i.e. cancer cell-fate attractors and genome attractors) in accordance with this new model (Fig. 7).
In accordance with Fig. 7 two types of cancer development can be distinguished during cancer development, i.e. the vertical and horizontal development of cancer. Vertical cancer development is based on step-by-step changes of cell-fate attractors. Horizontal cancer development is based on step-by-step changes of genome attractors. The vertical development always occurs during cancer development, while the horizontal development of cancer is optional, i.e. it may or may not occur (additionally see schemas of cancer transformation and development presented in Figs. 8,9,10,11). This indicates that cancer development can occur as a clearly vertical development (during which vertical development occurs only). Cancer development can also occur as a mix of vertical development and horizontal development-during that type of development, horizontal development is always followed by vertical development (see Figs. 10, 11). Fig. 7 as vertical (i.e. from top to down) development, occurs through step-by-step changes of cell-fates by cancer clones. This type of cancer development occurs without a change of genome attractor and can occur without DNA mutations (see Remark 3).

Remark 3
Vertical development of cancer is driven by 'non-genetic instability' , i.e. it is driven by instability of the phenotype 38 . Vertical cancer development occurs as an adaptation of cell-fates to external (environment) factors. It can also occur as an adaptation of cell-fates to internal factors, among others, random changes of genome (but without occurrence of genome re-organization) by elevated level of ROS. That means that vertical cancer development occurs without a change of genome attractor and can occur without DNA mutations (additionally see schemas of cancer development (Figs. 10, 11)).
After destabilization of current cell-fate (i.e. destabilization of current gene expression program), the processes of establishing new cell-fate (i.e. establishing new gene expression program) and its stabilization are activated 39 . The process of cell-fate stabilization is schematically presented in Fig. 7 as auto-transformation to cell-fate attractor (i.e. establishing stable cell-fate means that cell-fate attains cell-fate attractor). From this point of view, cell-fate attractor can be considered as a bioenergetic state toward which a running gene expression program (coded in DNA) strives to attain stability. In view of unified cell bioenergetics (UCB), overenergization of mitochondria is one of the reasons for cancer transformation and then cancer development (for details see 37,40,41 ). Overenergization can cause the switch of current cell-fate to cancerous/atavistic cell-fate 37 . The aim of the activation of cancerous/atavistic cell-fate is to prevent overenergized mitochondria against an excessive amount of ROS (additionally see Remark 4). The reversal of cancer cells towards early protists was suggested previously [42][43][44][45][46] and formulated by some authors as the atavistic theory of cancer [47][48][49][50][51][52][53]  www.nature.com/scientificreports/ disruption from the body, occurrence of the Warburg effect and emergence of the other cancer hallmarks that include: sustained proliferative signaling, evasion of growth suppressors, resistance to cell death, enablement of replicative immortality, energy metabolism list-reprogramming, evasion of immune destruction, inducement of angiogenesis, and the activation of invasion and metastasis 54 . Cancer development leads also to the outgrowth of a clonally derived population of cancer cells. Moreover, tumors contain a repertoire of recruited cells that contribute to the acquisition of the aforementioned hallmark traits by creating a 'tumor microenvironment' 54 .

Remark 4
In accordance with unified cell bioenergetics (UCB), a stimulation of aerobic fermentation can inhibit mitochondrial NADH (mtNADH) increase and as a result inhibit ROS production. From this point of view,   www.nature.com/scientificreports/ the Warburg effect represents a cellular defense strategy that reduces the oxidative stress status of the cells 55 . According to current observations, the Warburg effect occurs even in the presence of completely functioning mitochondria and the glycolytic contribution to total ATP production does not generally exceed 50-60% 56,57 . For this reason OXPHOS (i.e. oxidative phosphorylation) substantially contributes to ATP production after cancer transformation 58 . This phenomenon allows maintenance of charging mitochondria with NADH in many cancer cell types. Moreover, gaining energy through highly intensive aerobic glycolysis that occurs after cancer transformation can additionally inhibit overenergized mitochondria discharge. As a result ROS levels are increased in many types of cancer cells, that is consistent with other research findings 59 .
As an example of cancer development through changes of cell-fate attractors can be given development which occurs after stimulation (being trapped in genome attractor) of the MCF-7 breast-cancer cell line by HRG (heregulin) 39 . It is known that HRG-stimulation induces cell differentiation 60 . HRG activates the ErbB receptor with sustained extracellular signal regulated kinase (ERK) activity. As a result, after stimulation of MCF-7 breast-cancer cells by HRG, the destabilization of current cell-fate occurs, what induces the cells leave current cell-fate attractor to be trapped in another cell-fate attractor with stabilization of new cell-fate as an outcome 39 . This type of cancer development can be considered as an adaptation (by cell-fate changes) to changes in the www.nature.com/scientificreports/ environment. From Fig. 7 it is visible that a lot of different, stable phenotypes can be obtained for each genotype. This hallmark of the way cancer develops can be described as one-genotype-many-phenotypes, with a paradigm of 1:n mapping. The ability of the regulatory control structures of a system to produce more than one stable system state is called multi-stability 38 .
Cancer genome attractors. Cancer development, presented in Fig. 7 as horizontal (i.e. from right to left) development, is based on step-by-step changes of genome attractors by cancer clones. This type of cancer development is optional (it may or may not occur). Occurrence of this type of development can be considered as a supporting (important) component of the development of some cancers leading to, among others, polyploidy and aneuploidy. Horizontal cancer development is associated with genome re-organization and is driven by genome instability (GIN). In accordance with unified cell bioenergetics (UCB), cancerous mitochondria generate an excessive amount of reactive oxygen species (ROS) (for details see 37,40,41 ). Thus, cancer cells exhibit increased levels of ROS compared to normal cells. High level of ROS causes an increase of the number of random DNA mutations 61 . As a result, the probability of changes (by these mutations) of DNA fragments that code the mechanisms responsible for monitoring the integrity of the genome increases 61 62 . Both CIN and CSI are associated with advanced stages of cancer development (characterized by, among others, increased resistance to chemotherapy and invasiveness) 62 . The chromosomal changes induced by CIN and CSI provide the driving mechanism that allows cancer cells to sample the genomic landscape 62 . The aim of the sampling is to find an aneuploid karyotype that may be transformative or best suited for growth in stressful environments 62 . This sampling is supported by the cloning mechanism. It should be noted that cancer are clonal for aneuploidy above a threshold 68 . Aneuploidy is a ubiquitous feature of cancer 69 . Generally, aneuploidy can be described as numerical or structural, depending on whether whole chromosomes or portions of chromosomes are gained or lost. Both of these are distinct from polyploidy, in which cells contain more than two complete sets of chromosomes, but always contain an exact multiple of the haploid number, so the chromosomes remain balanced 70 . Aneuploidy and polyploidy occur frequently in tumors 70 . Polyploid cells are known to display greater capacity for adaptation to environmental challenge comparing to their diploid counterparts 47 . GIN leads to occurrence of genome chaos 71 . Genome chaos is a process of complex, rapid genome re-organization that results in the formation of unstable genomes, which is followed by the potential to establish stable genomes 72 . The process of genome stabilization is schematically presented in Fig. 7 as auto-transformation to genome attractor. Establishing a stable genome means that the cell has been trapped in genome attractor. From this point of view, the cancer genome attractor can be considered as a physical state in which a genome attains stability. Occurrence of genome chaos and then genome stabilization cause the emergence of a re-organized genome, i.e. it can be said that a modified organism of the same type is emerging. This modified organism is then kept alive by establishing new cell-fate (i.e. horizontal cancer development is followed by vertical cancer development) that allows it to stay alive. In view of presented information, schema and block schema of cancer transformation can be depicted as shown in Figs (i) many carcinogens do not mutate genes; (ii) there is no functional proof that mutant genes cause cancer; (iii) mutation is fast but carcinogenesis is exceedingly slow.
In accordance with the presented schemas, cancer transformation (Figs. 8,9) and development (Figs. 10,11) can occur without mutations, only as a result of subsequent cell-fate destabilizations (issues (i) and (ii)). It should be added that cancer transformation and then development can also occur as a result of random mutations changing DNA fragments which code mechanisms responsible for monitoring the integrity of the genome, leading to GIN and consequently to genome chaos (with genome re-organization) followed by a change of genome attractor (see the "Cancer genome attractors" section). After cancer transformation, cancer development can also occur both as horizontal cancer development (as a result of subsequent genome destabilizations) followed by vertical cancer development (as a result of subsequent cell-fate destabilizations) (Figs. 7, 10, 11). After destabilization of current cell-fate of normal cell as a result of fast occurring mutations, the cells can undergo cancer transformation and as a result attain cancerous/atavistic cell-fate with a very small probability (Figs. 8,9), for this reason cancer transformation requires a long time (issue (iii)).

Conclusions
This work presents new approaches to the separation of organisms into groups that represent attractors (i.e. artificial neural network, semihomologous Dot-Matrix method and unified cell bioenergetics concept). The carried out analyzes point out that pattern recognition by neural network allows for very effective and clear organism separation (see Remark 2). Semihomologous Dot-Matrix method confirms the results and is a good method for detailed attractor analyzes. Analysis of the development of normal exemplary organisms (i.e. human and yeasts) points out that the organisms get trapped in the local attractors during evolution (Figs. 1, 4) www.nature.com/scientificreports/ level in normal cells is moderate, for this reason ROS can stimulate living processes without a big impact on changes of genome attractors. Comparing to attractors of normal cells, cancer attractors are very unstable. In accordance with the proposed new universal model, cancer transformation and then development, can occur without genome re-organization (vertical development in Fig. 7), i.e. step-by-step cancer development, from one cell-fate attractor to the next cell-fate attractor (see Figs. 8, 9, 10, 11 and Remark 3). However, a higher level of ROS in cancer cells (see Remark 4) can also lead to repeated occurrences of genome chaos, and, as a result, permanent changes of genome attractors during cancer development, leading to instability of current gene expressions and (as a result) changes and stabilization of new cell-fates. When viewed from outside, there can be an impression that cancer cells want to escape from the internal ROS flame through permanent changes of genome attractors followed by an adaptation of gene expression to re-organized genome by attaining new cellfate attractors. In sum, considering this case, cancer transformation and then development can also occur as a result of genome re-organizations (horizontal development in Fig. 7), i.e. step-by-step cancer development, from one genome attractor to the next genome attractor followed by vertical development (see Figs. 8,9,10,11).

Data availability
Cytochrome b amino-acid sequences selected for this study were taken from the protein databases NCBI and Protein BLAST. All data generated or analyzed during this study are included in this article.

Code availability
All calculations presented in this article have been made using the written by the authors EvolutionXXI and dotPicker programs (freely available at https:// github. com/ biopg ms/ bioat tr). The EvolutionXXI program has been written in Java using the Joone framework. This program contains an implemented neural network. The EvolutionXXI program can be run on any platform with installed Java Virtual Machine (JVM). The dotPicker program has been written in C#. This program contains implemented multidimensional semihomologous Dot-Matrix method. The dotPicker program can be run on Windows with installed .NET Framework.