Table 1 Features used in the machine-learning framework.

From: Harnessing machine learning to guide phylogenetic-tree search algorithms

# Feature Feature name Details Represented action Tree considered
1 Total branch lengths The sum of branch lengths in the starting tree Shared for pruning and regrafting Initial tree
(a in Fig. 1)
2 Longest branch The length of the longest branch in the starting tree
3–4 Branch length The length of the branch that was being pruned or regrafted Both pruning and regrafting
5 Topology distance from the pruned node The number of branches in the path between the regrafting and the pruning branches, not including these branches Regrafting only
6 Branch length distance from the pruned node The sum of branches in the path between the regrafting and the pruning branches, not including these branches
7 New branch length The approximated length of the new branch formed due to pruning (see Supplementary Note 1 for feature extraction details)
8–11 Number of species The number of leaves in the four subtrees Both pruning and regrafting Each of the four
subtrees (b, c, c1, c2 in Fig. 1)
12–15 Total branch lengths The sum of branch lengths in the four subtrees
16–19 Longest branch The length of the longest branch in the four subtrees
  1. The table lists the 19 features on which the machine-learning algorithm is based, extracted for each data point. Features 1-7 are extracted from the starting tree, while the remaining features are extracted from the four subtrees in Fig. 1. Features 1 and 2 are not affected by SPR moves.