The tree height of the leaves, i.e. the sum of the lengths of all branches connecting a leaf with the root node of the tree, was slightly but significantly (�� = 1.0e-40) negatively correlated DAPT secretase cost with both bRPD and uRPD. Even though this behavior is in obvious conflict with the second design goal, the correlation between tree height and number of nodes between root and leaf must be considered (Table 2). If the effect of the number of nodes is corrected for by replacing the tree height with the residuals from a regression with the number of nodes as explanatory variable, the correlation to the bRPD and uRPD becomes moderately strong and positive. Table 2 Correlations between the balanced (bRPD) and the unbalanced (uRPD) variant of the score for each leaf (�� Height��).

Based on these results, we concluded that both measures comply with design goals (i) and (ii), but finally preferred bRPD because it showed more well-balanced correlations with the indicator of topological isolation on the one hand and the independent effect of the branch lengths on the other hand than uRPD. But the differences between both measures were not pronounced, particularly regarding the top-scoring species; in addition to Table 2, this is shown in the scatter plot in Figure 2 and in Table 3. Figure 2 Scatterplot showing the relationship between the two examined variants of the phylogenetic scoring, bRPD (x-axis) and uRPD (y-axis). In addition to the fact that the overall correlation between the two measures is high (see also Table 2), it is obvious … Table 3 Selection results for the 20 LTP strains with the highest bRPD scores.

The number of nodes between the root and each leaf (��# nodes��) and the residuals of a linear regression with the number of nodes as explanatory and the height as dependent variable (��Residual��). These residuals represent the average impact of the branch lengths, independent of the number of branches that contribute to the height. The lower left triangle shows Kendall’s correlation coefficients, the upper right triangle shows the corresponding p values. Selection of targets for genome sequencing In addition to the close correspondence between the two measures, Figure 2 demonstrates that the distribution of both bRPD and uRPD is strongly asymmetric, as comparatively few strains (close to upper right corner) display very high values compared to the bulk of the strains which show at most moderately high bRPD and uRPD measures (close to the lower left corner). This behavior is confirmed by Figure 3, which shows that 50% saturation regarding bRPD would already be obtained if only about 2,000 of the 8,029 Dacomitinib strains were genome sequenced. Figure 3 Saturation plot for the bRPD measure.