Tree Building & Transformations

The tree building methods listed here ensure that the nodes of the tree(s) they build are fully initialized, i.e. they have a unique number, binary representation and a height. Therefore there is no need to run initialize_tree! or update_tree! after running them.

Matrix Representation

MCPhyloTree.leave_incidence_matrixMethod
function leave_incidence_matrix(root::G)::Matrix{Float64} where {G<:AbstractNode}

Calculate the incidence matrix of the tree whos root node is root For a tree with $m$ leaves and $n$ vertecies this function returns an $m \times n$ matrix $L$, where $L_{ij} = 1$ if vertex $j$ is on the path from leave $i$ to the root of the tree and $0$ otherwise.

Returns leave incidence matrix.

  • root : Root node of the tree
source
MCPhyloTree.to_covarianceMethod
to_covariance(tree::N, blv::Array{T})::Array{T,2} where {N<:AbstractNode,T<: Real}

Calcualte the variance-covariance matrix from tree. An entry (i,j) of the matrix is defined as the length of the path connecting the latest common ancestor of i and j with the root of the tree.

Returns an Array of Real numbers.

  • tree : Node in tree of interest.

  • blv : branchlength vector of tree.

source
MCPhyloTree.to_dfMethod

to_df(root::GeneralNode)::Tuple{Array{Float64}, Vector{String}}

This function returns a matrix representation of the tree structure and a vector with the column names. The entry mat[i,j] is the length of the edge connecting node i with node j. Returns Tuple containing the matrix and a vector of names.

  • root : root of tree used to create matrix represenation.
source
MCPhyloTree.to_distance_matrixMethod
to_distance_matrix(tree::T)::Array{Float64,2} where T <:AbstractNode

Calculate the distance matrix over the set of leaves.

Returns an Array of Floats.

  • tree : root node of tree used to perform caclulcation.
source

Newick Parsing

MCPhyloTree.ParseNewickMethod
ParseNewick(s::String)::Union{GeneralNode, Array{GeneralNode, 1}}

This function takes a string - either a filename or a newick string - and reads the file / string to return an array of trees (represented as Node objects). The file should solely consist of newick tree representations, separated by line. The function checks for proper newick formatting, and will return an error if the string / file is incorrectly formatted.

Newick string input: Returns the root of the tree represented by the newick string. Filename input: Returns an Array of Nodes; each Node is the root of the tree represented by a newick string in the file.

  • s : newick string or name of file containing newick strings to parse.
source

Build Trees from Matrices

MCPhyloTree.from_dfFunction
function from_df(df::Array{T,2}, name_list::Vector{String})::GeneralNode{T, Int64} where T<:Real

This function takes an adjacency matrix and a vector of names and turns it into a tree. No checks are performed.

Returns the root node of the tree.

  • df : matrix with edge weights
  • name_list : a list of names such that they match the column indices of the matrix
source
MCPhyloTree.create_tree_from_leavesFunction
function create_tree_from_leaves(leaf_nodes::Vector{String}, rooted::Bool=false<:AbstractNode

Build a random tree from a list of leaf names. The tree is unrooted by default.

Returns the root node of the new tree.

  • leaf_nodes : A list of strings which are used as the names of the leaves.

  • rooted : Boolean indicating if the tree should be rooted

source
function create_tree_from_leaves(rng::Random.AbstractRNG, leaf_nodes::Vector{String}, rooted::Bool=false<:AbstractNode

Build a random tree from a list of leaf names. The tree is unrooted by default.

Returns the root node of the new tree.

  • leaf_nodes : A list of strings which are used as the names of the leaves.

  • rooted : Boolean indicating if the tree should be rooted

source
MCPhyloTree.cov2treeFunction
function cov2tree(covmat::Array{<:T, 2}, names::Vector{<:AbstractString}, numbers::Vector{Int64}; tol::Real=1e-7)::GeneralNode{T, Int64} where T<:Real

This function reconstructs a tree from a covariance matrix. It takes a covariance matrix, a vector of leaf names and a vector of node numbers as mandatory arguments. The order of the two vectors must correspond to the order of rows and columns in the covariance matrix. Optionally, the tol paramter indicates the boundary below which all values are treated as zero.

Returns the root node of the tree corresponding to the supplied covariance matrix.

  • covmat : covariance matrix
  • names : a list of names such that they match the column/row indices of the matrix
  • numbers : a list of Integers such that they match the column/row indices of the matrix
  • tol : cut off value below which all values are treated as zero
source
MCPhyloTree.from_leave_incidence_matrixFunction
from_leave_incidence_matrix(lm::A, names) where A<:AbstractArray{<:Real, 2}

Build the tree which is specified through a leave incidence matrix. The function $leave_incidence_matrix$ from this package creates such a matrix.

Returns the root node of the tree build from the matrix.

  • lm : leave incidence matrix
  • names : list of names for the leaves (in order of the rows)
source
from_leave_incidence_matrix(lm::A, names, blv::Vector{<:AbstractFloat}) where A<:AbstractArray{<:Real, 2}

Build the tree which is specified through a leave incidence matrix. The function $leave_incidence_matrix$ from this package creates such a matrix. This function additionally takes a vector of branch lengths, which are assigend to the reconstructed tree.

Returns the root node of the tree build from the matrix.

  • lm : leave incidence matrix
  • names : list of names for the leaves (in order of the rows)
  • blv : vector of branch lengths used for this tree
source

Tree Estimation from Matrices

MCPhyloTree.neighbor_joiningFunction
neighbor_joining(dm::Array{Float64,2}, Array{String,1})

This function returns a phylogenetic tree by using neighbor-joining based on a given distance matrix and an array of leaf names.

Returns a node of the resulting tree, from which it can be traversed.

  • dm : Matrix used to create Tree.

  • leaf_names : Array containing names of leaf nodes.

source
neighbor_joining(dm::Array{Float64,2})

This function returns a phylogenetic tree by using neighbor-joining based on a given distance matrix. Creates an array of nodes to be used as leaves.

Returns a node of the resulting tree, from which it can be traversed.

  • dm : Matrix from which to create tree.
source
MCPhyloTree.upgmaFunction
upgma(dm::Array{Float64,2}, Array{String,1})

This function returns a phylogenetic tree by using UPGMA based on a given distance matrix and an array of leaf names.

Returns a node of the resulting tree, from which it can be traversed.

  • dm : Matrix from which to create the tree.

  • leaf_names : array of strings containing names of leaf nodes.

source
upgma(dm::Array{Float64,2})

This function returns a phylogenetic tree by using UPGMA based on a given distance matrix. Creates an array of nodes to be used as leaves.

Returns a node of the resulting tree, from which it can be traversed.

  • dm : Matrix from which to create the tree.
source

Consensus Tree computation

MCPhyloTree.majority_consensus_treeFunction
majority_consensus_tree(trees::Vector{T}, percentage::Float64=0.5)
    ::T where T<:AbstractNode

Construct the majority rule consensus tree from a set of trees that share the same leafset. By default the output tree includes clusters that occur in over 50% of the trees. This can be customized when calling the function. The function returns the root node of the majority consensus tree, from which it can be traversed. The algorithm is based on section 3 and 6.1 of:

Jesper Jansson, Chuanqi Shen, and Wing-Kin Sung. 2016. Improved algorithms for constructing consensustrees. J. ACM 63, 3, Article 28 (June 2016), 24 pages https://dl.acm.org/doi/pdf/10.1145/2925985

source
MCPhyloTree.loose_consensus_treeFunction
loose_consensus_tree(trees::Vector{T})::T where T<:AbstractNode

Construct the loose consensus tree from a set of trees that share the same leafset. I.e. a tree with all the clusters that appear in at least one tree and are compatible with all trees. Returns the root node of the loose consensus tree, from which it can be traversed. This algorithm is based on section 4 and 6.1 of:

Jesper Jansson, Chuanqi Shen, and Wing-Kin Sung. 2016. Improved algorithms for constructing consensustrees. J. ACM 63, 3, Article 28 (June 2016), 24 pages https://dl.acm.org/doi/pdf/10.1145/2925985

source
MCPhyloTree.greedy_consensus_treeFunction
greedy_consensus_tree(trees::Vector{T})::T where T<:AbstractNode

Construct the greedy consensus tree from a set of trees that share the same leafset. Returns the root node of the greedy consensus tree, from which it can be traversed. This algorithm is based on section 5 and 6.1 of:

Jesper Jansson, Chuanqi Shen, and Wing-Kin Sung. 2016. Improved algorithms for constructing consensustrees. J. ACM 63, 3, Article 28 (June 2016), 24 pages https://dl.acm.org/doi/pdf/10.1145/2925985

source

Tree Ladderizing

MCPhyloTree.ladderize_tree!Method
ladderize_tree!(root::T, ascending::Bool=true) where T<:AbstractNode

This function ladderizes a tree inplace, i.e. sorts the nodes on all levels by the count of their descendants.

  • root : root Node of tree.

  • ascending : Boolean, determines whether to sort in ascending (true) or descending (false) order.

source
MCPhyloTree.ladderize_treeMethod
ladderize_tree(root::T, ascending::Bool=true)::T where T<:AbstractNode

This function returns a ladderized copy of a tree, i.e. a copy with all the nodes on all levels sorted by the count of their descendants.

  • root : root Node of tree.

  • ascending : Boolean, determines whether to sort in ascending (true) or descending (false) order.

source