Tree Building & Transformations
The tree building methods listed here ensure that the nodes of the tree(s) they build are fully initialized, i.e. they have a unique number, binary representation and a height. Therefore there is no need to run initialize_tree! or update_tree! after running them.
Matrix Representation
MCPhyloTree.leave_incidence_matrix
— Methodfunction leave_incidence_matrix(root::G)::Matrix{Float64} where {G<:AbstractNode}
Calculate the incidence matrix of the tree whos root node is root
For a tree with $m$ leaves and $n$ vertecies this function returns an $m \times n$ matrix $L$, where $L_{ij} = 1$ if vertex $j$ is on the path from leave $i$ to the root of the tree and $0$ otherwise.
Returns leave incidence matrix.
root
: Root node of the tree
MCPhyloTree.to_covariance
— Methodto_covariance(tree::N, blv::Array{T})::Array{T,2} where {N<:AbstractNode,T<: Real}
Calcualte the variance-covariance matrix from tree
. An entry (i,j) of the matrix is defined as the length of the path connecting the latest common ancestor of i and j with the root of the tree.
Returns an Array of Real numbers.
tree
: Node in tree of interest.blv
: branchlength vector of tree.
MCPhyloTree.to_df
— Methodto_df(root::GeneralNode)::Tuple{Array{Float64}, Vector{String}}
This function returns a matrix representation of the tree structure and a vector with the column names. The entry mat[i,j]
is the length of the edge connecting node i
with node j
. Returns Tuple containing the matrix and a vector of names.
root
: root of tree used to create matrix represenation.
MCPhyloTree.to_distance_matrix
— Methodto_distance_matrix(tree::T)::Array{Float64,2} where T <:AbstractNode
Calculate the distance matrix over the set of leaves.
Returns an Array of Floats.
tree
: root node of tree used to perform caclulcation.
Newick Parsing
MCPhyloTree.ParseNewick
— MethodParseNewick(s::String)::Union{GeneralNode, Array{GeneralNode, 1}}
This function takes a string - either a filename or a newick string - and reads the file / string to return an array of trees (represented as Node objects). The file should solely consist of newick tree representations, separated by line. The function checks for proper newick formatting, and will return an error if the string / file is incorrectly formatted.
Newick string input: Returns the root of the tree represented by the newick string. Filename input: Returns an Array of Nodes; each Node is the root of the tree represented by a newick string in the file.
s
: newick string or name of file containing newick strings to parse.
Build Trees from Matrices
MCPhyloTree.from_df
— Functionfunction from_df(df::Array{T,2}, name_list::Vector{String})::GeneralNode{T, Int64} where T<:Real
This function takes an adjacency matrix and a vector of names and turns it into a tree. No checks are performed.
Returns the root node of the tree.
df
: matrix with edge weightsname_list
: a list of names such that they match the column indices of the matrix
MCPhyloTree.create_tree_from_leaves
— Functionfunction create_tree_from_leaves(leaf_nodes::Vector{String}, rooted::Bool=false<:AbstractNode
Build a random tree from a list of leaf names. The tree is unrooted by default.
Returns the root node of the new tree.
leaf_nodes
: A list of strings which are used as the names of the leaves.rooted
: Boolean indicating if the tree should be rooted
function create_tree_from_leaves(rng::Random.AbstractRNG, leaf_nodes::Vector{String}, rooted::Bool=false<:AbstractNode
Build a random tree from a list of leaf names. The tree is unrooted by default.
Returns the root node of the new tree.
leaf_nodes
: A list of strings which are used as the names of the leaves.rooted
: Boolean indicating if the tree should be rooted
MCPhyloTree.cov2tree
— Functionfunction cov2tree(covmat::Array{<:T, 2}, names::Vector{<:AbstractString}, numbers::Vector{Int64}; tol::Real=1e-7)::GeneralNode{T, Int64} where T<:Real
This function reconstructs a tree from a covariance matrix. It takes a covariance matrix, a vector of leaf names and a vector of node numbers as mandatory arguments. The order of the two vectors must correspond to the order of rows and columns in the covariance matrix. Optionally, the tol
paramter indicates the boundary below which all values are treated as zero.
Returns the root node of the tree corresponding to the supplied covariance matrix.
covmat
: covariance matrixnames
: a list of names such that they match the column/row indices of the matrixnumbers
: a list of Integers such that they match the column/row indices of the matrixtol
: cut off value below which all values are treated as zero
MCPhyloTree.from_leave_incidence_matrix
— Functionfrom_leave_incidence_matrix(lm::A, names) where A<:AbstractArray{<:Real, 2}
Build the tree which is specified through a leave incidence matrix. The function $leave_incidence_matrix$ from this package creates such a matrix.
Returns the root node of the tree build from the matrix.
lm
: leave incidence matrixnames
: list of names for the leaves (in order of the rows)
from_leave_incidence_matrix(lm::A, names, blv::Vector{<:AbstractFloat}) where A<:AbstractArray{<:Real, 2}
Build the tree which is specified through a leave incidence matrix. The function $leave_incidence_matrix$ from this package creates such a matrix. This function additionally takes a vector of branch lengths, which are assigend to the reconstructed tree.
Returns the root node of the tree build from the matrix.
lm
: leave incidence matrixnames
: list of names for the leaves (in order of the rows)blv
: vector of branch lengths used for this tree
Tree Estimation from Matrices
MCPhyloTree.neighbor_joining
— Functionneighbor_joining(dm::Array{Float64,2}, Array{String,1})
This function returns a phylogenetic tree by using neighbor-joining based on a given distance matrix and an array of leaf names.
Returns a node of the resulting tree, from which it can be traversed.
dm
: Matrix used to create Tree.leaf_names
: Array containing names of leaf nodes.
neighbor_joining(dm::Array{Float64,2})
This function returns a phylogenetic tree by using neighbor-joining based on a given distance matrix. Creates an array of nodes to be used as leaves.
Returns a node of the resulting tree, from which it can be traversed.
dm
: Matrix from which to create tree.
MCPhyloTree.upgma
— Functionupgma(dm::Array{Float64,2}, Array{String,1})
This function returns a phylogenetic tree by using UPGMA based on a given distance matrix and an array of leaf names.
Returns a node of the resulting tree, from which it can be traversed.
dm
: Matrix from which to create the tree.leaf_names
: array of strings containing names of leaf nodes.
upgma(dm::Array{Float64,2})
This function returns a phylogenetic tree by using UPGMA based on a given distance matrix. Creates an array of nodes to be used as leaves.
Returns a node of the resulting tree, from which it can be traversed.
dm
: Matrix from which to create the tree.
Consensus Tree computation
MCPhyloTree.majority_consensus_tree
— Functionmajority_consensus_tree(trees::Vector{T}, percentage::Float64=0.5)
::T where T<:AbstractNode
Construct the majority rule consensus tree from a set of trees that share the same leafset. By default the output tree includes clusters that occur in over 50% of the trees. This can be customized when calling the function. The function returns the root node of the majority consensus tree, from which it can be traversed. The algorithm is based on section 3 and 6.1 of:
Jesper Jansson, Chuanqi Shen, and Wing-Kin Sung. 2016. Improved algorithms for constructing consensustrees. J. ACM 63, 3, Article 28 (June 2016), 24 pages https://dl.acm.org/doi/pdf/10.1145/2925985
MCPhyloTree.loose_consensus_tree
— Functionloose_consensus_tree(trees::Vector{T})::T where T<:AbstractNode
Construct the loose consensus tree from a set of trees that share the same leafset. I.e. a tree with all the clusters that appear in at least one tree and are compatible with all trees. Returns the root node of the loose consensus tree, from which it can be traversed. This algorithm is based on section 4 and 6.1 of:
Jesper Jansson, Chuanqi Shen, and Wing-Kin Sung. 2016. Improved algorithms for constructing consensustrees. J. ACM 63, 3, Article 28 (June 2016), 24 pages https://dl.acm.org/doi/pdf/10.1145/2925985
MCPhyloTree.greedy_consensus_tree
— Functiongreedy_consensus_tree(trees::Vector{T})::T where T<:AbstractNode
Construct the greedy consensus tree from a set of trees that share the same leafset. Returns the root node of the greedy consensus tree, from which it can be traversed. This algorithm is based on section 5 and 6.1 of:
Jesper Jansson, Chuanqi Shen, and Wing-Kin Sung. 2016. Improved algorithms for constructing consensustrees. J. ACM 63, 3, Article 28 (June 2016), 24 pages https://dl.acm.org/doi/pdf/10.1145/2925985
Tree Ladderizing
MCPhyloTree.ladderize_tree!
— Methodladderize_tree!(root::T, ascending::Bool=true) where T<:AbstractNode
This function ladderizes a tree inplace, i.e. sorts the nodes on all levels by the count of their descendants.
root
: root Node of tree.ascending
: Boolean, determines whether to sort in ascending (true) or descending (false) order.
MCPhyloTree.ladderize_tree
— Methodladderize_tree(root::T, ascending::Bool=true)::T where T<:AbstractNode
This function returns a ladderized copy of a tree, i.e. a copy with all the nodes on all levels sorted by the count of their descendants.
root
: root Node of tree.ascending
: Boolean, determines whether to sort in ascending (true) or descending (false) order.