Output
Chains
MCPhylo.Chains
— TypeChains(iters::Integer, params::Integer;
start::Integer=1, thin::Integer=1, chains::Integer=1,
names::Vector{T}=AbstractString[]) where {T<:AbstractString}
Chains(value::Array{T, 3},
start::Integer=1, thin::Integer=1,
names::Vector{W}=AbstractString[], chains::Vector{V}=Int[], moves::Vector{V}=Int[0])
where {T<:Real, U<:AbstractString, V<:Integer, W <: AbstractString}
Chains(value::Array{T, 3},
value2::Array{U,3};
start::Integer=1, thin::Integer=1,
names::Vector{W}=AbstractString[], chains::Vector{V}=Int[], moves::Vector{V}=Int[0])
where {T<:Real, U<:AbstractString, V<:Integer, W <: AbstractString}
Chains(value::Matrix{T};
start::Integer=1, thin::Integer=1,
names::Vector{U}=AbstractString[], chains::Integer=1)
where {T<:Real, U<:AbstractString}
Chains(value::Vector{T};
start::Integer=1, thin::Integer=1,
names::U="Param1", chains::Integer=1) where {T<:Real, U <: AbstractString}
Construct a Chains
object that stores MCMC sampler output.
Returns an object of type Chains
.
iters
: total number of iterations in each sampler run, of whichlength(start:thin:iters)
outputted iterations will be stored in the object.params
: number of parameters to store.value
: array whose first, second (optional), and third (optional) dimensions index outputted iterations, parameter elements, and runs of an MCMC sampler, respectively.start
: number of the first iteration to be stored.thin
: number of steps between consecutive iterations to be stored.chains
: number of simulation runs for which to store output, or indices to the runs (default: 1, 2, …).names
: names to assign to the parameter elements (default:"Param1"
,"Param2"
, …).
File I/O
Base.read
— MethodBase.read(name::AbstractString, ::Type{T}) where {T<:AbstractChains}
Read a chain from an external file.
Returns an AbstractChains
subtype read from an external file.
name
: file to read or write. Recommended convention is for the file name to be specified with a.jls
extension.T
: chain type to read.c
: chain to write.
Base.write
— MethodBase.write(name::AbstractString, c::AbstractChains)
Write a chain to an external file.
Returns a written external file containing a subtype.
name
: file to read or write. Recommended convention is for the file name to be specified with a.jls
extension.T
: chain type to read.c
: chain to write.
MCPhylo.readcoda
— Methodreadcoda(output::AbstractString, index::AbstractString)
Read MCMC sampler output generated in the CODA format by OpenBUGS. The function only retains those sampler iterations at which all model parameters were monitored.
Returns a Chains
object containing the read sampler output.
output
: text file containing the iteration numbers and sampled values for the model parameters.index
: text file containing the names of the parameters, followed by the first and last rows in which their output can be found in theoutput
file.
MCPhylo.to_file
— Methodto_file(model::ModelChains, outpath::AbstractString)
This function writes the results of the MCMC runs into files. The destination of the files is specified using outpath
. It will create a files for each chain. A params_x.log
file storing each parameter sample. In this case x
specifies the index of the chain. The file is compatible with MCMC analysis tools like Tracer
(http://tree.bio.ed.ac.uk/software/tracer/). If in addition trees are sampled, they are stored in newick format in a file called trees_x.nwk
, where x
again specifies the index of the respective chain.
Discrete Diagnostics
MCPhylo.discretediag
— Methoddiscretediag(c::AbstractChains; frac::Real=0.3,
method::Symbol=:weiss, nsim::Int=1000)
Compute the convergence diagnostic for a discrete variable. Several options are available by choosing method to be one of :hangartner, :weiss, :DARBOOT, MCBOOT, :billinsgley, :billingsleyBOOT. The first four are based off of Pearson’s chi-square test of homogeneity. The diagnostic tests whether the proportion of the categories of the discrete variable are similar in each chain. The last two methods test whether the transition probabilities between each category are similar between each chain. Along with a between chain assessment of convergence, a within-chain assessment is carried out by comparing a specified fraction (frac), or window, of the beginning of a chain to the specified fraction of the end of the chain. For within-chain assessment, users should ensure that there is sufficient separation between the windows to assume that their samples are independent. A non-significant test p-value indicates convergence. Significant p-values indicate non-convergence and the possible need to discard initial samples as a burn-in sequence or to simulate additional samples.
A ChainSummary
type object with parameters contained in the rows of the value
field. The first three columns correspond to the test statistic, degrees of freedom, and p-value of the between-chain assessment. The next columns are the test statistic, degrees of freedom, and p-value for each chain of the within-chain assessment.
c
: sampler output on which to perform calculations.frac
: proportion of iterations to include in the first window.method
: Specify which method to use. One of :hangartner, :weiss, :DARBOOT, MCBOOT, :billinsgley, :billingsleyBOOT`.nsim
: For the bootstrap methods (:DARBOOT, :MCBOOT, and :billingsleyBOOT) the number of bootstrap simulations.
Gelman Diagnostic
MCPhylo.gelmandiag
— Methodgelmandiag(c::AbstractChains; alpha::Real=0.05, mpsrf::Bool=false,
transform::Bool=false)
Compute the convergence diagnostics of Gelman, Rubin, and Brooks for MCMC sampler output. The diagnostics are designed to asses convergence of posterior means estimated with multiple autocorrelated samples (chains). They does so by comparing the between and within-chain variances with metrics called potential scale reduction factors (PSRF). Both univariate and multivariate factors are available to assess the convergence of parameters individually and jointly. Scale factors close to one are indicative of convergence. As a rule of thumb, convergence is concluded if the 0.975 quantile of an estimated factor is less than 1.2. Multiple chains are required for calculations. It is recommended that at least three chains be generated, each with different starting values chosen to be diffuse with respect to the anticipated posterior distribution. Use of multiple chains in the diagnostic provides for more robust assessment of convergence than is possible with single chain diagnostics.
Returns a ChainSummary
type object of the form:
struct ChainSummary
value::Array{Float64, 3}
rownames::Vector{AbstractString}
colnames::Vector{AbstractString}
header::AbstractString
end
with parameters contained in the rows of the value
field, and scale reduction factors and upper-limit quantiles in the first and second columns.
c
: sampler output on which to perform calculations.alpha
: quantile (1 - alpha / 2
) at which to estimate the upper limits of scale reduction factors.mpsrf
: whether to compute the multivariate potential scale reduction factor. This factor will not be calculable if any one of the parameters in the output is a linear combination of others.transform
: whether to apply log or logit transformations, as appropriate, to parameters in the chain to potentially produce output that is more normally distributed, an assumption of the PSRF formulations.
Geweke Diagnostic
MCPhylo.gewekediag
— Functiongewekediag(x::Vector{T}; first::Real=0.1, last::Real=0.5,
etype=:imse, args...) where {T<:Real}
gewekediag(c::AbstractChains; first::Real=0.1, last::Real=0.5,
etype=:imse, args...)
Compute the convergence diagnostic of Geweke [37] for MCMC sampler output. The diagnostic is designed to asses convergence of posterior means estimated with autocorrelated samples. It computes a normal-based test statistic comparing the sample means in two windows containing proportions of the first and last iterations. Users should ensure that there is sufficient separation between the two windows to assume that their samples are independent. A non-significant test p-value indicates convergence. Significant p-values indicate non-convergence and the possible need to discard initial samples as a burn-in sequence or to simulate additional samples.
Returns a ChainSummary
type object with parameters contained in the rows of the value
field, and test Z-scores and p-values in the first and second columns. Results are chain-specific.
c
: sampler output on which to perform calculations.first
: proportion of iterations to include in the first window.last
: proportion of iterations to include in the last window.etype
: method for computing Monte Carlo standard errors. Seemcse()
for options.args...
: additional arguments to be passed to theetype
method.x
: vector on which to perform calculations.
Heidel Diagnostic
MCPhylo.heideldiag
— Functionheideldiag(x::Vector{T}; alpha::Real=0.05, eps::Real=0.1,
etype=:imse, start::Integer=1, args...) where {T<:Real}
heideldiag(c::AbstractChains; alpha::Real=0.05, eps::Real=0.1,
etype=:imse, args...)
Compute the convergence diagnostic of Heidelberger and Welch for MCMC sampler output. The diagnostic is designed to assess convergence of posterior means estimated with autocorrelated samples and to determine whether a target degree of accuracy is achieved. A stationarity test is performed for convergence assessment by iteratively discarding 10% of the initial samples until the test p-value is non-significant and stationarity is concluded or until 50% have been discarded and stationarity is rejected, whichever occurs first. Then, a halfwidth test is performed by calculating the relative halfwidth of a posterior mean estimation interval as $z_{1 - \alpha / 2} \hat{s} / |\bar{\theta}|$; where $z$ is a standard normal quantile, $\hat{s}$ is the Monte Carlo standard error, and $\bar{\theta}$ is the estimated posterior mean. If the relative halfwidth is greater than a target ratio, the test is rejected. Rejection of the stationarity or halfwidth test suggests that additional samples are needed.
Returns a ChainSummary
type object with parameters contained in the rows of the value
field, and numbers of burn-in sequences to discard, whether the stationarity tests are passed (1 = yes, 0 = no), their p-values ($p > \alpha$ implies stationarity), posterior means, halfwidths of their $(1 - \alpha) 100\%$ estimation intervals, and whether the halfwidth tests are passed (1 = yes, 0 = no) in the columns. Results are chain-specific.
c
: sampler output on which to perform calculations.alpha
: significance level for evaluations of stationarity tests and calculations of relative estimation interval halfwidths.eps
: target ratio for the relative halfwidths.etype
: method for computing Monte Carlo standard errors. Seemcse()
for options.args...
: additional arguments to be passed to theetype
method.x
: vector on which to perform calculations.start
: ??
Raftery and Lewis Diagnostic
MCPhylo.rafterydiag
— Functionrafterydiag(x::Vector{T}; q::Real=0.025, r::Real=0.005,
s::Real=0.95, eps::Real=0.001,
range::AbstractRange=1:1:length(x)) where {T<:Real}
rafterydiag(c::AbstractChains; q::Real=0.025, r::Real=0.005,
s::Real=0.95, eps::Real=0.001)
Compute the convergence diagnostic of Raftery and Lewis for MCMC sampler output. The diagnostic is designed to determine the number of autocorrelated samples required to estimate a specified quantile $\theta_q$, such that $\Pr(\theta \le \theta_q) = q$, within a desired degree of accuracy. In particular, if $\hat{\theta}_q$ is the estimand and $\Pr(\theta \le \hat{\theta}_q) = \hat{P}_q$ the estimated cumulative probability, then accuracy is specified in terms of r and s, where $\Pr(q - r < \hat{P}_q < q + r) = s$. Thinning may be employed in the calculation of the diagnostic to satisfy its underlying assumptions. However, users may not want to apply the same (or any) thinning when estimating posterior summary statistics because doing so results in a loss of information. Accordingly, sample sizes estimated by the diagnostic tend to be conservative (too large).
Returns a ChainSummary
type object with parameters contained in the rows of the value field, and thinning intervals employed, numbers of samples to discard as burn-in sequences, total numbers $(N)$ to burn-in and retain, numbers of independent samples that would be needed $(Nmin)$, and dependence factors $(N / Nmin)$ in the columns. Results are chain-specific.
c
: sampler output on which to perform calculations.q
: posterior quantile of interest.r
: margin of error for estimated cumulative probabilities.s
: probability for the margin of error.eps
: tolerance within which the probabilities of transitioning from initial to retained iterations are within the equilibrium probabilities for the chain. This argument determines the number of samples to discard as a burn-in sequence and is typically left at its default value.x
: vector on which to perform calculations.range
: ??
Average Standard Deviation of Split Frequencies
MCPhylo.ASDSF
— FunctionASDSF(args::String...; freq::Int64=1, check_leaves::Bool=true,
min_splits::Float64=0.1, show_progress::Bool=true)::Vector{Float64}
Calculate the average standard deviation of split frequencies for two or more files containing newick representations of trees. Default frequency is 1 and by default only trees with the same leafsets are supported. The default minimal splits threshold is 0.1. The progress bar is activated by default.
ASDSF(args::Vector{String}...; freq::Int64=1, check_leaves::Bool=true,
min_splits::Float64=0.1, show_progress::Bool=true)::Vector{Float64}
Calculate the average standard deviation of split frequencies for two or more Vectors containing newick representations of trees. Default frequency is 1 and by default only trees with the same leafsets are supported. The default minimal splits threshold is 0.1. The progress bar is activated by default.
ASDSF(model::ModelChains; freq::Int64=1, check_leaves::Bool=true,
min_splits::Float64=0.1, show_progress::Bool=true
)::Vector{Vector{Float64}}
Calculate the average standard deviation of split frequencies for the trees in different chains in a ModelChains object. Default frequency is 1 and by default only trees with the same leafsets are supported. The default minimal splits threshold is 0.1. The progress bar is activated by default.
ASDSF(r_channels::Vector{RemoteChannel}, n_trees::Int64,
tree_dims::UnitRange{Int64}, min_splits::Float64
)::Tuple{Vector{Vector{Float64}}, ConvergenceStorage}
–- INTERNAL –- Calculates - on-the-fly - the average standard deviation of split frequencies for the trees generated by MCMC draws from a model. Takes a vector of remote channels (where the generated trees are stored during the mcmc simulation) and the total number of trees in each chain as arguments. The default minimal splits threshold is 0.1.
Monte Carlo Standard Errors
MCPhylo.mcse
— Functionmcse(x::Vector{T}, method::Symbol=:imse; args...) where {T<:Real}
Compute Monte Carlo standard errors.
Returns the numeric standard error value.
x
: time series of values on which to perform calculations.method
: method used for the calculations. Options are:bm
: batch means, with optional argumentsize::Integer=100
determining the number of sequential values to include in each batch. This method requires that the number of values inx
is at least 2 times the batch size.:imse
: initial monotone sequence estimator.:ipse
: initial positive sequence estimator.
args...
: additional arguments for the calculation method.
ModelChains
MCPhylo.ModelChains
— TypeModelChains(c::Chains, m::Model)
See Chains()
.
Model-Based Inference
Distributions.logpdf
— Functionlogpdf(mc::ModelChains,
nodekeys::Vector{Symbol}=keys(mc.model, :stochastic))
Distributions.logpdf
— Methodlogpdf(mc::ModelChains, f::Function, nodekeys::Vector{Symbol})
Compute the sum of log-densities at each iteration of MCMC output for stochastic nodes.
Returns a ModelChains
object of resulting summed log-densities at each MCMC iteration of the supplied chain.
mc
: sampler output from a model fit with themcmc()
` function.nodekey/nodekeys
: stochastic model node(s) over which to sum densities (default: all).f
: ??
MCPhylo.dic
— Methoddic(mc::ModelChains)
Compute the Deviance Information Criterion (DIC) of Spiegelhalter et al. and Gelman et al. from MCMC sampler output.
Returns a ChainSummary
type object with DIC results from the methods of Spiegelhalter and Gelman in the first and second rows of the value
field, and the DIC value and effective numbers of parameters in the first and second columns; where
$\text{DIC} = -2 \mathcal{L}(\bar{\Theta}) + 2 p,$
such that $\mathcal{L}(\bar{\Theta})$ is the log-likelihood of model outputs given the expected values of model parameters $\Theta$, and $p$ is the effective number of parameters. The latter is defined as $p_D = -2 \bar{\mathcal{L}}(\Theta) + 2 \mathcal{L}(\bar{\Theta})$ for the method of Spiegelhalter and as $p_V = \frac{1}{2} \operatorname{var}(-2 \mathcal{L}(\Theta))$ for the method of Gelman. Results are for all chains combined.
mc
: sampler output from a model fit with themcmc()
function.
StatsAPI.predict
— Functionpredict(mc::ModelChains,
nodekeys::Vector{Symbol}=keys(mc.model, :output))
Generate MCMC draws from a posterior predictive distribution.
Returns a ModelChains
object of draws simulated at each MCMC iteration of the supplied chain. For observed data node $y$, simulation is from the posterior predictive distribution
$p(\tilde{y} | y) = \int p(\tilde{y} | \Theta) p(\Theta | y) d\Theta,$
where $\tilde{y}$ is an unknown observation on the node, $p(\tilde{y} | \Theta)$ is the data likelihood, and $p(\Theta | y)$ is the posterior distribution of unobserved parameters $\Theta$.
mc
: sampler output from a model fit with themcmc()
` function.nodekey/nodekeys
: observed Stochastic model node(s) for which to generate draws from the predictive distribution (default: all observed data nodes).
Plot
MCPhylo.bar_int
— Methodbar_int(c::AbstractChains, indeces::Vector{Int64}; args...)::Plots.Plot
–- INTERNAL –- Helper function for creating barplots.
MCPhylo.check_filename
— Methodcheck_filename(filename, fmt, plots)
–- INTERNAL –- Helper function that checks if a user-given filename is valid, and saves the created plots to that file.
MCPhylo.check_vars
— Methodcheckvars(simnames::Vector{AbstractString}, vars::Vector{String})::Vector{Int64}
–- INTERNAL –- Helper function that returns a list of indeces that correspond to specific variables. Only those variables are then plotted in the following steps.
MCPhylo.groupedbar_fillrange
— Methodgroupedbar_fillrange(y)
–- INTERNAL –- Helper function for the groupbar plot
MCPhylo.mixeddensityplot
— Methodmixeddensityplot(c::AbstractChains,, indeces::Vector{Int64};
barbounds::Tuple{Real, Real}=(0, Inf), args...):Plots.Plot
–- INTERNAL –- Helper function that creates a barplot for each discrete and a density for each continuous variable.
MCPhylo.plot_asdsf
— Methodplot_asdsf(c::AbstractChains; args...)::Plots.Plot
–- INTERNAL –- Plot the ASDSF values of an Abstract Chains object.
RecipesBase.apply_recipe
— Methodf(vals::ASDSF_trace)
–- INTERNAL –- Recipe for trace plots of ASDSF values
RecipesBase.apply_recipe
— Methodf(acor::Autocor)
–- INTERNAL –- Recipe for Autocor plots
RecipesBase.apply_recipe
— Methodf(cont::Contour)
–- INTERNAL –- Recipe for contour plots
RecipesBase.apply_recipe
— Methodf(dens::Density)
–- INTERNAL –- Recipe for density plots
RecipesBase.apply_recipe
— Methodf(g::GroupBar; spacing=0)
–- INTERNAL –- Recipe for a grouped bar plot.
RecipesBase.apply_recipe
— Methodf(mean::Mean)
–- INTERNAL –- Recipe for mean plots
RecipesBase.apply_recipe
— Methodf(trace::Trace)
–- INTERNAL –- Recipe for trace plots
RecipesBase.plot
— Functionplot(c::AbstractChains, ptype::Vector{Symbol}=[:trace, :density];
<keyword arguments>)::Array{Plots.Plot}
Function that takes a MCMC chain and creates various different plots (trace & density by default).
Arguments
'vars::Vector{String}=String[]': specifies the variables of the chain that are plotted. Trick: Instead of inputting [varname[1], varname[2], varname[3], ...] you can simply write [varname]. The function will understand that you want all variables with that name.
'filename::String=""': when given, the plots will be saved to a file
'fmt::Symbol=:pdf': Format of the output file
'fuse::Bool=false': Fuse all of the plots into one big plot, instead of displaying each of the different plot types separately
'f_layout=nothing': Layout for the fused plot.
'fsize::Tuple(Number, Number)=(0,0)': Size of the fused plot.
'force::Bool=false': Force plotting of more than 20 variables.
'args...': This includes specific arguments for the different plot types , like the number of bins for the contourplot or if the barplots bars should be stacked or not. Check the specific plot functions to use these arguments.
Plot attributes from the Plots.jl package can also be passed here: I.e. try passing background_color=:black for a black background on your plots. List of attributes here, https://docs.juliaplots.org/latest/generated/attributes_series/ https://docs.juliaplots.org/latest/generated/attributes_plot/ https://docs.juliaplots.org/latest/generated/attributes_subplot/ https://docs.juliaplots.org/latest/generated/attributes_axis/ though not all are supported: https://docs.juliaplots.org/latest/generated/supported/
Posterior Summary Statistics
DataAPI.describe
— Methoddescribe(io::IO, c::AbstractChains;
q::Vector=[0.025, 0.25, 0.5, 0.75, 0.975], etype=:bm, args...)
Compute summary statistics for MCMC sampler output.
Returns results from calls to summarystats(c, etype, args...)
and quantile(c, q)
are printed for all chains combined, and a value of nothing
is returned.
c
: sampler output on which to perform calculations.q
: probabilities at which to calculate quantiles.etype
: method for computing Monte Carlo standard errors. Seemcse()
for options.args...
: additional arguments to be passed to theetype
method.
MCPhylo.changerate
— Methodchangerate(c::AbstractChains)
Estimate the probability, or rate per iteration, $\Pr(\theta^i \ne \theta^{i-1})$ of a state space change for iterations $i = 2, \ldots, N$ in MCMC sampler output. Estimation is performed for each parameter univariately as well as for the full parameter vector multivariately. For continuous output generated from samplers, like Metropolis-Hastings, whose algorithms conditionally accept candidate draws, the probability can be viewed as the acceptance rate.
Returns a ChainSummary
type object with parameters in the rows of the value
field, and the estimated rates in the column. Results are for all chains combined.
c
: sampler output on which to perform calculations.
MCPhylo.hpd
— Methodhpd(c::AbstractChains; alpha::Real=0.05)
MCPhylo.hpd
— Methodhpd(x::Vector{T}; alpha::Real=0.05) where {T<:Real}
Compute highest posterior density (HPD) intervals of Chen and Shao [16] for MCMC sampler output. HPD intervals have the desirable property of being the smallest intervals that contain a given probability. However, their calculation assumes unimodal marginal posterior distributions, and they are not invariant to transformations of parameters like central (quantile-based) posterior intervals.
Returns a ChainSummary
type object with parameters contained in the rows of the value field, and lower and upper intervals in the first and second columns. Results are for all chains combined.
c
: sampler output on which to perform calculations.alpha
: the100 * (1 - alpha)
% interval to compute.x
: vector on which to perform calculations.
Statistics.cor
— Methodcor(c::AbstractChains)
Compute cross-correlations for MCMC sampler output.
Returns a ChainSummary
type object with the first and second dimensions of the value
field indexing the model parameters between which correlations. Results are for all chains combined.
c
: sampler output on which to perform calculations.
Statistics.quantile
— Methodquantile(c::AbstractChains; q::Vector=[0.025, 0.25, 0.5, 0.75, 0.975])
Compute posterior quantiles for MCMC sampler output.
Returns a ChainSummary
type object with parameters contained in the rows of the value
field, and quantiles in the columns. Results are for all chains combined.
c
: sampler output on which to perform calculations.q
: probabilities at which to compute quantiles.
StatsBase.autocor
— Methodautocor(c::AbstractChains; lags::Vector=[1, 5, 10, 50],
relative::Bool=true)
Compute lag-k autocorrelations for MCMC sampler output.
Returns a ChainSummary
type object with model parameters indexed by the first dimension of value
, lag-autocorrelations by the second, and chains by the third.
c
: sampler output on which to perform calculations.lags
: lags at which to compute autocorrelations.relative
: whether the lags are relative to the thinning interval of the output (true
) or relative to the absolute iteration numbers (false
).
StatsBase.summarystats
— Methodsummarystats(c::AbstractChains; etype=:bm, args...)
Compute posterior summary statistics for MCMC sampler output.
Returns a ChainSummary
type object with parameters in the rows of the value
field; and the sample mean, standard deviation, standard error, Monte Carlo standard error, and effective sample size in the columns. Results are for all chains combined.
c
: sampler output on which to perform calculations.etype
: method for computing Monte Carlo standard errors. Seemcse()
for options.args...
: additional arguments to be passed to theetype
method.