Output

Chains

MCPhylo.ChainsType
Chains(iters::Integer, params::Integer;
start::Integer=1, thin::Integer=1, chains::Integer=1,
names::Vector{T}=AbstractString[]) where {T<:AbstractString}
source
Chains(value::Array{T, 3},
start::Integer=1, thin::Integer=1,
names::Vector{W}=AbstractString[], chains::Vector{V}=Int[], moves::Vector{V}=Int[0])
where {T<:Real, U<:AbstractString, V<:Integer, W <: AbstractString}
source
Chains(value::Array{T, 3},
value2::Array{U,3};
start::Integer=1, thin::Integer=1,
names::Vector{W}=AbstractString[], chains::Vector{V}=Int[], moves::Vector{V}=Int[0])
where {T<:Real, U<:AbstractString, V<:Integer, W <: AbstractString}
source
Chains(value::Matrix{T};
start::Integer=1, thin::Integer=1,
names::Vector{U}=AbstractString[], chains::Integer=1)
where {T<:Real, U<:AbstractString}
source
Chains(value::Vector{T};
start::Integer=1, thin::Integer=1,
names::U="Param1", chains::Integer=1) where {T<:Real, U <: AbstractString}

Construct a Chains object that stores MCMC sampler output.

Returns an object of type Chains.

  • iters: total number of iterations in each sampler run, of which length(start:thin:iters) outputted iterations will be stored in the object.

  • params: number of parameters to store.

  • value: array whose first, second (optional), and third (optional) dimensions index outputted iterations, parameter elements, and runs of an MCMC sampler, respectively.

  • start: number of the first iteration to be stored.

  • thin: number of steps between consecutive iterations to be stored.

  • chains: number of simulation runs for which to store output, or indices to the runs (default: 1, 2, …).

  • names: names to assign to the parameter elements (default: "Param1", "Param2", …).

source

File I/O

Base.readMethod
Base.read(name::AbstractString, ::Type{T}) where {T<:AbstractChains}

Read a chain from an external file.

Returns an AbstractChains subtype read from an external file.

  • name : file to read or write. Recommended convention is for the file name to be specified with a .jls extension.

  • T : chain type to read.

  • c : chain to write.

source
Base.writeMethod
Base.write(name::AbstractString, c::AbstractChains)

Write a chain to an external file.

Returns a written external file containing a subtype.

  • name : file to read or write. Recommended convention is for the file name to be specified with a .jls extension.

  • T : chain type to read.

  • c : chain to write.

source
MCPhylo.readcodaMethod
readcoda(output::AbstractString, index::AbstractString)

Read MCMC sampler output generated in the CODA format by OpenBUGS. The function only retains those sampler iterations at which all model parameters were monitored.

Returns a Chains object containing the read sampler output.

  • output : text file containing the iteration numbers and sampled values for the model parameters.

  • index : text file containing the names of the parameters, followed by the first and last rows in which their output can be found in the output file.

source
MCPhylo.to_fileMethod
to_file(model::ModelChains, outpath::AbstractString)

This function writes the results of the MCMC runs into files. The destination of the files is specified using outpath. It will create a files for each chain. A params_x.log file storing each parameter sample. In this case x specifies the index of the chain. The file is compatible with MCMC analysis tools like Tracer (http://tree.bio.ed.ac.uk/software/tracer/). If in addition trees are sampled, they are stored in newick format in a file called trees_x.nwk, where x again specifies the index of the respective chain.

source

Discrete Diagnostics

MCPhylo.discretediagMethod
discretediag(c::AbstractChains; frac::Real=0.3,
              method::Symbol=:weiss, nsim::Int=1000)

Compute the convergence diagnostic for a discrete variable. Several options are available by choosing method to be one of :hangartner, :weiss, :DARBOOT, MCBOOT, :billinsgley, :billingsleyBOOT. The first four are based off of Pearson’s chi-square test of homogeneity. The diagnostic tests whether the proportion of the categories of the discrete variable are similar in each chain. The last two methods test whether the transition probabilities between each category are similar between each chain. Along with a between chain assessment of convergence, a within-chain assessment is carried out by comparing a specified fraction (frac), or window, of the beginning of a chain to the specified fraction of the end of the chain. For within-chain assessment, users should ensure that there is sufficient separation between the windows to assume that their samples are independent. A non-significant test p-value indicates convergence. Significant p-values indicate non-convergence and the possible need to discard initial samples as a burn-in sequence or to simulate additional samples.

A ChainSummary type object with parameters contained in the rows of the value field. The first three columns correspond to the test statistic, degrees of freedom, and p-value of the between-chain assessment. The next columns are the test statistic, degrees of freedom, and p-value for each chain of the within-chain assessment.

  • c : sampler output on which to perform calculations.

  • frac : proportion of iterations to include in the first window.

  • method : Specify which method to use. One of :hangartner, :weiss, :DARBOOT, MCBOOT, :billinsgley, :billingsleyBOOT`.

  • nsim : For the bootstrap methods (:DARBOOT, :MCBOOT, and :billingsleyBOOT) the number of bootstrap simulations.

source

Gelman Diagnostic

MCPhylo.gelmandiagMethod
gelmandiag(c::AbstractChains; alpha::Real=0.05, mpsrf::Bool=false,
            transform::Bool=false)

Compute the convergence diagnostics of Gelman, Rubin, and Brooks for MCMC sampler output. The diagnostics are designed to asses convergence of posterior means estimated with multiple autocorrelated samples (chains). They does so by comparing the between and within-chain variances with metrics called potential scale reduction factors (PSRF). Both univariate and multivariate factors are available to assess the convergence of parameters individually and jointly. Scale factors close to one are indicative of convergence. As a rule of thumb, convergence is concluded if the 0.975 quantile of an estimated factor is less than 1.2. Multiple chains are required for calculations. It is recommended that at least three chains be generated, each with different starting values chosen to be diffuse with respect to the anticipated posterior distribution. Use of multiple chains in the diagnostic provides for more robust assessment of convergence than is possible with single chain diagnostics.

Returns a ChainSummary type object of the form:

struct ChainSummary
  value::Array{Float64, 3}
  rownames::Vector{AbstractString}
  colnames::Vector{AbstractString}
  header::AbstractString
end

with parameters contained in the rows of the value field, and scale reduction factors and upper-limit quantiles in the first and second columns.

  • c : sampler output on which to perform calculations.

  • alpha : quantile (1 - alpha / 2) at which to estimate the upper limits of scale reduction factors.

  • mpsrf : whether to compute the multivariate potential scale reduction factor. This factor will not be calculable if any one of the parameters in the output is a linear combination of others.

  • transform : whether to apply log or logit transformations, as appropriate, to parameters in the chain to potentially produce output that is more normally distributed, an assumption of the PSRF formulations.

source

Geweke Diagnostic

MCPhylo.gewekediagFunction
gewekediag(x::Vector{T}; first::Real=0.1, last::Real=0.5,
            etype=:imse, args...) where {T<:Real}
source
gewekediag(c::AbstractChains; first::Real=0.1, last::Real=0.5,
            etype=:imse, args...)

Compute the convergence diagnostic of Geweke [37] for MCMC sampler output. The diagnostic is designed to asses convergence of posterior means estimated with autocorrelated samples. It computes a normal-based test statistic comparing the sample means in two windows containing proportions of the first and last iterations. Users should ensure that there is sufficient separation between the two windows to assume that their samples are independent. A non-significant test p-value indicates convergence. Significant p-values indicate non-convergence and the possible need to discard initial samples as a burn-in sequence or to simulate additional samples.

Returns a ChainSummary type object with parameters contained in the rows of the value field, and test Z-scores and p-values in the first and second columns. Results are chain-specific.

  • c : sampler output on which to perform calculations.

  • first : proportion of iterations to include in the first window.

  • last : proportion of iterations to include in the last window.

  • etype : method for computing Monte Carlo standard errors. See mcse() for options.

  • args... : additional arguments to be passed to the etype method.

  • x : vector on which to perform calculations.

source

Heidel Diagnostic

MCPhylo.heideldiagFunction
heideldiag(x::Vector{T}; alpha::Real=0.05, eps::Real=0.1,
            etype=:imse, start::Integer=1, args...) where {T<:Real}
source
heideldiag(c::AbstractChains; alpha::Real=0.05, eps::Real=0.1,
            etype=:imse, args...)

Compute the convergence diagnostic of Heidelberger and Welch for MCMC sampler output. The diagnostic is designed to assess convergence of posterior means estimated with autocorrelated samples and to determine whether a target degree of accuracy is achieved. A stationarity test is performed for convergence assessment by iteratively discarding 10% of the initial samples until the test p-value is non-significant and stationarity is concluded or until 50% have been discarded and stationarity is rejected, whichever occurs first. Then, a halfwidth test is performed by calculating the relative halfwidth of a posterior mean estimation interval as $z_{1 - \alpha / 2} \hat{s} / |\bar{\theta}|$; where $z$ is a standard normal quantile, $\hat{s}$ is the Monte Carlo standard error, and $\bar{\theta}$ is the estimated posterior mean. If the relative halfwidth is greater than a target ratio, the test is rejected. Rejection of the stationarity or halfwidth test suggests that additional samples are needed.

Returns a ChainSummary type object with parameters contained in the rows of the value field, and numbers of burn-in sequences to discard, whether the stationarity tests are passed (1 = yes, 0 = no), their p-values ($p > \alpha$ implies stationarity), posterior means, halfwidths of their $(1 - \alpha) 100\%$ estimation intervals, and whether the halfwidth tests are passed (1 = yes, 0 = no) in the columns. Results are chain-specific.

  • c : sampler output on which to perform calculations.

  • alpha : significance level for evaluations of stationarity tests and calculations of relative estimation interval halfwidths.

  • eps : target ratio for the relative halfwidths.

  • etype : method for computing Monte Carlo standard errors. See mcse() for options.

  • args... : additional arguments to be passed to the etype method.

  • x : vector on which to perform calculations.

  • start : ??

source

Raftery and Lewis Diagnostic

MCPhylo.rafterydiagFunction
rafterydiag(x::Vector{T}; q::Real=0.025, r::Real=0.005,
                  s::Real=0.95, eps::Real=0.001,
                  range::AbstractRange=1:1:length(x)) where {T<:Real}
source
rafterydiag(c::AbstractChains; q::Real=0.025, r::Real=0.005,
                 s::Real=0.95, eps::Real=0.001)

Compute the convergence diagnostic of Raftery and Lewis for MCMC sampler output. The diagnostic is designed to determine the number of autocorrelated samples required to estimate a specified quantile $\theta_q$, such that $\Pr(\theta \le \theta_q) = q$, within a desired degree of accuracy. In particular, if $\hat{\theta}_q$ is the estimand and $\Pr(\theta \le \hat{\theta}_q) = \hat{P}_q$ the estimated cumulative probability, then accuracy is specified in terms of r and s, where $\Pr(q - r < \hat{P}_q < q + r) = s$. Thinning may be employed in the calculation of the diagnostic to satisfy its underlying assumptions. However, users may not want to apply the same (or any) thinning when estimating posterior summary statistics because doing so results in a loss of information. Accordingly, sample sizes estimated by the diagnostic tend to be conservative (too large).

Returns a ChainSummary type object with parameters contained in the rows of the value field, and thinning intervals employed, numbers of samples to discard as burn-in sequences, total numbers $(N)$ to burn-in and retain, numbers of independent samples that would be needed $(Nmin)$, and dependence factors $(N / Nmin)$ in the columns. Results are chain-specific.

  • c : sampler output on which to perform calculations.

  • q : posterior quantile of interest.

  • r : margin of error for estimated cumulative probabilities.

  • s : probability for the margin of error.

  • eps : tolerance within which the probabilities of transitioning from initial to retained iterations are within the equilibrium probabilities for the chain. This argument determines the number of samples to discard as a burn-in sequence and is typically left at its default value.

  • x : vector on which to perform calculations.

  • range : ??

source

Average Standard Deviation of Split Frequencies

MCPhylo.ASDSFFunction
ASDSF(args::String...; freq::Int64=1, check_leaves::Bool=true,
      min_splits::Float64=0.1, show_progress::Bool=true)::Vector{Float64}

Calculate the average standard deviation of split frequencies for two or more files containing newick representations of trees. Default frequency is 1 and by default only trees with the same leafsets are supported. The default minimal splits threshold is 0.1. The progress bar is activated by default.

source
ASDSF(args::Vector{String}...; freq::Int64=1, check_leaves::Bool=true,
      min_splits::Float64=0.1, show_progress::Bool=true)::Vector{Float64}

Calculate the average standard deviation of split frequencies for two or more Vectors containing newick representations of trees. Default frequency is 1 and by default only trees with the same leafsets are supported. The default minimal splits threshold is 0.1. The progress bar is activated by default.

source
ASDSF(model::ModelChains; freq::Int64=1, check_leaves::Bool=true,
      min_splits::Float64=0.1, show_progress::Bool=true
      )::Vector{Vector{Float64}}

Calculate the average standard deviation of split frequencies for the trees in different chains in a ModelChains object. Default frequency is 1 and by default only trees with the same leafsets are supported. The default minimal splits threshold is 0.1. The progress bar is activated by default.

source
ASDSF(r_channels::Vector{RemoteChannel}, n_trees::Int64,
      tree_dims::UnitRange{Int64}, min_splits::Float64
      )::Tuple{Vector{Vector{Float64}}, ConvergenceStorage}

–- INTERNAL –- Calculates - on-the-fly - the average standard deviation of split frequencies for the trees generated by MCMC draws from a model. Takes a vector of remote channels (where the generated trees are stored during the mcmc simulation) and the total number of trees in each chain as arguments. The default minimal splits threshold is 0.1.

source

Monte Carlo Standard Errors

MCPhylo.mcseFunction
mcse(x::Vector{T}, method::Symbol=:imse; args...) where {T<:Real}

Compute Monte Carlo standard errors.

Returns the numeric standard error value.

  • x : time series of values on which to perform calculations.

  • method : method used for the calculations. Options are

    • :bm : batch means, with optional argument size::Integer=100 determining the number of sequential values to include in each batch. This method requires that the number of values in x is at least 2 times the batch size.

    • :imse : initial monotone sequence estimator.

    • :ipse : initial positive sequence estimator.

  • args... : additional arguments for the calculation method.

source

ModelChains

Model-Based Inference

Distributions.logpdfMethod
logpdf(mc::ModelChains, f::Function, nodekeys::Vector{Symbol})

Compute the sum of log-densities at each iteration of MCMC output for stochastic nodes.

Returns a ModelChains object of resulting summed log-densities at each MCMC iteration of the supplied chain.

  • mc : sampler output from a model fit with the mcmc()` function.

  • nodekey/nodekeys : stochastic model node(s) over which to sum densities (default: all).

  • f : ??

source
MCPhylo.dicMethod
dic(mc::ModelChains)

Compute the Deviance Information Criterion (DIC) of Spiegelhalter et al. and Gelman et al. from MCMC sampler output.

Returns a ChainSummary type object with DIC results from the methods of Spiegelhalter and Gelman in the first and second rows of the value field, and the DIC value and effective numbers of parameters in the first and second columns; where

$\text{DIC} = -2 \mathcal{L}(\bar{\Theta}) + 2 p,$

such that $\mathcal{L}(\bar{\Theta})$ is the log-likelihood of model outputs given the expected values of model parameters $\Theta$, and $p$ is the effective number of parameters. The latter is defined as $p_D = -2 \bar{\mathcal{L}}(\Theta) + 2 \mathcal{L}(\bar{\Theta})$ for the method of Spiegelhalter and as $p_V = \frac{1}{2} \operatorname{var}(-2 \mathcal{L}(\Theta))$ for the method of Gelman. Results are for all chains combined.

  • mc : sampler output from a model fit with the mcmc() function.
source
StatsAPI.predictFunction
predict(mc::ModelChains,
         nodekeys::Vector{Symbol}=keys(mc.model, :output))

Generate MCMC draws from a posterior predictive distribution.

Returns a ModelChains object of draws simulated at each MCMC iteration of the supplied chain. For observed data node $y$, simulation is from the posterior predictive distribution

$p(\tilde{y} | y) = \int p(\tilde{y} | \Theta) p(\Theta | y) d\Theta,$

where $\tilde{y}$ is an unknown observation on the node, $p(\tilde{y} | \Theta)$ is the data likelihood, and $p(\Theta | y)$ is the posterior distribution of unobserved parameters $\Theta$.

  • mc : sampler output from a model fit with the mcmc()` function.

  • nodekey/nodekeys : observed Stochastic model node(s) for which to generate draws from the predictive distribution (default: all observed data nodes).

source

Plot

MCPhylo.bar_intMethod
bar_int(c::AbstractChains, indeces::Vector{Int64}; args...)::Plots.Plot

–- INTERNAL –- Helper function for creating barplots.

source
MCPhylo.check_filenameMethod
check_filename(filename, fmt, plots)

–- INTERNAL –- Helper function that checks if a user-given filename is valid, and saves the created plots to that file.

source
MCPhylo.check_varsMethod

checkvars(simnames::Vector{AbstractString}, vars::Vector{String})::Vector{Int64}

–- INTERNAL –- Helper function that returns a list of indeces that correspond to specific variables. Only those variables are then plotted in the following steps.

source
MCPhylo.mixeddensityplotMethod
mixeddensityplot(c::AbstractChains,, indeces::Vector{Int64};
                 barbounds::Tuple{Real, Real}=(0, Inf), args...):Plots.Plot

–- INTERNAL –- Helper function that creates a barplot for each discrete and a density for each continuous variable.

source
MCPhylo.plot_asdsfMethod

plot_asdsf(c::AbstractChains; args...)::Plots.Plot

–- INTERNAL –- Plot the ASDSF values of an Abstract Chains object.

source
RecipesBase.plotFunction
plot(c::AbstractChains, ptype::Vector{Symbol}=[:trace, :density];
   <keyword arguments>)::Array{Plots.Plot}

Function that takes a MCMC chain and creates various different plots (trace & density by default).

Arguments

  • 'vars::Vector{String}=String[]': specifies the variables of the chain that are plotted. Trick: Instead of inputting [varname[1], varname[2], varname[3], ...] you can simply write [varname]. The function will understand that you want all variables with that name.

  • 'filename::String=""': when given, the plots will be saved to a file

  • 'fmt::Symbol=:pdf': Format of the output file

  • 'fuse::Bool=false': Fuse all of the plots into one big plot, instead of displaying each of the different plot types separately

  • 'f_layout=nothing': Layout for the fused plot.

  • 'fsize::Tuple(Number, Number)=(0,0)': Size of the fused plot.

  • 'force::Bool=false': Force plotting of more than 20 variables.

  • 'args...': This includes specific arguments for the different plot types , like the number of bins for the contourplot or if the barplots bars should be stacked or not. Check the specific plot functions to use these arguments.

          Plot attributes from the Plots.jl package can also be passed here:
          I.e. try passing background_color=:black for a black background on
          your plots. List of attributes here,
          https://docs.juliaplots.org/latest/generated/attributes_series/
          https://docs.juliaplots.org/latest/generated/attributes_plot/
          https://docs.juliaplots.org/latest/generated/attributes_subplot/
          https://docs.juliaplots.org/latest/generated/attributes_axis/
          though not all are supported:
          https://docs.juliaplots.org/latest/generated/supported/
source

Posterior Summary Statistics

DataAPI.describeMethod
describe(io::IO, c::AbstractChains;
              q::Vector=[0.025, 0.25, 0.5, 0.75, 0.975], etype=:bm, args...)

Compute summary statistics for MCMC sampler output.

Returns results from calls to summarystats(c, etype, args...) and quantile(c, q) are printed for all chains combined, and a value of nothing is returned.

  • c : sampler output on which to perform calculations.

  • q : probabilities at which to calculate quantiles.

  • etype : method for computing Monte Carlo standard errors. See mcse() for options.

  • args... : additional arguments to be passed to the etype method.

source
MCPhylo.changerateMethod
changerate(c::AbstractChains)

Estimate the probability, or rate per iteration, $\Pr(\theta^i \ne \theta^{i-1})$ of a state space change for iterations $i = 2, \ldots, N$ in MCMC sampler output. Estimation is performed for each parameter univariately as well as for the full parameter vector multivariately. For continuous output generated from samplers, like Metropolis-Hastings, whose algorithms conditionally accept candidate draws, the probability can be viewed as the acceptance rate.

Returns a ChainSummary type object with parameters in the rows of the value field, and the estimated rates in the column. Results are for all chains combined.

  • c : sampler output on which to perform calculations.
source
MCPhylo.hpdMethod
hpd(x::Vector{T}; alpha::Real=0.05) where {T<:Real}

Compute highest posterior density (HPD) intervals of Chen and Shao [16] for MCMC sampler output. HPD intervals have the desirable property of being the smallest intervals that contain a given probability. However, their calculation assumes unimodal marginal posterior distributions, and they are not invariant to transformations of parameters like central (quantile-based) posterior intervals.

Returns a ChainSummary type object with parameters contained in the rows of the value field, and lower and upper intervals in the first and second columns. Results are for all chains combined.

  • c : sampler output on which to perform calculations.

  • alpha : the 100 * (1 - alpha)% interval to compute.

  • x : vector on which to perform calculations.

source
Statistics.corMethod
cor(c::AbstractChains)

Compute cross-correlations for MCMC sampler output.

Returns a ChainSummary type object with the first and second dimensions of the value field indexing the model parameters between which correlations. Results are for all chains combined.

  • c : sampler output on which to perform calculations.
source
Statistics.quantileMethod
quantile(c::AbstractChains; q::Vector=[0.025, 0.25, 0.5, 0.75, 0.975])

Compute posterior quantiles for MCMC sampler output.

Returns a ChainSummary type object with parameters contained in the rows of the value field, and quantiles in the columns. Results are for all chains combined.

  • c : sampler output on which to perform calculations.

  • q : probabilities at which to compute quantiles.

source
StatsBase.autocorMethod
autocor(c::AbstractChains; lags::Vector=[1, 5, 10, 50],
         relative::Bool=true)

Compute lag-k autocorrelations for MCMC sampler output.

Returns a ChainSummary type object with model parameters indexed by the first dimension of value, lag-autocorrelations by the second, and chains by the third.

  • c : sampler output on which to perform calculations.

  • lags : lags at which to compute autocorrelations.

  • relative : whether the lags are relative to the thinning interval of the output (true) or relative to the absolute iteration numbers (false).

source
StatsBase.summarystatsMethod
summarystats(c::AbstractChains; etype=:bm, args...)

Compute posterior summary statistics for MCMC sampler output.

Returns a ChainSummary type object with parameters in the rows of the value field; and the sample mean, standard deviation, standard error, Monte Carlo standard error, and effective sample size in the columns. Results are for all chains combined.

  • c : sampler output on which to perform calculations.

  • etype : method for computing Monte Carlo standard errors. See mcse() for options.

  • args... : additional arguments to be passed to the etype method.

source