| Title: | Tools for 'outbreaker2' |
|---|---|
| Description: | Streamlines the post-processing, summarization, and visualization of 'outbreaker2' output via a suite of helper functions. Facilitates tidy manipulation of posterior samples, integration with case metadata, generation of diagnostic plots and summary statistics. |
| Authors: | Cyril Geismar [aut, cre] |
| Maintainer: | Cyril Geismar <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.0.1 |
| Built: | 2026-06-08 08:18:33 UTC |
| Source: | https://github.com/cygei/o2ools |
For each case in linelist, appends summary statistics of selected parameters
from an outbreaker_chains object (e.g. infection times, number of generations).
augment_linelist( out, linelist, params = c("t_inf", "kappa"), summary_fns = list(mean = function(x) mean(x, na.rm = TRUE), q25 = function(x) quantile(x, 0.25, na.rm = TRUE), q75 = function(x) quantile(x, 0.75, na.rm = TRUE)) )augment_linelist( out, linelist, params = c("t_inf", "kappa"), summary_fns = list(mean = function(x) mean(x, na.rm = TRUE), q25 = function(x) quantile(x, 0.25, na.rm = TRUE), q75 = function(x) quantile(x, 0.75, na.rm = TRUE)) )
out |
An |
linelist |
A |
params |
Character vector of parameter prefixes to summarise (e.g. |
summary_fns |
A named list of summary functions. Each function
takes a numeric vector and returns a single value.
Example:
|
The input linelist, with new columns named
<param>_<fn> (e.g. t_inf_mean, kappa_q25).
augmented_linelist <- augment_linelist( out, linelist, params = c("t_inf", "kappa"), summary_fns = list( median = function(x) median(x, na.rm = TRUE), q25 = function(x) quantile(x, 0.25, na.rm = TRUE), q75 = function(x) quantile(x, 0.75, na.rm = TRUE) ) )augmented_linelist <- augment_linelist( out, linelist, params = c("t_inf", "kappa"), summary_fns = list( median = function(x) median(x, na.rm = TRUE), q25 = function(x) quantile(x, 0.25, na.rm = TRUE), q75 = function(x) quantile(x, 0.75, na.rm = TRUE) ) )
Mask target columns whenever a parameter column fails a threshold test.
filter_chain(out, param, thresh, comparator = "<=", target = "alpha")filter_chain(out, param, thresh, comparator = "<=", target = "alpha")
out |
A data frame of class |
param |
Name of the parameter prefix (e.g. |
thresh |
Numeric threshold. |
comparator |
A string comparator: one of |
target |
Name of the target prefix to mask (e.g. |
An outbreaker_chains data frame with target_* entries set to NA
wherever param_* comparator thresh is FALSE.
# Mask alpha_i whenever kappa_i > 1 filter_chain(out, param = "kappa", thresh = 1, comparator = "<=", target = "alpha")# Mask alpha_i whenever kappa_i > 1 filter_chain(out, param = "kappa", thresh = 1, comparator = "<=", target = "alpha")
Accuracy is defined as the proportion of correctly assigned ancestries across the posterior sample.
get_accuracy(out, true_tree)get_accuracy(out, true_tree)
out |
An object of class |
true_tree |
A data frame with the true transmission tree, including 'from' and 'to' columns. |
A numeric vector of accuracy values for each posterior tree.
true_tree <- data.frame(from = as.character(outbreaker2::fake_outbreak$ances), to = linelist$id) get_accuracy(out, true_tree)true_tree <- data.frame(from = as.character(outbreaker2::fake_outbreak$ances), to = linelist$id) get_accuracy(out, true_tree)
Computes the most frequent ancestor for each case across the posterior sample.
get_consensus(out)get_consensus(out)
out |
An object of class |
A data frame showing the most frequent ancestor for each case.
get_consensus(out)get_consensus(out)
Computes the mean entropy of transmission trees from outbreaker2, quantifying uncertainty in inferred infectors.
By default, entropy is normalised between 0 (complete certainty) and 1 (maximum uncertainty).
get_entropy(out, normalise = TRUE)get_entropy(out, normalise = TRUE)
out |
A data frame of class |
normalise |
Logical. If |
Entropy quantifies uncertainty in inferred infectors across posterior samples using the Shannon entropy formula:
where is the proportion of times each infector is inferred. If normalise = TRUE, entropy is scaled by its maximum possible value, , where is the number of distinct inferred infectors:
This normalisation ensures values range from 0 to 1:
0: Complete certainty — the same infector is inferred across all samples.
1: Maximum uncertainty — all infectors are equally likely.
A numeric value representing the mean entropy of transmission trees across posterior samples.
# High entropy out <- data.frame(alpha_1 = sample(c("2", "3"), 100, replace = TRUE), alpha_2 = sample(c("1", "3"), 100, replace = TRUE)) class(out) <- c("outbreaker_chains", class(out)) get_entropy(out) # Low entropy out <- data.frame(alpha_1 = sample(c("2", "3"), 100, replace = TRUE, prob = c(0.95, 0.05)), alpha_2 = sample(c("1", "3"), 100, replace = TRUE, prob = c(0.95, 0.05))) class(out) <- c("outbreaker_chains", class(out)) get_entropy(out)# High entropy out <- data.frame(alpha_1 = sample(c("2", "3"), 100, replace = TRUE), alpha_2 = sample(c("1", "3"), 100, replace = TRUE)) class(out) <- c("outbreaker_chains", class(out)) get_entropy(out) # Low entropy out <- data.frame(alpha_1 = sample(c("2", "3"), 100, replace = TRUE, prob = c(0.95, 0.05)), alpha_2 = sample(c("1", "3"), 100, replace = TRUE, prob = c(0.95, 0.05))) class(out) <- c("outbreaker_chains", class(out)) get_entropy(out)
This function computes the number of secondary infections caused by each individual from outbreaker2 MCMC chains. For each MCMC iteration, it counts how many times each individual appears as an infector (alpha parameter).
get_Ri(out)get_Ri(out)
out |
An object of class |
A data frame where:
Each row represents an MCMC iteration
Each column represents an individual (named by their identifier)
Values represent the reproduction number (Ri) for that individual in that iteration
out_id <- identify(out, ids = linelist$name) Ri <- get_Ri(out_id) str(Ri)out_id <- identify(out, ids = linelist$name) Ri <- get_Ri(out_id) str(Ri)
The serial interval is the time between the onset of symptoms in an infector-infectee pair. This function computes the serial interval statistics from a list of transmission trees.
get_si( trees, date_suffix = "date", stats = list(mean = mean, lwr = function(x) quantile(x, 0.025, na.rm = TRUE), upr = function(x) quantile(x, 0.975, na.rm = TRUE)) )get_si( trees, date_suffix = "date", stats = list(mean = mean, lwr = function(x) quantile(x, 0.025, na.rm = TRUE), upr = function(x) quantile(x, 0.975, na.rm = TRUE)) )
trees |
A list of data frames, generated by |
date_suffix |
A string indicating the suffix for date of onset columns.
Default is "date", which means the columns should be named |
stats |
A list of functions to compute statistics. Default is:
Each function should take a numeric vector as input and return a single numeric value. |
A data frame with serial interval statistic
get_trees for generating a list of transmission trees.
trees <- get_trees(out, date = linelist$onset) si_stats <- get_si(trees) str(si_stats)trees <- get_trees(out, date = linelist$onset) si_stats <- get_si(trees) str(si_stats)
Generates a list of data frames representing posterior transmission trees from an
outbreaker_chains object. Each tree contains 'from' and 'to' columns, and may
optionally include kappa, t_inf, and user-supplied columns.
get_trees(out, kappa = FALSE, t_inf = FALSE, ...)get_trees(out, kappa = FALSE, t_inf = FALSE, ...)
out |
A data frame of class |
kappa |
Logical. If |
t_inf |
Logical. If |
... |
Additional vectors to include as columns in the output. Must be given in the same order as used in |
A list of data frames, one per posterior sample. Each data frame has at least 'from' and 'to' columns.
get_trees(out, id = linelist$id, name = linelist$name, group = linelist$group, onset = linelist$onset)get_trees(out, id = linelist$id, name = linelist$name, group = linelist$group, onset = linelist$onset)
Replace integers in outbreaker2 output with unique identifiers
identify(out, ids)identify(out, ids)
out |
A data frame of class |
ids |
A vector of IDs from the original linelist (see |
A data frame of class outbreaker_chains with integers replaced by the corresponding IDs.
identify(out, id = linelist$name)identify(out, id = linelist$name)
A simulated linelist derived from fake_outbreak, where cases are assigned to the patient or hcw group.
First names are randomly generated using the randomNames package.
linelistlinelist
A data frame with 30 rows and 5 columns:
Case ID
Simulated first name
Group label: "patient" or "hcw"
Date of symptom onset
Date of sample collection
head(linelist)head(linelist)
The outbreaker2 result generated from the example in the
outbreaker2 vignette.
This dataset was produced by running outbreaker() on the fake_outbreak data.
outout
An outbreaker_chains object.
https://www.repidemicsconsortium.org/outbreaker2/articles/introduction.html
This function samples rows from an object of class outbreaker_chains.
sample.outbreaker_chains(out, ...)sample.outbreaker_chains(out, ...)
out |
A data frame of class |
... |
Additional arguments to be passed to |
An object of class outbreaker_chains, with sampled rows.
Generates a contingency table based on 'from' (infector) and 'to' (infectee) vectors.
ttable(from, to, levels = NULL, ...)ttable(from, to, levels = NULL, ...)
from |
A vector of infectors. |
to |
A vector of infectees. |
levels |
Optional. A vector of factor levels. Defaults to the unique, sorted values of 'from' and 'to'. |
... |
Additional arguments passed to the |
A contingency table of infectors (rows) and infectees (columns).
from <- c("A", "A", NA, "C", "C", "C") to <- c("A", "B", "B", "C", "C", "C") ttable(from, to)from <- c("A", "A", NA, "C", "C", "C") to <- c("A", "B", "B", "C", "C", "C") ttable(from, to)