Title: | A Statistical Framework for Comparing Sets of Trees |
---|---|
Description: | Apply hypothesis testing methods to assess differences between sets of trees. |
Authors: | Cyril Geismar [aut, cre, cph] |
Maintainer: | Cyril Geismar <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.0.1 |
Built: | 2025-03-06 05:31:30 UTC |
Source: | https://github.com/cygei/mixtree |
The Abouheif distance is the product of the number of direct descendants of each node in the path between two nodes. It is a measure of the number of transmission events between two nodes.
abouheif(tree)
abouheif(tree)
tree |
A data frame representing a transmission tree, with the first column containing the infector IDs and the second the infectee IDs. |
A square, symmetric matrix of Abouheif distances between nodes.
tree <- data.frame(from = c(1, 1, 2, 2, 3, 3), to = c(2, 3, 4, 5, 6, 7)) abouheif(tree)
tree <- data.frame(from = c(1, 1, 2, 2, 3, 3), to = c(2, 3, 4, 5, 6, 7)) abouheif(tree)
This function computes the Euclidean distance between the lower triangular parts of two given matrices.
euclidean(mat1, mat2)
euclidean(mat1, mat2)
mat1 |
A numeric matrix. |
mat2 |
A numeric matrix. |
A numeric value representing the Euclidean distance between the
lower triangular parts of mat1
and mat2
.
mat1 <- matrix(c(1, 2, 3, 4), 2, 2) mat2 <- matrix(c(4, 3, 2, 1), 2, 2) euclidean(mat1, mat2)
mat1 <- matrix(c(1, 2, 3, 4), 2, 2) mat2 <- matrix(c(4, 3, 2, 1), 2, 2) euclidean(mat1, mat2)
Kendall's distance measures the depth of the most recent common infector (MRCI) for each pair of nodes with respect to the source (patient 0).
kendall(tree)
kendall(tree)
tree |
A data frame representing a transmission tree, with the first column containing the infector IDs and the second the infectee IDs. |
A square, symmetric matrix of Kendall's distances between nodes.
A Metric to Compare Transmission Trees - M Kendall · 2018
tree <- data.frame(from = c(1, 1, 2, 2, 3, 3), to = c(2, 3, 4, 5, 6, 7)) kendall(tree)
tree <- data.frame(from = c(1, 1, 2, 2, 3, 3), to = c(2, 3, 4, 5, 6, 7)) kendall(tree)
Creates a transmission tree with a specified number of cases and branches per case. The tree can be generated with fixed or Poisson-distributed branching factors.
make_tree(n_cases, R = 2, stochastic = FALSE, plot = FALSE)
make_tree(n_cases, R = 2, stochastic = FALSE, plot = FALSE)
n_cases |
Integer. The total number of cases (nodes) in the tree. |
R |
Integer. The fixed number of branches per case when |
stochastic |
Logical. If |
plot |
Logical. If |
An igraph object representing the transmission tree.
# Generate a deterministic transmission tree deterministic_tree <- make_tree(n_cases = 15, R = 2, stochastic = FALSE, plot = TRUE) # Generate a stochastic transmission tree random_tree <- make_tree(n_cases = 15, R = 2, stochastic = TRUE, plot = TRUE)
# Generate a deterministic transmission tree deterministic_tree <- make_tree(n_cases = 15, R = 2, stochastic = FALSE, plot = TRUE) # Generate a stochastic transmission tree random_tree <- make_tree(n_cases = 15, R = 2, stochastic = TRUE, plot = TRUE)
The patristic distance is the number of generations separating any two nodes in a transmission tree.
patristic(tree)
patristic(tree)
tree |
A data frame representing a transmission tree, with the first column containing the infector IDs and the second the infectee IDs. |
A square, symmetric matrix of patristic distances between nodes.
tree <- data.frame(from = c(1, 1, 2, 2, 3, 3), to = c(2, 3, 4, 5, 6, 7)) patristic(tree)
tree <- data.frame(from = c(1, 1, 2, 2, 3, 3), to = c(2, 3, 4, 5, 6, 7)) patristic(tree)
Randomly shuffles the IDs of the nodes in a given graph and optionally plots the shuffled graph.
shuffle_graph_ids(g, plot = FALSE)
shuffle_graph_ids(g, plot = FALSE)
g |
An igraph object representing the graph. |
plot |
Logical. If |
An igraph object with shuffled node IDs.
# Create an example graph g <- make_tree(n_cases = 10, R = 2) # Shuffle the node IDs shuffled_graph <- shuffle_graph_ids(g, plot = TRUE)
# Create an example graph g <- make_tree(n_cases = 10, R = 2) # Shuffle the node IDs shuffled_graph <- shuffle_graph_ids(g, plot = TRUE)
Performs a statistical test to assess whether there are significant differences between sets of transmission trees.
Supports PERMANOVA (via "vegan::adonis2"
), Chi-Square, or Fisher's Exact Test.
tree_test( ..., method = c("permanova", "chisq", "fisher"), within_dist = patristic, between_dist = euclidean, test_args = list() )
tree_test( ..., method = c("permanova", "chisq", "fisher"), within_dist = patristic, between_dist = euclidean, test_args = list() )
... |
Two or more sets of transmission trees. Each set must be a list of data frames with columns |
method |
A character string specifying the test method. Options are |
within_dist |
A function to compute pairwise distances within a tree for PERMANOVA. Takes a data frame, returns a square matrix. Default is |
between_dist |
A function to compute distance between two trees for PERMANOVA. Takes two matrices, returns a numeric value. Default is |
test_args |
A list of additional arguments to pass to the underlying test function ( |
This function compares sets of transmission trees using one of three statistical tests.
PERMANOVA: Evaluates whether the topological distribution of transmission trees differs between sets.
Null Hypothesis (H0): Transmission trees in all sets are drawn from the same distribution, implying similar topologies.
Alternative Hypothesis (H1): At least one set of transmission trees comes from a different distribution.
Chi-Square or Fisher’s Exact Test: Evaluates whether the distribution of infector-infectee pairs differs between sets.
Null Hypothesis (H0): The frequency of infector-infectee pairs is consistent across all sets.
Alternative Hypothesis (H1): The frequency of infector-infectee pairs differs between at least two sets.
For "permanova"
: A "vegan::adonis2"
object containing the test results.
For "chisq"
or "fisher"
: An "htest"
object with the test results.
set.seed(1) # Generate example sets setA <- replicate(10, igraph::as_long_data_frame( make_tree(n_cases = 10, R = 2, stochastic = TRUE) ), simplify = FALSE) setB <- replicate(10, igraph::as_long_data_frame( make_tree(n_cases = 10, R = 2, stochastic = TRUE) ), simplify = FALSE) setC <- replicate(10, igraph::as_long_data_frame( make_tree(n_cases = 10, R = 4, stochastic = TRUE) ), simplify = FALSE) # PERMANOVA test tree_test(setA, setB, setC, method = "permanova") # Chi-Square test tree_test(setA, setB, setC, method = "chisq")
set.seed(1) # Generate example sets setA <- replicate(10, igraph::as_long_data_frame( make_tree(n_cases = 10, R = 2, stochastic = TRUE) ), simplify = FALSE) setB <- replicate(10, igraph::as_long_data_frame( make_tree(n_cases = 10, R = 2, stochastic = TRUE) ), simplify = FALSE) setC <- replicate(10, igraph::as_long_data_frame( make_tree(n_cases = 10, R = 4, stochastic = TRUE) ), simplify = FALSE) # PERMANOVA test tree_test(setA, setB, setC, method = "permanova") # Chi-Square test tree_test(setA, setB, setC, method = "chisq")
Checks if a transmission tree meets specific topology criteria for our test. The tree must be a directed acyclic graph (DAG), weakly connected, and have at most one infector per node.
validate_tree(tree)
validate_tree(tree)
tree |
A data frame with columns |
Invisible TRUE
if the tree is valid. Throws an error if invalid.
good_tree <- data.frame(from = c(1, 2, 3), to = c(2, 3, 4)) validate_tree(good_tree) bad_tree <- data.frame(from = c(1, 2, 3), to = c(2, 3, 2)) try(validate_tree(bad_tree))
good_tree <- data.frame(from = c(1, 2, 3), to = c(2, 3, 4)) validate_tree(good_tree) bad_tree <- data.frame(from = c(1, 2, 3), to = c(2, 3, 2)) try(validate_tree(bad_tree))