Using treeducken for cophylogenetic simulation

Setting up

In this post, I will give a brief exploration of the features available for cophylogenetic simulation in my new R package: treeducken. First, you will need to install the package and then load it:

# install.packages("treeducken")
library(treeducken)
## Loading required package: ape

Set parameters and simulate!

First I need to set some parameters. Currently, there is only a way to simulate to a pre-specified time. We need to set six parameters: host speciation and extinction rate, symbiont speciation and extinction rate, host expansion rate, and cospeciation rate. These rates are set relative to the time; for example setting the speciation to 0.1 translates to 1 speciation every 10 time units. To assist with determining these parameters, we provide functions that calculate the expected numbers of tips under the SSA. In this code chunk we draw some random parameters for our host and symbiont birth rates, and calculate their expected numbers of tips. We then use these randomly drawn parameters to simulate under the cophylogenetic process. Figure 3 shows the output of one such simulation.

set.seed(54)
lambda_H <- rexp(n=1)
mu_H <- lambda_H / 2
lambda_C <- rexp(n=1)
time <- 1.0

lambda_S <- rexp(n=1)
mu_S <- lambda_S / 2

H_tips <- treeducken::calculate_expected_leaves_sptree(lambda = lambda_H + lambda_C,
                                                       mu = mu_H,
                                                       t = time)

S_tips <- treeducken::calculate_expected_leaves_sptree(lambda = lambda_S + lambda_C,
                                                       mu = mu_S,
                                                       t = time)
cophy_obj <- treeducken::sim_cophylo_bdp(hbr = lambda_H, hdr = mu_H,
                                         sbr = lambda_S, sdr = mu_S,
                                         cosp_rate = lambda_C, host_exp_rate = 0.0,
                                         time_to_sim = time, numbsim = 1)
plot(cophy_obj[[1]], col = "orange", lty = "dotted")

Other features

Cophylogenetic objects are output as an S3 object of class cophy with many S3 generic functions implemented including summary, plot (see above), c() and print. The structure of this class is largely based on the cophylo S3 object of PhyTools, and we modified code written for generic function definitions from the APE package. Here I’ve provided some examples of the generic functions, more are provided within the documentation.

See print() below:

print(cophy_obj, details = TRUE)
## 1 cophylogenetic set.
## Cophylogenetic set 1 :
##  Symbiont tree has  8  tips.
##  Host tree has  11  tips.
summary(cophy_obj[[1]])
## Null tree set.
## Set of host and symbiont tree: cophy_obj[[1]] 
## 
## 
## Host tree:   Number of tips: 11 
##   Number of nodes: 10 
##   Branch lengths:
##        Min.     1st Qu.      Median     3rd Qu.        Max. 
## 0.005784829 0.068360370 0.188179409 0.438411131 0.933831013 
##   Root edge: 0.03004373 
##   First ten tip labels: H1 
##                         X2
##                         H3
##                         H4
##                         H5
##                         H6
##                         H7
##                         H8
##                         H9
##                         H10
##   No node labels.
## 
## 
## Symb tree:   Number of tips: 8 
##   Number of nodes: 7 
##   Branch lengths:
##        Min.     1st Qu.      Median     3rd Qu.        Max. 
## 0.006135764 0.044184039 0.241735072 0.398700667 0.933831013 
##   Root edge: 0.03004373 
##   Tip labels: S1 
##               X2
##               S3
##               S4
##               S5
##               S6
##               S7
##               X8
##   No node labels.
## 
## 
## Association Matrix
##  There are  6  rows (i.e. extant symbionts.
##  There are  10  cols (i.e. extant hosts.Event history summary: 
## 
##  Min.   : 1.000   1st Qu.: 2.000   Median : 5.000   Mean   : 6.459   3rd Qu.:11.000   Max.   :15.000   Min.   : 1.00   1st Qu.: 7.00   Median :11.00   Mean   :11.68   3rd Qu.:17.00   Max.   :21.00   Length:85          Class :character   Mode  :character   NA NA NA Min.   :0.0000   1st Qu.:0.3667   Median :0.7583   Mean   :0.6062   3rd Qu.:0.9255   Max.   :0.9494

Or summary:

summary(cophy_obj[[1]])
## Null tree set.
## Set of host and symbiont tree: cophy_obj[[1]] 
## 
## 
## Host tree:   Number of tips: 11 
##   Number of nodes: 10 
##   Branch lengths:
##        Min.     1st Qu.      Median     3rd Qu.        Max. 
## 0.005784829 0.068360370 0.188179409 0.438411131 0.933831013 
##   Root edge: 0.03004373 
##   First ten tip labels: H1 
##                         X2
##                         H3
##                         H4
##                         H5
##                         H6
##                         H7
##                         H8
##                         H9
##                         H10
##   No node labels.
## 
## 
## Symb tree:   Number of tips: 8 
##   Number of nodes: 7 
##   Branch lengths:
##        Min.     1st Qu.      Median     3rd Qu.        Max. 
## 0.006135764 0.044184039 0.241735072 0.398700667 0.933831013 
##   Root edge: 0.03004373 
##   Tip labels: S1 
##               X2
##               S3
##               S4
##               S5
##               S6
##               S7
##               X8
##   No node labels.
## 
## 
## Association Matrix
##  There are  6  rows (i.e. extant symbionts.
##  There are  10  cols (i.e. extant hosts.Event history summary: 
## 
##  Min.   : 1.000   1st Qu.: 2.000   Median : 5.000   Mean   : 6.459   3rd Qu.:11.000   Max.   :15.000   Min.   : 1.00   1st Qu.: 7.00   Median :11.00   Mean   :11.68   3rd Qu.:17.00   Max.   :21.00   Length:85          Class :character   Mode  :character   NA NA NA Min.   :0.0000   1st Qu.:0.3667   Median :0.7583   Mean   :0.6062   3rd Qu.:0.9255   Max.   :0.9494

We can also calculate (a limited amount of) summary statistics for our treeset. There is a function to calculate some for all of the cophy objects in a list, or some are available separately (e.g. parafit_test):

df <- treeducken::cophy_summary_stat(cophy_obj)
## Calculating for replicate  1
D <- treeducken::parafit_stat(cophy_obj[[1]]$host_tree,
                              cophy_obj[[1]]$symb_tree,
                              cophy_obj[[1]]$association_mat)
treeducken::parafit_test(cophy_obj[[1]]$host_tree,
                              cophy_obj[[1]]$symb_tree,
                              cophy_obj[[1]]$association_mat,
                         D = D,
                         reps = 99)
## [1] 0.01

The event numbers such as “Cospeciations” and “Host_Speciations” here are only calculated if the Event_DF member of the cophy object is present. This will always be there if you simulate data using sim_cophylo_bdp(), but won’t be if you use the built-in functions to setup a cophylogenetic object. I provide an example of doing that with the classic Hafner and Page gopher-louse dataset below:

gopher_lice_map <- read.table(system.file("extdata",
                                          "gopher_lice_mapping.txt",
                                          package = "treeducken"),
                              stringsAsFactors = FALSE, header = TRUE)

gopher_lice_assoc_matrix <- convert_assoc_table_to_matrix(gopher_lice_map)
gopher_tree <- ape::read.nexus(system.file("extdata",
                                           "gophers_bd.tre",
                                           package = "treeducken"))
lice_tree <- ape::read.nexus(system.file("extdata",
                                         "lice_bd.tre",
                                         package = "treeducken"))
gopher_lice_cophylo <- convert_to_cophy(hostTree = gopher_tree,
                                          symbTree = lice_tree,
                                          assocMat = gopher_lice_assoc_matrix)
print(gopher_lice_cophylo)
## 
##  Host Tree:
## 
## 
## Phylogenetic tree with 15 tips and 14 internal nodes.
## 
## Tip labels:
##   Cratogeomys_castanops_L32685, Cratogeomys_merriami_L32688, Geomys_breviceps_L32683, Geomys_bursarius_L32693, Geomys_bursarius_L32694, Geomys_personatus_L32689, ...
## 
## Rooted; includes branch lengths.
## 
## 
##  Symbiont Tree:
## 
## 
## Phylogenetic tree with 17 tips and 16 internal nodes.
## 
## Tip labels:
##   Geomydoecus_actuosi_L32665, Geomydoecus_chapini_L32667, Geomydoecus_cherriei_L32668, Geomydoecus_costaricensis_L32669, Geomydoecus_ewingi_L32671, Geomydoecus_expansus_L32670, ...
## 
## Rooted; includes branch lengths.
## 
##  Association Matrix: 
## 
## 
##  There are  17  rows (i.e. extant symbionts).
##  There are  15  cols (i.e. extant hosts).
cophy_summary_stat(gopher_lice_cophylo)
##   Cospeciations Host_Speciations Host_Extinctions Symbiont_Speciations
## 1             0                0                0                    0
##   Symbiont_Extinctions Parafit_Stat Parafit_P-value
## 1                    0     2.720485            0.12
plot(gopher_lice_cophylo,
     fsize = 0.5,
     show_tip_label = FALSE,
     gap = 1,
     col = "purple",
     lty = "dashed")

Using treeducken with PACo

I can use the treeducken output with the paco package. If there are extinct tips in the host or symbiont tree these need to be pruned off prior to being used with the paco package. Pruning can be done with the drop_extinct function.

library(paco)
host_tree_pruned <- drop_extinct(cophy_obj[[1]]$host_tree)
symb_tree_pruned <- drop_extinct(cophy_obj[[1]]$symb_tree)
A <- association_mat(cophy_obj[[1]])
host_dist <- cophenetic(host_tree_pruned)
symb_dist <- cophenetic(symb_tree_pruned)
links <- t(A) # paco wants associations with rows as hosts 

# we need to name rows and columns for paco
rownames(links) <- host_tree_pruned$tip.label
colnames(links) <- symb_tree_pruned$tip.label 
D <- paco::prepare_paco_data(H = host_dist, P = symb_dist, HP = links)
D <- paco::add_pcoord(D)
D <- paco::PACo(D, nperm=100, seed = 11, method="r0")
print(D$gof)
## $p
## [1] 0
## 
## $ss
## [1] 0.02307293
## 
## $n
## [1] 100

References

Hutchinson, M. C., Cagua, E. F., Balbuena, J. A., Stouffer, D. B., & Poisot, T. (2017). paco: implementing Procrustean Approach to Cophylogeny in R. Methods in Ecology and Evolution, 8(8), 932-940.

Hafner, M. S., & Page, R. D. (1995). Molecular phylogenies and host-parasite cospeciation: gophers and lice as a model system. Philosophical Transactions of the Royal Society of London. Series B: Biological Sciences, 349(1327), 77-83.

Wade Dismukes
Wade Dismukes
Graduate Candidate

My research interests include phylogenetic theory, host-symbiont cophylogenetics and Bayesian phylogenetics. I am also interested in gene tree-species tree evolution.