The phylogenies in the treeblock are generated according to an imputation scheme described in Rabosky et al. (2018). We generated 100 stochastic character maps for each of 100 trees; the figure shows the average number of gains and losses per tree (i.e., average gains/losses across the stochastic character maps for each tree).
As described in the main text, a continuous-time stochastic process for three binary traits has a total of 24 possible transitions where a single one of the three states changes. The figure below shows the full and reduced transition matrices. In the axis tick labels, ag0_pc0_sc1
(for example) refers to the state without AGs (ag0
), without male care (pc0
), and with group spawning (sc1
, loosely “sperm competition present”1). Parameter 1 quantifies the log-hazard of a switch to pair spawning in this state (i.e., to ag0_pc0_sc0
).
The assignment of numbers to rates is arbitrary, numbered in sequence in column-wise order. Knowing the assignment is only necessary when doing low-level computations based on the vector of parameters. The take-home information is which sets of transitions are constrained to be identical in the reduced model; these rates have the same number . In the reduced model (righthand plot), transitions that are assumed to have equal rates - because we assume that all transitions in spawning mode, and all transitions in male care, are independent of the other states - have the same ID number. Colours denote the estimated log-hazard for each parameter when the models are fitted by maximum likelihood. Below, we denote parameter 1 (in the reduced model, which occupies four different cells in the transition matrix) as sc_loss
, because it quantifies the loss rate of sperm competition regardless of the states of the other traits; we refer to parameter 3 (for example) as loss.ag_pc0_sc1
, the loss rate of AGs in the absence of male care and presence of sperm competition.
The transitions denoted A-D all represent gains of accessory glands, under different combinations of male care/spawning mode. Arrows A and B both represent AG gains in states with pair spawning (sc0
); C and D both represent states in states with group spawning (sc1
). Thus rate(pair) = (rate(A) + rate(B))/2 is the average rate of AG gain under pair spawning; rate(group) = (rate(C) + rate(D))/2 is the average rate under group spawning; and rate(group)-rate(pair) is the contrast, which we can think of as “the effect of group vs pair spawning on the rate of AG gain”. We can similarly compute other combinations such as rate(no male care) (pc0
) = (rate(B) + rate(D))/2; rate(male care) (pc1
) = (rate(A) + rate(C))/2; and contrast(male care) = rate(male care) - rate(no male care).
The contrast matrix connects the parameters we estimate (e.g. loss.ag_pc0_sc0
, loss rate of accessory glands when parental care and sperm competition are both absent) to the biological effects we are interested in (e.g. pc_loss
, the effect of parental care on the loss rate of accessory glands). For example, the intercept_loss
effect is the (unweighted) average of the four different loss terms; the pc_loss
effect is the difference between the average loss rate when male care is present (second row, red squares) and the average loss rate when male care is absent (second row, blue squares).
Here is an equivalent numeric matrix:
\[ \left[ \begin{array}{rrrrrrrr} 1/4 & 1/4 & 1/4 & 1/4 & . & . & . & . \\ -1/2 & -1/2 & 1/2 & 1/2 & . & . & . & . \\ 1/2 & -1/2 & -1/2 & 1/2 & . & . & . & . \\ -1/2 & 1/2 & -1/2 & 1/2 & . & . & . & . \\ . & . & . & . & 1/4 & 1/4 & 1/4 & 1/4 \\ . & . & . & . & -1/2 & 1/2 & -1/2 & 1/2 \\ . & . & . & . & -1/2 & -1/2 & 1/2 & 1/2 \\ . & . & . & . & 1/2 & -1/2 & -1/2 & 1/2 \\ \end{array} \right] \]
Here we illustrate our general strategy for constructing priors: pick sensible lower and upper “bounds” (not really bounds, but values we would consider extreme) and use a Normal prior on the log-hazard scale (equivalent to a log-Normal prior on the hazard scale) with a mean halfway between the lower and upper bounds and a standard deviation \(\sigma = (\log(\textrm{upper})-\log(\textrm{lower}))/(2 \cdot \textrm{range})\) In our model we use a range of 3 (i.e. the lower and upper bounds are at \(\pm 3 \sigma\)), to denote that the lower and upper bounds we specify are extreme (99.7% ranges). The figure below uses range = 2 (which would correspond to 95.4% ranges) so the tails are more clearly visible.
By sampling directly from the prior and computing contrasts, we can confirm that the priors on the contrasts (ratio of AG gain/loss rates depending on male care and sperm competition) are indeed neutral (centered at 1.0) and reasonable (95% confidence intervals extend from 10x slower to 10x faster). The intercept terms for gain and loss represent the average evolutionary rates across all states, and are measured in units of expected numbers of events across the entire tree. For example, the prior expected number of gains of AG across the whole tree is 33.4 (95% CI: 10.6-104).
This section presents the quantitative results in more detail.
Here are the estimated posterior median and confidence intervals, along with \(p_\textrm{MCMC}\) (twice the minimum tail probability) and the “probability of direction” (\(p_d\)), the probability that the parameter has the same sign as the median (\(p_d\) is an index of how clearly the sign of the parameter is known: \(p_\textrm{MCMC} = 2(1-p_d)\)) (Makowski et al. 2019; Shi and Yin 2021).
contrast | rate | median | lwr | upr | pMCMC | pd |
---|---|---|---|---|---|---|
intercept | gain | 18.137 | 8.834 | 34.901 | NA | NA |
pc | gain | 5.755 | 1.352 | 23.492 | 0.019 | 0.991 |
pcxsc | gain | 0.511 | 0.119 | 2.118 | 0.355 | 0.823 |
sc | gain | 0.254 | 0.060 | 1.056 | 0.061 | 0.970 |
intercept | loss | 68.239 | 27.553 | 174.638 | NA | NA |
pc | loss | 0.966 | 0.133 | 5.667 | 0.970 | 0.515 |
pcxsc | loss | 0.496 | 0.074 | 2.832 | 0.443 | 0.779 |
sc | loss | 0.687 | 0.117 | 4.698 | 0.684 | 0.658 |
Here are the marginal posterior distributions for all of the contrasts, including fits to both the full tree-block data set and the subset of the data that includes only species with genetic data (“fishphylo”). (The latter is useful for comparison with the MLE results, which cannot use the Bayesian method of sampling across treeblock phylogenies.) The treeblock estimates are slightly more precise (i.e. the confidence intervals are narrower), as expected since they can make use of a larger data set. Points show the posterior median, Line ranges represent 95% credible intervals.
For any pair of nested models we can do a likelihood ratio test,
comparing the model fits and number of parameters. We fit restricted models
with AG evolution independent of both sperm competition and male care
(indep
); dependent on male care but not sperm competion (pc
);
dependent on sperm competition only (sc
); depending additively on both traits (i.e., the effects of the trait status on evolutionary rates are added,
on the log scale - this model has 10 parameters); and with an interaction between traits
(the full 12-parameter model that is our primary focus).
Testing (most) nested pairs gives this diagram (we omit the
comparison between pcsc
and indep
):
The p-values quoted in the paper are from comparing the full model
with interactions to the pc
and sc
models. These comparisons are
analogous to “type 3” tests in ANOVA (Fox and Weisberg 2018), which are the
most commonly used approach in the biological literature. “Type 2”
tests (as implemented in the car
package for R) instead compare
the additive model to the models with only one main effect:
these tests suggest that the effects of both sperm competition
and male care are statistically significant, although the evidence
for the effect of male care is still stronger.
desc | ddf | ddev | pval | |
---|---|---|---|---|
pcsc | interaction | 2 | 0.040 | 0.980 |
sc3 | sc (type III) | 4 | 8.519 | 0.074 |
sc2 | sc (type II) | 2 | 8.479 | 0.014 |
pc3 | pc (type III) | 4 | 9.918 | 0.042 |
pc2 | pc (type II) | 2 | 9.878 | 0.007 |
(ddf
: difference in number of parameters between models;
ddev
: change in deviance (\(-2 \log(L)\)); pval
: likelihood ratio
test p-value)
We can look at the parameter estimates from maximum likelihood fits, although most of the confidence intervals of the unregularized fits are undetermined because of the failure of the Wald estimates. All confidence intervals more extreme than \(\exp(\pm 20)\) are set to 0 or \(\infty\).
Profile confidence intervals could give slightly better results, but these turds may be not worth polishing very much …
method | term | estimate | lwr | upr |
---|---|---|---|---|
pcsc | pc_loss | 0.273 | 0.000 | Inf |
pcsc | sc_loss | 3.668 | 0.000 | Inf |
pcsc | pcxsc_loss | 0.000 | 0.000 | Inf |
pcsc | pc_gain | 2.664 | 0.000 | Inf |
pcsc | sc_gain | 0.003 | 0.000 | Inf |
pcsc | pcxsc_gain | 0.375 | 0.000 | Inf |
pcsc_prior | pc_loss | 0.973 | 0.170 | 5.550 |
pcsc_prior | sc_loss | 0.654 | 0.117 | 3.652 |
pcsc_prior | pcxsc_loss | 0.612 | 0.109 | 3.452 |
pcsc_prior | pc_gain | 4.156 | 1.075 | 16.069 |
pcsc_prior | sc_gain | 0.297 | 0.075 | 1.177 |
pcsc_prior | pcxsc_gain | 0.729 | 0.181 | 2.940 |
pcsc_add | pc_loss | 9362.489 | 0.000 | Inf |
pcsc_add | sc_loss | 0.001 | 0.000 | Inf |
pcsc_add | pc_gain | 8.349 | 2.030 | 34.339 |
pcsc_add | sc_gain | 0.001 | 0.000 | Inf |
Alternatively, we can make comparisons based on information-theoretic criteria such as AICc, AIC, or BIC. These three criteria differ in the kind and strength of penalty that they use to avoid overly complex models. Researchers choose different criteria depending on their goals - strictly speaking AIC and its variants are for maximizing predictive accuracy while BIC is for choosing among hypotheses. AICc is often recommended for analyses where \(n<40\). Using AICc presents a challenge: for this data set (a phylogeny of 607 species with approximately 20 independent origins of accessory glands), it is difficult to quantify the effective number of observations (the \(n\) in the rule of thumb above, and in the formula for the AICc, is based on a simpler data format where we can assume that every observation is independent). We chose \(n=30\) for this comparison, as a guess at the effective sample size (larger \(n\) will tend to select larger models as better).
Negative log-likelihood, which measures the unpenalized goodness-of-fit, is included for completeness/comparison.
df | ΔAICc | ΔAIC | ΔBIC | ΔnegLL | |
---|---|---|---|---|---|
AG dep only on PC | 8 | 0.00 | 4.48 | 1.68 | 4.26 |
AG dep on PC, SC additively | 10 | 0.24 | 0.00 | 0.00 | 0.02 |
AG dep only on SC | 8 | 1.40 | 5.88 | 3.08 | 4.96 |
AG evol indep of PC, SC | 6 | 8.38 | 16.06 | 10.46 | 12.05 |
AG dep on PC, SC & interaction | 12 | 10.98 | 3.96 | 6.76 | 0.00 |
As stated in the main text, we can see that AICc gives similar conclusions to the Bayesian and likelihood-ratio-test analyses: the best model has AG evolution depending on male care only, followed closely by the additive model, and fairly closely by sperm competition only (models with \(\Delta\) AICc < 2 are usually interpreted as “roughly equivalent”). In contrast, using AIC selects the additive model as best, and considerably better than the single-factor models. BIC, which like AICc is generally more conservative than AIC, also selects the additive model as best, similar (\(\Delta\) BIC < 2) to the male-care-only model.
We performed a range of different analyses to test the sensitivity of our conclusions to technical choices, and to explore the difference between frequentist (maximum likelihood estimate = MLE), constrained frequentist (maximum a posteriori = MAP), and full Bayesian (MCMC) fitting methods.
mcmc_tb
: Bayesian MCMC using neutral priors on the transition rates and biologically informed priors on the gain/loss rates; this is the primary model presented in the main textmcmc_0
: as mcmc_tb
, but using only the completely resolved phylogeny (i.e., only including species for which genetic data is available)mcmc_tb_nogainloss
: as mcmc_tb
, but using only the priors on the transition rates, omitting the priors on the gain/loss ratesfull
: a model estimating all 24 transition rates, i.e. allowing for the rates of evolution of male care and sperm competition to depend on each other and on AGmodel_pcsc
: MLE fit of the 12-parameter modelmodel_pcsc_add
: MLE fit of the additive (10-parameter) modelmodel_pcsc_prior
: as model_pcsc
, but adding the priors used in the Bayesian model (this is a regularized or MAP estimate)Points at the edge of the graph (sc_gain
, pc_loss
, pcxsc_loss
) indicate that the point estimates fell outside of the plotted range.
Intercepts are excluded (as being less biologically interesting, and using a different scale from the contrasts: hazards rather than hazard ratios), as are the estimates of the rates of gain and loss of the non-focal traits (spawning mode and male care).
We can draw a variety of conclusions from these results.
mcmc_tb
vs. mcmc_0
: as illustrated above, the estimates from the full tree-block data set and the reduced (“fishphylo”) data set are similar, although the tree-block estimates are slightly more precise.mcmc_tb
vs. mcmc_tb_nogainloss
: leaving out the gain/loss priors doesn’t make much difference; it slightly weakens the SC estimate and strengthens the PC estimatemcmc_tb
vs. full
: although the results from the full model are generally of the same sign as from the reduced (12-parameter) model, they are much less certain. None of the signs of the effects are statistically clear (\(p_{\textrm{MCMC}} > 0.05\) for all estimates).mcmc_tb
vs. model_pcsc
: the 12-parameter model is too poorly constrained to draw conclusions from the parameter estimates; this agrees with the likelihood ratio and IC tests above, which indicate that the additive model is preferred to the interaction model.model_pcsc
vs. model_pcsc_add
: constraining the model slightly by removing the interaction terms helps a little bit - now we get a point estimate and confidence intervals for the pc_gain
parameter that agrees reasonably well with the full Bayesian fits - but most of the other parameters of interest are still unidentifiable (large confidence intervals/failure of the Wald approximation). The estimates of the rates of gain and loss of the non-focal traits are well identified, but not of interest.mcmc_tb
vs. model_pcsc_prior
: when we add the prior information to the MLE fit, we get point estimates, and CIs, that are very close to those from the primary (mcmc_tb
) model.Bouckaert, Remco R. 2010. “DensiTree: Making Sense of Sets of Phylogenetic Trees.” Bioinformatics 26 (10): 1372–3. https://doi.org/10.1093/bioinformatics/btq110.
Fox, John, and Sanford Weisberg. 2018. An R Companion to Applied Regression. 3rd edition. Los Angeles: SAGE Publications, Inc.
Makowski, Dominique, Mattan S. Ben-Shachar, S. H. Annabel Chen, and Daniel Lüdecke. 2019. “Indices of Effect Existence and Significance in the Bayesian Framework.” Frontiers in Psychology 10. https://www.frontiersin.org/articles/10.3389/fpsyg.2019.02767.
Rabosky, Daniel L., Jonathan Chang, Pascal O. Title, Peter F. Cowman, Lauren Sallan, Matt Friedman, Kristin Kaschner, et al. 2018. “An Inverse Latitudinal Gradient in Speciation Rate for Marine Fishes.” Nature 559 (7714): 392–95. https://doi.org/10.1038/s41586-018-0273-1.
Shi, Haolun, and Guosheng Yin. 2021. “Reconnecting P-Value and Posterior Probability Under One- and Two-Sided Tests.” The American Statistician 75 (3): 265–75. https://doi.org/10.1080/00031305.2020.1717621.
Hereafter we refer to pair spawning as “sperm competition absent” (sc0
) and group spawning as “sperm competition present” (sc1
), recognizing that this is an oversimplification.↩︎