1 Descriptive statistics

1.1 Data

Numbers of species with each set of traits  (`0` = absent, `1` = present, `?` = unknown; `pc` = male parental care; `sc` = sperm competition (i.e. group spawning or ARTs); `ag` = accessory glands), depending on whether we use the partial (but fully genetically resolved) tree (`fishphylo`) or the imputed trees from @rabosky_inverse_2018 (`treeblock`).

Figure 1.1: Numbers of species with each set of traits (0 = absent, 1 = present, ? = unknown; pc = male parental care; sc = sperm competition (i.e. group spawning or ARTs); ag = accessory glands), depending on whether we use the partial (but fully genetically resolved) tree (fishphylo) or the imputed trees from Rabosky et al. (2018) (treeblock).

The phylogenies in the treeblock are generated according to an imputation scheme described in Rabosky et al. (2018). We generated 100 stochastic character maps for each of 100 trees; the figure shows the average number of gains and losses per tree (i.e., average gains/losses across the stochastic character maps for each tree).

average number of gains and losses of AGs simulated for each tree in the tree block

Figure 1.2: average number of gains and losses of AGs simulated for each tree in the tree block

densitree-style mapping of tree blocks/phylogenetic uncertainty [@bouckaertDensiTree2010]

Figure 1.3: densitree-style mapping of tree blocks/phylogenetic uncertainty (Bouckaert 2010)

2 Parameter definitions

2.1 Transition matrices

As described in the main text, a continuous-time stochastic process for three binary traits has a total of 24 possible transitions where a single one of the three states changes. The figure below shows the full and reduced transition matrices. In the axis tick labels, ag0_pc0_sc1 (for example) refers to the state without AGs (ag0), without male care (pc0), and with group spawning (sc1, loosely “sperm competition present”1). Parameter 1 quantifies the log-hazard of a switch to pair spawning in this state (i.e., to ag0_pc0_sc0).

The assignment of numbers to rates is arbitrary, numbered in sequence in column-wise order. Knowing the assignment is only necessary when doing low-level computations based on the vector of parameters. The take-home information is which sets of transitions are constrained to be identical in the reduced model; these rates have the same number . In the reduced model (righthand plot), transitions that are assumed to have equal rates - because we assume that all transitions in spawning mode, and all transitions in male care, are independent of the other states - have the same ID number. Colours denote the estimated log-hazard for each parameter when the models are fitted by maximum likelihood. Below, we denote parameter 1 (in the reduced model, which occupies four different cells in the transition matrix) as sc_loss, because it quantifies the loss rate of sperm competition regardless of the states of the other traits; we refer to parameter 3 (for example) as loss.ag_pc0_sc1, the loss rate of AGs in the absence of male care and presence of sperm competition.

Transition matrix for full and reduced (independent evolution of spawning mode and male care) models. Colours of grid squares represent the estimated evolutionary rates (log-hazards) for each transition. Note that the rates for 24-parameter model are extremely uncertain; see sensitivity analysis (last section).

Figure 2.1: Transition matrix for full and reduced (independent evolution of spawning mode and male care) models. Colours of grid squares represent the estimated evolutionary rates (log-hazards) for each transition. Note that the rates for 24-parameter model are extremely uncertain; see sensitivity analysis (last section).

2.2 Transitions

Transitions for the reduced (12-parameter) model. As in Figure 1 in the main text: Sets of connections drawn with the same colours are constrained to identical evolutionary rates (e.g. light gray downward arrows, representing loss of male parental care – all transitions from “pc1” to the corresponding “pc0” state). Top-bottom and left-right transitions, representing loss/gain of parental care and transitions between pair spawning and group spawning, are assumed to be independent of other trait values. Loss/gain of accessory glands (diagonal arrows), which are our primary interest, are all estimated separately (8 different colours for arrows representing loss/gain of accessory glands under the 4 different combinations of the other two traits, parental  care and spawning mode).

Figure 2.2: Transitions for the reduced (12-parameter) model. As in Figure 1 in the main text: Sets of connections drawn with the same colours are constrained to identical evolutionary rates (e.g. light gray downward arrows, representing loss of male parental care – all transitions from “pc1” to the corresponding “pc0” state). Top-bottom and left-right transitions, representing loss/gain of parental care and transitions between pair spawning and group spawning, are assumed to be independent of other trait values. Loss/gain of accessory glands (diagonal arrows), which are our primary interest, are all estimated separately (8 different colours for arrows representing loss/gain of accessory glands under the 4 different combinations of the other two traits, parental care and spawning mode).

The transitions denoted A-D all represent gains of accessory glands, under different combinations of male care/spawning mode. Arrows A and B both represent AG gains in states with pair spawning (sc0); C and D both represent states in states with group spawning (sc1). Thus rate(pair) = (rate(A) + rate(B))/2 is the average rate of AG gain under pair spawning; rate(group) = (rate(C) + rate(D))/2 is the average rate under group spawning; and rate(group)-rate(pair) is the contrast, which we can think of as “the effect of group vs pair spawning on the rate of AG gain”. We can similarly compute other combinations such as rate(no male care) (pc0) = (rate(B) + rate(D))/2; rate(male care) (pc1) = (rate(A) + rate(C))/2; and contrast(male care) = rate(male care) - rate(no male care).

2.3 Contrast matrix

The contrast matrix connects the parameters we estimate (e.g. loss.ag_pc0_sc0, loss rate of accessory glands when parental care and sperm competition are both absent) to the biological effects we are interested in (e.g. pc_loss, the effect of parental care on the loss rate of accessory glands). For example, the intercept_loss effect is the (unweighted) average of the four different loss terms; the pc_loss effect is the difference between the average loss rate when male care is present (second row, red squares) and the average loss rate when male care is absent (second row, blue squares).

Contrast matrix defining translation from internal rate parameters (columns, corresponding to transition rates in Figure 1.3) to meaningful contrasts (see figures below). Gain and loss terms for group spawning and male care not shown, as these are not translated into contrasts.

Figure 2.3: Contrast matrix defining translation from internal rate parameters (columns, corresponding to transition rates in Figure 1.3) to meaningful contrasts (see figures below). Gain and loss terms for group spawning and male care not shown, as these are not translated into contrasts.

Here is an equivalent numeric matrix:

\[ \left[ \begin{array}{rrrrrrrr} 1/4 & 1/4 & 1/4 & 1/4 & . & . & . & . \\ -1/2 & -1/2 & 1/2 & 1/2 & . & . & . & . \\ 1/2 & -1/2 & -1/2 & 1/2 & . & . & . & . \\ -1/2 & 1/2 & -1/2 & 1/2 & . & . & . & . \\ . & . & . & . & 1/4 & 1/4 & 1/4 & 1/4 \\ . & . & . & . & -1/2 & 1/2 & -1/2 & 1/2 \\ . & . & . & . & -1/2 & -1/2 & 1/2 & 1/2 \\ . & . & . & . & 1/2 & -1/2 & -1/2 & 1/2 \\ \end{array} \right] \]

3 Priors

Here we illustrate our general strategy for constructing priors: pick sensible lower and upper “bounds” (not really bounds, but values we would consider extreme) and use a Normal prior on the log-hazard scale (equivalent to a log-Normal prior on the hazard scale) with a mean halfway between the lower and upper bounds and a standard deviation \(\sigma = (\log(\textrm{upper})-\log(\textrm{lower}))/(2 \cdot \textrm{range})\) In our model we use a range of 3 (i.e. the lower and upper bounds are at \(\pm 3 \sigma\)), to denote that the lower and upper bounds we specify are extreme (99.7% ranges). The figure below uses range = 2 (which would correspond to 95.4% ranges) so the tails are more clearly visible.

Schematic of prior definition in terms of lower and upper tails of a Gaussian distribution. Left, linear scale; right, log scale

Figure 3.1: Schematic of prior definition in terms of lower and upper tails of a Gaussian distribution. Left, linear scale; right, log scale

By sampling directly from the prior and computing contrasts, we can confirm that the priors on the contrasts (ratio of AG gain/loss rates depending on male care and sperm competition) are indeed neutral (centered at 1.0) and reasonable (95% confidence intervals extend from 10x slower to 10x faster). The intercept terms for gain and loss represent the average evolutionary rates across all states, and are measured in units of expected numbers of events across the entire tree. For example, the prior expected number of gains of AG across the whole tree is 33.4 (95% CI: 10.6-104).

95% CIs for prior distributions of contrasts

Figure 3.2: 95% CIs for prior distributions of contrasts

4 Statistical summaries

This section presents the quantitative results in more detail.

4.1 Bayesian

Here are the estimated posterior median and confidence intervals, along with \(p_\textrm{MCMC}\) (twice the minimum tail probability) and the “probability of direction” (\(p_d\)), the probability that the parameter has the same sign as the median (\(p_d\) is an index of how clearly the sign of the parameter is known: \(p_\textrm{MCMC} = 2(1-p_d)\)) (Makowski et al. 2019; Shi and Yin 2021).

contrast rate median lwr upr pMCMC pd
intercept gain 18.137 8.834 34.901 NA NA
pc gain 5.755 1.352 23.492 0.019 0.991
pcxsc gain 0.511 0.119 2.118 0.355 0.823
sc gain 0.254 0.060 1.056 0.061 0.970
intercept loss 68.239 27.553 174.638 NA NA
pc loss 0.966 0.133 5.667 0.970 0.515
pcxsc loss 0.496 0.074 2.832 0.443 0.779
sc loss 0.687 0.117 4.698 0.684 0.658

Here are the marginal posterior distributions for all of the contrasts, including fits to both the full tree-block data set and the subset of the data that includes only species with genetic data (“fishphylo”). (The latter is useful for comparison with the MLE results, which cannot use the Bayesian method of sampling across treeblock phylogenies.) The treeblock estimates are slightly more precise (i.e. the confidence intervals are narrower), as expected since they can make use of a larger data set. Points show the posterior median, Line ranges represent 95% credible intervals.

4.2 Frequentist

4.2.1 Likelihood ratio tests

For any pair of nested models we can do a likelihood ratio test, comparing the model fits and number of parameters. We fit restricted models with AG evolution independent of both sperm competition and male care (indep); dependent on male care but not sperm competion (pc); dependent on sperm competition only (sc); depending additively on both traits (i.e., the effects of the trait status on evolutionary rates are added, on the log scale - this model has 10 parameters); and with an interaction between traits (the full 12-parameter model that is our primary focus).

Testing (most) nested pairs gives this diagram (we omit the comparison between pcsc and indep):

likelihood ratio test comparisons

Figure 4.1: likelihood ratio test comparisons

The p-values quoted in the paper are from comparing the full model with interactions to the pc and sc models. These comparisons are analogous to “type 3” tests in ANOVA (Fox and Weisberg 2018), which are the most commonly used approach in the biological literature. “Type 2” tests (as implemented in the car package for R) instead compare the additive model to the models with only one main effect: these tests suggest that the effects of both sperm competition and male care are statistically significant, although the evidence for the effect of male care is still stronger.

desc ddf ddev pval
pcsc interaction 2 0.040 0.980
sc3 sc (type III) 4 8.519 0.074
sc2 sc (type II) 2 8.479 0.014
pc3 pc (type III) 4 9.918 0.042
pc2 pc (type II) 2 9.878 0.007

(ddf: difference in number of parameters between models; ddev: change in deviance (\(-2 \log(L)\)); pval: likelihood ratio test p-value)

4.2.2 Parameter estimates/CIs

We can look at the parameter estimates from maximum likelihood fits, although most of the confidence intervals of the unregularized fits are undetermined because of the failure of the Wald estimates. All confidence intervals more extreme than \(\exp(\pm 20)\) are set to 0 or \(\infty\).

Profile confidence intervals could give slightly better results, but these turds may be not worth polishing very much …

method term estimate lwr upr
pcsc pc_loss 0.273 0.000 Inf
pcsc sc_loss 3.668 0.000 Inf
pcsc pcxsc_loss 0.000 0.000 Inf
pcsc pc_gain 2.664 0.000 Inf
pcsc sc_gain 0.003 0.000 Inf
pcsc pcxsc_gain 0.375 0.000 Inf
pcsc_prior pc_loss 0.973 0.170 5.550
pcsc_prior sc_loss 0.654 0.117 3.652
pcsc_prior pcxsc_loss 0.612 0.109 3.452
pcsc_prior pc_gain 4.156 1.075 16.069
pcsc_prior sc_gain 0.297 0.075 1.177
pcsc_prior pcxsc_gain 0.729 0.181 2.940
pcsc_add pc_loss 9362.489 0.000 Inf
pcsc_add sc_loss 0.001 0.000 Inf
pcsc_add pc_gain 8.349 2.030 34.339
pcsc_add sc_gain 0.001 0.000 Inf

4.2.3 Information-theoretic comparisons

Alternatively, we can make comparisons based on information-theoretic criteria such as AICc, AIC, or BIC. These three criteria differ in the kind and strength of penalty that they use to avoid overly complex models. Researchers choose different criteria depending on their goals - strictly speaking AIC and its variants are for maximizing predictive accuracy while BIC is for choosing among hypotheses. AICc is often recommended for analyses where \(n<40\). Using AICc presents a challenge: for this data set (a phylogeny of 607 species with approximately 20 independent origins of accessory glands), it is difficult to quantify the effective number of observations (the \(n\) in the rule of thumb above, and in the formula for the AICc, is based on a simpler data format where we can assume that every observation is independent). We chose \(n=30\) for this comparison, as a guess at the effective sample size (larger \(n\) will tend to select larger models as better).

Negative log-likelihood, which measures the unpenalized goodness-of-fit, is included for completeness/comparison.

df ΔAICc ΔAIC ΔBIC ΔnegLL
AG dep only on PC 8 0.00 4.48 1.68 4.26
AG dep on PC, SC additively 10 0.24 0.00 0.00 0.02
AG dep only on SC 8 1.40 5.88 3.08 4.96
AG evol indep of PC, SC 6 8.38 16.06 10.46 12.05
AG dep on PC, SC & interaction 12 10.98 3.96 6.76 0.00

As stated in the main text, we can see that AICc gives similar conclusions to the Bayesian and likelihood-ratio-test analyses: the best model has AG evolution depending on male care only, followed closely by the additive model, and fairly closely by sperm competition only (models with \(\Delta\) AICc < 2 are usually interpreted as “roughly equivalent”). In contrast, using AIC selects the additive model as best, and considerably better than the single-factor models. BIC, which like AICc is generally more conservative than AIC, also selects the additive model as best, similar (\(\Delta\) BIC < 2) to the male-care-only model.

5 Sensitivity analyses

We performed a range of different analyses to test the sensitivity of our conclusions to technical choices, and to explore the difference between frequentist (maximum likelihood estimate = MLE), constrained frequentist (maximum a posteriori = MAP), and full Bayesian (MCMC) fitting methods.

Points at the edge of the graph (sc_gain, pc_loss, pcxsc_loss) indicate that the point estimates fell outside of the plotted range.

Intercepts are excluded (as being less biologically interesting, and using a different scale from the contrasts: hazards rather than hazard ratios), as are the estimates of the rates of gain and loss of the non-focal traits (spawning mode and male care).

We can draw a variety of conclusions from these results.

References

Bouckaert, Remco R. 2010. “DensiTree: Making Sense of Sets of Phylogenetic Trees.” Bioinformatics 26 (10): 1372–3. https://doi.org/10.1093/bioinformatics/btq110.

Fox, John, and Sanford Weisberg. 2018. An R Companion to Applied Regression. 3rd edition. Los Angeles: SAGE Publications, Inc.

Makowski, Dominique, Mattan S. Ben-Shachar, S. H. Annabel Chen, and Daniel Lüdecke. 2019. “Indices of Effect Existence and Significance in the Bayesian Framework.” Frontiers in Psychology 10. https://www.frontiersin.org/articles/10.3389/fpsyg.2019.02767.

Rabosky, Daniel L., Jonathan Chang, Pascal O. Title, Peter F. Cowman, Lauren Sallan, Matt Friedman, Kristin Kaschner, et al. 2018. “An Inverse Latitudinal Gradient in Speciation Rate for Marine Fishes.” Nature 559 (7714): 392–95. https://doi.org/10.1038/s41586-018-0273-1.

Shi, Haolun, and Guosheng Yin. 2021. “Reconnecting P-Value and Posterior Probability Under One- and Two-Sided Tests.” The American Statistician 75 (3): 265–75. https://doi.org/10.1080/00031305.2020.1717621.


  1. Hereafter we refer to pair spawning as “sperm competition absent” (sc0) and group spawning as “sperm competition present” (sc1), recognizing that this is an oversimplification.↩︎