1 Descriptive statistics
- 1.1 Data
2 Parameter definitions
3 Priors
4 Statistical summaries
- 4.1 Bayesian
- 4.2 Frequentist
5 Sensitivity analyses
References

1 Descriptive statistics

1.1 Data

Numbers of species with each set of traits (`0` = absent, `1` = present, `?` = unknown; `pc` = male parental care; `sc` = sperm competition (i.e. group spawning or ARTs); `ag` = accessory glands), depending on whether we use the partial (but fully genetically resolved) tree (`fishphylo`) or the imputed trees from @rabosky_inverse_2018 (`treeblock`).

Figure 1.1: Numbers of species with each set of traits (0 = absent, 1 = present, ? = unknown; pc = male parental care; sc = sperm competition (i.e. group spawning or ARTs); ag = accessory glands), depending on whether we use the partial (but fully genetically resolved) tree (fishphylo) or the imputed trees from Rabosky et al. (2018) (treeblock).

The phylogenies in the treeblock are generated according to an imputation scheme described in Rabosky et al. (2018). We generated 100 stochastic character maps for each of 100 trees; the figure shows the average number of gains and losses per tree (i.e., average gains/losses across the stochastic character maps for each tree).

Figure 1.2: average number of gains and losses of AGs simulated for each tree in the tree block

densitree-style mapping of tree blocks/phylogenetic uncertainty [@bouckaertDensiTree2010]

Figure 1.3: densitree-style mapping of tree blocks/phylogenetic uncertainty (Bouckaert 2010)

2 Parameter definitions

2.1 Transition matrices

As described in the main text, a continuous-time stochastic process for three binary traits has a total of 24 possible transitions where a single one of the three states changes. The figure below shows the full and reduced transition matrices. In the axis tick labels, ag0_pc0_sc1 (for example) refers to the state without AGs (ag0), without male care (pc0), and with group spawning (sc1, loosely “sperm competition present”¹). Parameter 1 quantifies the log-hazard of a switch to pair spawning in this state (i.e., to ag0_pc0_sc0).

The assignment of numbers to rates is arbitrary, numbered in sequence in column-wise order. Knowing the assignment is only necessary when doing low-level computations based on the vector of parameters. The take-home information is which sets of transitions are constrained to be identical in the reduced model; these rates have the same number . In the reduced model (righthand plot), transitions that are assumed to have equal rates - because we assume that all transitions in spawning mode, and all transitions in male care, are independent of the other states - have the same ID number. Colours denote the estimated log-hazard for each parameter when the models are fitted by maximum likelihood. Below, we denote parameter 1 (in the reduced model, which occupies four different cells in the transition matrix) as sc_loss, because it quantifies the loss rate of sperm competition regardless of the states of the other traits; we refer to parameter 3 (for example) as loss.ag_pc0_sc1, the loss rate of AGs in the absence of male care and presence of sperm competition.

Transition matrix for full and reduced (independent evolution of spawning mode and male care) models. Colours of grid squares represent the estimated evolutionary rates (log-hazards) for each transition. Note that the rates for 24-parameter model are extremely uncertain; see sensitivity analysis (last section).

Figure 2.1: Transition matrix for full and reduced (independent evolution of spawning mode and male care) models. Colours of grid squares represent the estimated evolutionary rates (log-hazards) for each transition. Note that the rates for 24-parameter model are extremely uncertain; see sensitivity analysis (last section).

2.2 Transitions

Transitions for the reduced (12-parameter) model. As in Figure 1 in the main text: Sets of connections drawn with the same colours are constrained to identical evolutionary rates (e.g. light gray downward arrows, representing loss of male parental care – all transitions from “pc1” to the corresponding “pc0” state). Top-bottom and left-right transitions, representing loss/gain of parental care and transitions between pair spawning and group spawning, are assumed to be independent of other trait values. Loss/gain of accessory glands (diagonal arrows), which are our primary interest, are all estimated separately (8 different colours for arrows representing loss/gain of accessory glands under the 4 different combinations of the other two traits, parental care and spawning mode).

Figure 2.2: Transitions for the reduced (12-parameter) model. As in Figure 1 in the main text: Sets of connections drawn with the same colours are constrained to identical evolutionary rates (e.g. light gray downward arrows, representing loss of male parental care – all transitions from “pc1” to the corresponding “pc0” state). Top-bottom and left-right transitions, representing loss/gain of parental care and transitions between pair spawning and group spawning, are assumed to be independent of other trait values. Loss/gain of accessory glands (diagonal arrows), which are our primary interest, are all estimated separately (8 different colours for arrows representing loss/gain of accessory glands under the 4 different combinations of the other two traits, parental care and spawning mode).

The transitions denoted A-D all represent gains of accessory glands, under different combinations of male care/spawning mode. Arrows A and B both represent AG gains in states with pair spawning (sc0); C and D both represent states in states with group spawning (sc1). Thus rate(pair) = (rate(A) + rate(B))/2 is the average rate of AG gain under pair spawning; rate(group) = (rate(C) + rate(D))/2 is the average rate under group spawning; and rate(group)-rate(pair) is the contrast, which we can think of as “the effect of group vs pair spawning on the rate of AG gain”. We can similarly compute other combinations such as rate(no male care) (pc0) = (rate(B) + rate(D))/2; rate(male care) (pc1) = (rate(A) + rate(C))/2; and contrast(male care) = rate(male care) - rate(no male care).

2.3 Contrast matrix

The contrast matrix connects the parameters we estimate (e.g. loss.ag_pc0_sc0, loss rate of accessory glands when parental care and sperm competition are both absent) to the biological effects we are interested in (e.g. pc_loss, the effect of parental care on the loss rate of accessory glands). For example, the intercept_loss effect is the (unweighted) average of the four different loss terms; the pc_loss effect is the difference between the average loss rate when male care is present (second row, red squares) and the average loss rate when male care is absent (second row, blue squares).

Contrast matrix defining translation from internal rate parameters (columns, corresponding to transition rates in Figure 1.3) to meaningful contrasts (see figures below). Gain and loss terms for group spawning and male care not shown, as these are not translated into contrasts.

Figure 2.3: Contrast matrix defining translation from internal rate parameters (columns, corresponding to transition rates in Figure 1.3) to meaningful contrasts (see figures below). Gain and loss terms for group spawning and male care not shown, as these are not translated into contrasts.

Here is an equivalent numeric matrix:

\[ \left[ \begin{array}{rrrrrrrr} 1/4 & 1/4 & 1/4 & 1/4 & . & . & . & . \\ -1/2 & -1/2 & 1/2 & 1/2 & . & . & . & . \\ 1/2 & -1/2 & -1/2 & 1/2 & . & . & . & . \\ -1/2 & 1/2 & -1/2 & 1/2 & . & . & . & . \\ . & . & . & . & 1/4 & 1/4 & 1/4 & 1/4 \\ . & . & . & . & -1/2 & 1/2 & -1/2 & 1/2 \\ . & . & . & . & -1/2 & -1/2 & 1/2 & 1/2 \\ . & . & . & . & 1/2 & -1/2 & -1/2 & 1/2 \\ \end{array} \right] \]

3 Priors

Here we illustrate our general strategy for constructing priors: pick sensible lower and upper “bounds” (not really bounds, but values we would consider extreme) and use a Normal prior on the log-hazard scale (equivalent to a log-Normal prior on the hazard scale) with a mean halfway between the lower and upper bounds and a standard deviation \(\sigma = (\log(\textrm{upper})-\log(\textrm{lower}))/(2 \cdot \textrm{range})\) In our model we use a range of 3 (i.e. the lower and upper bounds are at \(\pm 3 \sigma\)), to denote that the lower and upper bounds we specify are extreme (99.7% ranges). The figure below uses range = 2 (which would correspond to 95.4% ranges) so the tails are more clearly visible.

Figure 3.1: Schematic of prior definition in terms of lower and upper tails of a Gaussian distribution. Left, linear scale; right, log scale

By sampling directly from the prior and computing contrasts, we can confirm that the priors on the contrasts (ratio of AG gain/loss rates depending on male care and sperm competition) are indeed neutral (centered at 1.0) and reasonable (95% confidence intervals extend from 10x slower to 10x faster). The intercept terms for gain and loss represent the average evolutionary rates across all states, and are measured in units of expected numbers of events across the entire tree. For example, the prior expected number of gains of AG across the whole tree is 33.4 (95% CI: 10.6-104).

Figure 3.2: 95% CIs for prior distributions of contrasts

4 Statistical summaries

This section presents the quantitative results in more detail.

4.1 Bayesian

Here are the estimated posterior median and confidence intervals, along with \(p_\textrm{MCMC}\) (twice the minimum tail probability) and the “probability of direction” (\(p_d\)), the probability that the parameter has the same sign as the median (\(p_d\) is an index of how clearly the sign of the parameter is known: \(p_\textrm{MCMC} = 2(1-p_d)\)) (Makowski et al. 2019; Shi and Yin 2021).

contrast	rate	median	lwr	upr	pMCMC	pd
intercept	gain	18.137	8.834	34.901	NA	NA
pc	gain	5.755	1.352	23.492	0.019	0.991
pcxsc	gain	0.511	0.119	2.118	0.355	0.823
sc	gain	0.254	0.060	1.056	0.061	0.970
intercept	loss	68.239	27.553	174.638	NA	NA
pc	loss	0.966	0.133	5.667	0.970	0.515
pcxsc	loss	0.496	0.074	2.832	0.443	0.779
sc	loss	0.687	0.117	4.698	0.684	0.658

Here are the marginal posterior distributions for all of the contrasts, including fits to both the full tree-block data set and the subset of the data that includes only species with genetic data (“fishphylo”). (The latter is useful for comparison with the MLE results, which cannot use the Bayesian method of sampling across treeblock phylogenies.) The treeblock estimates are slightly more precise (i.e. the confidence intervals are narrower), as expected since they can make use of a larger data set. Points show the posterior median, Line ranges represent 95% credible intervals.

4.2 Frequentist

4.2.1 Likelihood ratio tests

For any pair of nested models we can do a likelihood ratio test, comparing the model fits and number of parameters. We fit restricted models with AG evolution independent of both sperm competition and male care (indep); dependent on male care but not sperm competion (pc); dependent on sperm competition only (sc); depending additively on both traits (i.e., the effects of the trait status on evolutionary rates are added, on the log scale - this model has 10 parameters); and with an interaction between traits (the full 12-parameter model that is our primary focus).

Testing (most) nested pairs gives this diagram (we omit the comparison between pcsc and indep):

Figure 4.1: likelihood ratio test comparisons

The p-values quoted in the paper are from comparing the full model with interactions to the pc and sc models. These comparisons are analogous to “type 3” tests in ANOVA (Fox and Weisberg 2018), which are the most commonly used approach in the biological literature. “Type 2” tests (as implemented in the car package for R) instead compare the additive model to the models with only one main effect: these tests suggest that the effects of both sperm competition and male care are statistically significant, although the evidence for the effect of male care is still stronger.

	desc	ddf	ddev	pval
pcsc	interaction	2	0.040	0.980
sc3	sc (type III)	4	8.519	0.074
sc2	sc (type II)	2	8.479	0.014
pc3	pc (type III)	4	9.918	0.042
pc2	pc (type II)	2	9.878	0.007

(ddf: difference in number of parameters between models; ddev: change in deviance (\(-2 \log(L)\)); pval: likelihood ratio test p-value)

4.2.2 Parameter estimates/CIs

We can look at the parameter estimates from maximum likelihood fits, although most of the confidence intervals of the unregularized fits are undetermined because of the failure of the Wald estimates. All confidence intervals more extreme than \(\exp(\pm 20)\) are set to 0 or \(\infty\).

Profile confidence intervals could give slightly better results, but these turds may be not worth polishing very much …

method	term	estimate	lwr	upr
pcsc	pc_loss	0.273	0.000	Inf
pcsc	sc_loss	3.668	0.000	Inf
pcsc	pcxsc_loss	0.000	0.000	Inf
pcsc	pc_gain	2.664	0.000	Inf
pcsc	sc_gain	0.003	0.000	Inf
pcsc	pcxsc_gain	0.375	0.000	Inf
pcsc_prior	pc_loss	0.973	0.170	5.550
pcsc_prior	sc_loss	0.654	0.117	3.652
pcsc_prior	pcxsc_loss	0.612	0.109	3.452
pcsc_prior	pc_gain	4.156	1.075	16.069
pcsc_prior	sc_gain	0.297	0.075	1.177
pcsc_prior	pcxsc_gain	0.729	0.181	2.940
pcsc_add	pc_loss	9362.489	0.000	Inf
pcsc_add	sc_loss	0.001	0.000	Inf
pcsc_add	pc_gain	8.349	2.030	34.339
pcsc_add	sc_gain	0.001	0.000	Inf

4.2.3 Information-theoretic comparisons

Alternatively, we can make comparisons based on information-theoretic criteria such as AICc, AIC, or BIC. These three criteria differ in the kind and strength of penalty that they use to avoid overly complex models. Researchers choose different criteria depending on their goals - strictly speaking AIC and its variants are for maximizing predictive accuracy while BIC is for choosing among hypotheses. AICc is often recommended for analyses where \(n<40\). Using AICc presents a challenge: for this data set (a phylogeny of 607 species with approximately 20 independent origins of accessory glands), it is difficult to quantify the effective number of observations (the \(n\) in the rule of thumb above, and in the formula for the AICc, is based on a simpler data format where we can assume that every observation is independent). We chose \(n=30\) for this comparison, as a guess at the effective sample size (larger \(n\) will tend to select larger models as better).

Negative log-likelihood, which measures the unpenalized goodness-of-fit, is included for completeness/comparison.

	df	ΔAICc	ΔAIC	ΔBIC	ΔnegLL
AG dep only on PC	8	0.00	4.48	1.68	4.26
AG dep on PC, SC additively	10	0.24	0.00	0.00	0.02
AG dep only on SC	8	1.40	5.88	3.08	4.96
AG evol indep of PC, SC	6	8.38	16.06	10.46	12.05
AG dep on PC, SC & interaction	12	10.98	3.96	6.76	0.00

As stated in the main text, we can see that AICc gives similar conclusions to the Bayesian and likelihood-ratio-test analyses: the best model has AG evolution depending on male care only, followed closely by the additive model, and fairly closely by sperm competition only (models with \(\Delta\) AICc < 2 are usually interpreted as “roughly equivalent”). In contrast, using AIC selects the additive model as best, and considerably better than the single-factor models. BIC, which like AICc is generally more conservative than AIC, also selects the additive model as best, similar (\(\Delta\) BIC < 2) to the male-care-only model.

5 Sensitivity analyses

We performed a range of different analyses to test the sensitivity of our conclusions to technical choices, and to explore the difference between frequentist (maximum likelihood estimate = MLE), constrained frequentist (maximum a posteriori = MAP), and full Bayesian (MCMC) fitting methods.

mcmc_tb: Bayesian MCMC using neutral priors on the transition rates and biologically informed priors on the gain/loss rates; this is the primary model presented in the main text
mcmc_0: as mcmc_tb, but using only the completely resolved phylogeny (i.e., only including species for which genetic data is available)
mcmc_tb_nogainloss: as mcmc_tb, but using only the priors on the transition rates, omitting the priors on the gain/loss rates
full: a model estimating all 24 transition rates, i.e. allowing for the rates of evolution of male care and sperm competition to depend on each other and on AG
model_pcsc: MLE fit of the 12-parameter model
model_pcsc_add: MLE fit of the additive (10-parameter) model
model_pcsc_prior: as model_pcsc, but adding the priors used in the Bayesian model (this is a regularized or MAP estimate)

Points at the edge of the graph (sc_gain, pc_loss, pcxsc_loss) indicate that the point estimates fell outside of the plotted range.

Intercepts are excluded (as being less biologically interesting, and using a different scale from the contrasts: hazards rather than hazard ratios), as are the estimates of the rates of gain and loss of the non-focal traits (spawning mode and male care).

We can draw a variety of conclusions from these results.

mcmc_tb vs. mcmc_0: as illustrated above, the estimates from the full tree-block data set and the reduced (“fishphylo”) data set are similar, although the tree-block estimates are slightly more precise.
mcmc_tb vs. mcmc_tb_nogainloss: leaving out the gain/loss priors doesn’t make much difference; it slightly weakens the SC estimate and strengthens the PC estimate
mcmc_tb vs. full: although the results from the full model are generally of the same sign as from the reduced (12-parameter) model, they are much less certain. None of the signs of the effects are statistically clear (\(p_{\textrm{MCMC}} > 0.05\) for all estimates).
mcmc_tb vs. model_pcsc: the 12-parameter model is too poorly constrained to draw conclusions from the parameter estimates; this agrees with the likelihood ratio and IC tests above, which indicate that the additive model is preferred to the interaction model.
model_pcsc vs. model_pcsc_add: constraining the model slightly by removing the interaction terms helps a little bit - now we get a point estimate and confidence intervals for the pc_gain parameter that agrees reasonably well with the full Bayesian fits - but most of the other parameters of interest are still unidentifiable (large confidence intervals/failure of the Wald approximation). The estimates of the rates of gain and loss of the non-focal traits are well identified, but not of interest.
mcmc_tb vs. model_pcsc_prior: when we add the prior information to the MLE fit, we get point estimates, and CIs, that are very close to those from the primary (mcmc_tb) model.

References

Bouckaert, Remco R. 2010. “DensiTree: Making Sense of Sets of Phylogenetic Trees.” Bioinformatics 26 (10): 1372–3. https://doi.org/10.1093/bioinformatics/btq110.

Fox, John, and Sanford Weisberg. 2018. An R Companion to Applied Regression. 3rd edition. Los Angeles: SAGE Publications, Inc.

Makowski, Dominique, Mattan S. Ben-Shachar, S. H. Annabel Chen, and Daniel Lüdecke. 2019. “Indices of Effect Existence and Significance in the Bayesian Framework.” Frontiers in Psychology 10. https://www.frontiersin.org/articles/10.3389/fpsyg.2019.02767.

Rabosky, Daniel L., Jonathan Chang, Pascal O. Title, Peter F. Cowman, Lauren Sallan, Matt Friedman, Kristin Kaschner, et al. 2018. “An Inverse Latitudinal Gradient in Speciation Rate for Marine Fishes.” Nature 559 (7714): 392–95. https://doi.org/10.1038/s41586-018-0273-1.

Shi, Haolun, and Guosheng Yin. 2021. “Reconnecting P-Value and Posterior Probability Under One- and Two-Sided Tests.” The American Statistician 75 (3): 265–75. https://doi.org/10.1080/00031305.2020.1717621.

Hereafter we refer to pair spawning as “sperm competition absent” (sc0) and group spawning as “sperm competition present” (sc1), recognizing that this is an oversimplification.↩︎

Supplementary material for ‘Parental care drives the evolution of reproductive accessory glands in ray-finned fishes’

Ben Bolker and Lucas Eckert

27 November 2022