Sources

R packages

"ape" (Paradis et al 2004)

"MCMCglmm" (Hadfield, 2010a)

Data

Phenotypic data ("data_repeat.txt"), Data frame containing the phenotypic data (with multiple measurement)

Phylogeny ("phylo.nex"), Phylogeny file (NEXUS file)

Codes

Let's build up on the example from the last section by adding multiple measurements per species:

  library(ape)
  library(MCMCglmm)
  phylo<-read.nexus("phylo.nex")
  data<-read.table("data_repeat.txt",header=TRUE)
  head(data)

##     phen cofactor species phylo
## 1 107.42   11.224    sp_1  sp_1
## 2 109.16    9.806    sp_1  sp_1
## 3  91.89   10.308    sp_1  sp_1
## 4 121.54    8.355    sp_1  sp_1
## 5 105.32   11.855    sp_1  sp_1
## 6  65.00    4.314    sp_2  sp_2

How can we analyse such a dataset using a phylogenetic mixed model? We only have to add a new random effect taking into account the fact that each species has a ``multiple measurement effect''. Note the new column species, which is identical to the phylo one and will be used for that purpose. First, we will try to fit the model described in Eqn. (6) in the main text, using the specific mean of the cofactor:

  data$spec_mean_cf<-sapply(split(data$cofactor,data$phylo),mean)[data$phylo]

The code is a bit complex because we need to overcome the fact that R is trying to sort alphabetically the species name (which we don't want!), but we are simply calculating the mean of the cofactor for each species and pulling it in a new entry of the data frame. Now, because we have a new random effect, we also need to change our priors. The model would thus be:

inv.phylo<-inverseA(phylo,nodes="TIPS",scale=TRUE)
prior2<-list(G=list(G1=list(V=1,nu=0.02),G2=list(V=1,nu=0.02)),
   R=list(V=1,nu=0.02))
model_repeat1<-MCMCglmm(phen~spec_mean_cf,random=~phylo+species,
   family="gaussian",ginverse=list(phylo=inv.phylo$Ainv),
   prior=prior2,data=data,nitt=5000000,burnin=1000,thin=500)

Since the columns phylo and species have the same content and are called in the same fashion in the model, one would wonder if they are not accounting for the same thing. Actually, a careful inspection of the ginverse argument would show that we are providing the phylogenetic variance-covariance for the phylo effect, but not for the species one. This means that species is here to account for any specific effect that would be independent from the phylogenetic relationship between species (e.g. environmental/niche effects). This is also visible in Eqn. (4) in the main text. Here, because we distinguish these effects from the residual variance, the residual variance units now actually corresponds to the intra-specific variance (which is assumed equal across species). The output summary follows:

summary(model_repeat1)

## 
##  Iterations = 1001:4999501
##  Thinning interval  = 500
##  Sample size  = 9998 
## 
##  DIC: 7187 
## 
##  G-structure:  ~phylo
## 
##       post.mean l-95% CI u-95% CI eff.samp
## phylo       278      153      403     8804
## 
##                ~species
## 
##         post.mean l-95% CI u-95% CI eff.samp
## species      24.4     8.48     41.7     9444
## 
##  R-structure:  ~units
## 
##       post.mean l-95% CI u-95% CI eff.samp
## units      65.8     59.5     72.3     9343
## 
##  Location effects: phen ~ spec_mean_cf 
## 
##              post.mean l-95% CI u-95% CI eff.samp  pMCMC    
## (Intercept)      38.32    23.02    54.50     9998 <1e-04 ***
## spec_mean_cf      5.10     4.89     5.29     9253 <1e-04 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Because our sampling of individuals within species was totally unbiased, the results are similar, except we now have an estimate of the intra-specific variance, which is represented by the residual variance units. We did not, however, use the whole dataset in the previous model. We totally ignored the intra-specific variability of the cofactor. In order to get an estimate for the ``between-species'' and ``within-species'' (see Eqn. (7) in the main text), we need to use the within-group centring technique:

data$within_spec_cf<-data$cofactor-data$spec_mean_cf

We can now use a slightly more elaborate model:

model_repeat2<-MCMCglmm(phen~spec_mean_cf+within_spec_cf,
    random=~phylo+species,family="gaussian",
    ginverse=list(phylo=inv.phylo$Ainv),prior=prior2,data=data,
    nitt=5000000,burnin=1000,thin=500)

The fixed effect spec_mean_cf corresponds to the between-species slope (just as the previous model) and the fixed effect within_spec_cf corresponds to the within-species slope. As usual, we can have a look at the results:

summary(model_repeat2)

## 
##  Iterations = 1001:4999501
##  Thinning interval  = 500
##  Sample size  = 9998 
## 
##  DIC: 7189 
## 
##  G-structure:  ~phylo
## 
##       post.mean l-95% CI u-95% CI eff.samp
## phylo       278      159      409     9998
## 
##                ~species
## 
##         post.mean l-95% CI u-95% CI eff.samp
## species      24.4      8.3     41.7     9998
## 
##  R-structure:  ~units
## 
##       post.mean l-95% CI u-95% CI eff.samp
## units      65.8     59.3     72.4     9998
## 
##  Location effects: phen ~ spec_mean_cf + within_spec_cf 
## 
##                post.mean l-95% CI u-95% CI eff.samp  pMCMC    
## (Intercept)      38.3001  22.5414  53.8096     9998 <1e-04 ***
## spec_mean_cf      5.0988   4.8918   5.2982     9998 <1e-04 ***
## within_spec_cf   -0.0576  -0.4284   0.3061     9998   0.75    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The results are almost unchanged, with apparently no relationship between the phenotype phen and cofactor on the intra-specific level. Finally, we can again calculate $\lambda$ using:

lambda <- model_repeat2$VCV[,'phylo']/
 (model_repeat2$VCV[,'phylo']+model_repeat2$VCV[,'species']+
 model_repeat2$VCV[,'units'])