Sources

R packages

"ape" (Paradis et al 2004)

"MCMCglmm" (Hadfield, 2010a)

Data

Count data ("data_pois_missing.txt"), Data frame containing the Poisson data with missing values

Phylogeny ("phylo.nex"), Phylogeny file (NEXUS file)

Codes

We will use the same data set as in the OPM section 11.4, but this time, we are missing one half of trait data phen_pois (100 out of 200 species). However, we have a complete data set for the cofactor, which, in this case, was related to missingness; we deleted the 50 trait values that had the 50 lowest values for the cofactor. Therefore, missing data in this data set is MAR. Let's look at the data:

 library(ape)
 library(MCMCglmm)
 phylo<-read.nexus("phylo.nex")
 data<-read.table("data_pois_missing.txt",header=TRUE)
 head(data)
##   phen_pois cofactor phylo
## 1        NA   7.8703  sp_1
## 2        NA   3.4691  sp_2
## 3        NA   2.5479  sp_3
## 4        14  18.2287  sp_4
## 5        NA   2.5303  sp_5
## 6        NA   0.5146  sp_6
NA indicates missing values. It is almost too easy to run a model dealing with missing data in the response variable. We just run an identical model as in the OPM section 11.4. The only difference is that we will use the default node="ALL" rather than node="TIPS":
inv.phylo<-inverseA(phylo,nodes="TIPS",scale=TRUE)
prior<-list(G=list(G1=list(V=1,nu=0.02)),R=list(V=1,nu=0.02))
model_missing<-MCMCglmm(phen_pois~cofactor,random=~phylo,
  family="poisson",ginverse=list(phylo=inv.phylo$Ainv),
  prior=prior,data=data,nitt=5000000,burnin=1000,thin=500)
If you finish running this model, you will get this warning message:
Warning message:
In MCMCglmm(phen_pois ~ cofactor, random = ~phylo, family = "poisson",  :
some combinations in phylo do not exist and 198 missing records have been generated
This is not the data augmentation we are intending to show, but MCMCglmm is using its data augmentation technique to estimate all ancestral values at the internal nodes of the tree (phylo) apart from the root1, assuming MAR2. We now look at the output from this analysis:
summary(model_missing)
## 
##  Iterations = 1001:4999501
##  Thinning interval  = 500
##  Sample size  = 9998 
## 
##  DIC: 615.5 
## 
##  G-structure:  ~animal
## 
##        post.mean l-95% CI u-95% CI eff.samp
## animal    0.0467  0.00239    0.128     8679
## 
##  R-structure:  ~units
## 
##       post.mean l-95% CI u-95% CI eff.samp
## units    0.0399  0.00281   0.0923     8954
## 
##  Location effects: phen_pois ~ cofactor 
## 
##             post.mean l-95% CI u-95% CI eff.samp  pMCMC    
## (Intercept)    -2.246   -2.723   -1.775     9050 <1e-04 ***
## cofactor        0.260    0.234    0.287     9432 <1e-04 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Compare the estimate of cofactor between this and one in the OPM section 11.4 (0.2508, 95% CI $ =$ 0.2287 to 0.2732). Notably, in this MAR missingness in the bivariate data context, complete-case analysis will not result in biased estimates; this is shown in Nakagawa and Freckleton (2008). Thus, we will not run such analysis.

Footnotes

... root1
This tree phylo has 200 tips and so it has 199 nodes ( $ n_{node}=n_{tip}-1$). MCMCglmm augmented 198 missing values, which corresponds to $ 199-1$ nodes (the number of nodes without the root).
... MAR2
MCMCglmm uses the MAR assumption in this case, by assuming missingness depends on phylogeny (see section 11.3.2 in the main text).