As an explanation for the back tracking – I caught a cold just as we moved into generalized linear models and as I read the notes and try to understand, it’s all a bit Through the Looking-Glass. So, while I play catch up on those notes and the reading for this week, I’d like to take a look back at using strong priors in the Markov Chain Monte Carlo modeling for linear models.

One of the things that struck me in class is that if you use a strong prior, your credible interval shrinks on the slope of the output. However, this shrinking occurs whether the strong prior is a good fit or a bad fit, with similar credible interval ranges even when the slope is off by a factor of 10. If you have the wolf data to play around with:

library(MCMCglmm)

wolves <-read.csv(“./Class notes/wolves.csv”)

stronggoodprior <-list(B=list(mu=c(0,-11), V=diag(c(1e10, 1))))

strongbadprior <-list(B=list(mu=c(0,350), V=diag(c(1e10, 1))))

wolf_mcmc_goodprior <- MCMCglmm(pups ~ inbreeding.coefficient, data=wolves, verbose=FALSE, prior=stronggoodprior)

wolf_mcmc_badprior <- MCMCglmm(pups ~ inbreeding.coefficient, data=wolves, verbose=FALSE, prior=strongbadprior)

wolf_mcmc<- MCMCglmm(pups ~ inbreeding.coefficient, data=wolves, verbose=FALSE)

summary(wolf_mcmc)$solutions

summary(wolf_mcmc_goodprior)$solutions

summary(wolf_mcmc_badprior)$solutions

#No Prior

summary(wolf_mcmc)$solutions[2,3]-summary(wolf_mcmc)$solutions[2,2]

#Good Prior

summary(wolf_mcmc_goodprior)$solutions[2,3]-summary(wolf_mcmc_goodprior)$solutions[2,2]

#Bad Prior

summary(wolf_mcmc_badprior)$solutions[2,3] – summary(wolf_mcmc_badprior)$solutions[2,2]

Now I’m sure that with the bad prior, you could easily see that the data doesn’t at all fit the line, but it seems at least a little concerning that the bad prior should shrink the credible interval so much.

The big question in terms of priors that I have (and light internet digging has proven unhelpful) is how you set your standard deviation for something like a slope parameter. Lets say I want to predict the number of mangoes I will harvest, given the number of mango trees I have. The idea that I think the slope is 5 mangoes/tree is clear enough, and the standard deviation can be determined from the variability of mangoes/tree.

But what about when the parameter isn’t such an easily measured real world example? A good ecological example is metabolic theory: Rate = B*Mass^(Scaling factor). Many say that the scaling factor is 3/4, from the idea of how energy dissipates in a branched network. Others say the factor is 2/3, based on surface area/volume differences in energy dissipation (real arguments are more fleshed out, but let’s go with that for the moment). If I attempted to use a prior in examining metabolic rate data, how do we set the standard deviation of this parameter? Also, given that 2/3 and 3/4 are not radically different from one another, isn’t using a prior basically playing the game with loaded dice?

Indeed, careful application of priors is difficult! I’m a tad unclear on what you mean by setting the prior SD. Do you mean the SD of a prior? There, setting the SD so that values you feel are reasonable is the best guidance. So, if you feel strongly that a scaling factor will be somewhere from 2/3-3/4, setting a prior distribution so that values from, say, 1/2-7/8 are all highly probable would be the way to go. This assumes that you are using a normal distribution for your prior. Remember, this need not be the case – any distribution can be used. So, you may want to set a uniform distribution from 1/3-7/8 for example. This has the advantage of excluding values outside of that range, and providing a flat surface between your two alternate hypotheses.

But, again, buyer beware, and be ready to justify the heck out of your choice of prior!