As an explanation for the back tracking – I caught a cold just as we moved into generalized linear models and as I read the notes and try to understand, it’s all a bit Through the Looking-Glass. So, while I play catch up on those notes and the reading for this week, I’d like to take a look back at using strong priors in the Markov Chain Monte Carlo modeling for linear models.
One of the things that struck me in class is that if you use a strong prior, your credible interval shrinks on the slope of the output. However, this shrinking occurs whether the strong prior is a good fit or a bad fit, with similar credible interval ranges even when the slope is off by a factor of 10. If you have the wolf data to play around with:
wolves <-read.csv(“./Class notes/wolves.csv”)
stronggoodprior <-list(B=list(mu=c(0,-11), V=diag(c(1e10, 1))))
strongbadprior <-list(B=list(mu=c(0,350), V=diag(c(1e10, 1))))
wolf_mcmc_goodprior <- MCMCglmm(pups ~ inbreeding.coefficient, data=wolves, verbose=FALSE, prior=stronggoodprior)
wolf_mcmc_badprior <- MCMCglmm(pups ~ inbreeding.coefficient, data=wolves, verbose=FALSE, prior=strongbadprior)
wolf_mcmc<- MCMCglmm(pups ~ inbreeding.coefficient, data=wolves, verbose=FALSE)
summary(wolf_mcmc_badprior)$solutions[2,3] – summary(wolf_mcmc_badprior)$solutions[2,2]
Now I’m sure that with the bad prior, you could easily see that the data doesn’t at all fit the line, but it seems at least a little concerning that the bad prior should shrink the credible interval so much.
The big question in terms of priors that I have (and light internet digging has proven unhelpful) is how you set your standard deviation for something like a slope parameter. Lets say I want to predict the number of mangoes I will harvest, given the number of mango trees I have. The idea that I think the slope is 5 mangoes/tree is clear enough, and the standard deviation can be determined from the variability of mangoes/tree.
But what about when the parameter isn’t such an easily measured real world example? A good ecological example is metabolic theory: Rate = B*Mass^(Scaling factor). Many say that the scaling factor is 3/4, from the idea of how energy dissipates in a branched network. Others say the factor is 2/3, based on surface area/volume differences in energy dissipation (real arguments are more fleshed out, but let’s go with that for the moment). If I attempted to use a prior in examining metabolic rate data, how do we set the standard deviation of this parameter? Also, given that 2/3 and 3/4 are not radically different from one another, isn’t using a prior basically playing the game with loaded dice?