So I have to admit, after reading O’Hara 2009, How to make models add up- a primer on GLMMs, I was about ready to give up. I was thinking to myself that it was all just gibberish and that I’d never make sense of all those Greek letters. But this week has given me new hope! Maybe some of that gibberish made it into my brain after all! Or maybe it was a week in class of ANOVAs… Either way, the chapter this week made sense and maybe even cleared up some of what O’Hara was trying to say to me.
So it turns out all these big long complicated models are just modified linear regressions, nothing more, nothing scary. A model, any model, just represents the relationship between some response variable and one or more explanatory variables. This is easy to see when we are doing a linear regression, but when we get to things like Jen’s model on how different nitrogen inputs affect nitrogen loads in Waquoit Bay, taking into account atmospheric deposition, wastewater, removal in septic tanks, animal wastes, well that simplicity gets a little buried in all the details, but it’s still there, it’s still y=ax+b.
Whitlock and Shluter simply leave out all the additional slopes that are built into their models when they write them out and this makes the relationship that the model is trying to show much more straightforward. O’Hara on the other hand, gave me a headache with all of their slopes for each different categorical variable. I see now however, that you can simply set each of those other variables to zero (rending their troublesome slope obsolete!) to look at the relationship between the response variable and a single categorical variable.
I’m also starting to see how ANOVAs and linear models are really paired. An ANOVA represents a series of different F-tests where we eliminate one of the explanatory variables from our model and call that our null. We then compare the fit of that model to a model that incorporates the variable of interest. The F-statistic and p-value associated with it will tell us how significant the change in the model fit is due to the incorporation of this variable. I still want to know: how did O’Hara get those estimates in the tables, I thought these were from ANOVAs, but that not exactly true… It looks like those are t-tests.
Another thing I learned, why you use n-1 degrees of freedom in an ANOVA. Every other statistics course I have every taken has simple told me, “In an ANOVA use n-1 degrees of freedom,” but now I finally understand why. The degrees of freedom actually means something, how many things are “free” to move i.e. how many variables there are that are not “fixed.” In an ANOVA we set a single variable to be our “baseline” and look at how all other variables change in relation to it. This one variable is therefore fixed and we lose that as a degree of freedom.
While I did appreciate Whitlock and Shluter explaining generalized linear models in plain English that I could understand, I thought they took a rather simplistic route to explain this topic. They could have taken out a lot of extra words by eliminating the examples of categorical variables vs. blocking vs. factorial design and instead given us a more general overview of “multiple explanatory variables.” It seems to me that the whole idea is to add in all the terms that are a part of your experimental design, be they blocks that you established as part of your experimental set up or interaction terms that are inherent in factorial design. There’s really no need to explain these analyses separately, they are actually just the same analysis just with more or less variables. In fact the idea of a covariate is also the same. This is something that is not included in our experimental set up, but that we know is unavoidably going to affect our data and therefore needs to be accounted for in our data. We test to see how the covariate affects our model (i.e. does mass really affect energy expenditure? If so include it, if not, don’t. Is there an interaction term? If so include it, if not, don’t.) A more general statement regarding generalized linear models with a detailed description of some of these extra terms you could include and why might make this explanation more comprehensive and streamlined. It could also eliminate any possible confusion to the reader since treatment of these different types of experiments is presented as different tests.
Finally, I want to put in some reminders to myself. These reflections are really just my rewriting of the points I found most useful from any one week’s reading, so here’s a few more points that I want to address:
Things that are built into an experimental design should stay in a model whether they improve the fit or not. If it wasn’t a part of the design however, take out things that don’t enhance c fit.
Be careful of random factors- things that are randomly sampled and not fixed add more error to the model and need to be treated as such in the model fit. This affects how the F-test is run and needs to be specified before calculation. I wonder how it changes it though…
Finally, there are always assumptions to any test. Make sure you’re meeting them! ALWAYS!