For this post, I am going to talk about one of our readings for this week.
In a long-winded essay, Hurlbert and Lombardi (2009) assert their opinions on the role of P-values in null hypothesis significance testing (NHST). They rally for the movement away from traditional interpretations of P using Fisher (named “paleoFisherian” by the authors) and Neyman-Pearson (a.k.a. Neyman-Pearsonian) decision frameworks. Such frameworks set critical values and use the terms “significant” and “non-significant.” There is also much debate about whether a null hypothesis can ever be accepted based solely on P-values.
Hurlbert and Lombardi’s so-called neoFisherian significance assessments (NFSA) call for some changes in statistical interpretation. Here, I state each component of their NFSA and then include my own two cents.
- Do not set a standard α
- Using a critical value such as α=0.05 may cause issues in interpretation, especially when P-values are close to the critical value
- P-values of 0.045 and 0.055 should be interpreted similarly, however, with α=0.05, these values are interpreted dramatically different (one rejects the null and one does not)
- To standardize or not to standardize? Some argue in favor of critical values to standardize research and make it easier for journals to choose which articles to accept
- Without a critical value, why use a P value at all?
- If the interpretation of P-values becomes subjective, hypothesis testing may succumb to the biases of scientists towards their own research. Interpreting P-values becomes a grey area as P approaches the critical value. Unfortunately, the world does not work in such a black and white manner and we shouldn’t either.
- Without a critical value, how do you test the null hypothesis?
- Do away with the terms “significant” and “not-significant”
- Statistical significance does not imply biological significance
- Note: I find it amusing that the authors want to abolish these terms, yet they include the word “significance” in the title of their newly proposed decision framework
- Is this just semantics?
- When they are gone, new terms will arise. We need some way to express our interpretation of data. Maybe some other set of words will articulate it better someday, but for now, we have a system that scientists are familiar with.
- Do not accept null hypotheses based on P-values
- You shouldn’t feel confident rejecting the null hypothesis with a high P-value, but instead of accepting the null, you pass no judgment at all
- Use evidence other than P-values to help make a determination about hypotheses (e.g. confidence intervals, effect size)
- Interpret significance tests using “three-valued logic”
- The difference between groups is either positive, negative, or no judgment can be made
- Can you ever determine there is no difference when using a P-value?
- Once again, the authors are uncomfortable with accepting null hypotheses based only on P-values
- Effect sizes should always be reported with significance tests
- Don’t use P-values alone to interpret your data
- Use confidence intervals where appropriate
- Once again, don’t rely solely on P-values!
This neoFisherian framework gets at a lot of the problems scientists are faced with in significance testing, and it seems like Hurlbert and Lombardi’s take-home message is that P-values can be useful, but researchers shouldn’t rely too heavily on them. Also, be careful with interpretation. While the authors make a lot of good points, parts of this neoFisherian paradigm may be difficult to implement.