For this post, I am going to talk about one of our readings for this week.

In a long-winded essay, Hurlbert and Lombardi (2009) assert their opinions on the role of P-values in null hypothesis significance testing (NHST). They rally for the movement away from traditional interpretations of P using Fisher (named “paleoFisherian” by the authors) and Neyman-Pearson (a.k.a. Neyman-Pearsonian) decision frameworks. Such frameworks set critical values and use the terms “significant” and “non-significant.” There is also much debate about whether a null hypothesis can ever be accepted based solely on P-values.

Hurlbert and Lombardi’s so-called neoFisherian significance assessments (NFSA) call for some changes in statistical interpretation. Here, I state each component of their NFSA and then include my own two cents.

- Do not set a standard α
- Using a critical value such as α=0.05 may cause issues in interpretation, especially when P-values are close to the critical value
- P-values of 0.045 and 0.055 should be interpreted similarly, however, with α=0.05, these values are interpreted dramatically different (one rejects the null and one does not)
- To standardize or not to standardize? Some argue in favor of critical values to standardize research and make it easier for journals to choose which articles to accept
- Without a critical value, why use a P value at all?
- If the interpretation of P-values becomes subjective, hypothesis testing may succumb to the biases of scientists towards their own research. Interpreting P-values becomes a grey area as P approaches the critical value. Unfortunately, the world does not work in such a black and white manner and we shouldn’t either.
- Without a critical value, how do you test the null hypothesis?

- Do away with the terms “significant” and “not-significant”
- Statistical significance does not imply biological significance
- Note: I find it amusing that the authors want to abolish these terms, yet they include the word “significance” in the title of their newly proposed decision framework
- Is this just semantics?
- When they are gone, new terms will arise. We need some way to express our interpretation of data. Maybe some other set of words will articulate it better someday, but for now, we have a system that scientists are familiar with.

- Do not accept null hypotheses based on P-values
- You shouldn’t feel confident rejecting the null hypothesis with a high P-value, but instead of accepting the null, you pass no judgment at all
- Use evidence other than P-values to help make a determination about hypotheses (e.g. confidence intervals, effect size)

- Interpret significance tests using “three-valued logic”
- The difference between groups is either positive, negative, or no judgment can be made
- Can you ever determine there is no difference when using a P-value?
- Once again, the authors are uncomfortable with accepting null hypotheses based only on P-values

- Effect sizes should always be reported with significance tests
- Don’t use P-values alone to interpret your data

- Use confidence intervals where appropriate
- Once again, don’t rely solely on P-values!

This neoFisherian framework gets at a lot of the problems scientists are faced with in significance testing, and it seems like Hurlbert and Lombardi’s take-home message is that P-values can be useful, but researchers shouldn’t rely too heavily on them. Also, be careful with interpretation. While the authors make a lot of good points, parts of this neoFisherian paradigm may be difficult to implement.

Why would it be difficult to implement? What do YOU think is the right way forward?

I think eliminating the critical value would be the most difficult to implement. Without a critical value, it will be challenging not only to reject a null hypothesis, but to also defend your decision. As I mention briefly in my post, this may cause scientist bias to take over. Instead, I suggest that we retain a critical value, but keep it as a soft guideline. If a value is close to the critical value, the researcher should rely on other evidence to make a decision regarding the null hypothesis.

Moving forward, I think the most important thing is to supplement the p-value with other tests/evidence. Use other methods to evaluate your data. Show confidence intervals, standard error, means, effect size etc. where appropriate. Use multiple lines of evidence to support your interpretations.