In perfect timing for our Bayes discussion, one of our Senior Science Advisors at work sent along this article by Gelman and Loken (see here: http://www.americanscientist.org/issues/feature/2014/6/the-statistical-crisis-in-science/2) , questioning the value of statistically significant comparisons many of us researchers fall into doing. Calling it the “Statistical Crisis in Science”, Gelman and Loken delve into several papers that have come out with proven statistical significance, but question how much is influenced by the predetermined human predictions, and the data set of which they analyze. They discuss the issue dubbed “p-hacking” of when researchers come up with statistically significant results, but don’t look further at other relationships or comparisons for significance as well.  They are only looking at the hypothesis at hand, and not for the relationships with the best p-value.  The authors claim “the mistake is in thinking that, if the particular path that was chosen yields statistical significance, this is strong evidence in favor of the hypothesis.”

While I agree we need to consider all possible relationships or hypothesis so that we don’t miss the true importance of what is going on, how often can we look at all possibilities that drive us to a result?  The very basis of most of our studies come from a driven question, which focuses on the relationship analyzed.

The idea of statistic significance and how it’s role in our analyses is still clearly a debate among the community. I had a coworker publish a paper that her co-author, a statistician, demanded that she included p-values in her analysis, but she felt the p-values were not relevant to the analysis being discussed, nor did they make much sense when included.

It raises the point that Jarrett made in class yesterday, of it all really comes around to what questions we’re trying to answer.  Really, we can find a relationship (whether its strong or poor) with anything. But depending on the story we’re trying to tell, the tests we use to support our results must be relevant to our study.

This entry was posted in Uncategorized. Bookmark the permalink.

1 Response to P-hacking

  1. Judging by the discussion in class it seems as though “p-hacking” can be a common occurrence in ecology and environmental biology research publications. With the majority of my experience being in industry, although a much different field this can be even more prevalent. When companies invest millions of dollars in certain products they will sometimes do anything for a patent or clinical trial approval. In the past when reading patents or clinical trial reports I have found myself thinking almost exactly the following quote from the article.

    “There are many roads to statistical significance; if data are gathered with no preconceptions at all, statistical significance can obviously be obtained even from pure noise by the simple means of repeatedly performing comparisons, excluding data in different ways, examining different interactions, controlling for different predictors, and so forth.”

    Researchers in pharmaceutical companies will sometimes “massage” the data until it shows statistical significance and in turn potentially proving a drug works with high enough efficacy for a patent and maybe a foot in the door for FDA approval. So I agree, hearing “statistical significance” should sometimes be taken with a grain of salt in many settings.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s