Replication Schmeplication?

In 2010, James Prosser dropped a bomb on the microbial ecology community. He searched through hundreds of articles in the leading microbiological journals, such as Environmental Microbiology and FEMS Microbiology Ecology, on the lookout for papers where scientists were examining microbial diversity using molecular techniques. What he found was an embarrassment to the scientific process and statistical standards.

He found that only 29% of the papers characterizing diversity used true replication.

As pyrosequencing/high-throughput sequencing pave the way for the future of molecular studies, many researchers apparently think that using the shear numbers of sequences produced by these techniques is a proxy for replication. When, in reality, they are analyzing one sample.

This is like polling one person from Wisconsin on their knowledge of cheeses and doing the same for one person in Massachusetts and coming to the conclusion that Wisconsinites are more or less an expert on this dairy product compared to Massachusettsans (although we all know that Wisconsinites would win). You just can’t do this study and expect to be treated like a scientist. However, if you were to take one cheese sample from Wisconsin and one from Massachusetts and perform high-throughput sequencing, which generate a massive list of sequences, and determined relative abundance and diversity measures you may be able to publish something in Microbial Ecology about your findings on the differences between cheeses in these two states.

There seems to be a belief among some scientists that conducting these studies makes them exempt from these standards set in place everywhere else in science.

However, Jay Lennon (2011) put forth the idea that this lack of replication can be made up for by doing proper statistics. You just have to ask the right questions.

Bringing it back to the cheese example, you could do some statistical analyses that examine the sequences derived from your non-replicated samples. These analyses could be something like randomization procedures that test whether the sequences were obtained from the same statistical population.

Although, doing this type of analyses you are not allowed to look beyond those two cheese samples. You can only say the two samples are statistically distinct (and infer that Wisconsin is better).

Yet both Lennon and Prosser believe this lack of replication in microbiology is alarming and it can all be remedied by proper training in the scientific method as well as in teaching biostatistics.


Prosser, J. I. (2010). Replicate or lie. Environmental Microbiology, 12(7), 1806–10.

Lennon, J. T. (2011). Replication, lies and lesser-known truths regarding experimental design in environmental microbiology. Environmental Microbiology, 13(6), 1383–6.

About calynum

I am an invertebrate and microbe enthusiast currently pursuing my PhD studying microbial ecology and nitrogen cycling processes in salt marshes. Queer scientist located in Boston.
This entry was posted in Uncategorized. Bookmark the permalink.

2 Responses to Replication Schmeplication?

  1. justin4301 says:

    It’s interesting how certain fields will get stuck in certain practices that aren’t totally correct or appropriate if the majority of the community is using them. It always seems to take an individual, or a group to take a step back from the field and ask “why are we doing this?” Similar to the P-value discussion we had the other day in class, it is necessary to step away from whatever work is being done and make sure whatever is being done is the best way to find whatever truth is being determined. Just because, the rest of the community reports data / does things one way, does not always mean that it is the best way to go about it.

  2. I agree that this occurrence of essentially false reporting a high replication is too common and specifically happens in genetics research. I couldn’t help thinking about our p-value discussion as well when I was reading the post and Justin brought up a good point about this. I feel like it goes back to that issue even further and relates to obsession of a high n and the effects of an obese n as many researchers continue to use a p-value approach strictly based on their old school statistics training. This ingrains, especially in a laboratory setting, the desire to always repeat, repeat, repeat. This urge to always repeat experiments, controlled ones at that, I think carries over into a thought process that claims replication in a large database that isn’t true. Like you said, it should all come back to the question being asked.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s