Nearing Convergence

I’ve been rereading our class blog – so many great pieces and smart colleagues doing interesting work – and I’m dangerously close to getting nostalgic as we come to the end of the term here. Some final thoughts:

As much as I continue to believe R gets its name from its tendency to make one shout “Argh!!” when once again some cryptic error message appears, it is clear from wandering the Internet that R has claimed the statistical throne within the realm of biological/ecological sciences. Having begun the R learning process (which obviously never ends given that the language is a moving target, harrumphs Lucy) will stand us all in good stead as we move forward in our research.

As Sean wrote about last week, I too am struggling with design issues, hoping to be able to do solid research work this summer. When I told my advisor Bob Chen how I was trying to learn enough not discover I had “done it all wrong” after the summer, without missing a beat Bob matter-of-factly said “Oh, you will [do it all wrong]. Count on it.” It was not a criticism or warning, just the blunt observation of a professor who has done and seen a lot of research.

So I’m plugging ahead with my goal of good study design, but doing so knowing that inevitably I’ll discover the weaknesses after the fact, and that the real education begins at that point. I find encouragement in the knowledge that failure is inevitably a part of the journey; that others, far smarter researchers than I, have fallen face-first into the mud, learned from it, picked themselves up, salvaged something from the wreckage, as Jarrett so wryly commented, and carried on to better work. We all will too. But I think we now come strengthened with some powerful knowledge and insights from 607.

Best wishes to all my classmates – may your algorithms converge, may your models fit, and may those wretched p-values allow you to reject the null hypothesis.

Posted in Uncategorized | Leave a comment

climate

Nate Silver is obviously incredibly smart when it comes to modeling/statistics, but after reading the chapter about climate change, more so than before, I admire his ability to communicate effectively with his audience.  I would like to naively think that climate change deniers just have not had their “ah-ha” moment.  They haven’t read that right piece of literature that clarifies everything for them, or found that bit of information that brings everything to light.  I think Silver’s chapter on climate change should be assigned reading for every politician who will ever vote on anything even remotely related to carbon emissions, alternative energy, or measures for dealing with sea level rise.  On the day they are being sworn into office I’m assuming each politician gets a fancy pen, a bottle of champagne, and now add a copy of Silver’s book to the gift bag.  Once again, Silver is able to eloquently point out the noise confounding the signal.  Now, I’m assuming that part of the reason he is able to find the noise surrounding the signal with such apparent simplicity is 1.) Because he is extremely intelligent in his field but 2.) Maybe a little bit of it is hindsight being 20/20.  Once you’ve been able to test a climate model, and can see that it’s off by .5 degrees a century, than looking back at it your work it may be easier to find the errors in your ways.

Recently, I read an editorial in the Globe by John Sununu, an occasional contributor.  Unfortunately for John, he is my most recent scapegoat, but hey, he did it to himself.  The article was discussing the Keystone Oil Pipeline, but what sticks out in my mind was his opinion that climate change should not be part of the Keystone debate, and to further his agenda he references the blizzard in Buffalo.  I swear, the first person, this winter, that says “climate change isn’t happening because it’s cold outside” is going to get smacked.  This is exactly the noise that Silver is trying to warn us against when he is discussing initial condition uncertainty.  Long term climate change phenomena can be masked by day to day events (like a volcanic eruption, or a blizzard in November).   Nate also alludes to self-canceling predictions as a reason to why some climate change models maybe incorrect.  The models were built under a doomsday scenario, assuming carbon emissions would continue to increase.  Luckily, there has been some effort by the European Union to mitigate carbon emissions, which is the likely explanation for the errors in the IPCC’s climate change forecast.

Posted in Uncategorized | 2 Comments

Frozen in indecision

I wanted to talk a little about designing experiments, given the Hurlbert paper and our class discussion. The problem I have been coming up with is that the experimental methodology I lay down in the next few weeks is the one that has to be replicated over the next 3-4 years. I find myself frozen with inaction – asking questions at every turn (Is this a good reference site to compare with? How far apart do the sites need to be to constitute independence? Is my sampling design adequate to cover the entire population? Is this all just psuedoreplication?) There is also the goal of the statistical ideal – which isn’t always apparent ahead of time – and even when it is, it runs head-long into the real world of time/money/bureaucratic regulations. It doesn’t help to know that the examination of a time based phenomenon (in this case, an intensive system-wide restoration) is a one-shot deal – screw it up and you have nothing worth comparing at all.

On the other hand, there’s a weirdly comforting thought that accompanies our discussion of ANCOVA. The notion that even if things aren’t ideal, even if your sites arent completely independent, statistics is a living field which can help you to work with what you have and get the most out of the data you collected. While this isn’t a license to half-ass a design, it does help you to breathe and put the pencil back to the paper – start with your question, describe your conceptual model, and design for the best case scenario. Most importantly, be candid about the aspects in which your design is good and in which it is weak.

Methodology to me seems terrifying because in published papers, it is presented as an extremely polished, clear cut design that seems to have been self evident to the authors. In reality, it was probably the result of significant thought and recognition of real world limitations. Thinking that a bad design can dash your hopes and dreams can be paralyzing to the point of inaction, but the counter point is trusting that good practices and good faith can result in some form of a result, even if its not as powerful as you might have hoped when starting out.

Posted in Uncategorized | 4 Comments

Surfing the Pareto Principles of Our Lives

I’d like to comment on the philosophy of effort, the statistics of diminishing returns, and the intractable conundrum of the intellectual equivalent of quantum superposition.  I should explain.  I recently came across a page in “The Signal and the Noise” that seemed to epitomize everything about why I’ve been banging my head against the wall for the last month, chasing after rabbits with pocket-watches underground and getting as far as the Red Queen running across an endless chessboard.

Screen Shot 2014-12-02 at 1.43.37 AM

On page 312, Figure 10-6, Mr. 538 Uber-Nerd shows us what he calls the Pareto Principle of Prediction.  The basic idea is that with a 20% effort, or experience, 80% accuracy or skill can be acquired.  With increasing effort comes increased accuracy, or skill, at diminishing returns yielding a relatively steep log curve.  Mr. Silver uses this basic idea to explain his experience with online poker, the possibility of beating the stock market, the fortune to be slightly more prescient than the political geniuses on the McLaughlin Group and Billy Bean’s ability to field a winning team that deserves to be described by their own nickname (“Athletic”s).    Unfortunately, he forgot another important example: the ability to beat the poor odds of passing all of your classes while finishing your RA (or TA) work and theoretically writing your MS or PhD proposal.   And before you think I’m about to go off on a tirade about how hard my (our) life (lives) is (are), that’s not quite it.   We’re all fortunate to be in an incredibly challenging, occasionally inspiring community of mutual learning, teaching and discovering.

The issue is diminishing returns, and the wisdom to properly divert limited resources to achieve diverse goals.  Every day we all confronted with choices, armed with a wide variety of predictor variables (the length of the reading for class, the length of time necessary to properly research this paper we have to write, the number of days until a particular assignment is due) to properly acheive a set of response variables (earn a grade, impress a professor, keep open the possibility of one day walking out of here with a degree).  What combination of predictor variables do we need to put into the model in order to accurately achieve our goals? Is Nate suggesting that is we put in 20% of the effort in everything, than we can squeeze by with a B- average in everything, making it just above water level (Figure 10-7)?  Is it worth our time to put more effort in a particular area, yielding diminishing returns, and potentially dropping below the water line in a different class, project, thesis?

Screen Shot 2014-12-02 at 1.43.55 AM

And therein lies my biggest personal problem.  All of us are here because we’re able to acheive intellectually at a fairly high level, and competent enough to divide our time between academics, life-goals and the occasional White Russian on the weekend (the statistician Abides).   But maybe some things in life are worth more than a 20% effort, just sliding by.  When I study something, research something, invest in something, I want to know, experience the root of the problem, really understand the complexity of the issue.  When Jarrett gives us a problem set with five extra credit problems modeling the number of atoms in the universe using R-code that we haven’t really mastered yet, I really really want to figure it out.  I want to know the answer.  Not for any specific ROI grade objective, really, but because that piece of knowledge reveals something interesting and unique about the universe (or at least I hope).   When Dr. Hannigan gives us advanced problem sets in analytical chemistry (I took Natural Waters last semester), I really DO want to be able to calculate the exact atmospheric carbon dioxide concentration during the early Cenozoic based on the chemical equilibria within a single water drop trapped within an Arctic glacial moraine.   The problem is that by attempting to attack the problem, I’m putting myself above the 80-20 rule, where increased effort yields progressively decreasingly impressive results. I keep chasing harder and harder problems down the rabbit hole and across the chessboard without appearing to get any closer to checkmate.   Meanwhile, Robyn knows exactly which assumptions she can make to negate the majority of differential equations required to solve equilibria conditions, and Jarrett knows exactly which line of code to use to smoothly and efficiently approximate a complex model, while I’m left handing in five homework pages of excessive for loops and fully saturated (and weakly predictive) MLRs.

And so I feel torn between the 17th century French polymath Blaise Pascal and the Eastern guru, his holiness, the 14th Dalai Lama.

“Since we cannot know all that there is to be known about anything, we ought to know a little about everything.” – Blaise Pascal

“We have bigger houses but smaller families:                                                                                        We have more degrees but less sense;                                                                                             more knowledge but less judgements;                                                                                             more experts but more problems;                                                                                                    more medicines, but less healthiness.                                                                                            We’ve been all the way to the moon and back,                                                                                   but we have trouble crossing the street                                                                                                   to meet the new neighbor.                                                                                                                      We build more computers                                                                                                                            to hold more information,                                                                                                                        to produce more copies than ever,                                                                                                       but we have less communication.                                                                                                             We have become long on quantity                                                                                                           but short on quality.                                                                                                                                These are times of fast foods,                                                                                                                   but slow digestion;                                                                                                                                        tall man, but short character;                                                                                                                 steep profits, but shallow relationships.                                                                                                    It is time when there is much in the window                                                                                         but nothing in the room.”

— The Dalai Lama

Should we learn a little about everything as Pascal suggests? Less than 20% of every aspect of our lives in order to stay above the waterline in all of the diverse goals to which we aspire?  Or should we pursue the path suggested by the Dalai Lama, who laments modern attempts to do, experience and own everything all at once, leaving one’s center unbalanced and poorly connected?  Is it worth attempting all of the extra credit assignments when the regular homework problems are still poorly executed?  Those are the questions that stare me in the face when I look at Figure 10-6.  I want to know the underlying framework of the scientific field in which I am engaged; I’m sure all of us feel the same way to some extent.  But at what cost? At what cost comes spending hours and hours and hours working on a single line of R-code to rise above the 80-20 margin, while leaving laser ablation, dissertation proposals and half-finished manuscripts on the backburner.

Which brings me back to the final enigmatic metaphor that I used way back in the first paragraph of this personal exploration.  Quantum theory dictates that a single particle occupies all of its possible quantum states simultaneously.  Said in another way, it literally exists everywhere in the universe, all at the exact same time.  Crazy.   But even crazier is the notion of wave function collapse, where a superposition collapses to a single state (location) of existence after interaction with an observer.  As soon as the particle is observed, if no longer exists everywhere, it only exists in one location in space, as is observed in everyday life.  Is that our choice, that we can try to understand everything all at once, to invest a tiny tiny bit in everything all across the spectrum of the universe, and not focusing on one specific thing entirely? And that attempt, like superposition, can only exist in the absence of interaction, of being observed?  As we so often try to delve into the deepest, most complex theories and problems while in our labs at 2 AM when everyone else has gone home, while we get lost in the vast plateau of excruciatingly incremental achievement above the 80-20 line?  And maybe interaction with collegues, like an observed quantum particle, forces us to singularity, where we begin focusing again on more fruitful efforts, below the 80-20 line where incremental increases in effort lead to large leaps in achievement.  Maybe I just need to stop working on fifty lines of R code past midnight on a Friday alone in my office.

What I really need, maybe what we all need, is to learn to successfully navigate the 80-20 conundrum of our lives.  How should we divide our time? Should we try to do more, with less, or to invest in less, with more effort? Can we balance, like a sub-atomic particle, popping instantaneously between superposition and singularity? Can we dream of becoming leading researchers in our fields while also going home early to cook our own meals in our own homes and spending quality time with the people who mean the most to us in our lives? Maybe it’s not worth trying to jumping above the waterline like a breaching humpback whale when treading water at the 80-20 line allows us to do many diverse things, just ok.  And maybe winning the poker hand (grant?) depends on a steady pool of “fish” (failing undergrads, uncreative researchers) as Figure 10-8A suggests.  Or maybe it’s not a zero-sum game. Or maybe the cost of living above the 80-20 line (becoming the best in the world at what you do) has it’s own hidden implied costs:

“It’s better to burn out                                                                                                                                  Than to fade away                                                                                                                                           My, my, hey, hey.”

- Neil Young, Kurt Cobain

Thetis: If you stay in Larissa, you will find peace. You will find a wonderful woman, and you will have sons and daughters, who will have children. And they’ll all love you and remember your name. But when your children are dead, and their children after them, your name will be forgotten… If you go to Troy, glory will be yours. They will write stories about your victories in thousands of years! And the world will remember your name. But if you go to Troy, you will never come back… for your glory walks hand-in-hand with your doom. And I shall never see you again.

I know, I know.  Now I’m just getting ridiculous.  But food for thought.  And best wishes navigating your own 80-20 curves, and all of the many sacrifices, and gifts, life has to offer.  Good luck!

Posted in Uncategorized | Leave a comment

R Immersion

As the end of the semester comes near I have found myself reflecting on how far I’ve come since the first class. This was the first graduate class I have taken and after a 2 year hiatus from any school at all I had to make quite the adjustment as I work a full time research job in tandem with class. I had no experience doing any coding at all and I thought I knew a sufficient amount of statistics from my data preparation and embarrassing p-value conclusions in excel. When the R coding conversations kicked off and I still was having trouble knitting a pdf output I became very nervous.

I realized that coding in R was like learning a new language and it reminded me of stories my father told me about moving to Italy in the 1970s without knowing a word of Italian when English was not very common there. He took a language immersion class to learn Italian as quickly as possible in which they spoke only Italian. He became a fluent Italian speaker in only a matter of months because you essentially force yourself to learn it when it’s the only option.  Anyways, I tried to treat our class and my time working on homework in this style of thinking in code and ultimately putting the code and new statistics knowledge together piece by piece.

Most of my research experience is in the biotech realm of protein and molecular biology. However, under the same premise as learning code I have actually learned a lot about ecology and environmental biology just from the related readings and surrounding ecologists in class. I have noticed that ecology studies can have a lot more variables at play and this often requires the researcher to take a step back and assess the best way to analyze the data. In my research, I have found myself often pushed in the direction of regimented data analysis to determine a “significant” or “insignificant” difference based on a .05 p-value regardless of the situation.  Although data generated in an in vitro laboratory study is usually more controlled and intentionally contains fewer variables than most field studies it is still valuable to think about the relationships and different models.

This default data analysis style I think is dangerous and incorrect because just like science, statistics is an evolving and improving field. This was said perfectly in the introduction of the Ecology Special Section on P Values Forum, “We also need to remember that ‘‘statistics’’ is an active research discipline, not a static tool-box to be opened once and used repeatedly… Continual new developments in statistics allow not only for reexamination of existing data sets and conclusions drawn from their analysis, but also for inclusion of new data in drawing more informative scientific inferences.”

Prior to this class I admittedly thought of statistics as always possessing a right way to do things like many people in immunology and molecular biology, including well-published scientists. Thankfully, being surrounded by ecologists and thinking about complicated data sets has allowed me break free of this thought process and the robot analysis style.

Posted in Uncategorized | 3 Comments

Pseudo-UH-OH

In a convenient follow up to Chris’ post “replication schmeplication,” and in preparation for our Hurlbert conversation next week, I thought I’d discuss a paper from 2012 that really got me thinking about climate change manipulation experiments. I have always been interested in ocean acidification (OA), which (in case you are unfamiliar) is when excess CO2 from the atmosphere is absorbed into the ocean, leading to changes in carbonate chemistry and acidic conditions. Scott Doney from WHOI calls this “The Other CO2 Problem” – and it has recently been wreaking havoc in Puget Sound, where particularly acidic water is being upwelled into shellfish hatcheries.

An informative video can be found here:http://www.pmel.noaa.gov/co2/story/Acidifying+Water+Takes+Toll+On+Northwest+Shellfish

Despite the harmful effects of OA, the topic made it to the top of the list of important stories largely ignored by the media. But scientists haven’t ignored it. From 2000 to 2009, acidification experiments represented >60% of the marine climate change experiments, likely due to its novel and alarming implications (even though geologists have been aware of this phenomenon for decades – sorry, that’s the geochemist in me speaking).

In a similar tone to Hurlbert’s 1984 comment on pseudoreplication, Wernberg et al. (2012) wrote “A decade of climate change experiments on marine organisms: procedures, patterns, and problems” to address the challenges scientists face in marine climate change experiments (henceforth referred to as MCCEs). They compiled papers that made an explicit reference to climate change from 2000 to 2009 (total = 110), and commented on five major issues that need to be improved in order to further understand the potential effects of climate change.

One of the biggest problems was experimental design – 49% of the studies were diagnosed as having “issues with their experimental procedures”, and of those studies, 91% portrayed some form of pseudoreplication. This is a problem, in short, because it limits the inference space and the ability of researchers to extrapolate results.

So back to the OA problem: Most studies test, at the tank level, the effects of OA on say, physiology, calcification, behavior, etc. Depending on funding and resources (which doesn’t look good considering the new role of our favorite senator), it is difficult to ensure appropriate replication. It’s not easy nor cheap to manipulate the carbonate chemistry of multiple tanks (especially when costs are a factor), and so lots of studies will instead place multiple tanks into one large vat of acidified/treated water – hence, pseudoreplicate. Though a colleague of mine, who did his dissertation on OA effects on clownfish, was able to jazz up a unique system that specifically addresses this issue at a fairly low cost: http://mobile.tube.aslo.net/lomethods/free/2013/0485.pdf

I don’t want to jump the gun too much on the potential upcoming discussion in regard to Hulbert’s piece, but I wanted to gauge the class’ experience with pseudoreplication…

(And to briefly allude to factorial design, single factor experiments (e.g. testing the effect of warmth, or OA, or Sea Level Rise) accounted for 65% of all MCCEs. But as we know, there are likely concurrent, multiple stressors that are non-additive, and cannot be well-understood in isolation from one another). Interaction effects!

Reference:

Wernberg, Thomas, Dan a. Smale, and Mads S. Thomsen. “A Decade of Climate Change Experiments on Marine Organisms: Procedures, Patterns and Problems.” Global Change Biology 18, no. 5 (May 02, 2012): 1491–98. doi:10.1111/j.1365-2486.2012.02656.x.

Posted in Uncategorized | Leave a comment

Replication Schmeplication?

In 2010, James Prosser dropped a bomb on the microbial ecology community. He searched through hundreds of articles in the leading microbiological journals, such as Environmental Microbiology and FEMS Microbiology Ecology, on the lookout for papers where scientists were examining microbial diversity using molecular techniques. What he found was an embarrassment to the scientific process and statistical standards.

He found that only 29% of the papers characterizing diversity used true replication.

As pyrosequencing/high-throughput sequencing pave the way for the future of molecular studies, many researchers apparently think that using the shear numbers of sequences produced by these techniques is a proxy for replication. When, in reality, they are analyzing one sample.

This is like polling one person from Wisconsin on their knowledge of cheeses and doing the same for one person in Massachusetts and coming to the conclusion that Wisconsinites are more or less an expert on this dairy product compared to Massachusettsans (although we all know that Wisconsinites would win). You just can’t do this study and expect to be treated like a scientist. However, if you were to take one cheese sample from Wisconsin and one from Massachusetts and perform high-throughput sequencing, which generate a massive list of sequences, and determined relative abundance and diversity measures you may be able to publish something in Microbial Ecology about your findings on the differences between cheeses in these two states.

There seems to be a belief among some scientists that conducting these studies makes them exempt from these standards set in place everywhere else in science.

However, Jay Lennon (2011) put forth the idea that this lack of replication can be made up for by doing proper statistics. You just have to ask the right questions.

Bringing it back to the cheese example, you could do some statistical analyses that examine the sequences derived from your non-replicated samples. These analyses could be something like randomization procedures that test whether the sequences were obtained from the same statistical population.

Although, doing this type of analyses you are not allowed to look beyond those two cheese samples. You can only say the two samples are statistically distinct (and infer that Wisconsin is better).

Yet both Lennon and Prosser believe this lack of replication in microbiology is alarming and it can all be remedied by proper training in the scientific method as well as in teaching biostatistics.

References:

Prosser, J. I. (2010). Replicate or lie. Environmental Microbiology, 12(7), 1806–10.

Lennon, J. T. (2011). Replication, lies and lesser-known truths regarding experimental design in environmental microbiology. Environmental Microbiology, 13(6), 1383–6.

Posted in Uncategorized | 2 Comments