Notes on Learnings About Data Analysis

Nearing Convergence

Posted on December 12, 2014 by coastalsci

I’ve been rereading our class blog – so many great pieces and smart colleagues doing interesting work – and I’m dangerously close to getting nostalgic as we come to the end of the term here. Some final thoughts:

As much as I continue to believe R gets its name from its tendency to make one shout “Argh!!” when once again some cryptic error message appears, it is clear from wandering the Internet that R has claimed the statistical throne within the realm of biological/ecological sciences. Having begun the R learning process (which obviously never ends given that the language is a moving target, harrumphs Lucy) will stand us all in good stead as we move forward in our research.

As Sean wrote about last week, I too am struggling with design issues, hoping to be able to do solid research work this summer. When I told my advisor Bob Chen how I was trying to learn enough not discover I had “done it all wrong” after the summer, without missing a beat Bob matter-of-factly said “Oh, you will [do it all wrong]. Count on it.” It was not a criticism or warning, just the blunt observation of a professor who has done and seen a lot of research.

So I’m plugging ahead with my goal of good study design, but doing so knowing that inevitably I’ll discover the weaknesses after the fact, and that the real education begins at that point. I find encouragement in the knowledge that failure is inevitably a part of the journey; that others, far smarter researchers than I, have fallen face-first into the mud, learned from it, picked themselves up, salvaged something from the wreckage, as Jarrett so wryly commented, and carried on to better work. We all will too. But I think we now come strengthened with some powerful knowledge and insights from 607.

Best wishes to all my classmates – may your algorithms converge, may your models fit, and may those wretched p-values allow you to reject the null hypothesis.

Posted in Uncategorized | Leave a comment

climate

Posted on December 5, 2014 by lauraganley

Nate Silver is obviously incredibly smart when it comes to modeling/statistics, but after reading the chapter about climate change, more so than before, I admire his ability to communicate effectively with his audience. I would like to naively think that climate change deniers just have not had their “ah-ha” moment. They haven’t read that right piece of literature that clarifies everything for them, or found that bit of information that brings everything to light. I think Silver’s chapter on climate change should be assigned reading for every politician who will ever vote on anything even remotely related to carbon emissions, alternative energy, or measures for dealing with sea level rise. On the day they are being sworn into office I’m assuming each politician gets a fancy pen, a bottle of champagne, and now add a copy of Silver’s book to the gift bag. Once again, Silver is able to eloquently point out the noise confounding the signal. Now, I’m assuming that part of the reason he is able to find the noise surrounding the signal with such apparent simplicity is 1.) Because he is extremely intelligent in his field but 2.) Maybe a little bit of it is hindsight being 20/20. Once you’ve been able to test a climate model, and can see that it’s off by .5 degrees a century, than looking back at it your work it may be easier to find the errors in your ways.

Recently, I read an editorial in the Globe by John Sununu, an occasional contributor. Unfortunately for John, he is my most recent scapegoat, but hey, he did it to himself. The article was discussing the Keystone Oil Pipeline, but what sticks out in my mind was his opinion that climate change should not be part of the Keystone debate, and to further his agenda he references the blizzard in Buffalo. I swear, the first person, this winter, that says “climate change isn’t happening because it’s cold outside” is going to get smacked. This is exactly the noise that Silver is trying to warn us against when he is discussing initial condition uncertainty. Long term climate change phenomena can be masked by day to day events (like a volcanic eruption, or a blizzard in November). Nate also alludes to self-canceling predictions as a reason to why some climate change models maybe incorrect. The models were built under a doomsday scenario, assuming carbon emissions would continue to increase. Luckily, there has been some effort by the European Union to mitigate carbon emissions, which is the likely explanation for the errors in the IPCC’s climate change forecast.

Posted in Uncategorized | 2 Comments

Frozen in indecision

Posted on December 2, 2014 by seanmccanty

I wanted to talk a little about designing experiments, given the Hurlbert paper and our class discussion. The problem I have been coming up with is that the experimental methodology I lay down in the next few weeks is the one that has to be replicated over the next 3-4 years. I find myself frozen with inaction – asking questions at every turn (Is this a good reference site to compare with? How far apart do the sites need to be to constitute independence? Is my sampling design adequate to cover the entire population? Is this all just psuedoreplication?) There is also the goal of the statistical ideal – which isn’t always apparent ahead of time – and even when it is, it runs head-long into the real world of time/money/bureaucratic regulations. It doesn’t help to know that the examination of a time based phenomenon (in this case, an intensive system-wide restoration) is a one-shot deal – screw it up and you have nothing worth comparing at all.

On the other hand, there’s a weirdly comforting thought that accompanies our discussion of ANCOVA. The notion that even if things aren’t ideal, even if your sites arent completely independent, statistics is a living field which can help you to work with what you have and get the most out of the data you collected. While this isn’t a license to half-ass a design, it does help you to breathe and put the pencil back to the paper – start with your question, describe your conceptual model, and design for the best case scenario. Most importantly, be candid about the aspects in which your design is good and in which it is weak.

Methodology to me seems terrifying because in published papers, it is presented as an extremely polished, clear cut design that seems to have been self evident to the authors. In reality, it was probably the result of significant thought and recognition of real world limitations. Thinking that a bad design can dash your hopes and dreams can be paralyzing to the point of inaction, but the counter point is trusting that good practices and good faith can result in some form of a result, even if its not as powerful as you might have hoped when starting out.

Posted in Uncategorized | 4 Comments

Surfing the Pareto Principles of Our Lives

Posted on December 2, 2014 by ahonig119

I’d like to comment on the philosophy of effort, the statistics of diminishing returns, and the intractable conundrum of the intellectual equivalent of quantum superposition. I should explain. I recently came across a page in “The Signal and the Noise” that seemed to epitomize everything about why I’ve been banging my head against the wall for the last month, chasing after rabbits with pocket-watches underground and getting as far as the Red Queen running across an endless chessboard.

On page 312, Figure 10-6, Mr. 538 Uber-Nerd shows us what he calls the Pareto Principle of Prediction. The basic idea is that with a 20% effort, or experience, 80% accuracy or skill can be acquired. With increasing effort comes increased accuracy, or skill, at diminishing returns yielding a relatively steep log curve. Mr. Silver uses this basic idea to explain his experience with online poker, the possibility of beating the stock market, the fortune to be slightly more prescient than the political geniuses on the McLaughlin Group and Billy Bean’s ability to field a winning team that deserves to be described by their own nickname (“Athletic”s). Unfortunately, he forgot another important example: the ability to beat the poor odds of passing all of your classes while finishing your RA (or TA) work and theoretically writing your MS or PhD proposal. And before you think I’m about to go off on a tirade about how hard my (our) life (lives) is (are), that’s not quite it. We’re all fortunate to be in an incredibly challenging, occasionally inspiring community of mutual learning, teaching and discovering.

The issue is diminishing returns, and the wisdom to properly divert limited resources to achieve diverse goals. Every day we all confronted with choices, armed with a wide variety of predictor variables (the length of the reading for class, the length of time necessary to properly research this paper we have to write, the number of days until a particular assignment is due) to properly acheive a set of response variables (earn a grade, impress a professor, keep open the possibility of one day walking out of here with a degree). What combination of predictor variables do we need to put into the model in order to accurately achieve our goals? Is Nate suggesting that is we put in 20% of the effort in everything, than we can squeeze by with a B- average in everything, making it just above water level (Figure 10-7)? Is it worth our time to put more effort in a particular area, yielding diminishing returns, and potentially dropping below the water line in a different class, project, thesis?

And therein lies my biggest personal problem. All of us are here because we’re able to acheive intellectually at a fairly high level, and competent enough to divide our time between academics, life-goals and the occasional White Russian on the weekend (the statistician Abides). But maybe some things in life are worth more than a 20% effort, just sliding by. When I study something, research something, invest in something, I want to know, experience the root of the problem, really understand the complexity of the issue. When Jarrett gives us a problem set with five extra credit problems modeling the number of atoms in the universe using R-code that we haven’t really mastered yet, I really really want to figure it out. I want to know the answer. Not for any specific ROI grade objective, really, but because that piece of knowledge reveals something interesting and unique about the universe (or at least I hope). When Dr. Hannigan gives us advanced problem sets in analytical chemistry (I took Natural Waters last semester), I really DO want to be able to calculate the exact atmospheric carbon dioxide concentration during the early Cenozoic based on the chemical equilibria within a single water drop trapped within an Arctic glacial moraine. The problem is that by attempting to attack the problem, I’m putting myself above the 80-20 rule, where increased effort yields progressively decreasingly impressive results. I keep chasing harder and harder problems down the rabbit hole and across the chessboard without appearing to get any closer to checkmate. Meanwhile, Robyn knows exactly which assumptions she can make to negate the majority of differential equations required to solve equilibria conditions, and Jarrett knows exactly which line of code to use to smoothly and efficiently approximate a complex model, while I’m left handing in five homework pages of excessive for loops and fully saturated (and weakly predictive) MLRs.

And so I feel torn between the 17th century French polymath Blaise Pascal and the Eastern guru, his holiness, the 14th Dalai Lama.

“Since we cannot know all that there is to be known about anything, we ought to know a little about everything.” – Blaise Pascal

“We have bigger houses but smaller families: We have more degrees but less sense; more knowledge but less judgements; more experts but more problems; more medicines, but less healthiness. We’ve been all the way to the moon and back, but we have trouble crossing the street to meet the new neighbor. We build more computers to hold more information, to produce more copies than ever, but we have less communication. We have become long on quantity but short on quality. These are times of fast foods, but slow digestion; tall man, but short character; steep profits, but shallow relationships. It is time when there is much in the window but nothing in the room.”

— The Dalai Lama

Should we learn a little about everything as Pascal suggests? Less than 20% of every aspect of our lives in order to stay above the waterline in all of the diverse goals to which we aspire? Or should we pursue the path suggested by the Dalai Lama, who laments modern attempts to do, experience and own everything all at once, leaving one’s center unbalanced and poorly connected? Is it worth attempting all of the extra credit assignments when the regular homework problems are still poorly executed? Those are the questions that stare me in the face when I look at Figure 10-6. I want to know the underlying framework of the scientific field in which I am engaged; I’m sure all of us feel the same way to some extent. But at what cost? At what cost comes spending hours and hours and hours working on a single line of R-code to rise above the 80-20 margin, while leaving laser ablation, dissertation proposals and half-finished manuscripts on the backburner.

Which brings me back to the final enigmatic metaphor that I used way back in the first paragraph of this personal exploration. Quantum theory dictates that a single particle occupies all of its possible quantum states simultaneously. Said in another way, it literally exists everywhere in the universe, all at the exact same time. Crazy. But even crazier is the notion of wave function collapse, where a superposition collapses to a single state (location) of existence after interaction with an observer. As soon as the particle is observed, if no longer exists everywhere, it only exists in one location in space, as is observed in everyday life. Is that our choice, that we can try to understand everything all at once, to invest a tiny tiny bit in everything all across the spectrum of the universe, and not focusing on one specific thing entirely? And that attempt, like superposition, can only exist in the absence of interaction, of being observed? As we so often try to delve into the deepest, most complex theories and problems while in our labs at 2 AM when everyone else has gone home, while we get lost in the vast plateau of excruciatingly incremental achievement above the 80-20 line? And maybe interaction with collegues, like an observed quantum particle, forces us to singularity, where we begin focusing again on more fruitful efforts, below the 80-20 line where incremental increases in effort lead to large leaps in achievement. Maybe I just need to stop working on fifty lines of R code past midnight on a Friday alone in my office.

What I really need, maybe what we all need, is to learn to successfully navigate the 80-20 conundrum of our lives. How should we divide our time? Should we try to do more, with less, or to invest in less, with more effort? Can we balance, like a sub-atomic particle, popping instantaneously between superposition and singularity? Can we dream of becoming leading researchers in our fields while also going home early to cook our own meals in our own homes and spending quality time with the people who mean the most to us in our lives? Maybe it’s not worth trying to jumping above the waterline like a breaching humpback whale when treading water at the 80-20 line allows us to do many diverse things, just ok. And maybe winning the poker hand (grant?) depends on a steady pool of “fish” (failing undergrads, uncreative researchers) as Figure 10-8A suggests. Or maybe it’s not a zero-sum game. Or maybe the cost of living above the 80-20 line (becoming the best in the world at what you do) has it’s own hidden implied costs:

“It’s better to burn out Than to fade away My, my, hey, hey.”

– Neil Young, Kurt Cobain

Thetis: If you stay in Larissa, you will find peace. You will find a wonderful woman, and you will have sons and daughters, who will have children. And they’ll all love you and remember your name. But when your children are dead, and their children after them, your name will be forgotten… If you go to Troy, glory will be yours. They will write stories about your victories in thousands of years! And the world will remember your name. But if you go to Troy, you will never come back… for your glory walks hand-in-hand with your doom. And I shall never see you again.

I know, I know. Now I’m just getting ridiculous. But food for thought. And best wishes navigating your own 80-20 curves, and all of the many sacrifices, and gifts, life has to offer. Good luck!

Posted in Uncategorized | Leave a comment

R Immersion

Posted on December 1, 2014 by stuartnelson001

As the end of the semester comes near I have found myself reflecting on how far I’ve come since the first class. This was the first graduate class I have taken and after a 2 year hiatus from any school at all I had to make quite the adjustment as I work a full time research job in tandem with class. I had no experience doing any coding at all and I thought I knew a sufficient amount of statistics from my data preparation and embarrassing p-value conclusions in excel. When the R coding conversations kicked off and I still was having trouble knitting a pdf output I became very nervous.

I realized that coding in R was like learning a new language and it reminded me of stories my father told me about moving to Italy in the 1970s without knowing a word of Italian when English was not very common there. He took a language immersion class to learn Italian as quickly as possible in which they spoke only Italian. He became a fluent Italian speaker in only a matter of months because you essentially force yourself to learn it when it’s the only option. Anyways, I tried to treat our class and my time working on homework in this style of thinking in code and ultimately putting the code and new statistics knowledge together piece by piece.

Most of my research experience is in the biotech realm of protein and molecular biology. However, under the same premise as learning code I have actually learned a lot about ecology and environmental biology just from the related readings and surrounding ecologists in class. I have noticed that ecology studies can have a lot more variables at play and this often requires the researcher to take a step back and assess the best way to analyze the data. In my research, I have found myself often pushed in the direction of regimented data analysis to determine a “significant” or “insignificant” difference based on a .05 p-value regardless of the situation. Although data generated in an in vitro laboratory study is usually more controlled and intentionally contains fewer variables than most field studies it is still valuable to think about the relationships and different models.

This default data analysis style I think is dangerous and incorrect because just like science, statistics is an evolving and improving field. This was said perfectly in the introduction of the Ecology Special Section on P Values Forum, “We also need to remember that ‘‘statistics’’ is an active research discipline, not a static tool-box to be opened once and used repeatedly… Continual new developments in statistics allow not only for reexamination of existing data sets and conclusions drawn from their analysis, but also for inclusion of new data in drawing more informative scientific inferences.”

Prior to this class I admittedly thought of statistics as always possessing a right way to do things like many people in immunology and molecular biology, including well-published scientists. Thankfully, being surrounded by ecologists and thinking about complicated data sets has allowed me break free of this thought process and the robot analysis style.

Posted in Uncategorized | 3 Comments

Pseudo-UH-OH

Posted on November 25, 2014 by abmckim

In a convenient follow up to Chris’ post “replication schmeplication,” and in preparation for our Hurlbert conversation next week, I thought I’d discuss a paper from 2012 that really got me thinking about climate change manipulation experiments. I have always been interested in ocean acidification (OA), which (in case you are unfamiliar) is when excess CO2 from the atmosphere is absorbed into the ocean, leading to changes in carbonate chemistry and acidic conditions. Scott Doney from WHOI calls this “The Other CO2 Problem” – and it has recently been wreaking havoc in Puget Sound, where particularly acidic water is being upwelled into shellfish hatcheries.

An informative video can be found here:http://www.pmel.noaa.gov/co2/story/Acidifying+Water+Takes+Toll+On+Northwest+Shellfish

Despite the harmful effects of OA, the topic made it to the top of the list of important stories largely ignored by the media. But scientists haven’t ignored it. From 2000 to 2009, acidification experiments represented >60% of the marine climate change experiments, likely due to its novel and alarming implications (even though geologists have been aware of this phenomenon for decades – sorry, that’s the geochemist in me speaking).

In a similar tone to Hurlbert’s 1984 comment on pseudoreplication, Wernberg et al. (2012) wrote “A decade of climate change experiments on marine organisms: procedures, patterns, and problems” to address the challenges scientists face in marine climate change experiments (henceforth referred to as MCCEs). They compiled papers that made an explicit reference to climate change from 2000 to 2009 (total = 110), and commented on five major issues that need to be improved in order to further understand the potential effects of climate change.

One of the biggest problems was experimental design – 49% of the studies were diagnosed as having “issues with their experimental procedures”, and of those studies, 91% portrayed some form of pseudoreplication. This is a problem, in short, because it limits the inference space and the ability of researchers to extrapolate results.

So back to the OA problem: Most studies test, at the tank level, the effects of OA on say, physiology, calcification, behavior, etc. Depending on funding and resources (which doesn’t look good considering the new role of our favorite senator), it is difficult to ensure appropriate replication. It’s not easy nor cheap to manipulate the carbonate chemistry of multiple tanks (especially when costs are a factor), and so lots of studies will instead place multiple tanks into one large vat of acidified/treated water – hence, pseudoreplicate. Though a colleague of mine, who did his dissertation on OA effects on clownfish, was able to jazz up a unique system that specifically addresses this issue at a fairly low cost: http://mobile.tube.aslo.net/lomethods/free/2013/0485.pdf

I don’t want to jump the gun too much on the potential upcoming discussion in regard to Hulbert’s piece, but I wanted to gauge the class’ experience with pseudoreplication…

(And to briefly allude to factorial design, single factor experiments (e.g. testing the effect of warmth, or OA, or Sea Level Rise) accounted for 65% of all MCCEs. But as we know, there are likely concurrent, multiple stressors that are non-additive, and cannot be well-understood in isolation from one another). Interaction effects!

Reference:

Wernberg, Thomas, Dan a. Smale, and Mads S. Thomsen. “A Decade of Climate Change Experiments on Marine Organisms: Procedures, Patterns and Problems.” Global Change Biology 18, no. 5 (May 02, 2012): 1491–98. doi:10.1111/j.1365-2486.2012.02656.x.

Posted in Uncategorized | Leave a comment

Replication Schmeplication?

Posted on November 24, 2014 by calynum

In 2010, James Prosser dropped a bomb on the microbial ecology community. He searched through hundreds of articles in the leading microbiological journals, such as Environmental Microbiology and FEMS Microbiology Ecology, on the lookout for papers where scientists were examining microbial diversity using molecular techniques. What he found was an embarrassment to the scientific process and statistical standards.

He found that only 29% of the papers characterizing diversity used true replication.

As pyrosequencing/high-throughput sequencing pave the way for the future of molecular studies, many researchers apparently think that using the shear numbers of sequences produced by these techniques is a proxy for replication. When, in reality, they are analyzing one sample.

This is like polling one person from Wisconsin on their knowledge of cheeses and doing the same for one person in Massachusetts and coming to the conclusion that Wisconsinites are more or less an expert on this dairy product compared to Massachusettsans (although we all know that Wisconsinites would win). You just can’t do this study and expect to be treated like a scientist. However, if you were to take one cheese sample from Wisconsin and one from Massachusetts and perform high-throughput sequencing, which generate a massive list of sequences, and determined relative abundance and diversity measures you may be able to publish something in Microbial Ecology about your findings on the differences between cheeses in these two states.

There seems to be a belief among some scientists that conducting these studies makes them exempt from these standards set in place everywhere else in science.

However, Jay Lennon (2011) put forth the idea that this lack of replication can be made up for by doing proper statistics. You just have to ask the right questions.

Bringing it back to the cheese example, you could do some statistical analyses that examine the sequences derived from your non-replicated samples. These analyses could be something like randomization procedures that test whether the sequences were obtained from the same statistical population.

Although, doing this type of analyses you are not allowed to look beyond those two cheese samples. You can only say the two samples are statistically distinct (and infer that Wisconsin is better).

Yet both Lennon and Prosser believe this lack of replication in microbiology is alarming and it can all be remedied by proper training in the scientific method as well as in teaching biostatistics.

References:

Prosser, J. I. (2010). Replicate or lie. Environmental Microbiology, 12(7), 1806–10.

Lennon, J. T. (2011). Replication, lies and lesser-known truths regarding experimental design in environmental microbiology. Environmental Microbiology, 13(6), 1383–6.

Posted in Uncategorized | 2 Comments

Bringing it from the real world

Posted on November 19, 2014 by gedavis27

The notion of using prior models, or data, with Bayesian statistics, or looking at the best fit model for your data with AIC, seems so necessary for scientific relevance that it is a wonder that not everyone is using it. First, let me back up and bring us to the perspective of ocean data. Working with any marine mammal provides special challenges on multiple levels, especially when trying to look at a species as a whole. For one thing, if we are trying to look at a specific functionality, for example, the hearing range of a sperm whale, how do we even start to collect that data? We cannot bring these animals into a lab to do any sort of testing like we can for, say, dogs. Even if we were to miraculously be able to do so, such as with dolphins and killer whales in captivity, we are then only testing the few individuals that have been in captivity, who are in a completely different environment than wild animals in their natural habitat, both of which now have a variety of different parameters they are exposed to.

OK, so then let’s say we somehow invent a great way to test the hearing of an individual whale in the wild- in the great wide ocean. Now we have the challenge of finding them. We’re talking about trying to first find, and then test, animals that spend the majority of their time underwater. Not to mention most marine mammals migrate and move great distances, with their entire habitat coverage spanning past continent ranges. While this is a bit unrealistic in the idea of testing, the same questions still apply to realistic applications such as stock assessments. When we talk about modeling stock assessments to estimate how many individuals there are of a species, or population, it seems necessary to incorporate the newer methods of statistics to be even remotely close to real numbers. We cannot (and do not) go out and model population numbers based on one survey and one survey only, and so must use past surveys and past models to help arrive to truthful estimates.

Even with incorporating prior knowledge, data, and models, I still wonder how we can ever know for sure if what we model is close to the truth? It’s often overwhelming to try to incorporate all the factors that address all the questions appropriately. In my marine mammal acoustic research, the biological and logical factors to consider seem endless: first, are the animals even present and passing through the area where we are collecting data? Are they vocalizing? Are they vocalizing loud enough to be picked up by the recorders? Is the individual or group I am hearing typical of the population?

As we evolve in science and technology, the ease and ability of collecting data improves. Yet, to arrive close to the truth in our modelings, our need to incorporate and consider the past and present cannot be ignored. In many cases, like our ocean, I am not sure we’ll ever be able to get the absolute truth, but we can certainly build upon past models and continue to move forward to get better at our predictions.

Posted in Uncategorized | Leave a comment

Statistical Musings

Posted on November 19, 2014 by coastalsci

With kudos to Tim and Lynn for tackling AIC, I am deferring Jarrett’s challenge to us to consider the philosophical aspects of AIC and p-values for now, but will share some recent musings that fall somewhere under the “philosophy of statistics” heading.

The only area of BIOL 607 in which I have found myself consistently ahead of the assignments has been the Nate Silver readings and, in truth, it is not I but the Boston-area traffic that can claim credit for this state of affairs. To my delight The Signal and The Noise was available from Audible – meaning that I could actually do homework *and* keep my sanity through an hour and a half or more commute each way to and from UMass Boston. Listening to audiobooks directs my brain’s main attention to the substance of the text while leaving the autopilot part of my brain to deal with the “do not run into the car in front of you and watch out for the bozo cutting in from the right” kind of work. It has been a most successful partnership between the higher level thinking processes and the gut survival instincts resulting in long rides that do not raise my blood pressure nor leave me wanting to abandon my car in the middle of the road due to an attack of traffic-induced claustrophobia. So I finished The Signal and the Noise some weeks back, along with Greenberg’s Four Fish, Corson’s The Secret Life of Lobsters, and Safina’s Song for the Blue Ocean, all of which I strongly recommend.

Back to statistics – I found myself thinking that Silver takes a more benign view of both the Wall Street financial players and the climate change skeptics than do I. In his first chapter on the financial meltdown, Silver’s analysis at times borders on rather a whitewash given his focus on the hidden risks, poor assessment of probabilities and statistical errors in the models used by financial firms and ratings agencies in the lead up to the 2008 crash. So here is where the philosophy part comes in – to what degree does one attribute the Lehman meltdown, the AIG meltdown etc. as caused primarily by ignorance – in the form of poor modeling and poor risk assessment etc. – and to what degree does one think that the majority of the very smart people involved knew quite well the inherent risks and knew that their models were bogus but figured they could get out or get away with it or at least not be left without a chair when the music inevitably stopped playing? I feel Silver at times is so focused on the probabilities, statistical models and process that he downplays the influence of deliberate, calculated human choices in determining outcomes. Not all bad choices derive from ignorance or poor models – many times less flattering aspects of human behavior are at work and subsequent evidence has uncovered just how aware of what they were doing most of the traders and firms involved really were.

Jumping ahead in The Signal and the Noise (spoiler alert!) to Chapter 12 on climate change modeling, I feel this same focus on statistical models and the intellectual challenges of statistics blinds Silver a bit in his treatment of climate change skeptics, most notably Scott Armstrong. Silver considers one of Armstrong’s books to be a seminal work within the statistics field and that may be why he gives Armstrong’s arguments more respect and legitimacy than I feel is due. As Silver acknowledges, Armstrong, like many of the prominent climate skeptics, hails from outside any scientific field directly related to understanding the earth’s climate system and the likely impacts of changes within that system (Armstrong is an economist and proclaims his ignorance of climate science almost proudly when before a Senate hearing).

Apart from any role that personal/professional ambition or ego might play, Armstrong has a particular economic, social and political view of the world that strongly influences his views on climate change and more importantly, his views on those in society advocating laws, regulatory changes, and government action to combat climate change. But Armstrong’s jabs against anthropogenic climate change are not the product of scientific knowledge and one has to do only a little digging to realize Armstrong is strongly aligned with the hardcore climate change deniers. He is one of the signatories of the now infamous Cato Institute (founded and funded by the Koch Brothers) full-page ad that appeared in prominent newspapers around the country in 2009 [read the ad at http://www.cato.org/special/climatechange/cato_climate.pdf ]. Silver’s serious treatment of Armstrong and other skeptics gives them an unearned legitimacy. It makes me think of the segment on Last Week Tonight with John Oliver, when, with the help of Bill Nye, Oliver makes the point that the news media tries so hard to be “neutral” and to provide both sides airtime and to treat both sides of the matter equally that, ironically, the news media ends up presenting a distorted view to the public that makes it seem like the reality of climate change is still being debated seriously.

Ultimately, truth wins out. But in the short-term, the “truthiness” of statistics and statistical modeling lie very much in the choices made by the human practitioners, and sometimes truth and the advancement of human knowledge takes a back seat to stronger drivers.

Posted in Uncategorized | 2 Comments

AIC Insecurity (Plus a Black Hole Simulation)

Posted on November 18, 2014 by tbelford24

Following the AIC readings, I can’t but help feel the same level of insecurity regarding just how much our models can tell us about the world around us. Even though AIC allows to choose the best model from a set of candidates, this measure is still limited by our a priori understanding, and the old adage “garbage in, garbage out” applies once more. AIC can only tell us the best relative model, but if we fail to include any good models, it will only be able to tell which of our bad models comes closest to being analyze the data effectively. This is sci-existentially terrifying. Incredibly, even with this fancy new tool, we can only be so certain that the particular model and/or method that we use to analyze our given data is correct. I’m sure that my thoughts on the subject will change over the course of next week, although I suppose my insecurity over choosing the right models to use with AIC will remain.

On a completely unrelated note, I came across an interesting example of the usefulness of simulations for revealing new insights regarding scientific phenomena. I’ll leave a link to the article below, but in short a group of astrophysicists and special effects artists were able to generate a physically/mathematically correct simulated black hole that turned out to be the most accurate visualization of a black hole to date. In fact, some aspects of the visualization that were initially thought of as bugs in the program turned out to make sense in a physics context, and led to new insights into the appearance of black holes. Overall, it’s definitely an interesting article that I would recommend checking out.

http://www.wired.com/2014/10/astrophysics-interstellar-black-hole/

Posted in Uncategorized | 1 Comment