Bringing it from the real world

The notion of using prior models, or data, with Bayesian statistics, or looking at the best fit model for your data with AIC, seems so necessary for scientific relevance that it is a wonder that not everyone is using it.  First, let me back up and bring us to the perspective of ocean data.  Working with any marine mammal provides special challenges on multiple levels, especially when trying to look at a species as a whole.  For one thing, if we are trying to look at a specific functionality, for example, the hearing range of a sperm whale, how do we even start to collect that data? We cannot bring these animals into a lab to do any sort of testing like we can for, say, dogs.  Even if we were to miraculously be able to do so, such as with dolphins and killer whales in captivity, we are then only testing the few individuals that have been in captivity, who are in a completely different environment than wild animals in their natural habitat, both of which now have a variety of different parameters they are exposed to.

OK, so then let’s say we somehow invent a great way to test the hearing of an individual whale in the wild- in the great wide ocean.  Now we have the challenge of finding them.  We’re talking about trying to first find, and then test, animals that spend the majority of their time underwater.  Not to mention most marine mammals migrate and move great distances, with their entire habitat coverage spanning past continent ranges.  While this is a bit unrealistic in the idea of testing, the same questions still apply to realistic applications such as stock assessments. When we talk about modeling stock assessments to estimate how many individuals there are of a species, or population, it seems necessary to incorporate the newer methods of statistics to be even remotely close to real numbers.   We cannot (and do not) go out and model population numbers based on one survey and one survey only, and so must use past surveys and past models to help arrive to truthful estimates.

Even with incorporating prior knowledge, data, and models, I still wonder how we can ever know for sure if what we model is close to the truth?  It’s often overwhelming to try to incorporate all the factors that address all the questions appropriately.  In my marine mammal acoustic research, the biological and logical factors to consider seem endless:  first, are the animals even present and passing through the area where we are collecting data?  Are they vocalizing?  Are they vocalizing loud enough to be picked up by the recorders? Is the individual or group I am hearing typical of the population?

As we evolve in science and technology, the ease and ability of collecting data improves.  Yet, to arrive close to the truth in our modelings, our need to incorporate and consider the past and present cannot be ignored.  In many cases, like our ocean, I am not sure we’ll ever be able to get the absolute truth, but we can certainly build upon past models and continue to move forward to get better at our predictions.

Posted in Uncategorized | Leave a comment

Statistical Musings

With kudos to Tim and Lynn for tackling AIC, I am deferring Jarrett’s challenge to us to consider the philosophical aspects of AIC and p-values for now, but will share some recent musings that fall somewhere under the “philosophy of statistics” heading.

The only area of BIOL 607 in which I have found myself consistently ahead of the assignments has been the Nate Silver readings and, in truth, it is not I but the Boston-area traffic that can claim credit for this state of affairs. To my delight The Signal and The Noise was available from Audible – meaning that I could actually do homework *and* keep my sanity through an hour and a half or more commute each way to and from UMass Boston. Listening to audiobooks directs my brain’s main attention to the substance of the text while leaving the autopilot part of my brain to deal with the “do not run into the car in front of you and watch out for the bozo cutting in from the right” kind of work. It has been a most successful partnership between the higher level thinking processes and the gut survival instincts resulting in long rides that do not raise my blood pressure nor leave me wanting to abandon my car in the middle of the road due to an attack of traffic-induced claustrophobia. So I finished The Signal and the Noise some weeks back, along with Greenberg’s Four Fish, Corson’s The Secret Life of Lobsters, and Safina’s Song for the Blue Ocean, all of which I strongly recommend.

Back to statistics – I found myself thinking that Silver takes a more benign view of both the Wall Street financial players and the climate change skeptics than do I. In his first chapter on the financial meltdown, Silver’s analysis at times borders on rather a whitewash given his focus on the hidden risks, poor assessment of probabilities and statistical errors in the models used by financial firms and ratings agencies in the lead up to the 2008 crash. So here is where the philosophy part comes in – to what degree does one attribute the Lehman meltdown, the AIG meltdown etc. as caused primarily by ignorance – in the form of poor modeling and poor risk assessment etc. – and to what degree does one think that the majority of the very smart people involved knew quite well the inherent risks and knew that their models were bogus but figured they could get out or get away with it or at least not be left without a chair when the music inevitably stopped playing? I feel Silver at times is so focused on the probabilities, statistical models and process that he downplays the influence of deliberate, calculated human choices in determining outcomes. Not all bad choices derive from ignorance or poor models – many times less flattering aspects of human behavior are at work and subsequent evidence has uncovered just how aware of what they were doing most of the traders and firms involved really were.

Jumping ahead in The Signal and the Noise (spoiler alert!) to Chapter 12 on climate change modeling, I feel this same focus on statistical models and the intellectual challenges of statistics blinds Silver a bit in his treatment of climate change skeptics, most notably Scott Armstrong. Silver considers one of Armstrong’s books to be a seminal work within the statistics field and that may be why he gives Armstrong’s arguments more respect and legitimacy than I feel is due. As Silver acknowledges, Armstrong, like many of the prominent climate skeptics, hails from outside any scientific field directly related to understanding the earth’s climate system and the likely impacts of changes within that system (Armstrong is an economist and proclaims his ignorance of climate science almost proudly when before a Senate hearing).

Apart from any role that personal/professional ambition or ego might play, Armstrong has a particular economic, social and political view of the world that strongly influences his views on climate change and more importantly, his views on those in society advocating laws, regulatory changes, and government action to combat climate change. But Armstrong’s jabs against anthropogenic climate change are not the product of scientific knowledge and one has to do only a little digging to realize Armstrong is strongly aligned with the hardcore climate change deniers. He is one of the signatories of the now infamous Cato Institute (founded and funded by the Koch Brothers) full-page ad that appeared in prominent newspapers around the country in 2009 [read the ad at ]. Silver’s serious treatment of Armstrong and other skeptics gives them an unearned legitimacy. It makes me think of the segment on Last Week Tonight with John Oliver, when, with the help of Bill Nye, Oliver makes the point that the news media tries so hard to be “neutral” and to provide both sides airtime and to treat both sides of the matter equally that, ironically, the news media ends up presenting a distorted view to the public that makes it seem like the reality of climate change is still being debated seriously.

Ultimately, truth wins out. But in the short-term, the “truthiness” of statistics and statistical modeling lie very much in the choices made by the human practitioners, and sometimes truth and the advancement of human knowledge takes a back seat to stronger drivers.

Posted in Uncategorized | 2 Comments

AIC Insecurity (Plus a Black Hole Simulation)

Following the AIC readings, I can’t but help feel the same level of insecurity regarding just how much our models can tell us about the world around us. Even though AIC allows to choose the best model from a set of candidates, this measure is still limited by our a priori understanding, and the old adage “garbage in, garbage out” applies once more. AIC can only tell us the best relative model, but if we fail to include any good models, it will only be able to tell which of our bad models comes closest to being analyze the data effectively. This is sci-existentially terrifying. Incredibly, even with this fancy new tool, we can only be so certain that the particular model and/or method that we use to analyze our given data is correct. I’m sure that my thoughts on the subject will change over the course of next week, although I suppose my insecurity over choosing the right models to use with AIC will remain.

On a  completely unrelated note, I came across an interesting example of the usefulness of simulations for revealing new insights regarding scientific phenomena. I’ll leave a link to the article below, but in short a group of astrophysicists and special effects artists were able to generate a physically/mathematically correct simulated black hole that turned out to be the most accurate visualization of a black hole to date. In fact, some aspects of the visualization that were initially thought of as bugs in the program turned out to make sense in a physics context, and led to new insights into the appearance of black holes. Overall, it’s definitely an interesting article that I would recommend checking out.

Posted in Uncategorized | 1 Comment

To the Tracks, Anyone?

Horse racing has been around for a long time. Archeological records have dated equestrian sports back to Ancient Greece, Babylon, Syria, and Egypt, with chariot racing becoming a main event in the Greek Olympics by 648BC. Horse racing since became a culture in America, too. The Jockey Club was established in 1775, and to this day still regulates breeding and racing. It is clear that people have been betting on horse races for a long time.


Well, when I used to live in Plainville, MA, there was a horse-racing track literally down the road from my apartment complex. They would primarily televise races from around the country, but would also have the occasional live-harness race (when the horses pull a driver in a sulky, or a light-weight cart). My husband would always try and convince me to go, but I never showed interest (and his quest to this day has been unsuccessful). However, after reading “The Poker Bubble”, I might give it a try next spring. Not that I think I will win a ton of money (though being a graduate student, it couldn’t hurt – look at how many people quit school and/or their jobs because they’re SO good at gambling. Not that I want to quit or anything) – but more so, because I’d like to observe how other people make their betting decisions. To see who’s “lucky” and who’s “skilled” at forecasting the outcome of the race.


Choosing the prize horse isn’t an untapped topic whatsoever, either. Just a quick amazon search for “How to win at horse racing” yielded 66 books (It also comes up with a book on Anna Nicole Smith, weird):

Horse racing has long been used as an example of Bayes Theorem in action. It requires betters to explicitly state how likely they believe an event to occur based on prior beliefs. What kinds of factors can we consider to be prior beliefs in this case? Probably things like: which horse has won in the past and out of how many total races and for those wins, was it raining or not raining?

Most importantly, we can then revise our opinion and update our probabilistic assessment of each horse following the race in preparation for the next. Using available information within context to reduce our prediction bias, and then testing our predictions, is the best way to get better [at winning money]. Silver writes “The more willing you are to test your ideas, the sooner you can begin to avoid these problems and learn from your mistakes.”

Well then, he’s convinced me. Anyone up for some horse races in the spring?


Posted in Uncategorized | Leave a comment

“You’ll be lucky next time!” – Dad

I recently received a letter from my dad which included the 2 dollars Mega Millions ticket that he got here while he was visiting me, a dollar bill, and a note which told me to go cash the 1 dollar he’d won and use the two dollars to get a new ticket. “Good luck! :)” ended the note.  As I’m sure you are all wondering, no, I did not win the 20 million jackpot last Tuesday night with the new 2 dollars ticket.  So much for that new apartment I was dreaming about.

With all the stats that we’ve been introduced to, it’s always been about figuring out if there’s a pattern, a signal, in all the noise.  Even in poker or blackjack, you can count cards to have a better chance at winning.  But what about the lottery, which is supposedly truly randomized, or slot machines, which is arguable.  What is the draw of these games when the odds are so against the player?  According to Mega Millions website, the chance of winning the jackpot is just 1 in 260 million, so roughly 1.2 person would win if everyone (including the underage) in the US would buy a ticket.  Put in that perspective, it’s a ridiculously small chance.  However, maybe this is where they reel in players: the overall chance of winning any prize is only 1 in 15.  That’s sounds actually decent, until you realize that’s only 6.67 %, and most of the money prizes is quite disproportional to the chance of winning it.  I’m truly fascinated with how people are willing to cast their money for chance and their belief in “maybe this time, I’ll be lucky”.   Even some of the most dedicated players (my great-grandmother who sits out on the porch and buys a ticket every morning) test their luck day after day to no prevail.  But I supposed the idea of being that one lucky person is tempting enough.  For me, the thrill of my first lottery ticket at 18 and Tuesday’s night waiting for the winning numbers to come out was enough, guess it’s taking a hard road to millionaire-status for me (sorry dad!).

But really, I wasn’t lucky enough to even be that 1 in 15?!

Mega Millions

Mega Millions


Posted in Uncategorized | 1 Comment

Is this you?

From the ever popular Research Whalberg – now that y’all are GLM-ers, this isn’t you, right?


Posted in Silly | Leave a comment

Influence of priors

As an explanation for the back tracking – I  caught a cold just as we moved into generalized linear models and as I read the notes and try to understand, it’s all a bit Through the Looking-Glass. So, while I play catch up on those notes and the reading for this week, I’d like to take a look back at using strong priors in the Markov Chain Monte Carlo modeling for linear models.

One of the things that struck me in class is that if you use a strong prior, your credible interval shrinks on the slope of the output. However, this shrinking occurs whether the strong prior is a good fit or a bad fit, with similar credible interval ranges even when the slope is off by a factor of 10. If you have the wolf data to play around with:

wolves <-read.csv(“./Class notes/wolves.csv”)
stronggoodprior <-list(B=list(mu=c(0,-11), V=diag(c(1e10, 1))))
strongbadprior <-list(B=list(mu=c(0,350), V=diag(c(1e10, 1))))

wolf_mcmc_goodprior <- MCMCglmm(pups ~ inbreeding.coefficient, data=wolves, verbose=FALSE, prior=stronggoodprior)
wolf_mcmc_badprior <- MCMCglmm(pups ~ inbreeding.coefficient, data=wolves, verbose=FALSE, prior=strongbadprior)
wolf_mcmc<- MCMCglmm(pups ~ inbreeding.coefficient, data=wolves, verbose=FALSE)


#No Prior
#Good Prior
#Bad Prior
summary(wolf_mcmc_badprior)$solutions[2,3] – summary(wolf_mcmc_badprior)$solutions[2,2]

Now I’m sure that with the bad prior, you could easily see that the data doesn’t at all fit the line, but it seems at least a little concerning that the bad prior should shrink the credible interval so much.

The big question in terms of priors that I have (and light internet digging has proven unhelpful) is how you set your standard deviation for something like a slope parameter. Lets say I want to predict the number of mangoes I will harvest, given the number of mango trees I have. The idea that I think the slope is 5 mangoes/tree is clear enough, and the standard deviation can be determined from the variability of mangoes/tree.

But what about when the parameter isn’t such an easily measured real world example? A good ecological example is metabolic theory: Rate = B*Mass^(Scaling factor). Many say that the scaling factor is 3/4, from the idea of how energy dissipates in a branched network. Others say the factor is 2/3, based on surface area/volume differences in energy dissipation (real arguments are more fleshed out, but let’s go with that for the moment). If I attempted to use a prior in examining metabolic rate data, how do we set the standard deviation of this parameter? Also, given that 2/3 and 3/4 are not radically different from one another, isn’t using a prior basically playing the game with loaded dice?

Posted in Uncategorized | 1 Comment