Baseball and Statistics

After reading chapter three of The Signal and the Noise and without much mention of it on here yet I felt like this was a good opportunity to dive in.  The relationship between statistics and sports let alone statistics and baseball is by no means nothing new. Statistics have been meticulously recorded in baseball for well over 100 years. As Silver mentions in chapter 3, the highly ordered format of mostly independent events has made baseball a haven for large data sets. However, the immense amount of unforeseen variables whether it be day to day events like a player nursing a hangover or long term injuries, a lot of challenges present themselves in predicting both future player and team success. I find it interesting how much baseball and biology can have in common when it comes to data.

In both biology and sports the way we analyze data can alter the conclusions we make. The interesting dynamic with statistics in baseball is the full circle effect that has become more prevalent over recent years. With increased popularity in fantasy sports, interest in baseball statistics has only grown and changed the way people watch baseball. As a billion dollar industry and with pressure to win, many organizations have turned to the statistics and advanced “sabermetrics” as not only a predictor of future prospect success but as a key role in game strategy.

A huge change in defensive strategy has been adopted by most of the MLB as recently as the last couple of years due to a suggestion from a sabermetrics study.  Almost all teams have implemented the defensive shift as a regular approach, a strategy that was historically only used on rare occasions. There were 2,357 total shifts in 2012 and 8,134 in 2013 according to the MLB. I am interested to see further data on what effect the strategy has had on hitting.

As foolish as it sounds I think there are striking similarities to the cycle of events of some biological processes and baseball when it comes to stats. The statistics of baseball, especially when we talk about the defensive shift almost sounds like it can serve as a useful statistical model in science. For example in a biological setting like reversing climate change or preventing the detrimental effects of environmental contaminants on an amphibian river population.

Posted in Uncategorized | 1 Comment

Piracy in R

In honor of International Talk Like a Pirate day, and your foray into the world of R graphics this week, here’s a little bit of what you can do when you put your mind to it.


Yes, that’s an animated map of pirate attacks made in R. With ggplot2 to boot!

See also this post on further analyses of this data.

UPDATE: There is now an aRrr package courtesy of Noam Ross

Posted in Silly | Leave a comment

There’s No “R” in “Funny” But There’s Some “Funny” in “R”

There’s No “R” in “Funny” But There’s Some “Funny” in “R”

In the spirit of Friday I wanted to dedicate my first blog entry to a lighter side of R. In the middle of working on last week’s homework assignment, I decided I wanted to gain a new perspective on the human aspect of R, namely some assessment of the programmers that develop and use it. Naturally, rather than beginning with searches like “developers of R” or “famous users of R” or some other intellectually-minded search, I chose to pursue “funny R code.”

My search led me to a Stack Overflow questions page entitled “What is the best comment in source code you have ever encountered?” While the page had been “closed as not constructive” way back in 2011, the feedback to this question was nothing short of marvelous. The 518 answers stretched on for 18 pages, and while the quality and appropriateness of the comments varied somewhat, overall this forum was a goldmine of R-related comic relief. Many of the answers are quite relatable, even for someone as new to R as myself, including “I am not sure if we need this, but too scared to delete” and “I am not sure why this works, but it fixes the problem.” There were also a number of references to movies and TV shows as well as some truly incredible artwork, including pictures of pigs and dragons drawn with text/code. In fact, based on this particular sample, a relatively high number of programmers who work with R seem to be obsessed with dragons relative to the general population, although more analysis would be required to examine the validity of this possible trend. Several other types of comments included short essays, philosophical quotes, and poems that convey a multitude of emotions and somehow help R code seem more human.

If I were forced to pick a favorite comment, it would have to be the one written by a frustrated programmer who had been attempting to work with something called Adobe PSD format. While I have absolutely no idea what this format is, the author seems incredibly displeased with it, and channels his feelings into a highly amusing essay, which includes what is possibly the most imaginative simile I’ve ever heard, that he chose to save within his code and thereby share with the next programmer to examine it. It’s pretty long, so I won’t post it here, but it’s about three quarters of the way down the first page and is definitely worth a read (link below). All in all, if you’re looking for some quick stress relief and/or some R-related laughs, I would definitely recommend visiting the webpage listed below. (Since this page is essentially an Internet comments section, please be aware that some of the content is R-rated. Pun intended.) Finally, to send off the week, here’s an awful joke: What was the pirate’s favorite programming language? R.

Posted in Uncategorized | Leave a comment

R: Open Source Software at its finest.

When we were talking about how anyone can make packages yesterday in class it made me realize how user intrinsic R really is. It’s very rare that you get a software that is used by fortune 500 companies(companies like Pfizer and Bank of America use it) that is free for anyone to use and learn.  The good thing about R being an open source program is that anyone can modify to fit their specific needs.  If you need to modify it for a specific fantasy football league I’m sure there is way to do it . If you need it to analyze speech patterns, I’m sure a chunk of code can allow that to happen.
What is also great about R is using different packages to analyze data in a different way. There is a good chance that whatever you are analyzing or questioning, someone else already has for you. Because R is open source , you can use that persons work to help your own research. This is rare in research since a lot of methodology isn’t shared from peer to peer. Maybe it’s a generational shift . But it is a welcome change in the research community.
This was a really random post which I guess most of the post on this blog will be , I just really appreciate a user based program that can affect people from all backgrounds.  What do you guys think about it?  Maybe I’m the kid in the schoolyard who stands in the corner and just looks at a rock during recess but something like this seems to be a shift in the right direction for communal research.


Posted in Uncategorized | 1 Comment

R: Getting Past the Blinking Cursor

Hello everyone!

I want to start by talking about my struggles when beginning to work in R. My first exposure to R (and programming in general) was in a Phylogenetic Methods course. I should note that I consider myself barely computer literate. Never in a million years did I think that as a biologist, I would have to work through a command line/code/script/programing language/whatever the correct lingo is. When we first fired up R in class, I looked at that blank console with the blinking cursor and I thought “Now what??” There are an infinite number of things I could type into the blank space. Unfortunately, most of those entries will yield error messages. So how do I use R to produce something meaningful? I found that there were 2 key things that began to open the door for me. These may seem silly, but they are really not trivial.

  1. Know what a working directory is and how to use it.
    1. Believe it or not, I didn’t fully understand a working directory until about halfway through the semester. A working directory is essentially a file folder that you tell R you are going to work out of. When you try to load data, R will be looking for the file in your working directory. In addition, when you save code, plots, etc., they will be saved to your working directory. It can be helpful to create a file folder specifically for your work in R so that you do not constantly need to change your working directory or specify paths to different files.
    2. But how do you know what folder is your working directory? It’s simple, just use the function getwd() and R will return the path to your current working directory. This function is useful because if you don’t specify a working directory, R will set one for you as a default.
    3. How do you set your working directory? The function setwd() allows you to specify your working directory. In the parentheses, enter the path to your desired file folder.
  2. Learn how to read help files.
    1. This seems simple enough, but the help files also appear to be written in R. It takes practice, but once you start to become familiar with R syntax, the help files will begin to make more sense. It took me an entire semester of working in R before I could read and understand help files. Now, I use them all the time!

A few other useful R basics to learn:

  • How to install and use packages (yay, we went over it in class!)
  • GOOGLE: If you are trying to do something in R and the help files are not doing it for you, google! There are a ton of resources out there and lots of people trying to troubleshoot their code. Even copying and pasting an error message into google can be very helpful. Most likely, you are not the first person to get that error and you may find a page where the error message is explained.

Now I would like to open it up to discussion. What specific challenges have you faced in learning a programming language? Do you have any tricks that make it easier? What do you think are the most important things for a beginner in R to know?

Happy coding!

Posted in Uncategorized | 2 Comments

R Statistics at the MISPWOSO

Why I feel like an idiot after struggling with R and then realizing that it’s actually quite logically simple:

Except: The Hitchhiker’s Guide to the Galaxy by Douglas Adams (Chapter: Mostly Harmless)

Now logic is a wonderful thing but it has, as the processes of evolution discovered, certain drawbacks.

Anything that thinks logically can be fooled by something else that thinks at least as logically as it does.  The easiest way to fool a completely logical robot is to feed it the same stimulus sequence over and over again so it gets locked in a loop.  This was best demonstrated by the famous Herring Sandwich experiments conducted millennia ago at the MISPWOSO (the MaxiMegalon Institute of Slowly and Painfully Working Out the Surprisingly Obvious).

A robot was programmed to believe that it liked herring sandwiches.  This was actually the most difficult part of the whole experiment. Once the robot had been programmed to believe that it liked herring sandwiches, a herring sandwich was placed in front of it. Whereupon the robot thought to itself, Ah! A herring sandwich! I like herring sandwiches.

It would then bend over and scoop up the herring sandwich in its herring sandwich scoop, and then straighten up again. Unfortunately for the robot, it was fashioned in such a way that the action of straightening up caused the herring sandwich to slip straight back off its herring sandwich scoop and fall on to the floor in front of the robot.  Whereupon the robot thought to itself, Ah A herring sandwich…, etc, and repeated the same action over and over again.  The only thing that prevented the herring sandwich from getting bored with the whole damn business and crawling off in search of other ways of passing the time was that the herring sandwich, being just a bit of dead fish between a couple of slices of bread, was marginally less alert to what was going on than was the robot.

The scientists at the Institute thus discovered the driving force behind all change, development and innovation in life, which was this: herring sandwiches. They published a paper to this effect, which was widely criticized as being extremely stupid.  They checked their figures and realized that what they had actually discovered was “boredom” or rather the practical function of boredom.  In a fever of excitement, they then went on to discover other emotions like “irritability,” “depression,” “reluctance,” “ickyness” and so on.  The next big breakthrough came when they stopped using herring sandwiches, whereupon a whole welter of new emotions became suddenly available to them for study, such as “relief,” “joy,” “friskiness,” “appetite,” “satisfaction,” and most important of all, the desire for “happiness.”

This was the biggest breakthrough of all.

Posted in Uncategorized | Leave a comment

Why learning R code makes me feel like a megapode

Excerpt: Last Chance To See by Douglas Adams (Chapter 2: Here Be Chickens)

In the afternoon, accompanied by Kiri and a guard, we went off to explore. We found no dragons, but as we thrashed recklessly through the undergrowth, we encountered instead a bird, and it was one that I felt very much at home with.

I have a well-deserved reputation for being something of a gadget freak, and am rarely happier than when spending an entire day programming my computer to perform automatically a task that it would otherwise take me a good ten seconds to do by hand. Ten seconds, I tell myself, is ten seconds. Time is valuable and ten seconds’ worth of it is well worth the investment of a day’s happy activity working out a way to save it.

The bird we came across was called a megapode, and it has a very similar outlook on life.

It looks a little like a lean, sprightly chicken, though it has the advantage over chickens that it can fly, if a little heavily, and is therefore better able to escape from dragons, which can only fly in fairy tales, and in some of the nightmares with which I was plagued while trying to sleep on Komodo.

The important thing is that the megapode has worked out a wonderful labour-saving device for itself. The labour it wishes to save is the time-consuming activity of sitting on its nest all day incubating its eggs, when it would be out and about doing things.

I have to say at this point that we didn’t actually come across the bird itself, though we thought we glimpsed one scuttling through the undergrowth. We did, however, come across its labour-saving device, which is something that it’s hard to miss. It was a conical mound of thickly packed earth and rotting vegetation, about six feet high and six feet wide at its base. In fact, it was considerably higher than it appeared because the mound would have been built on a hollow in the ground, which itself would have been about three feet deep.

I’ve just spent a cheerful hour of my time writing a program on my computer that will tell me instantly what the volume of the mound was. It’s a very neat and sexy program with all sorts of pop-up menus and things, and the advantage of doing it the way I have is that on any future occasion on which I need to know the volume of a megapode nest, given its basic dimensions, my computer will give the answer in less than a second, which is a wonderful saving of time. The downside, I suppose, is that I cannot conceive of any future occasion that I am likely to need to know the volume of a megapode nest, but no matter: the volume of this mound is a little over nine cubic yards.

What the mound is is an automatic incubator. The heat generated by the chemical reactions of the rotting vegetation keeps the eggs that are buried deep inside it warm – and not merely warm. By judicious additions or subtractions of material from the mound, the megapode is able to keep it at the precise temperature that the eggs require in order to incubate it properly.

So all the megapode has to do to incubate its eggs is merely to dig three cubic yards of earth out of the ground, fill it with three cubic yards of rotting vegetation, collect a further six cubic yards of vegetation, build it into a mound, and then continually monitor the heat it is producing and run about adding bits or taking bits away.

And thus it saves itself all the bother of sitting on its eggs from time to time.

(And of calculating univariate stats with a piece of paper and a pencil!)

Posted in Reflections | Leave a comment