Making Noise (or hopefully trying not to!)

As a first year graduate student, I am being asked to think about data in all sorts of new ways.  As I am sure some of my classmates can relate to, and the rest may remember the feeling, it seems like the jump from undergrad to grad student a really about making a mental switch from consuming data to creating and analyzing it.  I’d like to go back in Nate Silver’s book a bit, but there was one statistic he quoted in the beginning of The Signal and the Noise when he mentioned the sheer volume of new data being created every single day:  2.5 quintillion bytes.  We, as a society, are creating far more data than we can possibly analyze, which to be honest is a bit intimidating, especially since I am now being asked to create more data.  I do field work, design experiments, measure stuff, all the things that define my day, but really what I am doing is creating more data, and trying to analyze it in some meaningful way to describe some natural phenomenon.  As I am thinking about all this, one thing that I am quickly learning is that I don’t want to create bad data, as that really is just contributing to the “noise”.  Back to chapter 5, Silver is talking about overfitting.  This is something that I feel would be incredibly tempting to do, especially after spending years collecting data.  This seems to me like something that can come out of producing data that is either not useful or poorly collected.  Trying to find meaning in something that isn’t necessarily there, when the search for meaning becomes more important than the search for truth.  I am getting a little philosophical here, but in practice this is a lesson in making sure that I really think about design and data collection before creating lots of new data.

As I move forward in my work, school, career, whatever, I have chosen a path that will always require data analysis (someday it may even involve R if I ever can really figure it out, but I digress), I need to make sure that I think about what kind of data I am creating.  Maybe the data I make isn’t going to be analyzed by me, or maybe someone else looks at my data and can do something better with it.  Either way, the real tragedy would be if my created data isn’t useful because I failed to produce good data.  In this way, I hope that maybe a few signals can come out of what I do, in stark contrast to all the noise.

This entry was posted in Uncategorized. Bookmark the permalink.

One Response to Making Noise (or hopefully trying not to!)

  1. seanmccanty says:

    I’m glad you mentioned focusing on generating real data rather than contributing to the noise. In my senior year, I took a capstone class for my major which involved analyzing 26 years of migratory bird banding data. It was more like a master class in how not to maintain a data set – the file was hideous, with dates stored as text in every conceivable format (November 8, Nov. 8, 11/8, 11-8,11/08, etc etc). It was a USGS project with only 3 real staff and a group of volunteers who had nothing better to do than to count, weigh, and tag birds at 5:30 in the morning 3 days a week. One of the volunteers had attempted to make sense of the data, but with no real stat background and only slight familiarity with excel, he had generated around 100 sheets of data from pivot tables of some measure of when birds were migrating, but without knowing the best biological measure, he just ended up making an already exasperating data set worse.

    I think knowing what data you want, what questions you want to answer, and how you answer that question is crucial in avoiding mistakes like that – the data set was so convoluted it took almost an entire semester to clean it up enough to pull out one variable of interest.

    On the flip side, I did learn that USGS apparently paid people to collect data over 26 years with no apparent desire to see any form of analysis. So there’s always that as a fall back career….

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s