As I clicked around the web, looking for data and predictions surrounding the world series currently underway, I found some interesting angles about analyzing data that I had not previously thought about.
An article on fivethirtyeight.com (link below) discusses how predicting baseball seasons is an “imperfect pursuit”. This was an interesting concept to me, because the whole reason new predictive models are rolled out is to come up with data that is closer to what the truth will eventually be, the point that the article makes is that there is a statistical limit to how good a model can be. At some point, at least in baseball, there is a certain amount of randomness or luck involved, that simply can not be predicted. A sabermetrician Tom Tango determined that in baseball, one third of the difference between two teams records is the result of this random chance. Another way of looking at this is to say that the smallest possible root-mean-squared error, a way of testing the accuracy of a prediction, is 6.4 wins. This stat means that no matter how good, how perfect, a model can be for prediction, it will always have a built in level of error or 6.4 wins. Now modern models can get within this range on predicting a teams record, but this is random chance, and not a result of a model that has beaten the system.
I find this concept interesting since this is something that is virtually never discussed. At the beginning of a new season, fans and sports betters search for predictions about what the end result will be. The people trying to profit from this phenomena, and the people who are trying to beat the current systems by making new ones of their own. It might sound obvious given the amount of variability, but the models can never, and will never be completely perfect. This idea can be extrapolated to anything where there are variables that we do not completely understand. From political elections, to sporting events, to earthquakes. While we continue to strive to predict the future it will never happen while there are factors that are not understood