Wednesday, September 05, 2012

Lies, Damn Lies, and Orioles Statistics

I don't remember my father reciting too many proverbs when I was growing up, but I recall one of his favorites being the old saw that "There are lies, damn lies, and statistics." 

Statistics, in other words, are not lies, but they can be manipulated in order to create false impressions. Statistics come with the illusion of objectivity, but their meaning is subjective, because they most be placed in a context in order to be interpreted.

I bring this up because the Baltimore Orioles have stayed in the playoff hunt long enough to get my attention. Part of why I haven't paid much attention is that I have been reassured, seemingly every day since the beginning of the season, that their success to date is a statistical anomaly that can't possibly continue. 

One of my firm beliefs is that we tend to accept evidence as evidence more easily when it confirms what we already believe.  The Baltimore Orioles have not been in first place this late in the season since 1997. Most of my students were three or four years old at that time. I did not have a Ph.D. and was living in Illinois. I had never heard of Barack Obama or Osama Bin Laden. I did not own a cell phone. Almost all of my computer work was saved on 3 1/2 inch "discs." Brady Anderson was coming off a 50 home run season, and (yes, this was a long time ago...) nothing seemed particularly suspicious about that sudden increase of power. So, yes, of course, the Orioles would fade. They always do. (By "always" I mean, well in the last 15 years...)

Read enough baseball articles about the Orioles and you will be assaulted with two statistics, over and over. First, the Orioles have a negative run differential, meaning that over the course of the season, their opponents have scored more runs than they have. That usually means you lose more games than you win, unless, perchance, you win a LOT of close games and lose a lot of blowouts. Oh, surprise, statistic number 2..the Orioles have been very successful in close games, on pace to set a record for the best winning percentage in 1 run games. They also lost a couple of games early in the season by ten runs or more. 

Now the interesting thing to me about how these statistics are used is how they reinforce contrary assumptions. Those who focus on the run differential keep say, "The record moving forward will reflect what they ARE (which is masked by the anomaly of their past record)..a team that gets outscored." I'm reminded not for the first time of Bill Parcells's famous quip that "you are what your record says you are." Rather than believe that the record is indicative of what they are and the run differential an anomaly, pundits assert the opposite. Fine, but why? Isn't the true scientific method to try to use data to understand what is rather than to cherry pick the statistics that support your theory and explain away the evidence that doesn't? 

Conversely, the record in one run games is considered to be an anomaly...they can't keep it up because history shows that they couldn't actually be what the statistics say they are...it isn't possible that they are that good at 1 run games. Eventually they will revert to the mean...

There is clustering in statistics, but reverting to an average assumes that all other things are the same. A baseball season is a long time. Statistically it is very uncommon for a team, any team, to not have at least one five game losing streak over the course of the season. The New York Yankees have not. Yet in one reason column where the columnist asserts that the Orioles must inevitable move back to average, the Yankees' statistical anomaly is given as evidence that they are really good and why they should hold on to win the division. There is not an assumption that they must, inevitably, return to the average because the columnist begins with an assertion that they are not average and uses the statistic as evidence.

As anyone who has bought stock knows, past performance is no guarantee of future performance. Some of the players who contributed to that run differential are no longer with the team. Why, if we are extrapolating statistics, are we extrapolating the statistics of pitchers who aren't playing? Why not extrapolate statistics since the all star break, or over a three year period? 

I once wrote a paper that forced me to research the term "chaos theory." I was surprised to find that the most common form of chaos theory didn't hold that things were random, just that some systems were so complex that they could not be predicted, and anything that cannot be predicted has an appearance of (is indistinguishable from) randomness. We don't like to thing we can't predict, and we know that sports are not random, so statistics give us the illusion that we have found a meaningful pattern. Usually we have, but that pattern is part of a complex system that makes using it to predict the future a much more iffy proposition.

None of this is to say the Orioles are a lock to go to the playoffs. Plenty of teams not named "Mets" have blown bigger leads than 1.5 games with 27 to play. Just that if they do fail to make the playoffs, all the statisticians will crow and say, "I knew it" and if they do make the playoffs the same statisticans will not admit that stats are meaningless, they will simply look harder for other statistics to (seemingly) explain the fact that they won games they were not expected to win.

In a famous essay Stanley Fish opined that when people said of literature that a text "can't" mean something, that's just a lazy way of saying that nobody has constructed a convincing argument for that interpretation." Someone might claim that William Faulkner's "A Rose For Emily" can't sustain an interpretation that Faulkner believed he was a reincarnated Eskimo, but let a scholar find an authenticated letter demonstrating that Faulkner believed he was an Eskimo, and you better believe that scholars would all of a sudden find things in the text that suddenly appeared to be consistent with what they (now) knew to be true. Let the Orioles make the playoffs and see how many arguments will be made that they "can't" and how convincing the statistics are used to support that belief. Sadly, we don't learn to look at statistics with a modicum of suspicion, we don't learn. Let the Orioles make the playoffs and we will comb over the same data and pull out those examples that now point to what we believe to be true...

Sunday, September 02, 2012

It's been awhile...

I haven't posted anything in All Things Ken in over two years. Why?

The majority of my posts are/were film related, but I've created a separate online magazine, 1More Film Blog (http://1morefilmblog.com) to deal with film stuff.

Also, Facebook continues to proliferate, but as any Facebook fan knows, the wall there is hard(er) to tag and search, so it occurs to me to go ahead and use the Blogger tool for public posts that I want to remain searchable. Stuff I can archive and find more readily. So I'll be posting stuff a bit more here, maybe...