Vulgar Statistics Bonus: Behind the Scenes
Two days ago a commenter asked if I compiled the data for a specific entry manually. It’s a good question, and the issue is one that has bugged me ever since I started writing this column.
To put it simply, yes, I compile all the data for every article manually. I would guess that I have over a hundred spreadsheets organized on my computer. Sometimes it’s pretty easy and a simple matter of a few copy-paste commands. Sometimes I need to open up 50+ tabs worth of box scores, go through them one by one, transfer them to an excel sheet and then check my work to see if my eyes goofed. The most “automated” that I get is said copy-pastings, or writing my own formulas and applying them to entire rows and columns. The reason this has always bugged me is that it often means that a great deal of time goes into each entry. I love looking at statistics so the time doesn’t bother me so much in itself, but the fact that it often limits what I can bring bothers me quite a bit.
I would love to have access to the raw data files that ESPN.com and other sites use to populate their pages of standings and stats for several reasons (which I believe come from the Elias Sports Bureau). For one the raw data files are not only so much more descriptive (i.e. ESPN.com’s dates for games don’t include the year) than what we see online, but they’re made to be sorted a bunch of different ways. One of the issues I often have is that just copy-pasing data creates a formatting nightmare. ESPN.com, NHL.com, Yahoo, and Hockeydb all have different issues that make things problematic.
I’m not sure how one goes about obtaining those data files or how freely they’re released, but I’m going to try to do some digging to find out. If anyone knows anything about how I might do this, don’t hesitate to let me know by e-mailing me at email@example.com.