I’m pleased to announce that the 2018 Retrosheet game log files have been uploaded to the VBP database. This data can be used to create analysis at the game level, with a wide array of data elements, including the following:
…and much more
This data provides the input for the Game Explorer visualizations on this site, which will be updated shortly to include the 2018 season. If you haven’t seen them previously, the Game Explorers allow users to filter across many data attributes to retrieve specific results. Here’s a screenshot:
The next step is to create the 2018 version of the explorers, adding to the existing files covering the 1955-2017 seasons. I’ll keep you posted as soon as 2018 is available on the site. Thanks for reading!
One of my ongoing visualization projects has been the Batting Explorer, a semantic-based discovery concept built using the Simile Exhibit open source tools. I’ve been updating this on a not very timely basis the last few years, but have now caught up through the 2015 season. Of course, 2016 stats will be available in the next 2-3 months, so it will be time to repeat the process once more. For the moment, I’ve just added the 2014 & 2015 seasons into one of the decade-based examples, so you can now search for all batters covering seasons from 1901 through 2015.
The explorers work in much the same way as many travel sites you’ve visited on the web. Each page can be filtered using a wide array of facets (filters) that allow you to quickly narrow down results by team, season, batting category, and a bunch of other options. I’ll show this in a moment. Let’s first start with a basic view of the 2010-15 explorer:
Each of the Batting Explorers has a consistent look & feel, with the underlying data as the only difference. All individual player-season combinations are laid out in a baseball card sort of format, although you can’t flip them over or get any bubble gum either 🙂 Nonetheless, each card contains a wealth of information, including the number of games played by position, laid out on a baseball diamond. In addition, hovering over a card loads a pop-up summary of the season for each individual batter, as seen here:
An additional benefit comes when you click on a selected card. Every batter card has a personalized link to the massive Baseball-Reference.com site. Here’s what you’ll see when clicking on the Juan Uribe link:
I mentioned earlier the ability to filter using a wide range of facets. Here’s a glimpse of the many categorical and numerical options present in each Batting Explorer:
As you can see, there are dozens of possible filters that can be used. If you want to see only batters with more than 40 home runs in a season, simply select the HR facet and check the conveniently provided ranges. Or how about viewing players from a single team? Simple, using the Team facet. Likewise for filtering by season, number of doubles, stolen bases, walks, strikeouts, and so much more. And of course these filters can be used together to quickly find matching results.
Finally, there are a multitude of sort capabilities, or you can choose to have nothing sorted. If you do wish to choose one or more sort attributes, here are your options:
Your sorts can be many layers deep – just keep adding variables!
This has been a very brief overview – to learn more, go to the Portfolio section and begin exploring! While you’re on the site, take some time to view the Game Summary exhibits, set up in much the same fashion using Exhibit. Or, if networks are your thing, check out a large collection of franchise player or team trade networks. Hope you enjoy the site, and thanks for reading.
To say that some of my website visuals were not quite up to date is a massive understatement. The latest version of the Game Summaries covered the 2009 season. The interactive pennant race charts ran through 2011, and the Batting Explorer exhibits end with the 2009 season. Not exactly current in any of these cases, and other examples abound. So what to do about it?
For starters, the underlying data so generously made available from the Retrosheet folks needed to be updated. Portions of this had been done over the last few years, but a bit haphazardly, as I came to find out over the last few days. Some tables were current through 2011, others through 2012 or 2013. In short, they were consistently inconsistent, and certainly not suited to creating the latest versions of the aforementioned visuals.
One of the best aspects of growing older (at least from a data perspective) is accumulating more and more code that makes it a bit less painful to update or repair database tables. I have managed to create and save dozens of code snippets that help me create, insert, update, select, and otherwise manipulate the data into a proper format for consumption by visualization tools. In some cases, this code made the process surprisingly easy, while other cases required dusting off the cobwebs to understand what my code was doing or not doing. In the end, the process worked remarkably swiftly, aided by the periodic Michigan microbrew, resulting in table updates that allowed me to tackle the pennant race and game summary projects, resulting in 23 new baseball graphics created in a 72-hour window.
Et voila, as the French might say, the Visual-Baseball site now has 18 new pennant race charts (3 years times 6 divisions) while the game summaries have five new entries covering the 2010 through 2014 seasons, and they all work as expected. The pennant race charts are built using D3 and NVD3 code atop .json data, while the Game Summary exhibits are created using Simile Exhibit, a semantic browsing tool, also sitting on .json data.
The pennant race charts look like this:
The charts are interactive in several ways – individual teams can be hidden from view, the chart is zoomable, and individual values can be displayed using mouseover capability. You can find the entire portfolio of more than 360 charts here.
Game summary exhibits cover 60 seasons and afford users many filtering options to search for games based on specific criteria – teams, pitchers, runs scored, and so on. Results can be viewed in a tabbed fashion or via a timeline. Here’s an example image:
Have fun with both the pennant races and the game summaries, and make sure to check out a few of the other resources in the portfolio section. It feels as though the site is gradually becoming a unique resource for the visual interpretation of baseball data, whether it is in the form of conventional charts or more esoteric views such as the network graphs. Feel free to share any of the information on the site, and tell your friends and colleagues.