Batting Explorer Update

Almost done with the template for my series of Batting Explorers, where users can manipulate more than 30 filters to discover and uncover batter data by team, season, position, and much more. Results can be sorted by more than 20 variables as well, making it a flexible tool for quickly locating information.

The latest addition to the template is the ability to click on a batter's card to view his full stats at the great Baseball-Reference site.

Once this one is complete, it will be quite simple to create versions for other decades (note – the original is the 1980s) going all the way back to 1900 – maybe even earlier.

 

 

FacebooktwitterredditpinterestlinkedinmailFacebooktwitterredditpinterestlinkedinmailby feather
FacebooktwitterlinkedinrssFacebooktwitterlinkedinrssby feather

Batting Explorer Interactive Visualizations

I'm currently at work on the first Batting Explorer, an interactive tool where viewers can filter season level batting stats using more than 25 different filters. This project is being built using the great Exhibit tool, currently housed at simile-widgets.org. Within the Collections page on this site, you can find more than 50 game level explorers built using Exhibit; it is a pretty remarkable tool to work with, and I see more future use for it on this site.

It appears that there will be a batting explorer for each decade, as too much information slows the response time, and thus defeats the beauty of using javascript rather than a database to feed the pages. Here's a look:

Batting Explorer

Obviously, once the batting series is complete, I'll feel the urge to do a pitching series along the same lines, followed by whatever seems compelling at some future date. 

As soon as the Batting Explorer is launched, I'll try to follow with a video tutorial to illustrate all the possibilities, but it is also easy to just jump in and start playing with all the filters to get a sense for the power that Exhibit provides.

 

FacebooktwitterredditpinterestlinkedinmailFacebooktwitterredditpinterestlinkedinmailby feather
FacebooktwitterlinkedinrssFacebooktwitterlinkedinrssby feather

Coming Soon – Eyeo

For those who don't know, eyeo is a visualization festival that takes place in Minneapolis this June. According to all reports, last year's inaugural festival was an amazing success, leading to this year's rendition selling out in about 6 hours!

eyeo

The festival lineup is off the charts for anyone in the visualization space, with notable visionaries from around the globe presenting their slice of viz – whether it comes in the form of traditional data visualizations, or interactive art, music, or some combination of each. The attendees list is also quite impressive, based on the names I've seen thus far.

 

I'm looking at this as an opportunity to learn, share, and make contacts in the visualization space. The event schedule seems geared to bringing people together both during and beyond the official events, and with folks coming from all over the globe, it figures to be a very exciting event.

Unfortunately, I have to wait until June, but that gives me plenty of time to plan and to learn more from some of the great talents in the field.

 


FacebooktwitterredditpinterestlinkedinmailFacebooktwitterredditpinterestlinkedinmailby feather
FacebooktwitterlinkedinrssFacebooktwitterlinkedinrssby feather

Information is Beautiful Movie Explorer

OK, so this is not a baseball post, but it does involve visualization, and it employs the great Exhibit toolkit from the MIT Simile project. The Information is Beautiful website, led by David McCandless, hosts monthly visualization challenges for data and visual geeks like myself. Trust me, there are some exceptional folks out there turning seemingly mundane information into incredible visuals.

Anyhow, the data provided for the current challenge is movie-based, with loads of information on movies released between 2007 and 2011 – genre, story, profitability, budget, audience score, and much more. Here's a screenshot of what I did with the info:

Movie Explorer

Entrants can use any part of the info, or all of it. Given that my entry is competing in the interactive division, I chose to show most, if not all of the information, by building a 'Movie Explorer' using Exhibit. Take a test drive here – personally, I find it a bit addictive and fun to play with, but maybe that's just me speaking as a proud parent. Caution – it seems to work best in recent versions of Chrome and Firefox – my version of IE has problems with the scripts.

 

FacebooktwitterredditpinterestlinkedinmailFacebooktwitterredditpinterestlinkedinmailby feather
FacebooktwitterlinkedinrssFacebooktwitterlinkedinrssby feather

d3 Drillable Bar Charts

A couple weeks back I began tinkering with interactive bar charts using the d3 javascript charting toolkit. d3 stands for Data Driven Documents, and is the latest from the great Mike Bostock, former developer of Protovis, another great charting tool.

Anyhow, one of the samples on the d3 site used a drillable bar chart, where users can start at an aggregate level and then dig into the details by clicking on a given bar – a simple concept that is beautiful when d3 is the platform.

This has led me to my first chart, with more to come. The initial example looks at home runs by team for the 1995-2010 seasons, and begins at the season level. Once a season is selected, we drill into homers at the team level, and then at the individual batter level. It's fast, intuitive, and a heck of a lot more esthetically pleasing than your typical report output from any one of dozens of high-powered reporting tools.

d3 Bar Chart

To see it in action, click here

 

FacebooktwitterredditpinterestlinkedinmailFacebooktwitterredditpinterestlinkedinmailby feather
FacebooktwitterlinkedinrssFacebooktwitterlinkedinrssby feather

Using Scatterplots to Analyze Team Success

I've previously posted on Orange, the wonderful open source project for statistical analysis and visualization. My latest foray involves using a variety of charts to examine success patterns by team, as measured by wins in a season. We'll talk first about scatterplots.

Scatterplots, for the uninitiated, are basically two dimensional charts that allow users to see the relationship between two elements – say, hits on one axis and runs on the other. They can be a great tool for quickly spotting correlations between elements (e.g.- are more hits consistently associated with more runs over the course of a season). In Orange, we have the luxury of adding 3rd and 4th elements to the picture, using the color of the markers, as well as the size of each marker. Here's an example, using BB and HR on the axes, and wins for the size and runs for the color of the markers.

 
Note that the sizes of the circles are not proportional to the number of wins, but rather scale according to the range of the data. This is helpful in spotting patterns, but could be considered a distortion if one adheres closely to Edward Tufte's advice.

The scatterplot enables us to spot some easy correlations; notice that virtually all of the teams with high walk totals (x-axis) are also big winners – the Yankees, Rays, Braves, and Red Sox were all at 89 or more wins for the season. So apparently, with the exception of the Diamondbacks, walks were a good predictor of wins (we're not accounting for the pitching side of the equation here, which derailed the Diamondbacks).

 
On the Y-axis, home runs also appear to have a positive relationship to wins, although perhaps slightly less than walks. And woe be to any team with low positions on both axes – this is where the worst teams in baseball wound up in 2010, including the 57 win Pirates, the 61 win Mariners, and the 66 win Orioles. No on-base ability, coupled with no HR power, seems to be a strong predictor of failure, even without knowing anything about pitching for these teams.
 
Let's change the axes for another view; we'll focus on defense in this case, putting Errors on the x-axis and double plays on the Y.
 
 
Notice any patterns? Our underachieving teams are heavily skewed to the lower right of the chart, committing lots of errors with few double plays. The pattern is perhaps less obvious than in our first example, but it is intuitive. We also see a pair of teams, the Tigers and Braves, who appear to have overcome some of their proneness to errors by turning a lot of double plays. At the other extreme, we see the Giants, with very few double plays but also the fewest errors in MLB, reaching 92 wins in spite of their low run scoring ability. Obviously, pitching helped, but it appears their defense provided relatively error-free support to their mound corps.
 
Finally, we'll take one more view – home runs (HR) versus home runs allowed (HRA). 
 
 
Once again, there appears to be a fairly clear distinction, with the lower half of the display littered with the weaker teams that tended to surrender more homers than they hit themselves. Teams with a large positive home run differential (upper left of the chart) tended to have high win totals, although there were a few teams (Phillies, Rangers, Rays) with low differentials who still fared well, due to their strong pitching corps.
 
These are just a few examples created with Orange; I'll get these and many others out to the Collections page in the site as time permits. In summary, scatterplots, regardless of the user tool, are a great way to quickly view patterns in the underlying data, and the impact on a dependent variable such as wins.

 

FacebooktwitterredditpinterestlinkedinmailFacebooktwitterredditpinterestlinkedinmailby feather
FacebooktwitterlinkedinrssFacebooktwitterlinkedinrssby feather

A More Thorough Look at d3

In this post, I want to take a little deeper look at d3, the javascript-based charting library from Mike Bostock of Protovis fame. For those who like to view information that is intuitively, accurately, and esthetically well presented, d3 offers a marvelous range of possibilities.

I previously spoke on Calendar Views, a supremely intuitive means of viewing information that can be measured at a dialy level; today I'll look at a host of other approaches that I find most compelling for their potential to work with the baseball data that is my focus. Each chart type will feature a screenshot from the d3 site, as I haven't applied any of these approaches to the baseball data just yet.

OK, time to look at a few more chart types, with my brief synopsis of each.

Streamgraphs are an esthetically pleasing way to show information that might typically be shared via an area chart. However, as with all the d3 charts, there is far more flexibility available to the chart creator, albeit with some added complexity. The results, IMHO, are worth the trouble, compared to the traditional lifeless output we see from Excel or worse yet, Powerpoint charts. I have created thousands of Excel charts, and employed most of the available tricks and hacks, but there are still significant limitations to what one can do; for Powerpoint, forget about it as a useful charting tool.

Here's an example from the d3 site:

Not quite what we're used to seeing from Excel, is it? While the eye candy aspect is pleasing, I refuse to use charts only on that basis. Lord knows there are plenty of charts out there that have instant eye catching appeal, but they are the equivalent of Britney Spears versus the John Coltranes or Mozarts one can create with d3. All fluff, no substance, and destined to be banished to the scrap heap with all the other inaccurate, data distorting charts that preceded them.

 

As for potential uses with baseball data, I envision some historical tracking of hits data (singles, doubles, triples, home runs) for starters, and will doubtlessly find other relevant examples to share in the future.

Another chart type I intend to use is the Scatterplot Matrix. While this chart type is certainly not exclusive to d3, the ability to thoroughly customize the output makes its use extra appealing. This chart type fits into the category known as "small multiples" coined by the legendary Edward Tufte, and provides the user with a considerable wealth of information in a limited space, while simultaneously making the information easy to grasp.

I haven't figured out how to use these with baseball stats just yet – certainly there could be a multitude of ways to do so. Perhaps looking at multiple offensive categories from a batter's career would be one application; the individual data points could represent specific seasons or age levels associated with each category.

In any event, the scatterplot matrix, as well as other types of small multiples, are exceptional at providing large volumes of information in a limited space, making it easy for the observer to detect patterns and relationships – and isn't that the goal of most data display?

The Sunburst is part of a category of relationship-oriented displays available in d3, with the commom goal of displaying relationships within a data set. In the case of the Sunburst, the data is presented in a more hierachical visual form (versus a treemap, for example), while providing immediate insights into the interrelationship between entities within the data.

Once again, the d3 esthetics are most impressive, but the key is that the underlying data is not distorted or mis-represented by the beautiful display. We are quickly able to see which sub-elements flow up to a primary element, as well as which of the larger categories dominate the data, and thus the display.

I am certain that this display could apply to many different baseball stats; my job is to find where it makes the most sense from a viewer's perspective.

Lastly, for this post, is the bullet chart, created by visualization guru Steven Few, and since added to a number of graphing toolkits and libraries. I have used these for several years within Excel, and was pleased to see their availability in d3, for they provide a compact, intuitive look at target-based data.

By target-based, I mean that we can set a target level – for example a .900 OPS, and see how specific players measure up to that goal. Bullet charts provide the ability to show the target, actual performance, and projected performance, all framed by relative performance levels (i.e.- poor, average, good, etc.). Here's the d3 example:

Many baseball applications exist for the bullet chart – we could create a pitcher dashboard showing how a single pitcher compares to his career and league averages across a variety of statistical categories – WHIP, ERA, SO/9, etc.

The same would apply for batters, where we could rate them versus their career numbers, league averages, and so on, using batting average, OPS, OBP, and any number of other stats.

Once again, we have the potential for a "small multiples" type of display, where all the information the viewer needs is provided in a single page or screen view.

Well, that's it for now, but I'm certain to blog on other chart types within d3, as there are many more. Give it a look for yourself, even if just to understand the possibilities – d3 examples

 

FacebooktwitterredditpinterestlinkedinmailFacebooktwitterredditpinterestlinkedinmailby feather
FacebooktwitterlinkedinrssFacebooktwitterlinkedinrssby feather

d3 – Another Great Javascript Charting Tool

d3 is a relatively new javascript charting tool from Mike Bostock, one of the creators of the outstanding Protovis project, no longer under active development. But d3 has picked up the mantel, and made a number of significant improvements over Protovis, in my estimation.

Foremost among the improvements for me is the introduction of several new chart libraries, one of which I want to focus on here.

The Calendar View provides an ingenious way to show datasets that would typically be displayed in a line chart, due to the sheer volume of data points. The d3 calendar view helps us display daily data by year, month, and day, while simultaneously displaying results in a weekly grid. In addition, the color coding provides a quick and intuitive way to see patterns in the data. Here's an example, from the d3 site:

http://mbostock.github.com/d3/ex/calendar.html

The d3 example is built using stock market data, but the same approach can be used for any daily results – in a baseball context, this could be runs scored by game, winning (or losing) margin in each game, or even the number of pitches thrown per game by the starting pitcher.

I'll look at some of the other d3 libraries soon, and begin incorporating them into the Collections page on the site.

 

FacebooktwitterredditpinterestlinkedinmailFacebooktwitterredditpinterestlinkedinmailby feather
FacebooktwitterlinkedinrssFacebooktwitterlinkedinrssby feather