Baseball Grafika Book: Excel Dashboards

Baseball stats are ideally suited for display using a wide variety of charts, network graphs, and other visualization approaches. This is true whether we are using spreadsheet tools such as Microsoft Excel or OpenOffice Calc, data mining tools like Orange, RapidMiner or R, network analysis software such as Gephi and Cytoscape, or web-based visualization tools like D3 or Tableau Public. The sheer scope and variety of available baseball statistics can be brought to life using any one of these or countless other tools.

This is why I felt the need to create a book merging the rich statistical and historical data in the baseball archive with the advanced analytic and visual capabilities of the aforementioned tools. With a bit of good luck and perseverance, the book will be published in late April, coinciding with the early stages of the 2016 baseball season, under the title Baseball Grafika. This series of articles will share a few pieces from the book, which is still undergoing additions and revisions at this stage. I hope these will help provide some insight into how I view the possibilities for visualization, and perhaps generate your interest for how other datasets could benefit from a similar approach.

One of the chapters of the book deals with the creation of dashboards in Excel that allow us to distill large datasets into a single page summarizing information. Here’s an example of a single pennant race, and how it’s unique story can be told using an array of charts, tables, and graphics.

AL_1967_5

Now that we’ve seen an entire dashboard, we’ll look at the component pieces and how they were built. As a reminder, this is all created in Excel, which is often maligned as a visualization tool. Used well, Excel can produce highly effective visualizations, although deploying them to the web is not practical. In the book, I walk through how to create this dashboard using Excel, taking readers through all the steps needed to create formulas, charts, text summaries, and more.

Creating flexible, powerful data displays in Excel frequently involves the use of pivot tables and slicers (filters) that allow for data manipulation. Building charts on top of these tools permits maximum flexibility. Done effectively, this means we can create a template that can be used over and over, with only the source data changing according to our slicer selections. Here’s an example pivot table with slicer options:

team_pivot

The slicer selections allow us to choose the data elements from our base dataset that are to be displayed in a pivot table. From there, name ranges and formulas can be used to select the data programatically, and feed it into charts that are not dependent on any additional manual intervention. One chart, used over and over, makes it simple to display new data with a single click of a slicer button.

Name ranges can be used extensively to automate the dashboard to a high degree, using native Excel functionality. Here’s a screenshot showing a name being defined in Excel:

Excel_name_range

A virtually unlimited number of name ranges can be created, and then used as references in Excel cell formulas, making it easy to populate cells, tables, or charts with updated information.

Each of the following sections of the final dashboard are populated using one or more name ranges based on pivot table data in most cases. All that is required in the dashboard is a simple formula to grab the right data based on the slicer selection.

First, we create a basic text summary recapping each season, which is then pulled into the top section of the dashboard:

AL_1967_1

This is then followed by the pennant race section of the dashboard, including both the pennant race charts as well as a table of season-ending standings information. One pivot table and its references populate the chart, while a second pivot is used to provide the table data, with cell-level formulas performing calculations.

AL_1967_2

Our third section makes use of the wonderful Sparklines for Excel add-in. Our dashboard benefits from the use of horizon and variance charts, as well as box plots. In between, we’re able to add some additional Excel cell calculations to display metric values.

AL_1967_3

The final section of the dashboard takes advantage of some cell formulas to create dotplots displaying relative values within a category. This allows readers to see who was higher or lower in a specific measure, maximizing space along the way, which is often critical when building dashboards.

AL_1967_4

The book will provide much more, including tutorials on creating this type of dashboard, in addition to other visual displays of baseball information. Ultimately, the goal is to share some of my approaches and hope that they drive others to create their own unique approaches, all in the interest of advancing the discipline of baseball data visualization.

Future posts will examine other ways we can explore our baseball data. Text mining, statistical distributions, interactive charts, historical maps, and network graphs will be among our future topics. See you soon.

FacebooktwitterredditpinterestlinkedinmailFacebooktwitterredditpinterestlinkedinmailby feather
FacebooktwitterlinkedinrssFacebooktwitterlinkedinrssby feather

Tableau Public Baseball Pilot Complete

A few weeks back I posted about using Tableau Public to explore baseball stats, specifically with respect to building dashboards to display information. At the time I had created a very rough first pass at the dashboard, with lots of small multiple tables displaying info on a single page. I quickly realized it was a bit overwhelming, so I sought a better solution.

The new version is an improvement, albeit imperfect due to some Tableau limitations. Users can now select a single chart to display in a large, single window using radio buttons. Really easy, and there are a lot of additional filters available on each page to customize the chart to display the results you want. All of this comes with one catch, however. Even when a given chart is not being displayed, it still uses up a handful of pixels on the vertical screen. This forces some charts to display a bit lower on the screen than others, a minor annoyance I spent some time trying unsuccessfully to solve. That’s why you’ll find the offensive categories split into two tabs rather than one.

All of the offensive category charts are displayed using dot plots with dotted lines and small filled circles as the display devices. This proves to be an effective, low-ink manner that compares favorably to bar charts in this case. Dot plots allow for an axis range that doesn’t need to start at 0, which proper bar charts always should (it has to do with the visual perception of the bar size relative to other bars) do. There are many horrendous examples on the web, particularly those produced by major media outlets that distort data either accidentally or intentionally. So dot plots give us an edge for this sort of display.

The third tab displays distributions of various offensive statistics using scatter plots, which again benefit from their ability to use flexible axis ranges based on the displayed values. Once again, there are many filters that let you play to your heart’s content, using team, position, season, and so on to reduce your dataset and answer questions quickly.

Here’s the updated version, to be followed at some point by a full 1871-2014 dashboard:

Direct link is found here.

FacebooktwitterredditpinterestlinkedinmailFacebooktwitterredditpinterestlinkedinmailby feather
FacebooktwitterlinkedinrssFacebooktwitterlinkedinrssby feather

MLB Batter Dashboard in Tableau Public

Tableau has revolutionized visual analysis for many users by providing a tool that makes it easy to create exceptional visualizations without the need to write code. For some, Tableau Desktop has been a godsend to rescue them from the challenges of creating meaningful charts using Excel, Cognos, SAS, or any number of other tools. For others, Tableau Public has provided an opportunity to enter the world of visualization. In my case, I use one at work and the other for my side projects, one of which I’ll introduce here.

I’ve long worked with major league baseball data provided by either Retrosheet, Baseball-Databank, or Sean Lahman, and thought Tableau Public could help me to create some fascinating dashboards for users to navigate. Baseball visualization is a relatively untapped area, and one where I expect to spend more time in 2015. In the meantime, I have a prototype to share via the Tableau Public site, as seen here:

The full dashboard can be found here:

https://public.tableausoftware.com/views/MLBBatters1980sDashboard-WIP/RankingsDashboard?:showVizHome=no#1

This dashboard allows you to filter by team, batter type (left or right handed or switch hitter), league, season, or age (as of July 1st each season). In addition, filters can be set based on the number of games played at a given position. Multiple filters can be combined to provide a variety of results across 15 offensive categories. Have fun with it, and let me know what you think.

More will be coming – some of it can be seen currently on Tableau Public, with more to follow. I’m also planning to expand the data beyond the 1980s, so we can see patterns from more than 100 years of data. Stay tuned.

FacebooktwitterredditpinterestlinkedinmailFacebooktwitterredditpinterestlinkedinmailby feather
FacebooktwitterlinkedinrssFacebooktwitterlinkedinrssby feather