Baseball Grafika Book: Excel Dashboards

Baseball stats are ideally suited for display using a wide variety of charts, network graphs, and other visualization approaches. This is true whether we are using spreadsheet tools such as Microsoft Excel or OpenOffice Calc, data mining tools like Orange, RapidMiner or R, network analysis software such as Gephi and Cytoscape, or web-based visualization tools like D3 or Tableau Public. The sheer scope and variety of available baseball statistics can be brought to life using any one of these or countless other tools.

This is why I felt the need to create a book merging the rich statistical and historical data in the baseball archive with the advanced analytic and visual capabilities of the aforementioned tools. With a bit of good luck and perseverance, the book will be published in late April, coinciding with the early stages of the 2016 baseball season, under the title Baseball Grafika. This series of articles will share a few pieces from the book, which is still undergoing additions and revisions at this stage. I hope these will help provide some insight into how I view the possibilities for visualization, and perhaps generate your interest for how other datasets could benefit from a similar approach.

One of the chapters of the book deals with the creation of dashboards in Excel that allow us to distill large datasets into a single page summarizing information. Here’s an example of a single pennant race, and how it’s unique story can be told using an array of charts, tables, and graphics.

AL_1967_5

Now that we’ve seen an entire dashboard, we’ll look at the component pieces and how they were built. As a reminder, this is all created in Excel, which is often maligned as a visualization tool. Used well, Excel can produce highly effective visualizations, although deploying them to the web is not practical. In the book, I walk through how to create this dashboard using Excel, taking readers through all the steps needed to create formulas, charts, text summaries, and more.

Creating flexible, powerful data displays in Excel frequently involves the use of pivot tables and slicers (filters) that allow for data manipulation. Building charts on top of these tools permits maximum flexibility. Done effectively, this means we can create a template that can be used over and over, with only the source data changing according to our slicer selections. Here’s an example pivot table with slicer options:

team_pivot

The slicer selections allow us to choose the data elements from our base dataset that are to be displayed in a pivot table. From there, name ranges and formulas can be used to select the data programatically, and feed it into charts that are not dependent on any additional manual intervention. One chart, used over and over, makes it simple to display new data with a single click of a slicer button.

Name ranges can be used extensively to automate the dashboard to a high degree, using native Excel functionality. Here’s a screenshot showing a name being defined in Excel:

Excel_name_range

A virtually unlimited number of name ranges can be created, and then used as references in Excel cell formulas, making it easy to populate cells, tables, or charts with updated information.

Each of the following sections of the final dashboard are populated using one or more name ranges based on pivot table data in most cases. All that is required in the dashboard is a simple formula to grab the right data based on the slicer selection.

First, we create a basic text summary recapping each season, which is then pulled into the top section of the dashboard:

AL_1967_1

This is then followed by the pennant race section of the dashboard, including both the pennant race charts as well as a table of season-ending standings information. One pivot table and its references populate the chart, while a second pivot is used to provide the table data, with cell-level formulas performing calculations.

AL_1967_2

Our third section makes use of the wonderful Sparklines for Excel add-in. Our dashboard benefits from the use of horizon and variance charts, as well as box plots. In between, we’re able to add some additional Excel cell calculations to display metric values.

AL_1967_3

The final section of the dashboard takes advantage of some cell formulas to create dotplots displaying relative values within a category. This allows readers to see who was higher or lower in a specific measure, maximizing space along the way, which is often critical when building dashboards.

AL_1967_4

The book will provide much more, including tutorials on creating this type of dashboard, in addition to other visual displays of baseball information. Ultimately, the goal is to share some of my approaches and hope that they drive others to create their own unique approaches, all in the interest of advancing the discipline of baseball data visualization.

Future posts will examine other ways we can explore our baseball data. Text mining, statistical distributions, interactive charts, historical maps, and network graphs will be among our future topics. See you soon.

FacebooktwitterredditpinterestlinkedinmailFacebooktwitterredditpinterestlinkedinmailby feather
FacebooktwitterlinkedinrssFacebooktwitterlinkedinrssby feather

Anatomy of a Book

As I wait for the review process for my book to complete so I can queue up the Kindle version, I thought it would be a good time to share some of the philosophy behind the book, while taking a further look at the rationale for some of the chart selections. I’ll also start with why Microsoft Excel was my primary tool for creating the charts (wait – aren’t you an open source champion?) and how it helped make the book a reality.

For those of you new to the subject, my book is titled MLB Pennant Races, 1901-1968: A Visual Analysis of Baseball’s Pennant Races and endeavors to put a new, highly visual spin on an old topic. I see on a daily basis how much of an impact data visualization is having, and noticed that baseball visualization has not kept pace. So it became clear to me that a book (or books) was needed that could help close this gap, and turn a wealth of data into meaningful graphics. I knew I could do this, but what would be the best tool to actually create a book? Could it really be Excel? Absolutely.

For those of you who don’t use Excel on a regular basis (my day job calls for multiple hours a day in Excel), it really is a powerful tool for all kinds of analysis, and yes, even charting. Now here’s the rub – Excel’s default charting selections aren’t so good (albeit much improved in Excel 2013), and in fact can be absolutely grotesque on occasion, particularly with respect to improper scaling for bar charts. However, with a few well practiced tweaks combined with lessons learned from Excel gurus, I can do darn near anything with Excel charts. As a frequent user of Tableau, not to mention open source beauties like Protovis and D3, I still find Excel to provide a great combination of data management coupled with charting capabilities.

Now if I were going to create a single chart, or even a small set of charts, Excel might not be my first choice. However, when the need is for 136 identical dashboard pages composed of multiple charts, where only the data is changing, then Excel is tough to beat. The trick is to use pivot tables with the proper ‘slicers’, enabling a single data source to be used for many individual seasons. So 68 seasons for each of two leagues can all feed from a single data source and then populate existing chart templates. This way, I need just a handful of charts that can be used many times over to create 136 unique instances.

Here’s a look at one of the pivot tables with accompanying slicers that allow me to select by season, league, and division (as needed) to automatically update the values in the pivot table.

Similarly, I set up the primary pennant race chart to update using the same sort of slicers for season, league, etc. If I were truly an Excel genius, I’m sure I could have had a single set of slicers that would have updated everything, but it was still quite easy using the pair. This is how the slicers look for the main chart:

One of the reasons this all works so well in Excel is due to the formulas I used. In some cases, these were very simple, perhaps just dividing the contents of one cell by another. In other cases, the logic becomes more complex, involving sorting results based on the order of finish, or by team nickname rather than the city (think Dodgers, not Brooklyn or Los Angeles). Excel provides a range of formulas that let advanced users do virtually anything with the data. If everything is done right, these formulas are set up one time, and then work hundreds of times behind the scenes to get the right data into each chart, all incumbent on the slicer selections.

By now some of you regular Excel users will have noticed that many of the charts I used aren’t standard issue Excel charts. Absolutely true, but this leads me into a discussion of how to use Excel even when the chart type doesn’t exist. Take, for example, the dotplots pictured below.

How the heck did we create those in Excel, without having a standard chart type that even comes close to that look? Simple – we created in-cell charts by using the Excel REPT formula, combined with a couple other values to tweak the scaling for each category. This basically involves repeating a value (in our case, a space) a selected number of times based on the data value. We then choose a shape, a font size, and a font color, and use our data value (and some sort of factor value) to display each dot further to the right (higher values) or to the left (lower values). This is a great trick to learn in Excel, as it gives you another tool when bar charts are less appropriate, which is quite often the case. Visually, dotplots are often superior because we don’t need to fix the scale at zero; this allows us to ‘zoom’ into a narrow range based on the actual data values. In this case, they work far better than bar charts (trust me, I tried those first) and are visually cleaner as well (less ink).

There are some other chart types that are not native to Excel (horizon charts, box plots, and advanced sparklines) but which can be added to Excel, courtesy of the Sparklines for Excel tool, created by Fabrice Rimlinger. This is an essential add-in if you wish to create some great looking graphics when you have limited space to work with (for instance, on a dashboard). I have been an advocate of this project for more than two years, and look forward to more future use. These charts are also typically in-cell, which makes it very easy to re-size the charts to suit your application. Here’s a view of some horizon charts:

That’s it for now; perhaps I’ll dive into a bit more of the formula detail in a future post.

FacebooktwitterredditpinterestlinkedinmailFacebooktwitterredditpinterestlinkedinmailby feather
FacebooktwitterlinkedinrssFacebooktwitterlinkedinrssby feather

Sorry MicroStrategy: Back to Excel

A few recent posts have documented my explorations with the new desktop software from MicroStrategy, clearly designed to compete with the likes of Tableau and Excel. One of the chief advantages of the MS offering is the free price point (as in $0), versus the much pricier Excel and especially, Tableau. As usual, I had to perform my due diligence, as is the case with every new tool I get my hands on.

The MS Analytics offering does have a lot to recommend it by, as I noted in my previous posts. However, after a few weeks of intensive exploration, I have come to the conclusion that it isn’t a good fit for what I’m currently attempting to do, which is to create graphics for a pair of upcoming books. While it is perhaps easier to manage the data compared to Excel, the structure is a bit too rigid, and the chart options are also too structured and limiting for my current needs. It’s still useful, but not so much in the current context.

Which brings me back to Excel. For all its faults, Excel is still very powerful, and most importantly for me, very flexible. I can hack my way into almost any sort of chart, aided by the likes of Jon Peltier, Chandoo, and Fabrice Rimlinger. So Excel it is, at least for this project.

Now that I know what I’m doing, it’s high time to get back to work and deliver these books! Look for the 1901-68 pennant races in December, and the 1969-2013 to follow in early 2014.

FacebooktwitterredditpinterestlinkedinmailFacebooktwitterredditpinterestlinkedinmailby feather
FacebooktwitterlinkedinrssFacebooktwitterlinkedinrssby feather

Hitting the Homestretch

The last 10 days have been exceptionally productive in getting my first visual pennant race book (1901-1968) together, with the final content pages being completed earlier this evening. While that was extremely gratifying in itself, now comes the part where publishers typically do the work – the front and back matter. Title pages, table of contents, acknowledgments, preface, and introduction are all essential pieces in creating a finished product. And guess what? Since I’m the publisher for my own book this time around, I get to figure this all out on my own. Fun.

Fortunately, it isn’t rocket science, as every book follows a general framework that I can learn from and mimic to the best of my ability. Of course, there are other little details when you elect to create two versions of a book, one for print and one for Kindle and other e-readers. Creating bookmarks for each and every one of 175 pages is just one of the steps I’ll need to take for the e-version, but the goal is to make the book as easy to use and polished as possible, so this step is essential.

I’ve previously shared earlier versions of the season content. Here’s a glimpse of how the summary sections will appear:

The goal is to have both versions available by the 20th of this month, with the companion book (1969-2013 pennant races) likely to appear in March 2014. Just in time for the holidays, so maybe I can actually spend a couple weeks without creating charts, copying formulas, and building PDF files. Or I could get an early start on downloading the 2013 data I need for volume two. Just don’t tell my family what I’m up to.

FacebooktwitterredditpinterestlinkedinmailFacebooktwitterredditpinterestlinkedinmailby feather
FacebooktwitterlinkedinrssFacebooktwitterlinkedinrssby feather

Revised Pennant Race Template

Well, I thought I had the final template for my pennant race book a few weeks back, but began tinkering with it, and now have a new version with a bit more visual impact. Here’s a glimpse:

Still have lots of work to do on this book, with my first publishing priority being a book on Gephi and network visualization that I’m currently working on. Once that’s complete, I’ll refocus on the pennant race book and hope to publish it sometime this summer.

FacebooktwitterredditpinterestlinkedinmailFacebooktwitterredditpinterestlinkedinmailby feather
FacebooktwitterlinkedinrssFacebooktwitterlinkedinrssby feather

Updated Pennant Race Template

A few modifications later, here’s the updated (and I think, final) page template for the pennant races book.

Have some useful categories now in the heatmap, and rearranged the center section to be more intuitive. The fun part is that the template uses a dynamic dataset in Excel, so I can go in, make a couple selections, and all the content changes to a new season. Eager to begin assembling these so I can get the book published in the next month or two.

FacebooktwitterredditpinterestlinkedinmailFacebooktwitterredditpinterestlinkedinmailby feather
FacebooktwitterlinkedinrssFacebooktwitterlinkedinrssby feather

Pennant Race Template

Been a long time since my last post, since coming up with a basic page template for my upcoming book has been taking a lot of time and effort. After spending what seemed an eternity on draft versions, and scrapping more than a few that failed the sense check, I believe I’ve got something ready. For those of you not familiar with the concept of dashboards, I’ll defer to the great Stephen Few to explain them. Suffice to say, they provide a one page look at a series of metrics (typically within a business) and allow the user to quickly gauge the state of the business.

IMHO, baseball pennant races are far more intriguing than business metrics, but I do like the dashboard approach, especially for a book covering more than 100 seasons, with two leagues, followed by two, then three divisions. So a layout that can be replicated easily was essential to getting the book completed; create a handful of templates, feed it the selected season data, and we’re well on our way. Here’s a shot of a 90% complete template:

The trick is to keep the dashboard clean while still providing a wealth of data, and making it simple to interpret. A few of the chart types here will not be familiar to most viewers, but are excellent at communicating insights. For instance, the little red and blue charts are called horizon charts, and enable us to show teams that are losing more than half their games in red, while showing winning teams in blue, all while using a very limited amount of space. Ditto for the bullet charts, where we see team performance versus league averages, and box charts that show season-long patterns within a very small space.

FYI – Much of the work done so far is courtesy of the excellent Sparklines for Excel plugin created by Fabrice Rimlinger.

FacebooktwitterredditpinterestlinkedinmailFacebooktwitterredditpinterestlinkedinmailby feather
FacebooktwitterlinkedinrssFacebooktwitterlinkedinrssby feather