Pennant Race Charts Updated!

The last of my big three annual updates is now complete, as all 2016 & 2017 pennant race charts have been created, and now reside in the Visual-Baseball Project portfolio. These charts are created using NVD3, which is built on top of the powerful d3.js framework developed by Mike Bostock. These tools help make the charts highly interactive, allowing you to see where each team stands at any given point in the season, and also providing the ability to zoom in using a smaller sub-chart beneath the primary display.

The structure of the charts is based on every team’s relationship to a .500 winning percentage – a situation where a team wins exactly as many games as it loses. This structure allows for easy interpretation of the results, as we can see which teams hover near the .500 mark (i.e.- consistent mediocrity), others that rise well above this level, and also those teams that descend far below the breakeven point. Allow me to illustrate these thoughts using the 2017 American League Central division, and my hometown Detroit Tigers, who suffered through their worst season since 2003.


As you can see, the darker orange line representing the Tigers takes a steep dive starting in early August, culminating in a final record 34 games below the .500 percentage. Meanwhile, the rival Cleveland Indians (light blue line) present a near mirror image of the Tigers failure, with a sensational month of September that ultimately lands then 42 games over the .500 break-even level.

Similar charts have been created for the other divisions for both the 2016 & 2017 seasons. In fact, you can now view any season, league, and divisional splits dating back to the 1901 campaigns, a total of 380 pennant races to explore! Find all the pennant race charts here. Have fun exploring, and as always, thanks again for reading!

FacebooktwitterredditpinterestlinkedinmailFacebooktwitterredditpinterestlinkedinmailby feather
FacebooktwitterlinkedinrssFacebooktwitterlinkedinrssby feather

4 Years of Baseball Graphics Updates in 3 Days

To say that some of my website visuals were not quite up to date is a massive understatement. The latest version of the Game Summaries covered the 2009 season. The interactive pennant race charts ran through 2011, and the Batting Explorer exhibits end with the 2009 season. Not exactly current in any of these cases, and other examples abound. So what to do about it?

For starters, the underlying data so generously made available from the Retrosheet folks needed to be updated. Portions of this had been done over the last few years, but a bit haphazardly, as I came to find out over the last few days. Some tables were current through 2011, others through 2012 or 2013. In short, they were consistently inconsistent, and certainly not suited to creating the latest versions of the aforementioned visuals.

One of the best aspects of growing older (at least from a data perspective) is accumulating more and more code that makes it a bit less painful to update or repair database tables. I have managed to create and save dozens of code snippets that help me create, insert, update, select, and otherwise manipulate the data into a proper format for consumption by visualization tools. In some cases, this code made the process surprisingly easy, while other cases required dusting off the cobwebs to understand what my code was doing or not doing. In the end, the process worked remarkably swiftly, aided by the periodic Michigan microbrew, resulting in table updates that allowed me to tackle the pennant race and game summary projects, resulting in 23 new baseball graphics created in a 72-hour window.

Et voila, as the French might say, the Visual-Baseball site now has 18 new pennant race charts (3 years times 6 divisions) while the game summaries have five new entries covering the 2010 through 2014 seasons, and they all work as expected. The pennant race charts are built using D3 and NVD3 code atop .json data, while the Game Summary exhibits are created using Simile Exhibit, a semantic browsing tool, also sitting on .json data.

The pennant race charts look like this:


The charts are interactive in several ways – individual teams can be hidden from view, the chart is zoomable, and individual values can be displayed using mouseover capability. You can find the entire portfolio of more than 360 charts here.

Game summary exhibits cover 60 seasons and afford users many filtering options to search for games based on specific criteria – teams, pitchers, runs scored, and so on. Results can be viewed in a tabbed fashion or via a timeline. Here’s an example image:


The entire gallery is located here.

Have fun with both the pennant races and the game summaries, and make sure to check out a few of the other resources in the portfolio section. It feels as though the site is gradually becoming a unique resource for the visual interpretation of baseball data, whether it is in the form of conventional charts or more esoteric views such as the network graphs. Feel free to share any of the information on the site, and tell your friends and colleagues.

FacebooktwitterredditpinterestlinkedinmailFacebooktwitterredditpinterestlinkedinmailby feather
FacebooktwitterlinkedinrssFacebooktwitterlinkedinrssby feather

Pennant Race Book is Here!

After multiple iterations, periodic delays, and last minute additions, my first pennant race book is finally available. MLB Pennant Races, 1901-1968 A Visual Analysis of Baseball’s Pennant Races is available through Amazon. I’m still working through the Kindle version formatting, and hope to have that available before the end of the year. I’m also working on getting a PDF version available through this site, and possibly a version for Nook and iPad as well. Read More

FacebooktwitterredditpinterestlinkedinmailFacebooktwitterredditpinterestlinkedinmailby feather
FacebooktwitterlinkedinrssFacebooktwitterlinkedinrssby feather

Anatomy of a Book

As I wait for the review process for my book to complete so I can queue up the Kindle version, I thought it would be a good time to share some of the philosophy behind the book, while taking a further look at the rationale for some of the chart selections. I’ll also start with why Microsoft Excel was my primary tool for creating the charts (wait – aren’t you an open source champion?) and how it helped make the book a reality.

For those of you new to the subject, my book is titled MLB Pennant Races, 1901-1968: A Visual Analysis of Baseball’s Pennant Races and endeavors to put a new, highly visual spin on an old topic. I see on a daily basis how much of an impact data visualization is having, and noticed that baseball visualization has not kept pace. So it became clear to me that a book (or books) was needed that could help close this gap, and turn a wealth of data into meaningful graphics. I knew I could do this, but what would be the best tool to actually create a book? Could it really be Excel? Absolutely.

For those of you who don’t use Excel on a regular basis (my day job calls for multiple hours a day in Excel), it really is a powerful tool for all kinds of analysis, and yes, even charting. Now here’s the rub – Excel’s default charting selections aren’t so good (albeit much improved in Excel 2013), and in fact can be absolutely grotesque on occasion, particularly with respect to improper scaling for bar charts. However, with a few well practiced tweaks combined with lessons learned from Excel gurus, I can do darn near anything with Excel charts. As a frequent user of Tableau, not to mention open source beauties like Protovis and D3, I still find Excel to provide a great combination of data management coupled with charting capabilities.

Now if I were going to create a single chart, or even a small set of charts, Excel might not be my first choice. However, when the need is for 136 identical dashboard pages composed of multiple charts, where only the data is changing, then Excel is tough to beat. The trick is to use pivot tables with the proper ‘slicers’, enabling a single data source to be used for many individual seasons. So 68 seasons for each of two leagues can all feed from a single data source and then populate existing chart templates. This way, I need just a handful of charts that can be used many times over to create 136 unique instances.

Here’s a look at one of the pivot tables with accompanying slicers that allow me to select by season, league, and division (as needed) to automatically update the values in the pivot table.

Similarly, I set up the primary pennant race chart to update using the same sort of slicers for season, league, etc. If I were truly an Excel genius, I’m sure I could have had a single set of slicers that would have updated everything, but it was still quite easy using the pair. This is how the slicers look for the main chart:

One of the reasons this all works so well in Excel is due to the formulas I used. In some cases, these were very simple, perhaps just dividing the contents of one cell by another. In other cases, the logic becomes more complex, involving sorting results based on the order of finish, or by team nickname rather than the city (think Dodgers, not Brooklyn or Los Angeles). Excel provides a range of formulas that let advanced users do virtually anything with the data. If everything is done right, these formulas are set up one time, and then work hundreds of times behind the scenes to get the right data into each chart, all incumbent on the slicer selections.

By now some of you regular Excel users will have noticed that many of the charts I used aren’t standard issue Excel charts. Absolutely true, but this leads me into a discussion of how to use Excel even when the chart type doesn’t exist. Take, for example, the dotplots pictured below.

How the heck did we create those in Excel, without having a standard chart type that even comes close to that look? Simple – we created in-cell charts by using the Excel REPT formula, combined with a couple other values to tweak the scaling for each category. This basically involves repeating a value (in our case, a space) a selected number of times based on the data value. We then choose a shape, a font size, and a font color, and use our data value (and some sort of factor value) to display each dot further to the right (higher values) or to the left (lower values). This is a great trick to learn in Excel, as it gives you another tool when bar charts are less appropriate, which is quite often the case. Visually, dotplots are often superior because we don’t need to fix the scale at zero; this allows us to ‘zoom’ into a narrow range based on the actual data values. In this case, they work far better than bar charts (trust me, I tried those first) and are visually cleaner as well (less ink).

There are some other chart types that are not native to Excel (horizon charts, box plots, and advanced sparklines) but which can be added to Excel, courtesy of the Sparklines for Excel tool, created by Fabrice Rimlinger. This is an essential add-in if you wish to create some great looking graphics when you have limited space to work with (for instance, on a dashboard). I have been an advocate of this project for more than two years, and look forward to more future use. These charts are also typically in-cell, which makes it very easy to re-size the charts to suit your application. Here’s a view of some horizon charts:

That’s it for now; perhaps I’ll dive into a bit more of the formula detail in a future post.

FacebooktwitterredditpinterestlinkedinmailFacebooktwitterredditpinterestlinkedinmailby feather
FacebooktwitterlinkedinrssFacebooktwitterlinkedinrssby feather

Hitting the Homestretch

The last 10 days have been exceptionally productive in getting my first visual pennant race book (1901-1968) together, with the final content pages being completed earlier this evening. While that was extremely gratifying in itself, now comes the part where publishers typically do the work – the front and back matter. Title pages, table of contents, acknowledgments, preface, and introduction are all essential pieces in creating a finished product. And guess what? Since I’m the publisher for my own book this time around, I get to figure this all out on my own. Fun.

Fortunately, it isn’t rocket science, as every book follows a general framework that I can learn from and mimic to the best of my ability. Of course, there are other little details when you elect to create two versions of a book, one for print and one for Kindle and other e-readers. Creating bookmarks for each and every one of 175 pages is just one of the steps I’ll need to take for the e-version, but the goal is to make the book as easy to use and polished as possible, so this step is essential.

I’ve previously shared earlier versions of the season content. Here’s a glimpse of how the summary sections will appear:

The goal is to have both versions available by the 20th of this month, with the companion book (1969-2013 pennant races) likely to appear in March 2014. Just in time for the holidays, so maybe I can actually spend a couple weeks without creating charts, copying formulas, and building PDF files. Or I could get an early start on downloading the 2013 data I need for volume two. Just don’t tell my family what I’m up to.

FacebooktwitterredditpinterestlinkedinmailFacebooktwitterredditpinterestlinkedinmailby feather
FacebooktwitterlinkedinrssFacebooktwitterlinkedinrssby feather

Pennant Race Book Sample

For someone who’s never written a book, trying to get two books out in the same year has proven to be a unique challenge. I have to say I’m grateful that one of them has the structure and support of an established publisher, which provides me with a good bit of guidance as well as a timetable to work from. On the other hand, self-publishing the other volume allows me to stretch out a bit, make more frequent revisions, and eventually get to the book I envisioned 12 months ago.

So, on at least the fourth format revision (who’s counting?), I believe the main section of the pennant race book is now set, and only requires the updating of each season’s data to feed the template. Sometimes time away from a project leads to better solutions, as it did in this case, with some new formulas, improved graphics, and a greater degree of automation. I like the results, and hope that others will as well.

Here’s a quick look, and I’ve also provided an attached file (.pdf) if you wish to download a few seasons and get a feel for what I’m trying to accomplish. In the near future, I’ll have a legend page that will explain all of the charts you see on each page. Trust me, they do make sense if you know what it is you’re seeing!

Much more to come over the next 2-3 months as I try to get the book launched this summer/fall.

FacebooktwitterredditpinterestlinkedinmailFacebooktwitterredditpinterestlinkedinmailby feather
FacebooktwitterlinkedinrssFacebooktwitterlinkedinrssby feather

Revised Pennant Race Template

Well, I thought I had the final template for my pennant race book a few weeks back, but began tinkering with it, and now have a new version with a bit more visual impact. Here’s a glimpse:

Still have lots of work to do on this book, with my first publishing priority being a book on Gephi and network visualization that I’m currently working on. Once that’s complete, I’ll refocus on the pennant race book and hope to publish it sometime this summer.

FacebooktwitterredditpinterestlinkedinmailFacebooktwitterredditpinterestlinkedinmailby feather
FacebooktwitterlinkedinrssFacebooktwitterlinkedinrssby feather

Updated Pennant Race Template

A few modifications later, here’s the updated (and I think, final) page template for the pennant races book.

Have some useful categories now in the heatmap, and rearranged the center section to be more intuitive. The fun part is that the template uses a dynamic dataset in Excel, so I can go in, make a couple selections, and all the content changes to a new season. Eager to begin assembling these so I can get the book published in the next month or two.

FacebooktwitterredditpinterestlinkedinmailFacebooktwitterredditpinterestlinkedinmailby feather
FacebooktwitterlinkedinrssFacebooktwitterlinkedinrssby feather

Pennant Race Template

Been a long time since my last post, since coming up with a basic page template for my upcoming book has been taking a lot of time and effort. After spending what seemed an eternity on draft versions, and scrapping more than a few that failed the sense check, I believe I’ve got something ready. For those of you not familiar with the concept of dashboards, I’ll defer to the great Stephen Few to explain them. Suffice to say, they provide a one page look at a series of metrics (typically within a business) and allow the user to quickly gauge the state of the business.

IMHO, baseball pennant races are far more intriguing than business metrics, but I do like the dashboard approach, especially for a book covering more than 100 seasons, with two leagues, followed by two, then three divisions. So a layout that can be replicated easily was essential to getting the book completed; create a handful of templates, feed it the selected season data, and we’re well on our way. Here’s a shot of a 90% complete template:

The trick is to keep the dashboard clean while still providing a wealth of data, and making it simple to interpret. A few of the chart types here will not be familiar to most viewers, but are excellent at communicating insights. For instance, the little red and blue charts are called horizon charts, and enable us to show teams that are losing more than half their games in red, while showing winning teams in blue, all while using a very limited amount of space. Ditto for the bullet charts, where we see team performance versus league averages, and box charts that show season-long patterns within a very small space.

FYI – Much of the work done so far is courtesy of the excellent Sparklines for Excel plugin created by Fabrice Rimlinger.

FacebooktwitterredditpinterestlinkedinmailFacebooktwitterredditpinterestlinkedinmailby feather
FacebooktwitterlinkedinrssFacebooktwitterlinkedinrssby feather