Retrosheet 2022 Data is Here!

One of the primary source data sets I use to create baseball visualizations is the amazingly detailed information captured by the Retrosheet project, a dedicated group of volunteers providing play-by-play and game level information for each MLB season. They have recently passed the 100-year milestone, with data from the 1921 & 1922 seasons now available. I have some catching up to do on the older seasons, but just downloaded the 2022 season for adding to my databases.

The data comes in two distinct sets – game logs being the much easier of the two to work with, due to the smaller data size. Each game played in a season is captured at a summary level (~ 2,400 records), with information pertaining to the score, players, umpires, attendance, and much more. This information is used to feed my game summary visualizations:

2007 Game Summary results

As you can see, these are bite-sized summaries of every game, showing some of the important summary data for a game. They can be filtered to find specific teams, pitchers, scores, and much more. These visualizations are currently available covering the 1955-2019 seasons; one of my immediate goals is to add the 2020, 2021, and 2022 seasons, before starting to work in reverse with pre-1955 campaigns.

Fortunately, I have lots of SQL code built up over the years to make the data update process fairly simple; the 2022 game logs have already been added, and now I’ll get to work on the play-by-play data. Stay tuned for updates, and thanks for reading!

FacebooktwitterlinkedinrssFacebooktwitterlinkedinrssby feather
FacebooktwitterredditpinterestlinkedinmailFacebooktwitterredditpinterestlinkedinmailby feather

Warsaw Data+ Presentation

I have the good fortune to be the keynote speaker at this year’s Data+ conference in Warsaw, Poland on November 26th, so the traditional American Thanksgiving meal will not be in store for 2015. This is a very exciting opportunity, and comes on the heels of having presented in Boston at the Data Visualization Summit in September 2015, so it’s been a busy last few months getting presentations squared away.

My topic at the conference is Data Driven Storytelling, where I’ll walk the audience through some of my approach and philosophy about using data visualization to deliver information and insights about specific topics. In addition to the talk, I’ve created a story on my site that chronicles the last 21 seasons of play in the Ekstraklasa, the top level of play in Polish football.

Thus far it has been an absolute joy working with the folks at IDG/Computerworld, who are responsible for running the event. Patrycja Kuriata, Program Director for the conference, has been incredibly responsive and helpful with any questions or details, and has made the entire process a pleasurable one.

I’m putting the wraps on my content as October comes to an end, and look forward to visiting Poland in a few weeks, and reporting back on the conference as well as on the few days of sightseeing in my plans.FacebooktwitterlinkedinrssFacebooktwitterlinkedinrssby feather
FacebooktwitterredditpinterestlinkedinmailFacebooktwitterredditpinterestlinkedinmailby feather

Data Visualization, Aesthetics and Intuition

As I worked through a just completed project chronicling the diverse musical career of Neil Young, some valuable (if unintended) insights were reinforced once more. I work on a regular basis with a variety of large datasets that require analysis, interpretation, and ultimately visualization and presentation. Often, these goals are not easily reconciled, which leads to unsatisfactory results across one or more of these factors.

As much as we as analysts need to depict the data accurately and meaningfully, if we don’t do so with an attractive visual approach we risk not having our message get communicated at all. Merely presenting our data in a table may technically get the job done, but is also likely to bore the reader to tears while simultaneously failing to deliver the key messages. At the other extreme, we can pull out individual bits of the data and spend our time creating flashy infographics that may capture attention but fail to represent the data in its proper context. All flash, no substance. Neither approach is terribly effective.

At the same time, we may present all of the information using a reasonable visual approach that preserves the integrity of the data while still falling short of creating a fulfilling user experience. This is what I recently experienced with the Neil Young project, as I’ll detail below.

After spending a few days getting the data from the AllMusic site into Excel, and eventually as node and edge files into Gephi, it was finally time to create the network data visualization. I was determined to attempt one of the many force-based methods used in network graph analysis to create the graph. These methods are very popular and useful for creating graphs out of a variety of data networks, allowing viewers to see the larger patterns at work within the data.

After a few iterations, I wound up with a serviceable graph that covered most of the basics I spoke of earlier – all the data was exposed, element types were sized and color-coded for easier interpretation, and the project was navigable via the web. Here’s a look:


Not bad, but there was something nagging at me as I viewed it, tweaked it, played with the styling, and so on. Everything was technically fine, but something was missing. So back I went to Gephi to find the answer. The next day, it occurred to me – I was using the wrong approach for the type of data I was trying to depict. Where the force-directed approach is ideal for dense, social media type networks, this was a unique network that didn’t possess the same structure. Therefore, it was not as aesthetically appealing or as intuitive as it could be.

After iterating through a few approaches, I came across a winner that best exploits the structure of the underlying data while conveying a far more intuitive feel to the end user. Why not have Neil at the center of the graph, surrounded by all of his albums, ordered by release date? On top of this, I could then have the style and mood data form an outer ring, as they needed only to link to the albums in some fashion. Now we have something that conveys the same information as the first attempt, but in a much more pleasing layout relative to this dataset. See for yourself:


The new version addresses the issues of aesthetics and intuition where the first graph fell short. All moods and styles are now easily found; the same is true for all albums. Highlighting a single mood (or album) also provides an information-rich view for how the music changed over periods in Young’s career. This was nearly impossible to see in the initial layout.

So the message is this – visualizations not only don’t need to sacrifice aesthetics and intuition in order to be effective; rather, they should take advantage of these attributes to increase their appeal and impact. Don’t be afraid to experiment until you find the right formula, as it seldom presents itself the first time around, and trust your instincts.

FacebooktwitterlinkedinrssFacebooktwitterlinkedinrssby feather
FacebooktwitterredditpinterestlinkedinmailFacebooktwitterredditpinterestlinkedinmailby feather

Off to the 2014 Eyeo Festival

Once again, I am blessed to be heading back to the Eyeo Festival in beautiful Minneapolis, my third consecutive year attending. My excitement has been a bit more measured this time around, although the last few days have been filled with anticipation at hearing, seeing, and pssibly metting some of the best visualizers the world has to offer.
Eyeo Festival
There are a few repeaters from Eyeo 2013 and even Eyeo 2012 who I can’t wait to see again, including the estimable Mike Bostock of d3 and Protovis fame, Nicholas Felton (he of the annual Feltron Report), and Martin Wattenberg and Fernanda Viegas. This year’s lineup also sports a few newcomers who are doing fantastic work, including Santiago Ortiz, Burak Arikan, Cesar Hidalgo, and Eric Rodenbeck. For a full roster of speakers, click here: Eyeo speakers.
Plus, at what other venue could you possibly have a keynote speech delivered by an 85-year old pioneer of algorithmic art? Eagerly anticipating my annual week of mind expansion, all done with nothing stronger than local Minnesota beer.FacebooktwitterlinkedinrssFacebooktwitterlinkedinrssby feather
FacebooktwitterredditpinterestlinkedinmailFacebooktwitterredditpinterestlinkedinmailby feather

2014: What’s Next?

As we prepare to enter a new year, I’ve been doing some thinking about what I can create for 2014. I’ve already committed myself to a companion book to my recently completed pennant race book, with the new volume to cover the 1969 through 2013 seasons. So that’s a given, and should actually be a bit easier than the first volume, now that the basic template has been created. I want to create something that goes even further with the visualization and baseball marriage, and have come up with an idea to do just that. Read MoreFacebooktwitterlinkedinrssFacebooktwitterlinkedinrssby feather
FacebooktwitterredditpinterestlinkedinmailFacebooktwitterredditpinterestlinkedinmailby feather

MLB Birthplaces by Decade

I created a little visualization using Tableau Public that looks at the birthplace patterns by decade for Major League Baseball players. As you scroll through, you can see the various migrations, first from East to West, then to the South, and eventually to places like the Dominican Republic and Venezuela. Viewing these birthplaces really drives home the changes we have witnessed in Major League Baseball over recent decades.

Tableau Public, for those of you not familiar, allows users to upload data and create visualizations of various types, ranging from bar charts to maps. All content can be shared across the user base, leading to even more creative output.

You can find the viz here or in my Portfolio section under the Mapping menu. Enjoy!FacebooktwitterlinkedinrssFacebooktwitterlinkedinrssby feather
FacebooktwitterredditpinterestlinkedinmailFacebooktwitterredditpinterestlinkedinmailby feather

Sorry MicroStrategy: Back to Excel

A few recent posts have documented my explorations with the new desktop software from MicroStrategy, clearly designed to compete with the likes of Tableau and Excel. One of the chief advantages of the MS offering is the free price point (as in $0), versus the much pricier Excel and especially, Tableau. As usual, I had to perform my due diligence, as is the case with every new tool I get my hands on.

The MS Analytics offering does have a lot to recommend it by, as I noted in my previous posts. However, after a few weeks of intensive exploration, I have come to the conclusion that it isn’t a good fit for what I’m currently attempting to do, which is to create graphics for a pair of upcoming books. While it is perhaps easier to manage the data compared to Excel, the structure is a bit too rigid, and the chart options are also too structured and limiting for my current needs. It’s still useful, but not so much in the current context.

Which brings me back to Excel. For all its faults, Excel is still very powerful, and most importantly for me, very flexible. I can hack my way into almost any sort of chart, aided by the likes of Jon Peltier, Chandoo, and Fabrice Rimlinger. So Excel it is, at least for this project.

Now that I know what I’m doing, it’s high time to get back to work and deliver these books! Look for the 1901-68 pennant races in December, and the 1969-2013 to follow in early 2014.FacebooktwitterlinkedinrssFacebooktwitterlinkedinrssby feather
FacebooktwitterredditpinterestlinkedinmailFacebooktwitterredditpinterestlinkedinmailby feather

MicroStrategy Analytics Update

A few days ago, I stumbled across the free analytics offerings from MicroStrategy, as detailed in my last blog post. As you may recall, I had begun dabbling with the desktop version (there is also the online Analytics Express, with some slight differences), and promised to report back with further insights into the strengths and weaknesses of the tool. So here I am, in front of a Friday night fire (contained within our fireplace) in chilly Detroit, after having spent much of the day working with Analytics Desktop, or AD, as I’ll refer to it for the remainder of the post.

On balance, I’ve been favorably impressed with AD, although there may be a selfish motivation to take advantage of what AD can do. Given that I am in the midst of preparing a couple of highly visual baseball books to come out later this year and in early 2014, I saw an opportunity to tap into AD to create some of the visuals for the book. So I was sincerely hoping that I would like it, and that it could help make it easier for me to create certain portions of the book that would be far more laborious using Excel or other tools.

So with that out of the way, let’s walk through an analysis of the strengths and weaknesses of AD, using examples whenever possible. Let’s start by getting the weaknesses out of the way, and then move on to the longer list of strengths AD brings to the table.

Weaknesses and shortcomings:

  1. The big one – AD uses Flash for all charts and dashboards within the app. Given that Flash is on the way out as a technology, this seems a curious choice. Perhaps the folks at MicroStrategy had a team of Flash developers sitting around, making it easier to launch the product quickly. Clearly, javascript has taken over from Flash in the data viz universe, for a multitude of reasons, so using Flash is not going to wow anyone who’s familiar with d3, Protovis, or a handful of other open source libraries.

  2. Next is the use of a java server to run the app – based on the default port (8082), this feels like a Tomcat instance built into the application. This means that it takes some time to launch the app and have it load in your browser. To AD’s credit, it has run flawlessly on my machine, and was a breeze to install. Still, the combination of a java server and Flash may feel a bit awkward to a desktop user familiar with offerings from Tableau or other data viz vendors. It certainly will for the Excel crowd.

  3. For someone familiar with d3, or for that matter Tableau or Excel, AD will feel a bit constrained in terms of options; for example, it is a bit trickier to get colors to do what you want (a right click will get you the nearly meaningless Flash option settings). As someone with extensive Excel and Tableau experience, this is the most challenging element for me. Do not expect the same sort of capability from AD, although there are some options for customizing your charts and tables. In some ways, this makes the app feel outdated – modern visualization tools provide a great deal of flexibility by comparison.

Those are my three major observations after a few days of use, with number three covering a wide range of options that are either not available or that have been hard coded into the app, thus restricting or limiting your ability to create the chart of your dreams. Now, on to the positive stuff:

  1. Connecting to my data was an absolute breeze, at least after jumping through the Windows ODBC hoops (32 bit vs 64 bit). Once I figured out the correct ODBC executable on my machine, I was off to the races, and had a database connection within seconds. I use MySQL, so of course I needed the correct driver installed locally, but no issues there. After these steps were complete, I was able to view all my database tabs, and then write some simple SQL code in AD to bring back the data I needed. Once connected, here’s what I saw:

  2. The GUI is easy to navigate as well, showing the available dashboards, including samples to get you started, as shown here:

  3. Charts are very attractive, and can be easily re-sized within a dashboard; in fact, charts are easier to re-size here than in a comparable Tableau dashboard, where you need to set up frames in addition to the charts. Here are some example charts that may wind up in one of the books:

  4. A number of options are available for each chart, using a menu-driven approach (definitely not javascript here!)

  5. AD also has a ‘Page By’ option which is great for cases where you want to replicate the same charts or tables across multiple instances of a variable. In my case, this could be by team or by season. Set the page up once, set the page by variable, and you instantly get the same charts populated with data specific to the individual page. Pretty slick! Tableau has a similar feature, and you can use Excel pivot table functionality in the same way, although I find the AD approach to be more powerful and simple.

  6. Exporting to a PDF or image (.png) format is also very simple. From my perspective, the image export is excellent, as it captures the entire panel view of multiple charts as one image.

  7. Finally, it’s very easy to create multiple panels within a single page layout, as well as to add additional layout pages. These features make for a cleaner, easier to navigate feel versus the many tabs used in Tableau or Excel.

  8. That’s my take for now – all in all, AD is a great addition to the data viz toolkit, even with the Flash-based limitations. It won’t get you to some of the same places you can go with Excel, Tableau, or especially d3, but it is at least their equal in data handling and dashboard layout possibilities. If you work with .csv, Excel, or database data, give it a try!

    FacebooktwitterlinkedinrssFacebooktwitterlinkedinrssby feather
    FacebooktwitterredditpinterestlinkedinmailFacebooktwitterredditpinterestlinkedinmailby feather

A New Analytics Tool from MicroStrategy

Being the analysis geek that I am, I’m always on the lookout for anything new in the data analysis and visualization space. New insights, techniques, people, datasets, tools, etc. always intrigue me and help keep things fresh. Any time I come across something new makes it a rewarding day, especially when it leads to something that can help me with my baseball analysis.

So today was one of those days where I stumbled across a new tool, courtesy of a new blog I also stumbled upon. The new tool comes courtesy of MicroStrategy, one of the mainline Business Intelligence (BI) players in the industry, and is called Analytics Desktop, and quite remarkably is a free tool (as in $0.00!). So of course I had to evaluate the latest (released October 2013) addition to this interesting space.

My previous impression of MicroStrategy was tepid at best, given my familiarity with their large scale BI installations for major corporations, including one I had previously used in the corporate world. It was highly structured, felt inflexible, and churned out canned reports that took too long to run. In short, I saw it as a dinosaur app, even several years ago, competing with the likes of Cognos and other major BI players in the world of operational reporting.

The new tool is clearly designed to compete with Tableau and other nimble, visually-oriented BI vendors who understand that the business analysis space has evolved (far) beyond applications that are controlled by the IT folks. Many of the old apps wind up gathering dust or are used to generate dull reports that shed little insight into the needs of the business. Good analysts have been bypassing those tools for years by dropping ad hoc data into Excel or (more recently) Tableau, where they can at least create some decent charts and tables without having to wage battles with unfriendly BI servers.

Enough of the background – what’s the early verdict? In a word, impressive! I’m only a couple hours in, but have managed to complete the installation, configure the connection to my databases, and begin playing with data. The user interface is easy to navigate, the charts are clean and nicely styled, and the ability to work with filters, pages, sorting, and more appears to make this a very powerful app. While some of my favorite chart types are not here – horizon charts, bullet charts, and a couple others – my first impressions are hugely favorable.

I’m going to give a more complete review soon, after I’ve had time to work through some of the ideas I had been intending for Excel. In the meantime, here’s a quick look at a dashboard I created using team level data.

Pretty sweet, isn’t it? The ability to combine multiple chart and data elements into a dashboard is one of the strengths of Analytics Desktop, and one that I expect to tap into in the coming weeks. Much more to come on this one.

FacebooktwitterlinkedinrssFacebooktwitterlinkedinrssby feather
FacebooktwitterredditpinterestlinkedinmailFacebooktwitterredditpinterestlinkedinmailby feather

Pennant Race Book Sample

For someone who’s never written a book, trying to get two books out in the same year has proven to be a unique challenge. I have to say I’m grateful that one of them has the structure and support of an established publisher, which provides me with a good bit of guidance as well as a timetable to work from. On the other hand, self-publishing the other volume allows me to stretch out a bit, make more frequent revisions, and eventually get to the book I envisioned 12 months ago.

So, on at least the fourth format revision (who’s counting?), I believe the main section of the pennant race book is now set, and only requires the updating of each season’s data to feed the template. Sometimes time away from a project leads to better solutions, as it did in this case, with some new formulas, improved graphics, and a greater degree of automation. I like the results, and hope that others will as well.

Here’s a quick look, and I’ve also provided an attached file (.pdf) if you wish to download a few seasons and get a feel for what I’m trying to accomplish. In the near future, I’ll have a legend page that will explain all of the charts you see on each page. Trust me, they do make sense if you know what it is you’re seeing!

Much more to come over the next 2-3 months as I try to get the book launched this summer/fall.

FacebooktwitterlinkedinrssFacebooktwitterlinkedinrssby feather
FacebooktwitterredditpinterestlinkedinmailFacebooktwitterredditpinterestlinkedinmailby feather