2021 Data is Here!

Happy day! Just finished uploading the 2021 baseball dataset from the Lahman baseball archive and Baseball-Databank, just in time for the 2022 season. Next step is inserting and updating the existing tables (with data back to 1901!) with the 2021 season stats. I can then move on to the fun side of the equation – updating existing visualizations and creating some new analyses and visuals. Stay tuned!

FacebooktwitterlinkedinrssFacebooktwitterlinkedinrssby feather
FacebooktwitterredditpinterestlinkedinmailFacebooktwitterredditpinterestlinkedinmailby feather

Welcome to 2022!

I for one am looking forward to 2022 after a couple of interesting, often challenging years affected my desire to generate interesting analytics and data visualizations. The less said the better – simply excited to get back to updating some existing visuals and adding a host of new ones.

I’ll be doing a lot of work using the Exploratory toolkit which keeps improving by the day. It is simply a great tool for handling large (or small) data sets from start to finish; I especially love it’s data wrangling capabilities.

On the data source side, Retrosheet and the Lahman database will continue to feed my analysis and visuals; none of what I create would be possible without these great resources. Retrosheet data (used for game level and play level detail) is already updated through the 2021 season; part of this year’s plan is to add older years (pre-1955) to my local database. The Lahman data (season level) is typically available around February and I’ll be downloading it to my databases at that time.

Stay tuned for updates throughout 2022 – they should be a lot more frequent than the last two years. Happy New Year!

FacebooktwitterlinkedinrssFacebooktwitterlinkedinrssby feather
FacebooktwitterredditpinterestlinkedinmailFacebooktwitterredditpinterestlinkedinmailby feather

Visual-Baseball Project Site Updates Continue

Updates to the Portfolio section of the VBP site continue, in an effort to reverse some lost functionality in the wake of one or more WordPress updates. The plus side of this setback is that the updates allow us to introduce a more easily maintained infrastructure with improved usability. Users can now search and scroll through content links while also accessing pages through an enhanced menu system.

Here’s an example of the new Portfolio menu:

Selecting one of the menu items will take you directly to a relevant page, now composed of a brief intro as well as an example of the visualization type, as seen here:

The lower half of each page will now have a searchable list format, with both a link and a description of the associated content. Users can also adjust the pagination settings to show the desired number of links to view at one time:

These enhancements should make it far easier to navigate the site and view the desired content. Once these are complete, I can begin to deliver new content; the 2018 game and season files will soon be upon us, and much of the content is itching for an update. In the meantime, enjoy what’s already here!

FacebooktwitterlinkedinrssFacebooktwitterlinkedinrssby feather
FacebooktwitterredditpinterestlinkedinmailFacebooktwitterredditpinterestlinkedinmailby feather

Recapping 2017

Observers of this blog will note that posts were scarce in 2017 – in fact this is the only one, and it’s being completed in 2018! This is the result of a variety of causes, including external projects, busy schedules, and focus that was shifted in other, unrelated directions. Still, 2017 was not without its moments.

For starters, I managed to create three data visualization courses for Packt:

Learning Data Visualization

Data Visualization Techniques

Advanced Data Visualization

Retrosheet data for the 2016 and 2017 seasons has also been downloaded, and is in the update process as we speak, which will enable some new visualization work (and perhaps a new book title) in 2018. Soon, annual season data from the Baseball-Databank and Sean Lahman will be available as well.

I’m also in the process of launching a new site at jazzgraphs.com, where I’ll use network visualizations to uncover the complex web of relationships between jazz musicians, labels, and recordings. Posters and a book are in the plans for 2018, so stay tuned.

Wishing all a happy and prosperous 2018, and I promise more content to come this year!

FacebooktwitterlinkedinrssFacebooktwitterlinkedinrssby feather
FacebooktwitterredditpinterestlinkedinmailFacebooktwitterredditpinterestlinkedinmailby feather

Visual-Baseball Site Refresh

This one wasn’t in the plan, at least not so quickly, but I decided to go ahead and do a site update with a new theme. The new layout has a little more visual style, particularly on the landing page, and provides greater flexibility in some key areas. While the site has been updated, all content remains available, and seems to be easier to get to using the menu tab, archives, and recent posts.

The next step is to produce more content, which I’m eager to do…after I get caught up on my chapter submissions for the Gephi book. Some upcoming posts may deliver some insight into using Gephi, or into network graph analysis in general, but that’s not a certainty at this point. In the meantime, take a look at the site, and let me know your thoughts.

FacebooktwitterlinkedinrssFacebooktwitterlinkedinrssby feather
FacebooktwitterredditpinterestlinkedinmailFacebooktwitterredditpinterestlinkedinmailby feather

New Year – New Site

The holidays and New Year always bring thoughts of what we can do better for the future. Traditionally, this has been in the form of resolutions – to lose weight, to exercise more frequently, drink less coffee (tea, beer, etc.). For me, the holidays are when I take the time to assess what I have planned (or would like to achieve) for the upcoming year, and how I can create more time to do some things that will grow me personally and professionally. The challenge is to achieve this without detracting from family time or the day job, or even sleep time. In short, how do I get more time out of the existing hours in the day? Read More

FacebooktwitterlinkedinrssFacebooktwitterlinkedinrssby feather
FacebooktwitterredditpinterestlinkedinmailFacebooktwitterredditpinterestlinkedinmailby feather

A Better Way to Share My Work

I’m always trying to find better ways to show data, better tools to display the results, and better ways to share any insights. Over the last few years it feels like progress has been made to a much greater degree on the first two, and less on the third. There are now so many great techniques and tools available through the open source community to help accomplish many tasks. In some cases, almost too many (not that I’m complaining – I like complexity!) choices are out there for learning more about information display and actually creating some clever output. I’ve written many times about some of the valuable tools such as d3, nvd3, Orange, Protovis, Sparklines for Excel, R, and so on.

The challenge has always been in the best way to share the results of my explorations. Over the past few years, I’ve worked extensively with Omeka, an open source tool geared to museum-oriented displays using items, collections, and potentially, exhibits. Every visualization I create becomes an item, groups of similar items become collections (say, all pennant races), and then items with a similar theme from different collections can be combined to create an exhibit (or exhibition).

This is all well and good from an archival standpoint, but it sometimes falls short of being the best option for users, so I am constantly seeking other options. Omeka is also designed to house existing content created using other apps, but doesn’t have the pure flexibility of (say) HTML5 for creating interactive charts and visualizations that really rock.

Which is all a rather long-winded introduction to my latest toy, which is a menu-driven site that I hope will fill the void, enabling me to deploy more and better interactive visuals built using d3 and other tools. For now, some of the existing work I’ve created is already on board, albeit with some long overdue face lifts. Check out the Batting Explorer, a semantic filtering tool built using Simile Exhibit that lets you explore a decade’s worth of batting information using a baseball-card style layout:

..and with filters…

How about game summaries, also built using Exhibit. These summaries give you a glimpse of every game played in a season, with filters for pitchers, hits, runs, homers, and much more:

..and with filters…

Some more recent work is also available, such as the collection of more than 350 interactive pennant race charts built on d3 and nvd3:

Obviously, there’s much more to do here, as I’m in the beta version at best. Please go ahead and play with it and let me know if you have any thoughts or comments. Just follow the link to Portfolio. Thanks!

FacebooktwitterlinkedinrssFacebooktwitterlinkedinrssby feather
FacebooktwitterredditpinterestlinkedinmailFacebooktwitterredditpinterestlinkedinmailby feather

Reworking the Analysis Lab

Less than two weeks after upgrading the SpagoBI software that powers the Visual-Baseball Analysis Lab, I’ve decided to move in a different direction. After spending many hours over the holiday weekend examining all the options, I’m electing to move forward using d3 and nvd3 as the primary tools for the future version of the lab.

Here is some rationale behind my decision:

  • SpagoBI, being a java-based platform, has a huge footprint, which means that I need to maintain a Tomcat web server with plenty of memory and storage space. Given the limited usage of the lab over the last two years, this no longer seems like a good tradeoff.
  • It takes too long for the app to launch after logging in. I could spend more time addressing this, but again this is not time well spent. I would rather have an application (or applications) that are fast and easy for site visitors.
  • Java-based platforms, including SpagoBI, Jasperserver, and Pentaho, all lean toward production-oriented reporting. This makes sense, given their corporate audiences, but is no longer the best option for what I hope to accomplish with the VBP site. I need a less rigid model with greater growth potential.
  • d3 and other javascript alternatives provide far more flexibility to create impactful visualizations using an endless variety of chart types. The java apps simply cannot compete on this front.
  • Most of my recent efforts have been created using d3 and nvd3, so it makes sense to leverage these tools even more, and to spend a higher percentage of my limited time using the most effective tools.

I will miss certain elements within Spago, and in the general BI model, such as OLAP cubes and parameterized reports. Perhaps these will reappear in some form in the future. On the flip side, I certainly won’t miss stack errors, re-booting Tomcat when the app crashes, and a few other annoyances that seem to be standard fare with Java. There are still some worthy Java apps, including Spago, but the time has come to move forward. More to come.

FacebooktwitterlinkedinrssFacebooktwitterlinkedinrssby feather
FacebooktwitterredditpinterestlinkedinmailFacebooktwitterredditpinterestlinkedinmailby feather

It's Upgrade Week!

Why stop with one major software upgrade when you can do three the same week? Right on the heels of moving from SpagoBI 3.6 to 4.0 for my Analysis Lab, I just upgraded the Collections portion of the Visual-Baseball site with the latest version of Omeka. As luck would have it, my old, heavily customized theme no longer works with the new version. Not to worry! Thanks to the power of CSS, I was able to tweak a new theme in an hour or two to get approximately the same look and feel. In fact, the new look may be slightly improved. Check it out at VBP Collections.

Here’s a screenshot of the updated Collections site:

The third and final upgrade for the week is the latest version of Orange, one of my favorite stats and visualization tools. Version 2.7 has a completely new look and feel in addition to some nw capabilities. Of course, with Orange residing on my desktop and not the web, this was the easiest and least risky of the three upgrades, but one that I’ve been looking forward to for several months.

I’ll soon report back on the new version of Orange with a walk through on the feature set and the all new GUI. And just maybe I’ll go a few weeks without any further upgrades, but don’t bet your life savings on it!

FacebooktwitterlinkedinrssFacebooktwitterlinkedinrssby feather
FacebooktwitterredditpinterestlinkedinmailFacebooktwitterredditpinterestlinkedinmailby feather

SpagoBI 4.0 Redux – Working!

It’s working! All right, so it isn’t exactly ready for prime time yet, but my Analysis Lab reports and analysis views have made a successful (and remarkably easy) leap to the new version of SpagoBI. I still need to re-apply some cosmetic tweaks, but I’m quite pleased with the new look and feel the Spago folks have brought to the tool.

I especially like the sleek vertical menu down the left side of the screen, as well as the icon choices. The general impression is that there’s more room for the actual reports and analysis views, which is where the emphasis should be. The report parameters are also smartly placed on the right edge of the screen, and other icon-based functions take up very little real estate. So far, so good.

I’ll report more as I tweak the CSS settings, get the content optimized, and have it ready to roll. Next up is an exploration of the new SpagoBI Studio, which should help me to upgrade the content that winds up in the lab. I’m eager to explore the new features in v4.0, which I’ll post about if I get really fired up!

FacebooktwitterlinkedinrssFacebooktwitterlinkedinrssby feather
FacebooktwitterredditpinterestlinkedinmailFacebooktwitterredditpinterestlinkedinmailby feather