-
Recently came across yet another innovative project that uses open source technology to drive knowledge sharing and productivity. This one is called SHIVA - from the University of Virginia. A brief descrip from their website:
SHANTI Interactive Visualization Analytics (SHIVA) is a web application that takes a new approach that makes to easily use graphical and data-driven elements within websites. Elements such as data, charts, maps, images, timelines, and video are easily created in this freely available HTML5-based web-app.
Instead building all these elements in-house, we drew inspiration from David Winberger's Small Pieces Loosely Joined: A unified theory of the web, and have provided a simple and consistent interface to open source and open access tools on the web, such as Google's Visualization Toolkit and Maps, YouTube, Vimeo, and Kaltura videos, the SIMILE timeline from MIT, and images from ARTstor, Flickr, and Picassa.
Here's a chart done on their site and embedded here:
and a motion chart:
More to come, as I start to use this fun tool with baseball stats.
-
I'm continuing to learn more about the capabilities of Rickshaw (and d3, the underlying platform) and how effectively it can depict time series data. Last post I used an example directly from the Rickshaw site, with a promise to start getting real baseball numbers in future displays. So here we go, with a real basic time series showing wins per season for the Yankees and Red Sox. Note the big dips for both teams in 1981 and 1994 - the strike seasons where fewer games were played.
FYI - this is a work in progress, so you may notice some functionality improving as I tinker with this - like converting the dates to a simple year value versus the timestamp info currently displayed.
-
This is a simple test using the Rickshaw charting library based on d3 (which I have written about previously). Hover over the data points to see associated values, or click the check boxes to toggle the 3 data series on and off. Pretty cool stuff - more evidence of the simple power of javascript and CSS for visualization purposes.
I look forward to doing a few creative charts populated with baseball data using this toolkit.
-
The folks at datavisualization.ch have come up with a slick page featuring some of their (and my) favorite analysis and visualization tools available today. Here's a peek at some of them - click on the image to go to the site and begin checking out any or all of the tools:
Many personal favorites are here, including d3, Protovis, and GeoCommons, along with a few that are new to me. Javascript is perhaps the dominant technology in these selections, but there are other options as well. Take a look, and start experimenting.
-
Browsing through the eyeo festival site to see exactly which sessions I want to sit in on, and wanted to feature a handful of those that are most relevant to my life and work. These choices are necessarily geared toward the visualization and data analysis spectrum, versus the more art/music/coding presentations that will be shared, although many of those appear to be fascinating as well.
Aaron Koblin has a session titled 'Data Arts', described as 'An overview of some recent projects and libraries created by members of the Google Data Arts Team.' Google is always up to something on the data front, so it will be interesting to see and hear what's new.
Fernanda Viegas and Martin Wattenberg of ManyEyes fame will discourse on 'Seeing Invisible Influences'. This sounds like a potentially fascinating topic as the eyeo site describes it: 'We’ll talk about how we use visualization to spark the joy of revelation–mapping the invisible forces that surround us, from social networks to the play of the wind. To sweeten the pot, we’ll show embarrassing outtakes from our design process.'
Amanda Cox of the New York Times will hold court on 'Complex, Big, Etc.', described as: 'So many of the words used to describe contemporary data visualization are so often very wrong. An examination of the claim “The future has an ancient heart,” through the lens of NYT graphics.' The NYT produces some of the best static graphics out there, so it will be interesting to hear her views.
Nicholas Felton will talk about data storytelling in his session 'A Man of Few Words'. Visual storytelling is a powerful tool when executed properly, so I look forward to learning some things here. His cursory description of the session: 'A survey of recent experiments with quantitative storytelling, the resulting projects and processes.'
'Near/Far' is the title of Jer Thorp's session, and sounds fascinating as well as relevant to all data analysts & visualizers. 'In this session, Jer will share a variety of new work that explores the concept and experience of location. He’ll show projects that engage with local, personal data, as well as visualizations of systems of astronomical size. He’ll discuss the importance of engaging with the character of data sets, and will share a variety of strategies and techniques for working with locational data. Along the way, he’ll share all kinds of tips and techniques, and probably tell a fair number of bad jokes.'
Manuel Lima heads up a panel discussion titled 'The Power of Networks'. Netwsorks have assumed a critical role in analysis and visualization circles in recent years, largely due to the emergence of social networks like Facebook and Twitter. Here's the description: 'Network visualization has experienced a meteoric rise in the last decade, bringing together people from various fields and capturing the interest of individuals across the globe. As the practice continues to shed light on an incredible array of complex issues, it keeps drawing attention back onto itself. This talk will explore a critical paradigm shift in various areas of knowledge, as we stop relying on hierarchical tree structures and turn instead to networks in order to map the inherent complexities of our modern world. The talk will also showcase a variety of captivating examples of network visualization and introduce the network topology as a new cultural meme.'
The following morning, Moritz Stefaner will also address networks in 'OMG - It's All Connected'. The summary: 'Once again, Moritz will report from his practice as a Truth and Beauty Operator. This time, he will focus on the visualization of large networks – an important, but also difficult endeavor. We will learn how to avoid the notorious hairball visualizations, which promising new layout strategies have been developed, and how interaction can help to untangle intertwined interconnectedness in complex data sets.' Should be a great session.
Finally, Wes Grubbs will present on 'Generative Cognition and Memory', a session designed to explore the inter-relationship between humans and data. '“The pure and simple truth is rarely pure and never simple” – Oscar Wilde. Our comprehension and understanding of our surroundings and new information result from amazing processes within our brains. While Wes is far from a neural scientist, he will explain some of the inner workings of the human mind and how we can use this to visualize information, build user interfaces, explore and question everything in order to make sense of the perceived realities. While this path is anything but simple, Wes will provide creative-and technically-oriented minds with a machete of thought to hack through the complex jungle of story telling with data.' Great stuff.
While there are many other though provoking topics and presenters, these are the ones most likely to stand out for me, get my brain engaged, and lead to future creative bursts. Can't wait!
-
The next few weeks are likely to have a number of posts on the 2012 eyeo festival in Minneapolis. I've already managed a few posts, and with the festival another 6 weeks away, there are sure to be a handful of others, as I learn more about the skills and achievements of many of the presenters.
Today, though, I want to look at some of the venues where the festival will take place, as they are likely to play a pivotal role in the interactions that take place both within the structured events as well as in any informal or post-event gatherings.
First up is the Walker Center, Minneapolis' acclaimed modern art venue, where the daytime activities will be held.


The Walker should certainly present a stimulating atmosphere for the advanced technology and innovative ideas to be shared during the daytime sessions.
The Tuesday night setting is Aria At The Jeune Leune - an event space that formerly housed a celebrated theater company. This is a very industrial sort of space in a former warehouse within Minneapolis' Warehouse District near the Mississippi River. The exterior:
Aria will host the opening night keynote from Museum of Metropolitan Art (MOMA) Senior Curator Paola Antonelli, the favorite curator of the visualization crowd due to the influential exhibitions she has put together at MOMA. This promises to be a grand kickoff to this year's festival, with a mixer and social time to follow the keynote. Given that the Warehouse District is home to some great food and beverage options, the evening could run into the wee hours.
The Varsity Theater in Dinkytown, near the University of Minnesota plays host to the Wednesday evening keynotes and mixer. The Varsity has become renowned as a great spot for local artists as well as touring bands, and even as a wedding venue.
Thursday evening, the action moves to the Nicollet Island Pavilion, smack dab in the middle of the Mississippi River.
More to come on this great event, both leading up to, and of course, after attending.
-
Got a first look at the daily eyeo festival sessions, and have some hard choices coming up. There are so many talented folks presenting at this event - it's almost overwhelming.
For starters - Kevin Slavin or Shantell Martin? Ben Fry or Ayah Bdeir? Aaron Koblin or Manuel Lima? And that's just the first few hours of Day 1 of a 3 day roster of awesome talent. Not counting the opening night keynote from Paola Antonelli, Senior Curator in The Department of Architecture and Design at the Museum of Metropolitan Art (MOMA) in NYC.
Whew!
For a visualization junkie, this festival is akin to a hoops fan seeing every game of March Madness, with great seats to boot. Can't wait! More to come as eyeo approaches.
-
Finally, after what seemed like months (it was really only a few weeks...) I have a new site up and running, if not fully baked. The most difficult part of the site creation was the selection of a CMS (Content Management System) that provided most, if not all, of the features I wanted and was no longer getting through Liferay, my site framework for the last few years.
So what motivated the change? Simply, that Liferay had become nearly impossible to blog with, and to update anything seemed to take forever, when it worked at all. I liked Liferay, but it simply was running too slow, and was quite honestly more than I need for the future. Bye Liferay - more good memories than bad, although the frustration level had really ramped up in the last year. I needed a nimble, flexible, lightweight site framework that still has more power than basic blogging software.
The tryouts began - Elgg, Hotaru, ModX, Elgg again, back to Hotaru, then ImpressPages, and Elgg yet again. I cast glances at Pligg, Dolphin, and a few others, and thought briefly about Drupal and Joomla, which I've used in the past. Something always seemed to be not quite right for what I needed, so I kept switching and testing, playing, and switching again.
Finally, I recalled Symphony, a lightweight yet highly flexible and extensible tool I had looked at a few months earlier. My previous reaction was "cool, but not sure how to get it started", so I gave it little further thought. Until last week, when I looked at it again, did some reading, and found an 'ensemble' the rough equivalent of themes for other CMS tools. Then it all started to click...Symphony made it exceptionally easy to create new blog entries - and new pages - and to embed other apps seamlessly in iFrames (one of the reasons I had chosen Liferay). And it did all these things quickly, plus it looked good, used Google web fonts, and allowed for almost limitless flexibility, most of which I have yet to tap into.
So Symphony it is, and will likely stay for a long while! Bear with me while I get everything up to speed, including adding new features, porting old blog entries, and figuring out the endless possibilities. Welcome to the new, faster, more flexible Visual-Baseball Project site.
-
Think I'm finally done tweaking the template for the series of Batting Explorers built using the Simile Exhibit framework. Since the last post, I've managed to modify a few things, including much cooler filtering capabilties, courtesy of a "pop-up" style filter panel. This provides the added benefit of greater display space for the individual batter "cards", which can now spread across the screen much more effectively.
A look at the filter panel overlaid on the results:

These are incredibly easy to create once the design was finished, as evidenced by the fact that there are currently five different explorers available to play with, with more to come. Each new one takes between 5-10 minutes to run the data and tweak the data links and references. For now, there are explorers for the 1950s, 1960s, 1970s, 1980s, and 1990s.
Ultimately, another 6 or 7 will be online.
While a tutorial may be in the offing, why not just start playing with these. They're quite intuitive once you get going. Just go to the Collections page, and look for the Simile Exhibits collections, or any of the individual Batting Explorers.
-
Almost done with the template for my series of Batting Explorers, where users can manipulate more than 30 filters to discover and uncover batter data by team, season, position, and much more. Results can be sorted by more than 20 variables as well, making it a flexible tool for quickly locating information.
The latest addition to the template is the ability to click on a batter's card to view his full stats at the great Baseball-Reference site.
Once this one is complete, it will be quite simple to create versions for other decades (note - the original is the 1980s) going all the way back to 1900 - maybe even earlier.

-
For those who don't know, eyeo is a visualization festival that takes place in Minneapolis this June. According to all reports, last year's inaugural festival was an amazing success, leading to this year's rendition selling out in about 6 hours!

The festival lineup is off the charts for anyone in the visualization space, with notable visionaries from around the globe presenting their slice of viz - whether it comes in the form of traditional data visualizations, or interactive art, music, or some combination of each. The attendees list is also quite impressive, based on the names I've seen thus far.
I'm looking at this as an opportunity to learn, share, and make contacts in the visualization space. The event schedule seems geared to bringing people together both during and beyond the official events, and with folks coming from all over the globe, it figures to be a very exciting event.
Unfortunately, I have to wait until June, but that gives me plenty of time to plan and to learn more from some of the great talents in the field.
-
I'm currently at work on the first Batting Explorer, an interactive tool where viewers can filter season level batting stats using more than 25 different filters. This project is being built using the great Exhibit tool, currently housed at simile-widgets.org. Within the Collections page on this site, you can find more than 50 game level explorers built using Exhibit; it is a pretty remarkable tool to work with, and I see more future use for it on this site.
It appears that there will be a batting explorer for each decade, as too much information slows the response time, and thus defeats the beauty of using javascript rather than a database to feed the pages. Here's a look:

Obviously, once the batting series is complete, I'll feel the urge to do a pitching series along the same lines, followed by whatever seems compelling at some future date.
As soon as the Batting Explorer is launched, I'll try to follow with a video tutorial to illustrate all the possibilities, but it is also easy to just jump in and start playing with all the filters to get a sense for the power that Exhibit provides.
-
OK, so this is not a baseball post, but it does involve visualization, and it employs the great Exhibit toolkit from the MIT Simile project. The Information is Beautiful website, led by David McCandless, hosts monthly visualization challenges for data and visual geeks like myself. Trust me, there are some exceptional folks out there turning seemingly mundane information into incredible visuals.
Anyhow, the data provided for the current challenge is movie-based, with loads of information on movies released between 2007 and 2011 - genre, story, profitability, budget, audience score, and much more. Here's a screenshot of what I did with the info:

Entrants can use any part of the info, or all of it. Given that my entry is competing in the interactive division, I chose to show most, if not all of the information, by building a 'Movie Explorer' using Exhibit. Take a test drive here - personally, I find it a bit addictive and fun to play with, but maybe that's just me speaking as a proud parent. Caution - it seems to work best in recent versions of Chrome and Firefox - my version of IE has problems with the scripts.
-
A couple weeks back I began tinkering with interactive bar charts using the d3 javascript charting toolkit. d3 stands for Data Driven Documents, and is the latest from the great Mike Bostock, former developer of Protovis, another great charting tool.
Anyhow, one of the samples on the d3 site used a drillable bar chart, where users can start at an aggregate level and then dig into the details by clicking on a given bar - a simple concept that is beautiful when d3 is the platform.
This has led me to my first chart, with more to come. The initial example looks at home runs by team for the 1995-2010 seasons, and begins at the season level. Once a season is selected, we drill into homers at the team level, and then at the individual batter level. It's fast, intuitive, and a heck of a lot more esthetically pleasing than your typical report output from any one of dozens of high-powered reporting tools.

To see it in action, click here
-
I've previously posted on Orange, the wonderful open source project for statistical analysis and visualization. My latest foray involves using a variety of charts to examine success patterns by team, as measured by wins in a season. We'll talk first about scatterplots.
Scatterplots, for the uninitiated, are basically two dimensional charts that allow users to see the relationship between two elements - say, hits on one axis and runs on the other. They can be a great tool for quickly spotting correlations between elements (e.g.- are more hits consistently associated with more runs over the course of a season). In Orange, we have the luxury of adding 3rd and 4th elements to the picture, using the color of the markers, as well as the size of each marker. Here's an example, using BB and HR on the axes, and wins for the size and runs for the color of the markers.
Note that the sizes of the circles are not proportional to the number of wins, but rather scale according to the range of the data. This is helpful in spotting patterns, but could be considered a distortion if one adheres closely to Edward Tufte's advice.
The scatterplot enables us to spot some easy correlations; notice that virtually all of the teams with high walk totals (x-axis) are also big winners - the Yankees, Rays, Braves, and Red Sox were all at 89 or more wins for the season. So apparently, with the exception of the Diamondbacks, walks were a good predictor of wins (we're not accounting for the pitching side of the equation here, which derailed the Diamondbacks).
On the Y-axis, home runs also appear to have a positive relationship to wins, although perhaps slightly less than walks. And woe be to any team with low positions on both axes - this is where the worst teams in baseball wound up in 2010, including the 57 win Pirates, the 61 win Mariners, and the 66 win Orioles. No on-base ability, coupled with no HR power, seems to be a strong predictor of failure, even without knowing anything about pitching for these teams.Let's change the axes for another view; we'll focus on defense in this case, putting Errors on the x-axis and double plays on the Y.
Notice any patterns? Our underachieving teams are heavily skewed to the lower right of the chart, committing lots of errors with few double plays. The pattern is perhaps less obvious than in our first example, but it is intuitive. We also see a pair of teams, the Tigers and Braves, who appear to have overcome some of their proneness to errors by turning a lot of double plays. At the other extreme, we see the Giants, with very few double plays but also the fewest errors in MLB, reaching 92 wins in spite of their low run scoring ability. Obviously, pitching helped, but it appears their defense provided relatively error-free support to their mound corps.Finally, we'll take one more view - home runs (HR) versus home runs allowed (HRA).
Once again, there appears to be a fairly clear distinction, with the lower half of the display littered with the weaker teams that tended to surrender more homers than they hit themselves. Teams with a large positive home run differential (upper left of the chart) tended to have high win totals, although there were a few teams (Phillies, Rangers, Rays) with low differentials who still fared well, due to their strong pitching corps.These are just a few examples created with Orange; I'll get these and many others out to the Collections page in the site as time permits. In summary, scatterplots, regardless of the user tool, are a great way to quickly view patterns in the underlying data, and the impact on a dependent variable such as wins.
-
In this post, I want to take a little deeper look at d3, the javascript-based charting library from Mike Bostock of Protovis fame. For those who like to view information that is intuitively, accurately, and esthetically well presented, d3 offers a marvelous range of possibilities.
I previously spoke on Calendar Views, a supremely intuitive means of viewing information that can be measured at a dialy level; today I'll look at a host of other approaches that I find most compelling for their potential to work with the baseball data that is my focus. Each chart type will feature a screenshot from the d3 site, as I haven't applied any of these approaches to the baseball data just yet.
OK, time to look at a few more chart types, with my brief synopsis of each.
Streamgraphs are an esthetically pleasing way to show information that might typically be shared via an area chart. However, as with all the d3 charts, there is far more flexibility available to the chart creator, albeit with some added complexity. The results, IMHO, are worth the trouble, compared to the traditional lifeless output we see from Excel or worse yet, Powerpoint charts. I have created thousands of Excel charts, and employed most of the available tricks and hacks, but there are still significant limitations to what one can do; for Powerpoint, forget about it as a useful charting tool.
Here's an example from the d3 site:

Not quite what we're used to seeing from Excel, is it? While the eye candy aspect is pleasing, I refuse to use charts only on that basis. Lord knows there are plenty of charts out there that have instant eye catching appeal, but they are the equivalent of Britney Spears versus the John Coltranes or Mozarts one can create with d3. All fluff, no substance, and destined to be banished to the scrap heap with all the other inaccurate, data distorting charts that preceded them.
As for potential uses with baseball data, I envision some historical tracking of hits data (singles, doubles, triples, home runs) for starters, and will doubtlessly find other relevant examples to share in the future.
Another chart type I intend to use is the Scatterplot Matrix. While this chart type is certainly not exclusive to d3, the ability to thoroughly customize the output makes its use extra appealing. This chart type fits into the category known as "small multiples" coined by the legendary Edward Tufte, and provides the user with a considerable wealth of information in a limited space, while simultaneously making the information easy to grasp.

I haven't figured out how to use these with baseball stats just yet - certainly there could be a multitude of ways to do so. Perhaps looking at multiple offensive categories from a batter's career would be one application; the individual data points could represent specific seasons or age levels associated with each category.
In any event, the scatterplot matrix, as well as other types of small multiples, are exceptional at providing large volumes of information in a limited space, making it easy for the observer to detect patterns and relationships - and isn't that the goal of most data display?
The Sunburst is part of a category of relationship-oriented displays available in d3, with the commom goal of displaying relationships within a data set. In the case of the Sunburst, the data is presented in a more hierachical visual form (versus a treemap, for example), while providing immediate insights into the interrelationship between entities within the data.

Once again, the d3 esthetics are most impressive, but the key is that the underlying data is not distorted or mis-represented by the beautiful display. We are quickly able to see which sub-elements flow up to a primary element, as well as which of the larger categories dominate the data, and thus the display.
I am certain that this display could apply to many different baseball stats; my job is to find where it makes the most sense from a viewer's perspective.
Lastly, for this post, is the bullet chart, created by visualization guru Steven Few, and since added to a number of graphing toolkits and libraries. I have used these for several years within Excel, and was pleased to see their availability in d3, for they provide a compact, intuitive look at target-based data.
By target-based, I mean that we can set a target level - for example a .900 OPS, and see how specific players measure up to that goal. Bullet charts provide the ability to show the target, actual performance, and projected performance, all framed by relative performance levels (i.e.- poor, average, good, etc.). Here's the d3 example:

Many baseball applications exist for the bullet chart - we could create a pitcher dashboard showing how a single pitcher compares to his career and league averages across a variety of statistical categories - WHIP, ERA, SO/9, etc.
The same would apply for batters, where we could rate them versus their career numbers, league averages, and so on, using batting average, OPS, OBP, and any number of other stats.
Once again, we have the potential for a "small multiples" type of display, where all the information the viewer needs is provided in a single page or screen view.
Well, that's it for now, but I'm certain to blog on other chart types within d3, as there are many more. Give it a look for yourself, even if just to understand the possibilities - d3 examples.
-
d3 is a relatively new javascript charting tool from Mike Bostock, one of the creators of the outstanding Protovis project, no longer under active development. But d3 has picked up the mantel, and made a number of significant improvements over Protovis, in my estimation.
Foremost among the improvements for me is the introduction of several new chart libraries, one of which I want to focus on here.
The Calendar View provides an ingenious way to show datasets that would typically be displayed in a line chart, due to the sheer volume of data points. The d3 calendar view helps us display daily data by year, month, and day, while simultaneously displaying results in a weekly grid. In addition, the color coding provides a quick and intuitive way to see patterns in the data. Here's an example, from the d3 site:
http://mbostock.github.com/d3/ex/calendar.html
The d3 example is built using stock market data, but the same approach can be used for any daily results - in a baseball context, this could be runs scored by game, winning (or losing) margin in each game, or even the number of pitches thrown per game by the starting pitcher.
I'll look at some of the other d3 libraries soon, and begin incorporating them into the Collections page on the site.



