Data Visualization, Aesthetics and Intuition

As I worked through a just completed project chronicling the diverse musical career of Neil Young, some valuable (if unintended) insights were reinforced once more. I work on a regular basis with a variety of large datasets that require analysis, interpretation, and ultimately visualization and presentation. Often, these goals are not easily reconciled, which leads to unsatisfactory results across one or more of these factors.

As much as we as analysts need to depict the data accurately and meaningfully, if we don’t do so with an attractive visual approach we risk not having our message get communicated at all. Merely presenting our data in a table may technically get the job done, but is also likely to bore the reader to tears while simultaneously failing to deliver the key messages. At the other extreme, we can pull out individual bits of the data and spend our time creating flashy infographics that may capture attention but fail to represent the data in its proper context. All flash, no substance. Neither approach is terribly effective.

At the same time, we may present all of the information using a reasonable visual approach that preserves the integrity of the data while still falling short of creating a fulfilling user experience. This is what I recently experienced with the Neil Young project, as I’ll detail below.

After spending a few days getting the data from the AllMusic site into Excel, and eventually as node and edge files into Gephi, it was finally time to create the network data visualization. I was determined to attempt one of the many force-based methods used in network graph analysis to create the graph. These methods are very popular and useful for creating graphs out of a variety of data networks, allowing viewers to see the larger patterns at work within the data.

After a few iterations, I wound up with a serviceable graph that covered most of the basics I spoke of earlier – all the data was exposed, element types were sized and color-coded for easier interpretation, and the project was navigable via the web. Here’s a look:


Not bad, but there was something nagging at me as I viewed it, tweaked it, played with the styling, and so on. Everything was technically fine, but something was missing. So back I went to Gephi to find the answer. The next day, it occurred to me – I was using the wrong approach for the type of data I was trying to depict. Where the force-directed approach is ideal for dense, social media type networks, this was a unique network that didn’t possess the same structure. Therefore, it was not as aesthetically appealing or as intuitive as it could be.

After iterating through a few approaches, I came across a winner that best exploits the structure of the underlying data while conveying a far more intuitive feel to the end user. Why not have Neil at the center of the graph, surrounded by all of his albums, ordered by release date? On top of this, I could then have the style and mood data form an outer ring, as they needed only to link to the albums in some fashion. Now we have something that conveys the same information as the first attempt, but in a much more pleasing layout relative to this dataset. See for yourself:


The new version addresses the issues of aesthetics and intuition where the first graph fell short. All moods and styles are now easily found; the same is true for all albums. Highlighting a single mood (or album) also provides an information-rich view for how the music changed over periods in Young’s career. This was nearly impossible to see in the initial layout.

So the message is this – visualizations not only don’t need to sacrifice aesthetics and intuition in order to be effective; rather, they should take advantage of these attributes to increase their appeal and impact. Don’t be afraid to experiment until you find the right formula, as it seldom presents itself the first time around, and trust your instincts.

FacebooktwitterredditpinterestlinkedinmailFacebooktwitterredditpinterestlinkedinmailby feather
FacebooktwitterlinkedinrssFacebooktwitterlinkedinrssby feather

Mastering Gephi Book Update

Lest anyone think of me as a full-time author, rest assured that the likes of J.K. Rowling are not trembling in fear. Even if I had the ability to conjure up creative plots, I type too damn slow to make it as a full-time literary lion. Fortunately, I don’t have to depend on my keyboarding skill (or lack of) as a full-time pursuit. Which brings me around to my topic – the current book I’m authoring on Gephi.

For those not exposed to networks and network analysis, Gephi is a French-based open source project that makes it possible for all sorts of users (including moi) to create interesting graphs from connected datasets. By connected I am referring to data where the individual nodes are connected in some way, shape, or form. This could be anything from movie actor databases, Facebook friend networks, baseball player connections, and so on. Anyone with a spreadsheet full of data and a bit of effort and persistence can use Gephi to create cool looking graphs that also tell a story of some sort.

My job in writing the book is to help people make sense of all the features and capabilities within Gephi, some of which are a bit complex to master. In the process, I get to learn more about the theory behind network analysis, and with it terms such as contagion, diffusion, clustering, and homophily. It’s really fascinating if you’re into understanding how people and institutions interact, contagion processes function, or how product adoption can be affected by the structure of a network. My higher math skills are not good enough to be at an academic level with this stuff, so I have to compensate with some logic and visual acuity.

Anyhow, here’s some of the stuff that Gephi can create:




I’m hoping that one of these images will serve as the book cover come publishing time, which should be sometime this fall. In the meantime, I have six more chapters to write (of 10 total), and will have the added joy of working through chapter edits where others catch the mistakes I’ve made.

FacebooktwitterredditpinterestlinkedinmailFacebooktwitterredditpinterestlinkedinmailby feather
FacebooktwitterlinkedinrssFacebooktwitterlinkedinrssby feather

Off to the 2014 Eyeo Festival

Once again, I am blessed to be heading back to the Eyeo Festival in beautiful Minneapolis, my third consecutive year attending. My excitement has been a bit more measured this time around, although the last few days have been filled with anticipation at hearing, seeing, and pssibly metting some of the best visualizers the world has to offer.
Eyeo Festival
There are a few repeaters from Eyeo 2013 and even Eyeo 2012 who I can’t wait to see again, including the estimable Mike Bostock of d3 and Protovis fame, Nicholas Felton (he of the annual Feltron Report), and Martin Wattenberg and Fernanda Viegas. This year’s lineup also sports a few newcomers who are doing fantastic work, including Santiago Ortiz, Burak Arikan, Cesar Hidalgo, and Eric Rodenbeck. For a full roster of speakers, click here: Eyeo speakers.
Plus, at what other venue could you possibly have a keynote speech delivered by an 85-year old pioneer of algorithmic art? Eagerly anticipating my annual week of mind expansion, all done with nothing stronger than local Minnesota beer.

FacebooktwitterredditpinterestlinkedinmailFacebooktwitterredditpinterestlinkedinmailby feather
FacebooktwitterlinkedinrssFacebooktwitterlinkedinrssby feather

3 More MLB Network Graphs

Getting rolling now, using a templated approach to create a handful of franchise graphs, with many more to come. The first five cover the Tigers, Cubs, Red Sox, Dodgers, and Giants, showing all the connections between players from 1901-2013 within each franchise’s history. All credit is due to Gephi, the ARF layout, and the Chinese Whispers clustering algorithm. Data is courtesy of Sean Lahman’s baseball database. I’m merely the conductor who gets to bring these great tools together.

Here’s the roster if you want to go to a single graph, or you can go to the network graphs gallery on my website:

Check them out and let me know what you think.

FacebooktwitterredditpinterestlinkedinmailFacebooktwitterredditpinterestlinkedinmailby feather
FacebooktwitterlinkedinrssFacebooktwitterlinkedinrssby feather

Final Network Graph MLB Template

After what seemed an eternity (in reality, just 10 days), I’ve settled on a template and formula for depicting the player networks for MLB teams dating back to 1901. Throughout this process, the hometown Tigers have been my trial balloon to see if and how this idea would work. I’m happy to report that the idea not only works, but it makes for a beautiful (and highly addictive) interactive graph.

After several days of testing a variety of graph algorithms, I’ve landed back at the ARF method used for the Octavio Dotel graphs created earlier this year. There’s something about a circular layout that is visually appealing and informationally dense at the same time. Players are clustered by color, reflecting the primary peer group they belong to, although many will connect across two or more groups. The size of each player node reflects the number of seasons played with the team. Alan Trammell and Ty Cobb have large nodes, while Eddie Miller has a very small node, reflecting his single season in a Tigers uniform. Check it out for yourself: UPDATE: Node Sizes not behaving as planned – still tweaking

Tigers Network Graph

To play with the live version, click here.

It took awhile to get a satisfying result, but after setting a few parameters in Gephi and tweaking some options I’m thrilled with the graph. Now I’m poised to do the same for all MLB franchises, using the same settings to allow each franchise’s patterns come to the fore. I’m eager to create the entire series of graphs, and to start assessing the differences and how they relate to team success patterns.

FacebooktwitterredditpinterestlinkedinmailFacebooktwitterredditpinterestlinkedinmailby feather
FacebooktwitterlinkedinrssFacebooktwitterlinkedinrssby feather

Octavio Dotel Network Draft 2

A few days ago I shared a post about creating a network graph detailing the many travels of former Tigers pitcher Octavio Dotel in his Major League Baseball career – 13 franchises over a 15 season span. I now have a live graph on my website, complete with a search box, hover capabilities, and easy clicking on links to narrow the graph to a manageable number of nodes.

Gephi provides the base functionality for the network creation and the excellent Sigma-js plugin converts the original network to a highly interactive web-based one. All I have to do is get the data into Gephi, choose a suitable algorithm, and maybe tinker a bit with the style settings in Sigma, and presto! a slick graph is created. Here’s a static look, but to really appreciate the beauty of the interaction, navigate to the Dotel visualization. The live version lets you mouse scroll to resize the image so you can zoom in for greater detail, in addition to the other navigation functions already mentioned. I’m not done yet, as some more data elements are coming, but the basic look and feel should remain unchanged. Enjoy!

FacebooktwitterredditpinterestlinkedinmailFacebooktwitterredditpinterestlinkedinmailby feather
FacebooktwitterlinkedinrssFacebooktwitterlinkedinrssby feather

The Amazing Octavio Dotel – Draft 1

For whatever reason, I recently had an epiphany about creating a data viz that would track the career of Octavio Dotel, the veteran pitcher who has managed to pitch for nearly half the franchises in Major League Baseball. Between 1999 and 2013, Dotel pitched for no fewer than 13 teams, including multiple seasons where he pitched for two or even three teams in the same season. No shortage of potential angles for this one – number of teams, number of other pitchers he pitched with, how many of his former teammates are still active, etc.

After a few days of thinking about this, and making sure my data was up to date, I finally began creating a graphic using Gephi, the open source network graphing tool I recently authored a book on. Over the course of several posts, I’m going to share what could be considered (if I were an artist) as sketches leading to the final work.

So here comes the first ‘sketch’. Given the desire to create a graphic that is easy to view digitally (as opposed to a print version), I quickly determined that an algorithm that created some sot of circular graph would be best. After some experimentation, I chose the ARF (Attractive and Repulsive Forces) method using Gephi. Like many network algorithms, ARF draws similar nodes together while pushing unrelated nodes farther apart, resulting in a graph that is not only visually striking but also quite intuitive. I’ll talk more about that as the graph evolves toward a finished state.

With that said, here’s the initial take:

Octavio Dotel Draft 1

Now, for some explanation of what you’re seeing. In the center (the largest blue node) is Octavio Dotel; since the graphic is about his connections, it’s only fair he gets top billing. The next level of nodes are depicted by slightly smaller circles. These represent each of the teams Dotel has performed for, with a single node for each season. The teams are color coded to resemble the actual team colors. If you look at the top center of the graphic, you will notice five identically colored circles, covering the five different seasons Dotel toiled for the Houston Astros, from 2000-2004. Most of the teams will have just a single season, while a few others have two.

Beyond the team level, you will see a few hundred smaller nodes, each one representing a single pitcher Dotel crossed paths with over the course of his career. Some of these nodes will be positioned between teams, indicating that a pitcher was part of multiple teams with Dotel.

As I continue to work on this, I’ll add some notation, reference salient features in the graph, and do some additional color coding at the pitcher level. Stay tuned.

FacebooktwitterredditpinterestlinkedinmailFacebooktwitterredditpinterestlinkedinmailby feather
FacebooktwitterlinkedinrssFacebooktwitterlinkedinrssby feather

Gephi Book Now Available!

I’m pleased to announce that my first book has been published (thanks to all at Packt Publishing!) and is now available online.

Network Graph Analysis and Visualization with Gephi provides a gentle introduction to the world of network graph visualization using Gephi, a powerful open source tool. In this post, I’ll walk you through a few examples from the book to illustrate how you can begin creating your own network graphs with Gephi.

Before diving into any specific examples, I want to give you an idea of what the book covers, so here’s the Table of Contents:

  • Preface
  • Chapter 1: Installing Gephi
  • Chapter 2: Creating Simple Network Graphs
  • Chapter 3: Exploring Additional Layout Options
  • Chapter 4: Creating a Gephi Dataset
  • Chapter 5: Exploring Plugins
  • Chapter 6: Advanced Features
  • Chapter 7: Deploying Gephi Visualizations
  • Appendix: Network Visualization Resources

While this book makes no claim to covering everything you can do with Gephi (not even close!), it does provide the reader with a broad and accessible overview, while also addressing some of the basic concepts and terminology of network graph analysis.

Here are a few excerpts from a companion article for the book; you can also download a sample chapter from the book page at Packt.

“Gephi is a versatile and powerful tool that will help you create simple network visualizations quickly, while also providing the capabilities to build complex graphs based on large datasets. In this article, you will learn some of the fundamentals of Gephi and network visualization, which will rapidly empower you to create your own graphs…”

“Network graphs are essentially based on the construct of nodes and edges. Nodes represent points or entities within the data, while edges refer to the connections or lines between nodes. Individual nodes might be students in a school, or schools within an educational system, or perhaps agencies within a government structure…”

“Network graphs are drawn through positioning nodes and their respective connections relative to one another. In the case of a graph with 8 or 10 nodes, this is a rather simple exercise, and could probably be drawn rather accurately without the help of complex methodologies. However, in the typical case where we have hundreds of nodes with thousands of edges, the task becomes far more complex…”

“Gephi is an ideal tool for users new to network graph analysis and visualization, as it provides a rich set of tools to create and customize network graphs. The user interface makes it easy to understand basic concepts such as nodes and edges, as well as descriptive terminology like neighbors, degrees, repulsion, and attraction. New users can move as slowly or as rapidly as they wish, given Gephi’s gentle learning curve…”

So if you or anyone you know is interested, navigate to the book’s page, where you’ll find more information, including a sample chapter, as well as links to a number of book sellers. Thanks, and happy visualizing!

FacebooktwitterredditpinterestlinkedinmailFacebooktwitterredditpinterestlinkedinmailby feather
FacebooktwitterlinkedinrssFacebooktwitterlinkedinrssby feather

A New Post! And a New Book!

Realize I haven’t posted in over a month, but think I have a legit excuse, beyond the usual spring busy-ness with soccer, baseball, kendo, etc. I’ve been spending lots of time on a book project (not the baseball book) about Gephi and network visualization. The book should be out later this summer, giving me two books to be released in 2013.

Gephi is a terrific open source project focused on creating network graphs, the sort you often see with social network data. Basically, you have nodes that represent a person or other entity, coupled with lots of connections (edges) to other nodes. Sounds simple, but the possibilities are almost endless.

Network graphs have become one of the top visualization categories over the last 5-10 years, and Gephi enables users to create them without having to do a lot of coding or other customization. However, it does provide loads of options that allow for very complex and fascinating graphs that can be deployed as either static or interactive web projects.

Currently working on chapter 3 (of 7, plus an appendix), and can’t wait to see the final book. Stay tuned.

FacebooktwitterredditpinterestlinkedinmailFacebooktwitterredditpinterestlinkedinmailby feather
FacebooktwitterlinkedinrssFacebooktwitterlinkedinrssby feather

MLB Trades Network Map

Took a little break from the book to create a new infographic on trades between major league baseball teams over the last 100+ seasons, from 1901-2012. Network maps are often put to use analyzing and depicting connections within a social network like Twitter or Facebook, but can be used in many other instances, as this example shows.

The key concepts in network maps are nodes and edges; nodes are the connection points in a network (the teams in this case), while edges are the connections between nodes, showing a level of activity or connectedness (number of trades in this case). Have a look:

Network maps are often among the most elegant data visualizations, bordering on the artistic while still providing insight into the underlying data. In some cases, the maps are interactive, making them even more useful. At some point, I’ll offer some of those up, but in the meantime, take a look at the work of Jan-Willem Tulp and Jerome Cukier. These guys are creating some of the best work I’ve seen, highlighted by Tulp’s voting display from the 2012 Dutch elections, and Cukier’s interactive Paris Metro display. Fantastic work!

To view or download a PDF of the MLB trades graphic, click here.

FacebooktwitterredditpinterestlinkedinmailFacebooktwitterredditpinterestlinkedinmailby feather
FacebooktwitterlinkedinrssFacebooktwitterlinkedinrssby feather