Network Data Adventures with Gephi

Posting to the blog has become a luxury recently, what with a summer full of youth baseball, some organizational changes at work, summer home projects, and of course the upcoming Gephi book. I’ve learned that one has to be especially good at finding synergies between projects in order to get everything done. So it is with creating any new work in Gephi while writing the book. Any new projects will necessarily be created while in the process of developing material for the book.

Earlier this year, I created a host of network visuals, one per franchise, showing the relationships between all players who suited up for a given major league baseball team. This data made for some interesting visuals that were fun to explore. What the graphs didn’t do was to provide visual cues about how the players could have been grouped – by decade, position, birthplace, and so on. So the logical evolution was to take this idea and extend it as an example for how to use partitioning and clustering to visually segment a network graph.

Recently I began playing with this idea by looking at a few of these examples, and have included some in one of the book chapters. I’ll use some slightly different cases here to avoid redundancy, but the principles are identical. I’ll walk through an example for how we can extract intelligence from a network graph in a few easy steps, using the Boston Red Sox from 1901 through 2013.

  1. Start with the base graph, having used a layout algorithm to arrange it in some fashion. I used the ARF approach for this example.
  2. Size the nodes in the graph using some criterion, such as the number of games played as a catcher. This will help users to quickly spot the dominant players at that position.
  3. Color the nodes using a categorical variable like decades. In this case, the color will reflect the first decade a player suited up for the Red Sox.

In sequence, here are the three graphs:

Kind of dull – nothing but a lot of identical nodes and their connections. Let’s apply sizing based on the number of games played as a catcher:


Now that pops a few things! We have some easy starting points to work from. How about coloring the nodes by decade to see if that adds to the story:


Hmmm. Maybe this gives us some additional insight as well. Certain decades are split amongst multiple catchers, while in other cases we have a single dominant player. Of course we would want to allow the user to identify each of these cases (for example, the large green node at the top left is Jason Varitek) through some labeling or interactivity.

So you get the idea for how a couple simple tweaks can change the way we view a graph. I’ll be using a similar approach in the book to help readers create powerful stories with their own data.

FacebooktwitterredditpinterestlinkedinmailFacebooktwitterredditpinterestlinkedinmailby feather
FacebooktwitterlinkedinrssFacebooktwitterlinkedinrssby feather


Ken Cherven is the Founder and Curator of the website. He loves to merge baseball data with all sorts of visualization methods - charts, network graphs, maps, etc. to provide greater insight into underlying data patterns. Ken also authors books about baseball and visualization, and loves to listen to jazz while drinking some wine, craft beer, or bourbon.