Updating Player Networks – Part 2

In our previous post, we looked at how to acquire and load our baseball player data into Gephi. In this second installment, the focus will be on creating a player network graph in Gephi, and customizing many settings to deliver a network graph we can export to the web. Player networks are used to detail the connections between all players who are connected to one another in some fashion. In this instance, it is based on players having played for the same team in one or more common seasons. So let’s begin with the process of creating the graph using our raw data from the first installment.

Importing .csv data into Gephi is quite simple – we create individual node and edge files (as we showed in the previous post), and use the Gephi import functions to pull the data in. I always start with the node file, since it will typically have additional information not included in the edges file. After importing the node data, I then import the edge data, which gives us the information to form our initial graph. If we were to start with the edge file, Gephi will create our node data automatically, and we will not have the detail needed for our graph. This approach may work for simple graphs, but not for our current case.

Once both data files have been imported, we can begin thinking about what we want form our graph. Here are several questions we might pose:

  • How will we use color?
  • What sort of layout will be best?
  • Which measures should we calculate?
  • How should we depict node sizes?

In many cases, the answers to these questions come about through trial and error. We may have some ideas going into the process, but invariably, there will be modifications along the way. So be patient, and be willing to experiment as you create network graphs. The graph you will see in this post went through many of these modifications, which I won’t take the time to detail. Instead, this post will detail my final choices, along with some explanations for why these choices were made. So let’s take a walk through the various facets of the visualization.

Layout

While a network will retain the same underlying structure from a statistical point of view (degrees, centrality, eccentricity, etc.) regardless of our layout choices, it is still important to select a layout that will visually represent the underlying patterns in the network. Otherwise, we could just as well deliver a spreadsheet with all of the network statistics. So layout selection is critical, and often involves an iterative process.

For the baseball network graphs I built in 2014, I eventually settled on the ARF layout algorithm, which ran quickly and created an attractive circular network graph display using the player connection data. Alas, there is no ARF algorithm available for Gephi 0.9.2, so I required a different approach for the updates. Ultimately, this led to a 2-step approach using a pair of layout algorithms – OpenOrd followed by Force Atlas 2. OpenOrd is especially effective at creating a quick layout from large datasets, although with far less precision than some other force-directed approaches. Still, it is a great tool for creating a general understanding of the structure of a network very quickly. Force Atlas 2, is the near opposite of OpenOrd – a very precise approach that can be tweaked easily using the various settings in Gephi. It is ideal for putting the finishing touches on what OpenOrd started.

Here are the settings I eventually settled on for Force Atlas 2, after much trial and error:

Force_Atlas_2

Some of the more important things to note here are the Scaling and Gravity settings. I reduced the scaling to 0.5 so the network would display appropriately in a single window without the need for scrolling. The Gravity setting was increased to 2.5 to force nodes slightly toward the center of the display. The LinLog mode and Prevent Overlap options are also selected in order to make this particular graph more visually effective. For other graphs, I have used the Dissuade Hubs option, forcing large nodes to the perimeter of the graph; in this case, that was not an ideal choice.

Color

The use of color is also important within a network graph display. Color can be used to highlight nuances in the data that distinguish one or more nodes relative to another group of nodes. Often we use color to visually represent clusters within the graph, as grouped using the modularity classes statistic or some similar input. In the case of this series of graphs (ultimately one graph per team), I made a decision to use the official team colors to differentiate each graph. Thus my initial graph for the Boston Red Sox would be based on the two primary hex colors for the current team (these colors do change over time for many teams).

Here are the Red Sox primary colors:

c8102e_Color_Hex_-_2018-06-10_09.15.48 0c2340_Color_Hex_-_2018-06-10_09.16.33

After capturing current team colors in a spreadsheet for easy reference, I used the color-hex.com site to select complementary colors for the Red Sox graph. Using complementary colors allows me to differentiate clusters in the graph while remaining true to the original concept of employing team colors for each graph. So instead of a wide range of colors one would normally see in a Gephi output, I was able to input the complementary colors for each group. Thus, one team color could be used for the graph background, while the other color (and it’s complements) could be used for the graph structure (nodes & edges). We’ll share the effect later in this post.

Statistics

Graph statistics are critical to the full understanding of the structure of a network. While we can view a graph and begin to understanding the general structure of a network, the various statistics will aid and reinforce our initial visual comprehension. Gephi provides a nice range of statistical measures to choose from:

  • Eccentricity (the number of steps needed to traverse the network)
  • Centrality – betweenness, eigenvector, closeness, harmonic closeness (various measures of importance of an individual node)
  • Clustering coefficient (to discern cliques in the network)
  • Number of triangles (a friends of friends measure)
  • Modularity Class (clusters)
  • Degrees (the number of connections)

Sizing

Node sizing is another key element of effective graph design. In this case, there were a few options I could pursue for node sizing – the number of seasons played (I used this in the 2014 graphs), one of the various centrality measures we calculated, or the number of degrees (connections) an individual player possesses. After computing each of these statistics, I eventually decided to use the number of degrees as a representation of influence in the graph. Visually, I want to show how many other players a single individual is related to, and using node size is an effective means of doing so.

Summary

Our final graph in Gephi is shown below; the eventual web-based version will differ slightly and include additional functionality, but that’s for another post.

red_sox_20180610

Next Post

My third and final post in this series will address exporting this graph to the web using the sigma.js plugin, and making some additional customization to the web version. Thanks for reading, and see you soon!

 

 

 

 

 

 

Facebooktwittergoogle_plusredditpinterestlinkedinmailFacebooktwittergoogle_plusredditpinterestlinkedinmailby feather
Facebooktwittergoogle_pluslinkedinrssFacebooktwittergoogle_pluslinkedinrssby feather

Updating Baseball Team Networks – Part 1

A few years back, I used Gephi and sigma.js to create a series of interactive baseball team networks, one for each current MLB franchise. These networks displayed all players through the 2013 season, going all the way back to 1901 for the original American and National League franchises. Now that we have data through the 2017 season, it’s time for an update, not only from a data perspective, but also stylistically. This post will walk through the process of creating one of these networks using Toad for MySQL, Gephi, and sigma.js to create web-based interactive network visualizations.

Here’s a typical network from the 2013 series; the full list of networks can be found here. We’ll use the existing networks as a baseline for the new networks, although a few modifications will be made.

baseball_team_network_2013
a 2013 baseball team network for the Boston Red Sox

Source Data & MySQL Queries

Let’s start our discussion with the source data. Season-level baseball data is available through the seanlahman.com website, in the form of .csv files or Microsoft Access database tables. I use the .csv format, as it can be easily added to existing MySQL databases on the visual-baseball.com server. MySQL also makes it simple to add derived fields through some simple coding. These fields can be utilized later for a variety of activities.

For the purpose of our network graphs, there are a handful of critical fields we want to use. These include the following:

  • playerID, a unique identifier for every player who ever donned a major league uniform
  • player name, which can be used to provide a meaningful reference based on the playerID field
  • yearID, which refers to the season (or seasons) a player suited up for a specific franchise
  • franchID, a unique identifier for each MLB franchise

We also need to do a little manipulation of the source data in our code to deliver our results in the proper form for use in Gephi. This means we need to create two input files – one for nodes, and a second for edges. The nodes will contain information about each player, the number of seasons played for the franchise and the first and last seasons, which may differ from the number of seasons, as players frequently leave a franchise only to return later in their career. Here’s our node code:

SELECT Id, Label, MAX(Size) as Size
FROM
(SELECT bp.playerID AS Id, CONCAT(bp.name, ” “, MIN(bp.yearID), “-“, MAX(bp.yearID)) AS Label, COUNT(bp.yearID) AS Size
FROM BattingPlus bp
WHERE bp.franchID = ‘BOS’ and bp.yearID >= 1901
GROUP BY bp.name

UNION ALL

SELECT pp.playerID AS Id, CONCAT(pp.name, ” “, MIN(pp.yearID), “-“, MAX(pp.yearID)) AS Label, COUNT(pp.yearID) AS Size
FROM BattingPlus pp
WHERE pp.franchID = ‘BOS’ and pp.yearID >= 1901
GROUP BY pp.name)  a
GROUP BY Id
ORDER BY Id;

Here’s the simple interpretation – since we are attempting to display all players for a given franchise, we are executing a UNION ALL statement to combine batters and pitchers into a single result file. We have used the playerID field to create the required Id value for Gephi, while also creating a Label field by combining the player’s name with their first and last years playing for this franchise. Finally, we have created a Size field based on the number of seasons played for the franchise. We can then choose to use this in Gephi to size each node, if we so choose.

We also need to create the edge file for Gephi. In this case, we want to understand how many seasons two players were on the same team. This code is a bit trickier, since we want to show only one connection between two players, since this will be an undirected graph. More on that distinction later. Here’s our edge code:

SELECT b.playerID AS Source, m.playerID  AS Target,  ‘Undirected’ as Type,  ‘ ‘ as Id, ‘ ‘ as Label, count(*) as weight
FROM
(SELECT a.playerID, CONCAT(m.nameFirst, ” “, m.nameLast) name, a.yearID, a.franchID

FROM Appearances a
INNER JOIN Master m
ON a.playerID = m.playerID

WHERE a.franchID = ‘BOS’ and a.yearID >= 1901) b

INNER JOIN Appearances a
ON b.yearID = a.yearID and b.franchID = a.franchID and b.playerID <> a.playerID and a.playerID > b.playerID
INNER JOIN Master m
ON a.playerID = m.playerID

GROUP BY b.playerID, a.playerID
ORDER BY b.playerID

Here we use the Master table to provide player name information, and we also gather the ID information to match the node values. The critical piece in this code is in our join criteria:

INNER JOIN Appearances a
ON b.yearID = a.yearID and b.franchID = a.franchID and b.playerID <> a.playerID and a.playerID > b.playerID

Here we are matching players based on the same season and the same franchise. We then specify that we do not want to connect any player to himself, and that we want only values where the playerID value from our main query is greater than the playerID value from the sub-query. This gives us a single connection between two players, which is what we need for an undirected graph. We then define a Source node (required by Gephi) and a Target node (also required), as well as specifying ‘Undirected’ as the graph type. We leave the ID and Label values empty, and then summarize the number of seasons played together as an edge weight. This value can be used in Gephi to show the strength of a connection between two nodes (e.g.- did they spend one season together, or 10 seasons together?).

After exporting each of these files to a .csv format, we have our source data for Gephi. In Part 2 our focus will shift to creating the network in Gephi.

Facebooktwittergoogle_plusredditpinterestlinkedinmailFacebooktwittergoogle_plusredditpinterestlinkedinmailby feather
Facebooktwittergoogle_pluslinkedinrssFacebooktwittergoogle_pluslinkedinrssby feather

Pennant Race Charts Updated!

The last of my big three annual updates is now complete, as all 2016 & 2017 pennant race charts have been created, and now reside in the Visual-Baseball Project portfolio. These charts are created using NVD3, which is built on top of the powerful d3.js framework developed by Mike Bostock. These tools help make the charts highly interactive, allowing you to see where each team stands at any given point in the season, and also providing the ability to zoom in using a smaller sub-chart beneath the primary display.

The structure of the charts is based on every team’s relationship to a .500 winning percentage – a situation where a team wins exactly as many games as it loses. This structure allows for easy interpretation of the results, as we can see which teams hover near the .500 mark (i.e.- consistent mediocrity), others that rise well above this level, and also those teams that descend far below the breakeven point. Allow me to illustrate these thoughts using the 2017 American League Central division, and my hometown Detroit Tigers, who suffered through their worst season since 2003.

2017_AL_Central

As you can see, the darker orange line representing the Tigers takes a steep dive starting in early August, culminating in a final record 34 games below the .500 percentage. Meanwhile, the rival Cleveland Indians (light blue line) present a near mirror image of the Tigers failure, with a sensational month of September that ultimately lands then 42 games over the .500 break-even level.

Similar charts have been created for the other divisions for both the 2016 & 2017 seasons. In fact, you can now view any season, league, and divisional splits dating back to the 1901 campaigns, a total of 380 pennant races to explore! Find all the pennant race charts here. Have fun exploring, and as always, thanks again for reading!

Facebooktwittergoogle_plusredditpinterestlinkedinmailFacebooktwittergoogle_plusredditpinterestlinkedinmailby feather
Facebooktwittergoogle_pluslinkedinrssFacebooktwittergoogle_pluslinkedinrssby feather

Baseball Game Summaries Updated!

Thanks to some unusually cold and rainy weather, I’ve been able to focus on updating both my source databases as well as some of the visualizations built from the data. That’s a roundabout way of saying that the baseball Game Summary exhibits have been updated for both the 2016 & 2017 seasons. They can be found in the portfolio section of the site by following this link.

As a refresher, the baseball game summaries give you a sort of visual box score for every game played in a season, featuring the line score for the game, winning and losing pitchers, attendance, and much more information pertaining to each specific game. The real power comes from the ability to filter results to find all games that match specific criteria.

game_summary_filter_1

As you can see, there are many available filter options, right down to who the home plate umpire is for every game.

Here’s a quick illustration of how the filters can be used. We’ll filter 2017 results where Clayton Kershaw was the starting pitcher at home, and gave up 4 home runs (a very rare event!). First, we select Kershaw as the Home Starter, and then we open the Visitor HR filter, and select 4 (there’s just one instance). We can then apply these filters to see at which game this unusual event took place.

Kershaw_4HR

Closing the filter window, we see the single game box score returned by our filters:

Kershaw_4HR_game

Ironically, we can see that the Dodgers not only won this game, with Kershaw as the winning pitcher, but that they too hit 4 home runs (Home HR in the box score). We can also see that the Mets struck out 13 times (Visitor SO) and the Dodgers 12 times (Home SO). Must have been a wild day at Dodger Stadium on June 19th for the 43,266 in attendance!

As you can see, a lot of information can be gleaned using just a couple of selections to filter the data. There are nearly endless possibilities for using the filters to return the information that most interests you. So have a look at the game summaries and any other items in the portfolio section. Enjoy, and thanks for reading!

 

Facebooktwittergoogle_plusredditpinterestlinkedinmailFacebooktwittergoogle_plusredditpinterestlinkedinmailby feather
Facebooktwittergoogle_pluslinkedinrssFacebooktwittergoogle_pluslinkedinrssby feather

Batting Explorers Updated Through 2017 Season

I’m pleased to share that the Batting Explorer visualization for the 2010s decade has now been updated with 2016 & 2017 statistics. This ongoing project captures batting statistics at the season level for every major league batter, and visualizes them in a baseball card type of format, as seen below:

Batting Explorer
Batting Explorer

A number of filters are provided to make it easy to browse across a wide range of attributes, including all of the major batting categories:

Filters
Filters

One of my favorite aspects of the Batting Explorer is the ability to link to greater detail by clicking on a specific player card, which will transport you to the Baseball-Reference page for that player:

baseball_ref_1

This project uses the Exhibit project software originally developed years ago as part of the MIT Simile project, as well as a lot of HTML & CSS for styling purposes. Give it a try, and thanks for reading.

Facebooktwittergoogle_plusredditpinterestlinkedinmailFacebooktwittergoogle_plusredditpinterestlinkedinmailby feather
Facebooktwittergoogle_pluslinkedinrssFacebooktwittergoogle_pluslinkedinrssby feather

2017 Baseball Data Updates Coming Soon!

Happy to report that the annual 2017 baseball data has been downloaded from seanlahman.com, meaning it’s time to start the annual updates. This data is a great resource for building many of my data visualizations that are featured in the portfolio section of this site. Likewise, the Retrosheet game event data has also been downloaded, meaning the onus is now on me to run the various upload and update processes for my databases.

Looking forward to putting these resources to work as I make updates to the following data visualizations (among others):

Stay tuned for updates on these and other projects, and please pay a visit to my JazzGraphs site, where the focus is on network graph analysis of jazz artists, labels, releases, and songs. Thanks for reading!

Facebooktwittergoogle_plusredditpinterestlinkedinmailFacebooktwittergoogle_plusredditpinterestlinkedinmailby feather
Facebooktwittergoogle_pluslinkedinrssFacebooktwittergoogle_pluslinkedinrssby feather

Recapping 2017

Observers of this blog will note that posts were scarce in 2017 – in fact this is the only one, and it’s being completed in 2018! This is the result of a variety of causes, including external projects, busy schedules, and focus that was shifted in other, unrelated directions. Still, 2017 was not without its moments.

For starters, I managed to create three data visualization courses for Packt:

Learning Data Visualization

Data Visualization Techniques

Advanced Data Visualization

Retrosheet data for the 2016 and 2017 seasons has also been downloaded, and is in the update process as we speak, which will enable some new visualization work (and perhaps a new book title) in 2018. Soon, annual season data from the Baseball-Databank and Sean Lahman will be available as well.

I’m also in the process of launching a new site at jazzgraphs.com, where I’ll use network visualizations to uncover the complex web of relationships between jazz musicians, labels, and recordings. Posters and a book are in the plans for 2018, so stay tuned.

Wishing all a happy and prosperous 2018, and I promise more content to come this year!

Facebooktwittergoogle_plusredditpinterestlinkedinmailFacebooktwittergoogle_plusredditpinterestlinkedinmailby feather
Facebooktwittergoogle_pluslinkedinrssFacebooktwittergoogle_pluslinkedinrssby feather

Player Ego Network Visualization

Ego networks are an interesting concept within the realm of network visualization using graph analysis, as they allow us to easily see direct connections within the network of a particular individual. Using Gephi, we can navigate large networks using this technique, which enables us to filter and view only those connections relevant to our current criteria. All remaining nodes and edges are simply filtered out from a visual perspective, giving a very clean look at individual networks. The ego network can be set to a depth of 1 if the goal is to show only direct connections, or to 2 or even 3 if our goal is to see the so-called “friends of friends” via indirect connections.

My latest venture uses a network of all MLB players between 1901 and 2015, which consists of a somewhat unwieldy mass of nearly 17,000 players with close to 1.2 million connections. Even when we cluster the results using Gephi’s modularity class option, it is still a difficult network to navigate, both from a visual perspective and a resource allocation viewpoint. Here’s a view of the network as a whole:

mlb_players_20161230

While the modularity class coloring helps identify groups of related players, there is an awful lot of small detail that is not easily discerned, and the graph is computationally expensive, often crashing my version of Gephi if I try to do too many things with the full graph. Fortunately, ego networks are a great way to filter the data for greater understanding of some of the details within the network.

Using the ego network option as a filter, I am able to view the individual network of any player in the graph with ease. Here’s a look at my settings for the Miguel Cabrera ego network, and the resulting network, which is now a very manageable 300 nodes and 11k edges:

mlb_ego_filter_20161230

With a little editing in Gephi, such as increasing the size and adjusting the color for the central node, I can easily create a series of ego networks that can later be exported to a JSON format for use with Sigma.js. These can then be turned into interactive web-based networks quite easily. Here, we change the existing node settings so that the Cabrera node stands out in the graph. First, we locate Cabrera’s record in the data worksheet, and then select the node edit menu option:

mlb_edit_node1_20161230

This then takes us to the node properties, where size and color can be edited:

mlb_edit_node2_20161230

If this step causes some overlap in the graph, we can easily run the Noverlap layout algorithm to optimize graph spacing. Here’s a view of the completed Cabrera network after using Sigma.js and tweaking a few of the config settings:

Cabrera_20161130

As of now, there are five of these ego networks available for viewing on the visual-baseball site. They can be found here. I promise more to come in 2017 as time permits. Update – 25 networks as of 1/15/2017.

Facebooktwittergoogle_plusredditpinterestlinkedinmailFacebooktwittergoogle_plusredditpinterestlinkedinmailby feather
Facebooktwittergoogle_pluslinkedinrssFacebooktwittergoogle_pluslinkedinrssby feather

Updated Batting Explorer Baseball Visualizations

One of my ongoing visualization projects has been the Batting Explorer, a semantic-based discovery concept built using the Simile Exhibit open source tools. I’ve been updating this on a not very timely basis the last few years, but have now caught up through the 2015 season. Of course, 2016 stats will be available in the next 2-3 months, so it will be time to repeat the process once more. For the moment, I’ve just added the 2014 & 2015 seasons into one of the decade-based examples, so you can now search for all batters covering seasons from 1901 through 2015.

The explorers work in much the same way as many travel sites you’ve visited on the web. Each page can be filtered using a wide array of facets (filters) that allow you to quickly narrow down results by team, season, batting category, and a bunch of other options. I’ll show this in a moment. Let’s first start with a basic view of the 2010-15 explorer:

Batting Explorer
Batting Explorer

Each of the Batting Explorers has a consistent look & feel, with the underlying data as the only difference. All individual player-season combinations are laid out in a baseball card sort of format, although you can’t flip them over or get any bubble gum either 🙂 Nonetheless, each card contains a wealth of information, including the number of games played by position, laid out on a baseball diamond. In addition, hovering over a card loads a pop-up summary of the season for each individual batter, as seen here:

Mouseover for Info
Mouseover for Info

An additional benefit comes when you click on a selected card. Every batter card has a personalized link to the massive Baseball-Reference.com site. Here’s what you’ll see when clicking on the Juan Uribe link:

Juan Uribe at Baseball-Reference
Juan Uribe at Baseball-Reference

I mentioned earlier the ability to filter using a wide range of facets. Here’s a glimpse of the many categorical and numerical options present in each Batting Explorer:

Filters
Filters

As you can see, there are dozens of possible filters that can be used. If you want to see only batters with more than 40 home runs in a season, simply select the HR facet and check the conveniently provided ranges. Or how about viewing players from a single team? Simple, using the Team facet. Likewise for filtering by season, number of doubles, stolen bases, walks, strikeouts, and so much more. And of course these filters can be used together to quickly find matching results.

Finally, there are a multitude of sort capabilities, or you can choose to have nothing sorted. If you do wish to choose one or more sort attributes, here are your options:

Sort options
Sort options

Your sorts can be many layers deep – just keep adding variables!

This has been a very brief overview – to learn more, go to the Portfolio section and begin exploring! While you’re on the site, take some time to view the Game Summary exhibits, set up in much the same fashion using Exhibit. Or, if networks are your thing, check out a large collection of franchise player or team trade networks. Hope you enjoy the site, and thanks for reading.

Batting Explorers

Facebooktwittergoogle_plusredditpinterestlinkedinmailFacebooktwittergoogle_plusredditpinterestlinkedinmailby feather
Facebooktwittergoogle_pluslinkedinrssFacebooktwittergoogle_pluslinkedinrssby feather

Major League Baseball Trade Networks, Part 2

Welcome to Part 2 in our miniseries on building baseball (MLB) trade networks with Gephi. In the first post, the focus was on procuring and preparing the data using MySQL. The goal was to create nodes and edges that could be easily imported to Gephi. Gephi does allow for some data manipulation post-import, but I’ve learned from experience to do the main parts of the job with either SQL code or within spreadsheet software like Excel or Calc.

With our data readied for import, we’ll now move on to the more fun parts of the process, where we get to visualize the data and see any underlying patterns. Gephi is an ideal tool for this, as it allows us to try out many different algorithms, especially in version 0.8.2. The newer 0.9 versions are faster, but have not fully caught up on the plugin side at this writing, so options are a bit more limited. One other caveat – I frequently run into Java issues when using Gephi, so save your work often and be prepared to shut down and restart Gephi periodically.

We’ll kick off this part of the process by importing the data, nodes first, followed by edges. The reason I prefer this order is that nodes will be automatically created if we start with the edge file import, and they won’t contain any extra fields you may have added using your database or spreadsheet processes.

Here’s the node import window, showing the appropriate file input:

Gephi node import window
Gephi node import window

Once the node import process is complete, we turn to the edges file, and follow a similar sequence of steps. Here’s our starting point:

Gephi edge import window
Gephi edge import window

After the data has been imported, it’s time to move to the Overview tab, where we’ll see a dense mess of nodes and edges, especially if we have a fair sized dataset. Something like this:

Impossibly dense hairball network
Impossibly dense hairball network

Gephi offers a variety of interesting algorithms, each more or less appropriate based on the underlying dataset. In our case, the dataset is of a moderate size, with more than 8,000 nodes and nearly 62,000 edges. This immediately rules out the use of simple layouts such as the circular algorithms, as it would prove immensely challenging to display, even when we take it to an interactive output. At the other end of the sophistication level lie the force-directed layouts, which apply a significant dose of science and math within their respective algorithms. In Gephi, the Force Atlas 2 is quite popular, but it tends to run very slowly unless coupled with enormous levels of RAM. So where to go with our choice for this data?

I elected to take a two step approach, using the extremely fast (if less precise) OpenOrd algorithm for the original data. This provides a nice view of the network within a few minutes, making it a good starting point for our next steps.

Trade network with OpenOrd layout
Trade network with OpenOrd layout

The goal of this exercise was to create team level graphs, which will each have a small subset of the entire dataset. One easy way to achieve this is to use the Ego Network filter to select a single team and its connections. Setting the ego network to a depth of 1 limits the display to only first degree connections; in this case players traded to or from our selected team.

Ego network with depth = 1
Ego network with depth = 1

Once this step has been taken, we can then refine the display by applying another algorithm; in this case I have chosen the Yifan Hu option, and adjusted the settings until they created an aesthetically pleasing graph. The Yifan Hu adds further precision within each of the team graphs, and provides them with a common look & feel inasmuch as their respective data allows.

We have now completed our basic graph creation in Gephi, and can output our results to a variety of output formats. Our choice here is to create a GEXF file, which we can then plug in to an existing template. We do have another step with respect to the GEXF data. In order to relate the graphs back to their respective teams, I chose to apply official team colors to elements in the graph. Specifically, each node should reflect the individual team; we want the edges to remain the same across all graphs so that users have a common understanding for the types of connections between players and teams. So to update the node colors, simply use a code editor that can perform batch updates. I typically work with Brackets for this task, but choose your tool of choice. Here’s a view of the GEXF output prior to applying color changes:

GEXF nodes with original colors
GEXF nodes with original colors

Now here’s the updated version that reflects the Tigers navy blue coloring:

GEXF nodes updated with team colors
GEXF nodes updated with team colors

Once this step has been completed, I can upload the files to the web server, where other minor changes can be made. These include any updates to the CSS styling, adjustments to the config.js file, and minor changes to the index.html file so the proper team information is displayed. The easiest way to do this is to create a common set of directories for the basic javascript and CSS files, leaving only the individual config, html, and gexf files in each team’s directory.

After a bit of massaging the config and CSS settings, the result is visually appealing as well as highly functional. Here’s a zoomed in look at the Toronto Blue Jays network on the web:

Blue Jays trade network close-up
Blue Jays trade network close-up

All of the networks can be found by clicking the link below, with new ones being added until all teams have been represented:

Trade Networks

If you want to see each network in its own window or tab, use the right-click options in Firefox or Chrome.

I hope this has been helpful, and please feel free to reach out via the LinkedIn or Facebook Gephi groups, or leave a comment on this site.

Thanks for reading!

Facebooktwittergoogle_plusredditpinterestlinkedinmailFacebooktwittergoogle_plusredditpinterestlinkedinmailby feather
Facebooktwittergoogle_pluslinkedinrssFacebooktwittergoogle_pluslinkedinrssby feather