Political Contributions Network

Hi – I just launched another network project courtesy of Gephi and Sigma.js, my two favorite tools of the moment. You can find it here, or in a full web version here. This one, like its immediate predecessor, is founded in politics, and more specifically in tracking political contributions – who gives to whom. The paths in this network detail thousands of political candidates, and the many PACs, corporations, foundations, and trade associations that help fund their campaign efforts. Of course these connections also create a sort of influence network that could never be achieved by individual voters, and help explain why so many decisions are made that run counter to the will of the people.

While this one doesn’t focus on dollar amounts, it nonetheless paints a compelling picture for how political influence is meted out. Fringe candidates, frequently outside the embedded American two party system, are depicted near the perimeter of the graph, receiving little or no support from most major donors. Incumbent Democrats and Republicans, on the other hand, are situated at the center of the network, receiving contributions from dozens or even hundreds of PACs, unions, corporations, and trade associations.

Here are a few screenshots from the graph, which is fully interactive through the use of filters, scrolling, zooming, and panning, thanks to the wonders of javascript via Sigma.js. First up is a shot of the full network:


The multiple colors reflect the multitude of political parties (yes, beyond the dominant two-party monopoly) plus the hordes of contributors – corporations, unions, trade associations, and more.

One of the great features of interactive networks is the ability to dive into the details. For starters, lets take a look at the Nancy Pelosi neighbor network, which should provide a nice glimpse into the donor network for an entrenched, influential Democratic candidate:


What we see is a well-connected network populated by dozens of contributors. Now let’s go to the other side of the aisle and take a look at the donor network of John Boehner, an influential Republican incumbent:


The Boehner network is even more dense than the Pelosi network. We should note that many contributing organizations may be found in both the Pelosi and Boehner camps, although the overlap will be somewhat mitigated by the Democrat versus Republican differences. What they do have in common are a huge number of contributors determined to influence policy, often at the expense of the voting public.

Our final screenshot displays many of the PACs in the network – more than 2,600 in total. The attribute pane on the right of the display will show each and every one of these when you use the category filter to the left of the screen:


I hope you find some value in navigating and learning more about the scores of organizations involved in trying to influence policy through congressional gatekeepers. Bear in mind we haven’t even touched on the unelected portions of the government residing in the halls of the CIA, FBI, and Department of Defense. That will be the subject of a future network.

Facebooktwittergoogle_plusredditpinterestlinkedinmailFacebooktwittergoogle_plusredditpinterestlinkedinmailby feather
Facebooktwittergoogle_pluslinkedinrssFacebooktwittergoogle_pluslinkedinrssby feather

New US House Voting Patterns Network

Anyone who knows me well is aware of my general lack of enthusiasm for politics and politicians, so my latest network graph may come as a bit of a surprise. While I can’t express a lot of support for how my tax dollars are spent by the folks in DC, I can still make use of some of the data patterns they generate. Using data provided by govtrack.us (a non-government site), my latest Gephi project looks at the US House of Representatives votes over the last 4 months of 2014, specifically the ‘aye’ (yes) votes for each house vote.

The resulting graph lets us take a look at some general patterns, such as many cases where there is strong bi-partisan support for a bill. We can also see votes that failed, primarily in cases where the Democratic minority was unable to generate enough Republican support to pass a measure. Here are a few screenshots from the Gephi project; after that I’ll send you over to the interactive web version where you can search, zoom, pan, and otherwise interact with the data to your heart’s content.

First up is an overall view of the network, created using the Force Atlas 2 layout:

Here we can see the stereotypical view of Congress, with the blue Democrats on the left and red Republicans on the right. In the center are some very large nodes that depict near unanimous votes (nodes are sized by the number of ‘aye’ votes) with bi-partisan support. Darker gray nodes represent failed votes; note how many of these are at the far left, indicating support from only the Democrats in most cases. To the far right are bills that passed with primarily Republican support, as noted by their smaller size.

Our next view used node sizing to show only those representatives who cast 45 or fewer aye votes (of the more than 80 votes cast in this period). These voters are shown as oversized nodes relative to their colleagues. While missed votes may contribute to this classification, we also note the predominance of Democrats in this view. Given the Republican majority, it is hardly surprising that more Democrats would be likely to refrain from casting aye votes that are likely to reflect the Republican influence.


Next we take a look at those who cast at least 60 aye votes and are unsurprised to see that this one swings toward the Republican side of the graph. This view was achieved using some Gephi filters to hide individuals not meeting the selected criteria. Clearly, the most enthusiastic ‘aye’ voters in this period are primarily Republican.


Our final view for now (we could do dozens more) focuses on national security – generally considered to be a bi-partisan subject where both parties want to appear patriotic, regardless of whether the legislation actually advances security. To focus quickly on this topic, I have used Gephi to recolor all security-related nodes to yellow. Notice how these votes are almost uniformly bi-partisan, with overwhelming support from both parties.


These are just a few examples for how Gephi can help dissect a reasonably complex network and provide quick visual insights. There are of course many other methods available in Gephi that would take this analysis much deeper.

Now that we’ve done a brief examination of this data, time to move on to the interactive example on the web, where you can do your own clicking, searching, zooming, and panning to uncover patterns in the data. This functionality all comes courtesy of Sigma.js, an oustanding Gephi plugin. You can find the network here: http://visual-baseball.com/gephi/us_house/network/index.html#.

At some point, I may attempt to link back to the actual voting data at govtrack.us, but for now I hope you find this to be a useful (and fun) way to examine voting patterns. Enjoy!

Facebooktwittergoogle_plusredditpinterestlinkedinmailFacebooktwittergoogle_plusredditpinterestlinkedinmailby feather
Facebooktwittergoogle_pluslinkedinrssFacebooktwittergoogle_pluslinkedinrssby feather

Tableau Public Baseball Pilot Complete

A few weeks back I posted about using Tableau Public to explore baseball stats, specifically with respect to building dashboards to display information. At the time I had created a very rough first pass at the dashboard, with lots of small multiple tables displaying info on a single page. I quickly realized it was a bit overwhelming, so I sought a better solution.

The new version is an improvement, albeit imperfect due to some Tableau limitations. Users can now select a single chart to display in a large, single window using radio buttons. Really easy, and there are a lot of additional filters available on each page to customize the chart to display the results you want. All of this comes with one catch, however. Even when a given chart is not being displayed, it still uses up a handful of pixels on the vertical screen. This forces some charts to display a bit lower on the screen than others, a minor annoyance I spent some time trying unsuccessfully to solve. That’s why you’ll find the offensive categories split into two tabs rather than one.

All of the offensive category charts are displayed using dot plots with dotted lines and small filled circles as the display devices. This proves to be an effective, low-ink manner that compares favorably to bar charts in this case. Dot plots allow for an axis range that doesn’t need to start at 0, which proper bar charts always should (it has to do with the visual perception of the bar size relative to other bars) do. There are many horrendous examples on the web, particularly those produced by major media outlets that distort data either accidentally or intentionally. So dot plots give us an edge for this sort of display.

The third tab displays distributions of various offensive statistics using scatter plots, which again benefit from their ability to use flexible axis ranges based on the displayed values. Once again, there are many filters that let you play to your heart’s content, using team, position, season, and so on to reduce your dataset and answer questions quickly.

Here’s the updated version, to be followed at some point by a full 1871-2014 dashboard:

Direct link is found here.

Facebooktwittergoogle_plusredditpinterestlinkedinmailFacebooktwittergoogle_plusredditpinterestlinkedinmailby feather
Facebooktwittergoogle_pluslinkedinrssFacebooktwittergoogle_pluslinkedinrssby feather

4 Years of Baseball Graphics Updates in 3 Days

To say that some of my website visuals were not quite up to date is a massive understatement. The latest version of the Game Summaries covered the 2009 season. The interactive pennant race charts ran through 2011, and the Batting Explorer exhibits end with the 2009 season. Not exactly current in any of these cases, and other examples abound. So what to do about it?

For starters, the underlying data so generously made available from the Retrosheet folks needed to be updated. Portions of this had been done over the last few years, but a bit haphazardly, as I came to find out over the last few days. Some tables were current through 2011, others through 2012 or 2013. In short, they were consistently inconsistent, and certainly not suited to creating the latest versions of the aforementioned visuals.

One of the best aspects of growing older (at least from a data perspective) is accumulating more and more code that makes it a bit less painful to update or repair database tables. I have managed to create and save dozens of code snippets that help me create, insert, update, select, and otherwise manipulate the data into a proper format for consumption by visualization tools. In some cases, this code made the process surprisingly easy, while other cases required dusting off the cobwebs to understand what my code was doing or not doing. In the end, the process worked remarkably swiftly, aided by the periodic Michigan microbrew, resulting in table updates that allowed me to tackle the pennant race and game summary projects, resulting in 23 new baseball graphics created in a 72-hour window.

Et voila, as the French might say, the Visual-Baseball site now has 18 new pennant race charts (3 years times 6 divisions) while the game summaries have five new entries covering the 2010 through 2014 seasons, and they all work as expected. The pennant race charts are built using D3 and NVD3 code atop .json data, while the Game Summary exhibits are created using Simile Exhibit, a semantic browsing tool, also sitting on .json data.

The pennant race charts look like this:


The charts are interactive in several ways – individual teams can be hidden from view, the chart is zoomable, and individual values can be displayed using mouseover capability. You can find the entire portfolio of more than 360 charts here.

Game summary exhibits cover 60 seasons and afford users many filtering options to search for games based on specific criteria – teams, pitchers, runs scored, and so on. Results can be viewed in a tabbed fashion or via a timeline. Here’s an example image:


The entire gallery is located here.

Have fun with both the pennant races and the game summaries, and make sure to check out a few of the other resources in the portfolio section. It feels as though the site is gradually becoming a unique resource for the visual interpretation of baseball data, whether it is in the form of conventional charts or more esoteric views such as the network graphs. Feel free to share any of the information on the site, and tell your friends and colleagues.

Facebooktwittergoogle_plusredditpinterestlinkedinmailFacebooktwittergoogle_plusredditpinterestlinkedinmailby feather
Facebooktwittergoogle_pluslinkedinrssFacebooktwittergoogle_pluslinkedinrssby feather

Mapping Projected Growth Rates with D3

The mapping bug has bit me recently, as I continue to explore a variety of resources, including CartoDB, Mapbox, Leaflet, and now D3. While D3 is not a dedicated mapping platform like the others, it is perhaps the most flexible of all, due to the wealth of map projections provided by Mike Bostock, Jason Davies, and the rest of the D3 community. In addition, it provides nearly unlimited potential through the use of colors, labels, tooltips, and so on, all customizable using CSS.

My latest effort is a rather simple foray into a time-based map using population projections from the UN, found here. This map owes a debt of gratitude to a similar creation found on github from Rich Donohue. I was able to use his example as a starting point and then simply tweak a handful of settings, provide a different data source, and manipulate color schemes and the map projection. This is a basic choropleth map, where every country has a fill color based on the projected rate of population growth for every five-year period through 2100. I removed the data for the first half of each decade, as it didn’t add to the story. The result is a map that shows 10-year intervals for every country from 2020 through 2100.

Here’s a glimpse of the map, or click here to go to the full version:

This is a relatively simple example that barely scrapes the surface of what D3 can do, but it reinforces my love affair with open source tools and communities. More to come.

Facebooktwittergoogle_plusredditpinterestlinkedinmailFacebooktwittergoogle_plusredditpinterestlinkedinmailby feather
Facebooktwittergoogle_pluslinkedinrssFacebooktwittergoogle_pluslinkedinrssby feather

Midterm Elections Won’t Change Runaway Budget

Many voters and observers have been talking about the huge changes that could be brought about by the 2014 US midterm elections. While this may be true on a few high profile social issues, the election results will do nothing to stop the runaway freight train known as the federal government budget. Specifically, this refers to the many agencies that fall within the Executive branch of the government. Neither party has shown any inclination to slow this growth down, and that will likely remain unchanged regardless of who is in power.

To show how little impact the electorate has on this unelected portion of the government (often referred to as the ‘shadow government’), these giant bureaucracies continue to rapidly expand even as more and more Americans are struggling to survive. I have provided the gory details using Tableau Public, based on the federal government’s own staggering budget numbers from 1962 through 2015. Find the story here: Unchecked agency growth

Facebooktwittergoogle_plusredditpinterestlinkedinmailFacebooktwittergoogle_plusredditpinterestlinkedinmailby feather
Facebooktwittergoogle_pluslinkedinrssFacebooktwittergoogle_pluslinkedinrssby feather

Data Visualization, Aesthetics and Intuition

As I worked through a just completed project chronicling the diverse musical career of Neil Young, some valuable (if unintended) insights were reinforced once more. I work on a regular basis with a variety of large datasets that require analysis, interpretation, and ultimately visualization and presentation. Often, these goals are not easily reconciled, which leads to unsatisfactory results across one or more of these factors.

As much as we as analysts need to depict the data accurately and meaningfully, if we don’t do so with an attractive visual approach we risk not having our message get communicated at all. Merely presenting our data in a table may technically get the job done, but is also likely to bore the reader to tears while simultaneously failing to deliver the key messages. At the other extreme, we can pull out individual bits of the data and spend our time creating flashy infographics that may capture attention but fail to represent the data in its proper context. All flash, no substance. Neither approach is terribly effective.

At the same time, we may present all of the information using a reasonable visual approach that preserves the integrity of the data while still falling short of creating a fulfilling user experience. This is what I recently experienced with the Neil Young project, as I’ll detail below.

After spending a few days getting the data from the AllMusic site into Excel, and eventually as node and edge files into Gephi, it was finally time to create the network data visualization. I was determined to attempt one of the many force-based methods used in network graph analysis to create the graph. These methods are very popular and useful for creating graphs out of a variety of data networks, allowing viewers to see the larger patterns at work within the data.

After a few iterations, I wound up with a serviceable graph that covered most of the basics I spoke of earlier – all the data was exposed, element types were sized and color-coded for easier interpretation, and the project was navigable via the web. Here’s a look:


Not bad, but there was something nagging at me as I viewed it, tweaked it, played with the styling, and so on. Everything was technically fine, but something was missing. So back I went to Gephi to find the answer. The next day, it occurred to me – I was using the wrong approach for the type of data I was trying to depict. Where the force-directed approach is ideal for dense, social media type networks, this was a unique network that didn’t possess the same structure. Therefore, it was not as aesthetically appealing or as intuitive as it could be.

After iterating through a few approaches, I came across a winner that best exploits the structure of the underlying data while conveying a far more intuitive feel to the end user. Why not have Neil at the center of the graph, surrounded by all of his albums, ordered by release date? On top of this, I could then have the style and mood data form an outer ring, as they needed only to link to the albums in some fashion. Now we have something that conveys the same information as the first attempt, but in a much more pleasing layout relative to this dataset. See for yourself:


The new version addresses the issues of aesthetics and intuition where the first graph fell short. All moods and styles are now easily found; the same is true for all albums. Highlighting a single mood (or album) also provides an information-rich view for how the music changed over periods in Young’s career. This was nearly impossible to see in the initial layout.

So the message is this – visualizations not only don’t need to sacrifice aesthetics and intuition in order to be effective; rather, they should take advantage of these attributes to increase their appeal and impact. Don’t be afraid to experiment until you find the right formula, as it seldom presents itself the first time around, and trust your instincts.

Facebooktwittergoogle_plusredditpinterestlinkedinmailFacebooktwittergoogle_plusredditpinterestlinkedinmailby feather
Facebooktwittergoogle_pluslinkedinrssFacebooktwittergoogle_pluslinkedinrssby feather

Network Data Adventures with Gephi

Posting to the blog has become a luxury recently, what with a summer full of youth baseball, some organizational changes at work, summer home projects, and of course the upcoming Gephi book. I’ve learned that one has to be especially good at finding synergies between projects in order to get everything done. So it is with creating any new work in Gephi while writing the book. Any new projects will necessarily be created while in the process of developing material for the book.

Earlier this year, I created a host of network visuals, one per franchise, showing the relationships between all players who suited up for a given major league baseball team. This data made for some interesting visuals that were fun to explore. What the graphs didn’t do was to provide visual cues about how the players could have been grouped – by decade, position, birthplace, and so on. So the logical evolution was to take this idea and extend it as an example for how to use partitioning and clustering to visually segment a network graph.

Recently I began playing with this idea by looking at a few of these examples, and have included some in one of the book chapters. I’ll use some slightly different cases here to avoid redundancy, but the principles are identical. I’ll walk through an example for how we can extract intelligence from a network graph in a few easy steps, using the Boston Red Sox from 1901 through 2013.

  1. Start with the base graph, having used a layout algorithm to arrange it in some fashion. I used the ARF approach for this example.
  2. Size the nodes in the graph using some criterion, such as the number of games played as a catcher. This will help users to quickly spot the dominant players at that position.
  3. Color the nodes using a categorical variable like decades. In this case, the color will reflect the first decade a player suited up for the Red Sox.

In sequence, here are the three graphs:

Kind of dull – nothing but a lot of identical nodes and their connections. Let’s apply sizing based on the number of games played as a catcher:


Now that pops a few things! We have some easy starting points to work from. How about coloring the nodes by decade to see if that adds to the story:


Hmmm. Maybe this gives us some additional insight as well. Certain decades are split amongst multiple catchers, while in other cases we have a single dominant player. Of course we would want to allow the user to identify each of these cases (for example, the large green node at the top left is Jason Varitek) through some labeling or interactivity.

So you get the idea for how a couple simple tweaks can change the way we view a graph. I’ll be using a similar approach in the book to help readers create powerful stories with their own data.

Facebooktwittergoogle_plusredditpinterestlinkedinmailFacebooktwittergoogle_plusredditpinterestlinkedinmailby feather
Facebooktwittergoogle_pluslinkedinrssFacebooktwittergoogle_pluslinkedinrssby feather

Mastering Gephi Book Update

Lest anyone think of me as a full-time author, rest assured that the likes of J.K. Rowling are not trembling in fear. Even if I had the ability to conjure up creative plots, I type too damn slow to make it as a full-time literary lion. Fortunately, I don’t have to depend on my keyboarding skill (or lack of) as a full-time pursuit. Which brings me around to my topic – the current book I’m authoring on Gephi.

For those not exposed to networks and network analysis, Gephi is a French-based open source project that makes it possible for all sorts of users (including moi) to create interesting graphs from connected datasets. By connected I am referring to data where the individual nodes are connected in some way, shape, or form. This could be anything from movie actor databases, Facebook friend networks, baseball player connections, and so on. Anyone with a spreadsheet full of data and a bit of effort and persistence can use Gephi to create cool looking graphs that also tell a story of some sort.

My job in writing the book is to help people make sense of all the features and capabilities within Gephi, some of which are a bit complex to master. In the process, I get to learn more about the theory behind network analysis, and with it terms such as contagion, diffusion, clustering, and homophily. It’s really fascinating if you’re into understanding how people and institutions interact, contagion processes function, or how product adoption can be affected by the structure of a network. My higher math skills are not good enough to be at an academic level with this stuff, so I have to compensate with some logic and visual acuity.

Anyhow, here’s some of the stuff that Gephi can create:




I’m hoping that one of these images will serve as the book cover come publishing time, which should be sometime this fall. In the meantime, I have six more chapters to write (of 10 total), and will have the added joy of working through chapter edits where others catch the mistakes I’ve made.

Facebooktwittergoogle_plusredditpinterestlinkedinmailFacebooktwittergoogle_plusredditpinterestlinkedinmailby feather
Facebooktwittergoogle_pluslinkedinrssFacebooktwittergoogle_pluslinkedinrssby feather

Day 3 Eyeo Festival Recap

Well, it’s come and gone again, like a fleeting annual romance, a la Same Time Next Year, the Alan Alda & Ellen Burstyn feature about a couple that meets for a single weekend once a year. This is how I feel about the Eyeo Festival, where once a year for the last three years I have this short flirtation with this incredible event, meeting new people, reacquainting with others, and opening my mind to new ideas and people. So it was on the final day of this year’s event, with more amazing speakers sharing the work they create.

Morning began with Lauren McCarthy, recognizable from prior festivals, but presenting for the first time at Eyeo. She shared a creatively diverse group of projects she has worked on dating back a number of years, some merely playful, others representing bold forays into uncharted territory, many revolving around human relationships and interaction. One of the projects even netted her a bit of controversy, including some discussion on Fox news, hilariously shared with the Eyeo audience. Here are some of her gems: Crowdpilot, Us+, and Inneract.

Next up was Eric Rodenbeck, head of the great Stamen map and visualization design studio. Rodenbeck shared a number of Stamen projects, and also provided a great deal of insight on how to keep a small design studio going for better than a dozen years. Stamen does a lot of impressive work, and shares much of it in the public domain, including their Map Stack application. Lots of great advice provided for those out on their own, but also much relevance to tinkerers like myself working in large organizations.

One of the lunchtime sessions featured Ben Jones of Tableau, sharing some of the capabilities of the new Tableau 8.2, to be released very soon. I’m already very familiar with Tableau through daily work use, so much of the material was not new, but some of the upcoming 8.2 features are exciting indeed. After the talk, I had the opportunity to speak with both Ben and the Tableau mapping lead. Maps in 8.2 were designed in conjunction with the aforementioned Stamen, so they are likely to be a significant improvement over the already respectable maps in prior Tableau versions.

After lunch, my first stop was to hear the irreverent Jessica Hagy, who has done more to make the index card useful than anyone in recent memory. Her simple line drawings, typically either x-y axes or Venn diagrams, merge often unlikely pairs of ideas, items, or things, and hilariously plot them. Hagy has a number of other interesting projects, so take a moment to check them out.

Roman Verostko won honors as the oldest speaker (and attendee) at this years festival, and has been creating art projects since before most attendees were even born (before many of their parents were born, even!). At 85 years young, Verostko was the creator of a code-based machine to control brush strokes, in a process known as algorithmic painting. He also spents many years as a Benedictine monk, before leaving the monastery to become a full-time artist and professor. Versotko was one of the three pioneers featured at Eyeo, along with Frieder Nake and Lillian Schwartz.

The festival ended for me with the great Santiago Ortiz, creator of some of the most amazing interactive projects to be found on the web in recent years. Ortiz, who is based in Argentina, took the audience through a talk primarily focused on a 6-month period characterized as the most stressful, challenging, and uncertain period of his life, after parting ways with a previous employer and setting out on his own. After a number of physical hardships related to stress and fatigue, he eventually triumphed and began creating some incredible work that has to be seen by all visualizers. Ortiz work is heavily influenced by his background in and love for mathematics, which results in some incredibly elegant work. This was a very inspiring end to my third Eyeo trip.

Facebooktwittergoogle_plusredditpinterestlinkedinmailFacebooktwittergoogle_plusredditpinterestlinkedinmailby feather
Facebooktwittergoogle_pluslinkedinrssFacebooktwittergoogle_pluslinkedinrssby feather