One of the primary source data sets I use to create baseball visualizations is the amazingly detailed information captured by the Retrosheet project, a dedicated group of volunteers providing play-by-play and game level information for each MLB season. They have recently passed the 100-year milestone, with data from the 1921 & 1922 seasons now available. I have some catching up to do on the older seasons, but just downloaded the 2022 season for adding to my databases.
The data comes in two distinct sets – game logs being the much easier of the two to work with, due to the smaller data size. Each game played in a season is captured at a summary level (~ 2,400 records), with information pertaining to the score, players, umpires, attendance, and much more. This information is used to feed my game summary visualizations:
As you can see, these are bite-sized summaries of every game, showing some of the important summary data for a game. They can be filtered to find specific teams, pitchers, scores, and much more. These visualizations are currently available covering the 1955-2019 seasons; one of my immediate goals is to add the 2020, 2021, and 2022 seasons, before starting to work in reverse with pre-1955 campaigns.
Fortunately, I have lots of SQL code built up over the years to make the data update process fairly simple; the 2022 game logs have already been added, and now I’ll get to work on the play-by-play data. Stay tuned for updates, and thanks for reading!
This is the first in a series of posts where I take a look at notoriously one-sided baseball trades, using the baseball trade networks published on this site earlier in 2022. I won’t necessarily rank these deals in any sort of order; rather I will pick out a few from the network trade graphs and provide some analysis and context for some of the most notorious transactions.
If you haven’t seen the trade networks previously, here’s a link.
The networks were built using data from Retrosheet and Neil Paine, loaded into Gephi, a network analysis and visualization tool, and ultimately pushed to the web where I could finish styling the graphs. Graph nodes (the circles in the networks) are sized based on the total future WAR (Wins Above Replacement) accrued by the teams involved in the trade. All values must occur at the major league level (MLB), so players involved in the deal who don’t reach the MLB level with their new team will have a zero value. Only the cumulative WAR value while playing for the new team is included; we are not calculating WAR once a player leaves one of the teams involved in the transaction.
Finding a bad trade by scanning the networks is more an art than a science; the key is to look for large nodes (indicating a lot of future WAR value), and then dissecting the trade to see how much value each team received. The other alternative is if we already know the player(s) we are looking for; in these cases we can perform a simple search to find the trade. Here’s a classic example that Red Sox fans would love to forget – trading future Hall of Famer Jeff Bagwell for journeyman reliever Larry Anderson. Let’s go to the Red Sox trade network and search for Jeff Bagwell.
Typing in Jeff Bagwell locates him quickly within the trade network. Note that even if a player is involved in multiple trades to or from the same team (rare but possible) the search will locate each transaction. Here’s the Bagwell transaction, showing his player node and future WAR value connected to the transaction node; every player involved in that transaction will be connected to the trade node, as long as there is some future WAR value. If a player in the trade did not play in the majors for the receiving team, they will not be reflected in the graph. Here’s a view of Jeff Bagwell relative to the trade:
We can also click on the transaction node to see the value provided to each team by all of the players involved in the trade, again assuming they spent time with the team and were not limited to the minor leagues. Clicking on that node will display the respective WAR values in the sidebar on the left of the screen:
Here’s where we get to the details of the trade, and specifically the direct benefits accrued to each team. The Red Sox received 1.1 future WAR from Larry Anderson; to put this in perspective, we might expect this sort of value for an average player for a single season. The Astros, on the other hand received an incredible 93.8 WAR from Jeff Bagwell, or close to 6 WAR per season for 16 years! That is a Hall of Fame level performance, and it eventually led to his selection to Cooperstown in 2017. Here’s a profile that mentions the one-sided trade.
While we have the Red Sox network open, let’s see if there are any other disastrous transactions (other than the cash sale of Babe Ruth to the Yankees, technically not a trade). After scanning the network, we find this one from 1928:
This one is clearly not a Bagwell-level disaster, but was still quite negative for the Red Sox, with a WAR differential of 30 points. The primary villain here is Buddy Myer, a solid infielder who hit .300 or better seven times for the Senators. Not a major star, but the owner of a very nice career, including leading the American League in batting average in 1935.
Let’s try to find one more before closing this piece, this time favoring the Red Sox. We zero in on this deal:
The Red Sox netted nearly 45 future WAR value while surrendering just 0.1; most of the benefit was generated by slugging future Hall of Famer Jimmie Foxx, but they also received a nice three season contribution from pitcher Johnny Marcum. Note how we also removed nodes not involved in the trade by clicking on the edges icon on the bottom left of the display area; this makes it easier to focus on the details.
Feel free to try your hand at finding more of these one-sided deals in the Red Sox or any other trade networks. I’ll be back with some other teams before long. Thanks for reading!
The final 10 MLB WAR Trade Networks have now been published, bringing the total number of graphs to 31 – 30 teams and one overall network with all teams and transactions. For more information on the trade networks, click here. Here are the remaining networks:
I’ve added 11 new MLB WAR Trade Networks to the site, bringing us to 21 in all. One more round of updates should get us to the full complement of graphs. For more information on using the graphs, see here.
The first 10 WAR (Wins Above Replacement) Trade Networks are now available for exploring! This initial group includes nine team networks and one overall graph with all teams included. Here’s a list of the 10 graphs:
Each of these and any upcoming WAR trade networks can be found on this page.
Let’s walk through how the graphs work, using the Detroit Tigers network as an example. We’ll begin with an anatomy of the graph display:
As the image shows, the primary focus will be the main graph area in the center of the window. This is where all nodes (transactions, teams, and players) will reside, connected by edges based on common relationships. Transaction nodes will vary in size based on the total value of a trade with the largest nodes indicating a trade that created significant future WAR for one or both teams. Team and player nodes are set to constant sizes so that the initial visual focus will be on the transaction nodes. The size differences become more noticeable when we zoom in to the network. More on that shortly.
Edges are also sized based on WAR value; this is where we see the value provided to a team and by specific players. Edge sizes (weights) will be more easily seen when we zoom in to the network.
On the left are some graph controls to assist in navigating the graph. We can zoom in using the slider control or the plus/minus buttons adjacent to the slider. Zooming can also be done with a mouse scroll if you prefer that option. The fisheye lens can be toggled on or off and can be used to highlight certain areas of the graph by hovering over a selected region. Finally, the edges button will enable showing or hiding edges and connected nodes. This is useful when you wish to reduce surrounding nodes and focus on specific transactions. We can also pan the graph by dragging it using a mouse – this is helpful in centering a network or viewing specific regions of the graph.
At the upper left of the window is a color legend for each node type, and hidden on the left (not shown in our image) is an information pane that will show specifics about the network. More on that in a bit.
Now let’s examine the information window – this is what makes the network truly powerful. When the network is first displayed or the browser window is refreshed the information pane displays information about the graph (open it by clicking on the arrows icon at the top left):
You can see the simple overview of the graph, the source data, and what it aims to accomplish. Here’s an enlarged version for easier reading:
If we zoom in and select a specific transaction the pane displays the relevant details for that selection:
Now we have the details for the transaction – the season, teams, and players involved. Here’s the enlarged view:
You can do this for any transaction in a graph, or you could choose to select a team or player to see how they fit into the network. The possibilities are nearly endless and it’s a fun way to understand the relationships between teams, players, and trades.
We’ll do more exploring of the networks in upcoming posts; I’ll also be adding more teams until we have a complete set of trade networks. In the meantime, feel free to explore the graphs to learn more about the best (and worst) trades your favorite team has made over the last 120 years. Enjoy, and thanks for reading!
A few years back (2016 o be specific) I created network graphs displaying the history of trades made for each MLB franchise, using transactions data from the wonderful Retrosheet project. These graphs presented more than a few challenges in how to present the data but I wound up with what I consider to be a very interesting set of results, which you can find here. I also created some posts on the process at that time, found here and here.
Here’s a snapshot within a graph:
Six seasons have elapsed since I created those graphs, so I thought it was beyond time to update them, but this time with a twist. Last fall I came across a great dataset that captures an array of advanced sabermetric statistics which I hope to use on a regular basis. These statistics can be used to assess a player’s true value relative to his peers each season. What if I could incorporate those into the trade network updates to show the post-trade value of each player to their new team? Ideally, this will help to show the value of each trade and which team wound up getting the better part of the deal.
Of course this would involve adding a degree of complexity to the MySQL code for pulling the data and shaping it for use in creating network graphs. However, the end result could be very revealing and worthwhile. Today I’m at the start of the process, tinkering with SQL code to extract the data in a proper format. Here’s an example:
SELECT h.player_name, p.playerID, tr.season, tr.TransactionID, tr.TeamFrom, tr.TeamTo, ROUND(SUM(h.WAR162),1) as WAR
FROM historical_WAR_and_more h
INNER JOIN People p
ON h.key_bbref = p.bbrefID
INNER JOIN trades2021 tr
ON p.retroID = tr.Player
WHERE tr.season >= 1901 and h.year_ID > tr.season and h.team_ID = tr.TeamTo AND tr.Type = ‘T’
GROUP BY h.player_name, p.playerID, tr.season, tr.TransactionID, tr.TeamFrom, tr.TeamTo
In this case, I’m looking at the cumulative WAR (Wins Above Replacement) for each traded player with their new team. This could be a single season total or the sum of many years in some cases. Here are some results:
We now have post-trade results (starting if the season following the trade) as measured by WAR for each traded player. We see one fairly substantial figure – the second Aaron Harang trade which netted 16.9 WAR points for his new team, the Cincinnati Reds (CIN in the results). Given that a single season WAR above 3 or 4 is considered substantial, it’s clear that his new team probably benefited from a few of those high-value seasons. What we can’t see yet is what they gave away in their half of the trade.
Fortunately, we can access this using the TransactionID field, which provides all the information for each party within the trade. But we’ll save that for another day as I figure out the next progression of the code. As always, thanks for reading!
I for one am looking forward to 2022 after a couple of interesting, often challenging years affected my desire to generate interesting analytics and data visualizations. The less said the better – simply excited to get back to updating some existing visuals and adding a host of new ones.
I’ll be doing a lot of work using the Exploratory toolkit which keeps improving by the day. It is simply a great tool for handling large (or small) data sets from start to finish; I especially love it’s data wrangling capabilities.
On the data source side, Retrosheet and the Lahman database will continue to feed my analysis and visuals; none of what I create would be possible without these great resources. Retrosheet data (used for game level and play level detail) is already updated through the 2021 season; part of this year’s plan is to add older years (pre-1955) to my local database. The Lahman data (season level) is typically available around February and I’ll be downloading it to my databases at that time.
Stay tuned for updates throughout 2022 – they should be a lot more frequent than the last two years. Happy New Year!
The 2018 game summaries have been updated, using data from the Retrosheet project. This is the latest update in a series that goes all the way back to 1954. As a user, you have the ability to filter on a wide array of fields, as seen below:
The summaries provide basic data about every individual game played in a selected season – the line score, winning and losing pitchers, home runs, and much more. Here’s an example:
To have a go at the 2018 summaries, or any other season, go to the Game Summaries page in the portfolio section of this site.
I’m pleased to announce that the 2018 Retrosheet game log files have been uploaded to the VBP database. This data can be used to create analysis at the game level, with a wide array of data elements, including the following:
…and much more
This data provides the input for the Game Explorer visualizations on this site, which will be updated shortly to include the 2018 season. If you haven’t seen them previously, the Game Explorers allow users to filter across many data attributes to retrieve specific results. Here’s a screenshot:
The next step is to create the 2018 version of the explorers, adding to the existing files covering the 1955-2017 seasons. I’ll keep you posted as soon as 2018 is available on the site. Thanks for reading!