My first week of 2026 has been spent largely on updating game and event data from the massive Retrosheet data sets. Even limiting the number of data elements to a small subset of the event data yields a considerable amount of information to analyze. Here’s what’s new (for my databases) this week:
- 2023-2025 season event data
- 1950-1953 season event data
- 1910-1949 season event data
What do we find in this data? For my subset, these are the bits of data I can use:
- game id (a unique combination based on date and the home team
- visiting team
- inning (in which an event occurred)
- batting team
- the number of outs, balls, and strikes at the time of an event
- the score at the time of the event
- batter & pitcher information (left-handed, right-handed, etc.)
- event type (single, double, home run, etc.)
Plus a wealth of additional information to be mined, analyzed, and visualized.
While Retrosheet is missing events for a small percentage of games between 1910 and 1970, the data is otherwise remarkably comprehensive. Now that I have it stored locally, you should start seeing some interesting analyses on this site for 2026. That’s it for now, and thanks for reading!