Data, Data, and More Data

My first week of 2026 has been spent largely on updating game and event data from the massive Retrosheet data sets. Even limiting the number of data elements to a small subset of the event data yields a considerable amount of information to analyze. Here’s what’s new (for my databases) this week:

  • 2023-2025 season event data
  • 1950-1953 season event data
  • 1910-1949 season event data

What do we find in this data? For my subset, these are the bits of data I can use:

  • game id (a unique combination based on date and the home team
  • visiting team
  • inning (in which an event occurred)
  • batting team
  • the number of outs, balls, and strikes at the time of an event
  • the score at the time of the event
  • batter & pitcher information (left-handed, right-handed, etc.)
  • event type (single, double, home run, etc.)

Plus a wealth of additional information to be mined, analyzed, and visualized.

While Retrosheet is missing events for a small percentage of games between 1910 and 1970, the data is otherwise remarkably comprehensive. Now that I have it stored locally, you should start seeing some interesting analyses on this site for 2026. That’s it for now, and thanks for reading!