Applications of Discrete Markov Chains to Baseball Analysis

  • Leif Eliasson MacEwan University


There are two fundamental notions which justify the use of Markov chains in the analysis of baseball game outcomes. The first is simply a statement of the consequences of the rules of Baseball itself-- namely that for any batter, exactly one of three possible outcomes will have occurred by the time their turn at bat is finished. They will have scored a run, or they will find themselves on base, or they will be “out”. As a consequence, the evolving state of a game of baseball can be entirely characterized in terms of batters in sequence moving from their turn at bat into the appropriate variation of one of these three foundational “states”. In particular, from the beginning of a half-inning to its conclusion, every single possible configuration of bases occupied, number of outs, and number of points scored can therefore be arranged sequentially and, as I will demonstrate, the discrete Markov chain is the perhaps the most natural framework in which to do this.

The second fundamental notion is the justification for why, if we create a matrix of transition probabilities representing the transition from one “state of the field”, or “arrangement of players on bases” to the next, this matrix should in fact be a Markov chain. The key is this: since the states being transitioned through in this proposed matrix are exactly the state of the field of play at the time a given batter takes their turn at the mound, and the subsequent state after a batter has taken his turn depends solely upon the performance characteristics of that batter (this is a crucial point, as indeed the next state de facto depends upon the performance of the defenders in the field, but statistically their effects and any others may be aggregated into some “average” performance of the batter), it is logically equivalent then to say that the past states of the game have no bearing on what state will be reached next. The Markov property is satisfied precisely because the state of the game after a batter has taken their turn rests entirely on the shoulders of that batter, without regard to what the batters before have done.

Taken together, these two notions suggest a way to construct a Markov chain which will model each state of play which the game passes through from the start of a half-inning to its conclusion.

Presentation Abstracts