MLB Rankings Using the Bradley-Terry Model

Today, I take my first shots at ranking Major League Baseball (MLB) teams. I see my efforts at prediction and ranking an ongoing process so that my models improve, the data I incorporate are more meaningful, and ultimately my predictions are largely accurate. For the first attempt, let’s rank MLB teams using the Bradley-Terry (BT) model.

Before we discuss the rankings, we need some data. Let’s scrape ESPN’s MLB Standings Grid for a win-loss matchups of any two MLB teams for the current season. Perhaps to simplify the tables and to reduce the sparsity resulting from interleague play, ESPN provides only the matchup records within a single league – American or National. Accompanying the matchups, the data include a team’s overall record versus the other league, but we will ignore this for now. The implication is that we can rank teams only within the same league.

Scraping ESPN with a Python Script

In the following Python script, the BeautifulSoup library is used to scrape ESPN’s site for a given year. The script identifies each team in the American League table, their opponents, and their records against each opponent. The results are outputted in a CSV file to analyze in R. The code is for the American League only, but it is straightforward to modify the code to gather the National League data. Below, I use only the data for 2013 and ignore the previous seasons. In a future post though, I will incorporate these data.

Here’s the Python code. Feel free to fork it.

The BT model is a simple approach to modeling pairwise competitions, such as sporting events, that do not result in ties and is well-suited to the ESPN data above where we know only the win-loss records between any two teams. (If curious, ties can be handled with modifications.)

Suppose that teams $i$ and $j$ play each other, and we wish to know the probability $p_{ij}$ that team $i$ will beat team $j$. Then, with the BT model we define

where $\lambda_i$ and $\lambda_j$ denote the abilities of teams $i$ and $j$, respectively. Besides calculating the probability of one team beating another, the team abilities provide a natural mechanism for ranking teams. That is, if $\lambda_i > \lambda_j$, we say that team $i$ is ranked superior to team $j$, providing an ordering on the teams within a league.

Perhaps naively, we assume that all games are independent. This assumption makes it straightforward to write the likelihood, which is essentially the product of Bernoulli likelihoods representing each team matchup. To estimate the team abilities, we use the BradleyTerry2 R package. The package vignette provides an excellent overview of the Bradley-Terry model as well as various approaches to incorporating covariates (e.g., home-field advantage) and random effects, some of which I will consider in the future. One thing to note is that the ability of the first team appearing in the results data frame is used as a reference and is set to 0.

I have placed all of the R code used for the analysis below within bradley-terry.r in this GitHub repository. Note that I use the ProjectTemplate package to organize the analysis and to minimize boiler-plate code.

After scraping the matchup records from ESPN, the following R code prettifies the data and then fits the BT model to both data sets.

Next, we create a heatmap of probabilities winning for each matchup by first creating a grid of the probabilities. Given that the inverse logit of 0 is 0.5, the probability that team beats itself is estimated as 0.5. To avoid this confusing situation, we set these probabilities to 0. The point is that these events can never happen unless you play for Houston or have A-Rod on your team.

Now that the rankings and matchup probabilities have been computed, let’s take a look at the results for each league.

American League Results

The BT model provides a natural way of ranking teams based on the team-ability estimates. Let’s first look at the estimates.

(Please excuse the crude tabular output. I’m not a fan of how Octopress renders tables. Suggestions?)

The plot and the table give two representations of the same information. In both cases we can see that the team abilities are standardized so that Baltimore has an ability of 0. We also see that Tampa Bay is considered the top AL team with Boston being a close second. Notice though that the standard errors here are large enough that we might question the rankings by team ability. For now, we will ignore the standard errors, but this uncertainty should be taken into account for predicting future games.

The Astros stand out as the worse team in the AL. Although the graph seems to indicate that Houston is by far worse than any other AL team, the ability is not straightforward to interpret. Rather, using the inverse logit function, we can compare more directly any two teams by calculating the probability that one team will beat another.

A quick way to compare any two teams is with a heatmap. Notice how Houston’s probability of beating another AL team is less than 50%. The best team, Tampa Bay, has more than a 50% chance of beating any other AL team.

While the heatmap is useful for comparing any two teams at a glance, bar graphs provide a more precise representation of who will win. Here are the probabilities that the best and worst teams in the AL will beat any other AL team. A horizontal red threshold is drawn at 50%.

An important thing to notice here is that Tampa Bay is not unbeatable, according to the BT model, the Astros have a shot at winning against any other AL team.

I have also found that a useful gauge is to look at the probability that an average team will beat any other team. For instance, Cleveland is ranked in the middle according to the BT model. Notice that half of the teams have greater than 50% chance to beat them, while the Indians have more than 50% chance of beating the remaining teams. The Indians have a very good chance of beating the Astros.

National League Results

Here, we repeat the same analysis for the National League.

For the National League, Arizona is the reference team having an ability of 0. The Braves are ranked as the top team, and the Marlins are the worst team. At first glance, the differences in National League team abilities between two consecutively ranked teams are less extreme than the American League. However, it is unwise to interpret the abilities in this way. As with the American League, we largely ignore the standard errors, although it is interesting to note that the top and bottom NL team abilities have more separation between them when the standard error is taken into account.

As before, let’s look at the matchup probabilities.

From the heatmap we can see that the Braves have at least a 72% chance of beating the Marlins, according to the BT model. All other winning probabilities are less than 72%, giving teams like the Marlins, Cubs, and Mets a shot at winning.

Again, we plot the probabilities for the best and the worst teams along with an average team.

I find it very interesting that the probability Atlanta beats any other NL team is usually around 2/3. This makes sense in a lot of ways. For instance, if Atlanta has a three-game series with the Giants, odds are good that Atlanta will win 2 of the 3 games. Moreover, as we can see in the table above, there is less than a 5% chance that the Giants will sweep Atlanta.

The BT model indicates that the Miami Marlins are the worst team in the National League. Despite their poor performance this season, except for the Braves and the Cardinals, the Marlins have a legitimate chance to beat other NL teams. This is especially the case against the other bottom NL teams, such as the Cubs and the Mets.

What’s Next?

The above post ranked the teams within the American and National leagues separately for the current season, but similar data are also available on ESPN going back to 2002. With this in mind, obvious extensions are:

• Rank the leagues together after scraping the interleague play matchups.

• Examine how ranks change over time.

• Include previous matchup records as prior information for later seasons.

• Predict future games. Standard errors should not be ignored here.