Dr. Alexander Fisher
Duke University
Given two individuals \(i\) and \(j\) drawn from some population, the Bradley-Terry model estimates the probability that the pairwise comparison \(i > j\) turns out true, as
\[ Pr(i > j) = \frac{p_i}{p_i + p_j}, \]
where \(p_i\) is a positive real-valued score assigned to individual i. The comparison \(i > j\) can be read as “\(i\) is preferred to \(j\)”, “\(i\) ranks higher than \(j\)”, or “\(i\) beats \(j\)”, depending on the application.
One popular application of the Bradley-Terry model is ranking sports teams. We are interested modeling the outcome of previous match ups.
All 1230 regular season NBA games (82 games per team, 30 teams: 82 * 30 /2) from 2015-2016.1
Rows: 1,230
Columns: 3
$ Home <dbl> 1, 5, 10, 22, 2, 3, 9, 16, 28, 11, 15, 17, 21, 24, 25, 26, 14, 12…
$ Away <dbl> 9, 6, 19, 30, 23, 5, 29, 4, 12, 8, 6, 20, 27, 7, 19, 13, 18, 15, …
$ Y <dbl> 0, 1, 1, 0, 1, 0, 1, 1, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 1, 0, 1,…
Code book:
Home
: unique id for home teamAway
: unique id for away teamY
: whether the home team won (1) or lost (0)The most basic Bradley-Terry model does not account for ties, which is fine for our basketball example. However, it would be nice to model and assess the contribution of home-court advantage.
\[ \text{Pr(i beats j at home)} = \frac{\theta p_i}{\theta p_i + p_j} \]
\[ \text{Pr(i loses to j at home)} = \frac{p_j}{\theta p_i + p_j}. \]
Here, \(\theta \in \mathbb{R}^+\) corresponds to home-court advantage.
What does \(\theta = 0\) and \(\theta = 1\) correspond to?
What assumptions have we made about home-court advantage in this model?
Let \(a_{ij}\) be the number of times team \(i\) beats team \(j\) at home and let \(b_{ij}\) be the number of times team \(i\) loses to team \(j\) at home. Assuming all the games are independent, we write the log-likelihood of the Bradley Terry model with home-court advantage,
\[ \log L(\mathbf{p}, \theta) = \sum_{i} \sum_{j} a_{ij} \log \left( \frac{\theta p_i}{\theta p_i + p_j} \right) + b_{ij} \log \left(\frac{p_j}{\theta p_i + p_j} \right) \]
Let \(w_i\) be the total number of wins by team \(i\) and let \(h = \sum_{i} \sum_{j} a_{ij}\) be the total number of home-court wins across all teams.
We can re-formulate the log-likelihood,
\[ \log L(\mathbf{p}, \theta) = h \log \theta + \sum_{i} w_i \log p_i - \sum_{i} \sum_{j} (a_{ij} + b_{ij}) \log (\theta p_i + p_j) \]
Hint for algebra: \(\sum_i w_i \log p_i= \sum_i \sum_j \left[a_{ij} \log p_i + b_{ij} \log p_j \right]\). To see this derived in extra detail, see here.
\[ - \log (\theta p_i + p_j) \geq - \log (\theta_n p_{ni} + p_{nj}) - \frac{(\theta p_i + p_j) - (\theta_n p_{ni} + p_{nj})}{\theta_n p_{ni} + p_{nj}} \]
If we instead subtract the RHS, we are subtracting something ‘smaller’ from our log-likelihood. In other words, the log-likelihood dominates \(g\) where
\[ g(\mathbf{p}, \theta | \mathbf{p}_n, \theta_n) = h \log \theta + \sum_i \log p_i - \sum_i \sum_j \frac{(a_{ij} + b_{ij}) (\theta p_i + p_j)}{\theta_n p_{ni} + p_{nj}} \]
where we’ve dropped the irrelevant constant.
Now we can optimize our surrogate \(g\), instead of the objective function by setting the \(\nabla g = 0\). The \(\theta p_i\) term prevents direct optimization. Instead, we perform cyclic block ascent. That is, we update \(\theta\) holding \(\mathbf{p}\) fixed and we update each \(p_i\) holding all other parameters fixed.
To proceed, update
\[ \theta_{n + 1} = \frac{h}{\sum_i \sum_j \frac{p_{ni}(a_{ij} + b_{ij})}{\theta_n p_{ni} + p_{nj}}} \]
and then update each
\[ p_{n+1, i} = \frac{w_i}{ \sum_j \frac{\theta_{n+1}(a_{ij} + b_{ij})}{\theta_{n+1} p_{ni} + p_{nj}} + \sum_j \frac{(a_{ji} + b_{ji})}{\theta_{n+1} p_{nj} + p_{ni}} }. \]
Crucially cyclic block ascent preserves the ascent property, that is, for objective function \(f\), iterates \(f(x_{n+1}) \geq f(x_n)\) when \(g\) minorizes \(f\).
To be completed in the next lab… but we’ll get started in class:
Implement the MM algorithm as described on the previous slide for the NBA data. Your implementation should be able to be adapted to another data set of identical construction (i.e. don’t hard-code values).
What are the ten highest ranked teams (in order) from the 2015-2016 season according to the Bradley Terry model?
Is there a home-court advantage? What are the odds of winning at home vs away?
Content of this lecture based on chapter 1 of Dr. Ken Lange’s MM Optimization Algorithms.
Lange, Kenneth. MM Optimization Algorithms. Society for Industrial and Applied Mathematics, 2016.
\[ \begin{aligned} &\sum_i \sum_j a_{ij} \log p_i + b_{ij} \log p_j = \sum_i \log p_i \sum_j a_{ij} + \sum_j \log p_j \sum_i b_{ij}\\ \end{aligned} \]
Notice \(\sum_j a_{ij}\) is the number of wins of team \(i\) at home. We’ll call this term \(\alpha_i\). Similarly, \(\sum_i b_{ij}\) is the number of wins of team \(j\) away. We’ll call this term \(\beta_j\).
So we have,
\[ \sum_i \alpha_i \log p_i + \sum_j \beta_j \log p_j \]
Since \(i\) and \(j\) are both summations from \(1\) to \(n\), where \(n\) is the number of teams, we can equivalently write these with the same index:
\[ \sum_i (\alpha_i + \beta_i) \log p_i \]
where \(\alpha_i\) is the number of wins of team \(i\) at home and \(\beta_i\) is the number of wins of team \(i\) away. So \(\alpha_i + \beta_i = w_i\).