ELO Ratings in Football — Measuring Team Strength for Predictions
Every prediction model needs a way to measure how strong a team is right now — not last season, not historically, but today. ELO ratings provide a simple, elegant solution. Pi-ratings take it further by separating attack from defense and home from away. Here's how both work and why ExPrysm uses them together.
What Are ELO Ratings?
The ELO rating system was invented by Arpad Elo in the 1960s to rank chess players. The core idea is beautifully simple: every team starts with a base rating (typically 1500), and after each match, the winner gains points while the loser loses points. The amount transferred depends on how surprising the result was.
If a strong team beats a weak team, few points change hands — the result was expected. If the weak team wins, many points transfer — the upset carries more information. Over time, ratings converge to reflect true team strength.
ELO was adapted for football by several researchers and organizations, including FIFA (for their world rankings until 2018) and FiveThirtyEight. It works well because football has clear win/draw/loss outcomes and teams play frequently enough for ratings to stay current.
How ELO Works
The ELO update rule has three components:
Expected Score
Before a match, the expected score for the home team is calculated from the rating difference:
Ehome = 1 / (1 + 10(Raway − Rhome − HFA) / 400)
Where R is the current rating and HFA is the home field advantage adjustment (typically 50–100 points).
K-Factor
The K-factor controls how much ratings change after each match. A higher K means ratings react faster to recent results (more volatile), while a lower K means ratings are more stable but slower to adapt. Typical values range from 20 to 40 for football.
Update Rule
After the match, ratings are updated based on the difference between the actual result and the expected score:
Rnew = Rold + K × (Sactual − Eexpected)
Where Sactual = 1 for a win, 0.5 for a draw, 0 for a loss.
Team A (rating 1650) plays at home against Team B (rating 1500). With K=30 and HFA=65:
EA = 1 / (1 + 10(1500 − 1650 − 65)/400) = 0.78
If Team A wins: RA = 1650 + 30 × (1 − 0.78) = 1656.6 (+6.6)
If Team B wins: RA = 1650 + 30 × (0 − 0.78) = 1626.6 (−23.4)
The upset transfers far more rating points than the expected result.
ELO in Football Context
Why It Works
ELO captures two things simultaneously: underlying team quality and recent form. A team on a winning streak will see its rating climb, reflecting both genuine improvement and momentum. This makes ELO a compact, information-rich feature for prediction models.
Limitations
Standard ELO has a fundamental limitation for football: it produces a single number per team. This means it cannot distinguish between a team that's strong in attack but weak in defense, or a team that performs differently at home versus away. A team rated 1600 could be a 3-2 team or a 1-0 team — ELO treats them identically.
Pi-Ratings: The Next Evolution
In 2013, Anthony Constantinou and Norman Fenton published a paper introducing Pi-ratings — a rating system specifically designed for football that addresses ELO's key limitations. Instead of one number per team, Pi-ratings maintain four:
The Pi-rating system uses three key parameters from the original paper:
| Parameter | Value | Purpose |
|---|---|---|
| b | 10 | Base multiplier for rating updates |
| c | 3 | Controls sensitivity to goal difference |
| lr | 0.1 | Learning rate — how fast ratings adapt |
After each match, all four ratings for both teams are updated based on the goals scored and conceded. The home attack rating increases when the team scores at home; the away defense rating of the opponent decreases. This creates a rich, multi-dimensional picture of team strength.
Pi-ratings are updated incrementally after every match day. ExPrysm runs daily updates to ensure ratings reflect the most recent results before generating predictions.
How ExPrysm Uses Team Ratings
ExPrysm doesn't use ELO or Pi-ratings as standalone predictors. Instead, they serve as features within the CatBoost gradient boosting models:
- Match Result Model: Uses both ELO ratings and Pi-ratings among its 69 features. The CatBoost classifier learns how rating differences interact with other features (form, head-to-head, league position) to predict match outcomes.
- Goals Model: The Poisson regression models (53 features) use Pi-ratings to help predict expected goals. The attack/defense separation is particularly valuable here — a team's home attack rating directly informs how many goals they're likely to score.
- Feature Importance: Pi-ratings account for approximately 24.5% of total feature importance in the match result model, making them the single most influential feature group. This confirms that team strength measurement is the foundation of accurate prediction.
The key design decision in ExPrysm is that the models use no odds-based features. Team ratings provide the "market-independent" strength signal that allows the model to generate its own probability estimates without being anchored to bookmaker odds.
ELO vs Pi-Ratings Comparison
| Aspect | ELO | Pi-Ratings |
|---|---|---|
| Values per team | 1 | 4 |
| Attack/Defense split | No | Yes |
| Home/Away split | No (fixed HFA) | Yes (separate ratings) |
| Goal difference used | Optional | Built-in |
| Complexity | Simple | Moderate |
| Interpretability | Very high | High |
| Information density | Low | High |
| Academic basis | Elo (1960s) | Constantinou & Fenton (2013) |
Both systems have value. ELO provides a simple, interpretable baseline — you can immediately understand that a team rated 1700 is stronger than one rated 1500. Pi-ratings provide richer information that machine learning models can exploit, particularly the attack/defense and home/away separations.
Practical Impact on Predictions
How do rating differences translate to win probabilities? Here's an approximate mapping from ELO differences:
| ELO Difference | Stronger Team Win % | Draw % | Weaker Team Win % |
|---|---|---|---|
| 0 (equal) | ~36% | ~28% | ~36% |
| +100 | ~45% | ~27% | ~28% |
| +200 | ~55% | ~24% | ~21% |
| +300 | ~64% | ~21% | ~15% |
| +400 | ~72% | ~17% | ~11% |
These are rough estimates — ExPrysm's CatBoost model produces more nuanced probabilities by considering all 69 features together, not just the rating difference. But this table illustrates why ratings are so valuable: they compress a team's entire match history into a single, predictive signal.
With Pi-ratings, the model gets even more granular. A team with a high home attack rating but low away defense rating will produce very different predictions depending on whether they're playing at home or away — something a single ELO number cannot capture.
Conclusion
ELO ratings provide a proven, interpretable measure of team strength that has worked across sports for decades. Pi-ratings extend this concept with the attack/defense and home/away dimensions that football demands. ExPrysm uses both as features in its CatBoost models, where Pi-ratings alone account for ~24.5% of feature importance — confirming that accurate team strength measurement is the single most important ingredient in football prediction.
Want to understand the full platform? Read What Is ExPrysm? for a complete overview of how all the models work together.