Introduction to Statistical Football Prediction

Predicting football matches statistically means estimating the probability of each possible outcome — not picking a winner. The foundation of most goal-based models is a simple observation: the number of goals a team scores in a match follows a Poisson distribution reasonably well.

This insight, first documented by Moroney (1956) and later formalized by Maher (1982), allows us to build a full probability matrix for any match if we can estimate each team's expected goals (λ). From that matrix, every market — 1X2, BTTS, Over/Under, correct score, Asian handicap — can be derived mathematically.

The Poisson Distribution and Football

The Poisson distribution models the probability of a given number of events occurring in a fixed interval, when events happen independently at a constant average rate. For football, the "event" is a goal and the "interval" is one match.

The probability of exactly k goals given an expected rate λ is:

Poisson Formula

P(X = k) = (λk × e−λ) / k!

Where λ is the expected number of goals, e ≈ 2.71828, and k! is the factorial of k.

Why does this work for football? Goals are relatively rare events (typically 1–3 per team per match), they occur somewhat independently of each other within a match, and the average rate varies by team strength and context. These properties align well with Poisson assumptions.

Example: λ = 1.5 goals

P(0 goals) = 22.3%

P(1 goal) = 33.5%

P(2 goals) = 25.1%

P(3 goals) = 12.6%

P(4+ goals) = 6.5%

Independent Poisson Model

The simplest approach assumes home and away goals are independent. If we estimate λhome and λaway separately, the probability of any specific scoreline (i, j) is simply:

Joint Probability

P(Home=i, Away=j) = Phome(i) × Paway(j)

This generates a full scoreline probability matrix. For example, with λhome = 1.6 and λaway = 1.1:

Away 0Away 1Away 2Away 3
Home 06.7%7.4%4.1%1.5%
Home 110.8%11.8%6.5%2.4%
Home 28.6%9.5%5.2%1.9%
Home 34.6%5.1%2.8%1.0%

From this matrix, you can sum cells to get any market probability. Home win = sum of all cells where i > j. Draw = sum of diagonal. Away win = sum where j > i.

Limitations of Independence

The independent model has a known flaw: it underestimates the probability of low-scoring draws (especially 0-0 and 1-1). In real football, these scorelines occur more frequently than the independent model predicts. This is where Dixon and Coles stepped in.

The Dixon-Coles Correction

In their landmark 1997 paper, Mark Dixon and Stuart Coles introduced a correction factor ρ (rho) that adjusts the joint probability for low-scoring outcomes. The key insight: home and away goals are not fully independent — tactical and psychological factors create a correlation, particularly in tight, low-scoring matches.

The correction applies to four specific scorelines:

ScorelineCorrection Factor
0-01 + λh × λa × ρ
1-01 − λa × ρ
0-11 − λh × ρ
1-11 + ρ

When ρ is negative (which it typically is, around −0.03 to −0.10), the 0-0 and 1-1 probabilities increase while 1-0 and 0-1 decrease. This better matches observed frequencies in real match data.

The Dixon-Coles correction is small in magnitude but meaningful over thousands of predictions. It primarily affects correct score and Under 0.5/1.5 markets where low-scoring outcomes dominate.

How ExPrysm Uses Poisson

ExPrysm doesn't use the classical Poisson approach of estimating attack and defense parameters from historical averages. Instead, it uses a machine learning approach that's more powerful and flexible:

1
CatBoost Poisson Regression
Two separate CatBoost models (home_goals.cbm and away_goals.cbm) are trained with Poisson loss to predict λhome and λaway directly. Each model uses 53 features including Pi-ratings, form metrics, and head-to-head statistics.
2
Poisson Distribution Generation
The predicted λ values are fed into the Poisson probability mass function to generate a full scoreline probability matrix (typically 0–7 goals for each team).
3
Market Derivation
The scoreline matrix is aggregated to produce probabilities for every market: BTTS, Over/Under, correct score, Asian handicap lines, and more.

The advantage of this approach over classical Dixon-Coles is that CatBoost can capture non-linear relationships between features and expected goals. It doesn't assume a fixed attack/defense parameter per team — instead, it learns how 53 different contextual features interact to produce the expected goal rate for each specific match.

Production Ensemble

For the final match result (1X2) prediction, ExPrysm uses a production ensemble that combines two approaches:

Ensemble Formula

P(outcome) = 0.70 × PCatBoost MS + 0.30 × PPoisson

The CatBoost match result classifier (69 features, class_weights=[1.0, 1.3, 1.0]) provides the primary signal, while the Poisson-derived probabilities add a complementary perspective from the goals model.

From Poisson to Markets

Once you have the scoreline probability matrix, deriving market probabilities is straightforward arithmetic:

BTTS (Both Teams to Score)

Sum all cells where both home goals ≥ 1 and away goals ≥ 1. Equivalently: P(BTTS) = 1 − P(home=0) − P(away=0) + P(0-0).

Over/Under Goals

For Over 2.5: sum all cells where home + away ≥ 3. For Under 2.5: sum all cells where home + away ≤ 2. The same logic applies to any line (1.5, 3.5, etc.).

Correct Score

Each cell in the matrix directly gives the probability of that exact scoreline. The most probable scoreline is the cell with the highest value.

Asian Handicap

Apply the handicap to each scoreline and determine win/loss/push for each cell. Sum the probabilities weighted by the outcome. For example, Home −1.5: sum all cells where (home − away) > 1.5.

This is why the Poisson goals model is so valuable — a single pair of λ values generates probabilities for every goals-related market simultaneously. Learn more about BTTS in our BTTS Explained guide.

Limitations and Improvements

No model is perfect. The Poisson approach has known limitations that ExPrysm addresses through its feature engineering:

  • Time-varying attack/defense: Team strength changes throughout a season. ExPrysm handles this through Pi-ratings (updated daily) and rolling form features rather than static season averages.
  • Home advantage decay: Home advantage has been declining across European football since 2010, and dropped further during COVID-era empty stadiums. ExPrysm's models learn the current home advantage from recent data rather than assuming a fixed value.
  • Cup vs league dynamics: Cup matches have different tactical profiles (more cautious, more extra time scenarios). ExPrysm's features include competition type to capture these differences.
  • Independence assumption: While the Dixon-Coles ρ parameter helps, goals within a match are never truly independent. A team that goes 1-0 up may play more defensively. CatBoost's non-linear modeling partially captures these dynamics through contextual features.
  • Overdispersion: For some markets (cards, corners), goals don't follow Poisson well because the variance exceeds the mean. ExPrysm uses Negative Binomial regression for these markets instead.

Conclusion

The Poisson distribution remains the most elegant and practical foundation for football goal modeling. The Dixon-Coles correction refines it for low-scoring outcomes. ExPrysm builds on this foundation by replacing simple parameter estimation with CatBoost Poisson regression — using 53 features to predict expected goals with greater accuracy than classical methods.

The result is a system that generates calibrated probabilities across every goals-related market from a single pair of predicted λ values, combined with a direct match result classifier in a 70/30 ensemble for the final 1X2 prediction.

See how these models perform in practice on the Performance page, with results from 7,800+ matches across 100+ leagues.