FootballForecasts: France UEFA European Championship 2016

Algorithm overview

Team-level and proxied player-level data are used in a consensus model. The concept of the 'Wisdom of the Crowd' is applied through implementation of an array of diverse artificial intelligence (Machine Learning) predictors that learn patterns in vast amounts of data: over 5,000 international matches (~10,000 game samples) train the 18 input model. One such sub-model derives its functionality from the way in which biological neurons process information in the human brain. The other models in the consensus have strong roots within the literature, such as Support Vector Regression, which has previously been used in stock market prediction, speech recognition, and even cancer tissue classification.

The 18 inputs attempt to encapsulate various relationships between the two teams, such as; relative strength, attacking capability, defensive capability, and variables to capture systematic variation in results due to different conditions of play. Momentum-like inputs are also provided that can help identify changes in a team's general performance (e.g. goals per match increase after signing a new player).

The learning algorithms used are 'online', so as more games are played, the model continues to learn from incoming data. The static predictions show a snapshot of all group stage predictions before the tournament started, however the dynamic predictions (and the knockout predictions) update as game days complete, and so make use of data from more recent game days. This proved to be useful in the case of Wales, who outperformed the model's initial expectations, which was later adjusted for using this method.

How did we do?

After the tournament ended, comparisons were made between FootballForecasts and two other cutting-edge predictors. One commercial predictor (Betegy), and one non-commercial predictor (Goldman Sachs) were the comparison subjects.

In the tables below, we compare Goldman Sachs' predictions to our statically generated predictions, as their group stage predictions were statically generated at the beginning of the tournament. We only consider the group stage for this comparison, as Goldman Sachs' model wasn't updated after every round during the knockout stage, so doesn't have predictions for all games that took place. The dynamic models have access to more data (they're updated after each game day), and so are more powerful (see results below). These can still be compared against Betegy's predictions, as these were also dynamically generated with the latest available data.

Comparison metrics

  •      - RMSE (Root Mean Squared Error): A measure of the average error between the predicted and actual number of goals scored. Lower is better.

  •      - Draw or correct outcome (e.g. draw or home win): Considers a prediction to be successful if a draw occurs or the predicted outcome occurs. A commonly used hedging strategy. In the case of FootballForecasts, the unrounded prediction for this is more performant, as the largest decimal value is decided as the winner (draw doesn't occur). This metric is included for rounded predictions (where draws can be predicted) for completeness and for comparison with the other predictors which also produce whole numbers (and draws can be predicted).

  •      - Percentage of scores correct: Simply measures the percentage of scores correct, whereas scores correct (one side) measures the percentage of goals scored which are predicted correctly when considering each team in each game separately.

Group stage results

Only games in the group stage (36 games).

Predictor
RMSE
Draw OR correct outcome
% scores correct
% scores correct (one side)
FF (Static unrounded)
0.973
72.2%
N/A
N/A
FF (Static rounded)
0.979
52.8%
11.1%
33.3%
Goldman Sachs
0.993
47.2%
13.9%
34.7%
Predictor
RMSE
Draw OR correct outcome
% scores correct
% scores correct (one side)
FF (Dynamic unrounded)
0.987
75.0%
N/A
N/A
FF (Dynamic rounded)
0.979
55.6%
13.9%
37.5%
Betegy
1.219
66.7%
11.1%
31.9%

Group + Knockout stage results

All games in the tournament (51 games).

The percentage of bet recommendations correct considers only games in which Betegy recommends 'draw or home win', 'draw or away win' or 'draw', and discards all 'no bet', 'under_N' or 'over_N' goals recommendations. FootballForecasts doesn't supply 'no bet' recommendation or 'over/under goals', so only the games in which Betegy produced the same recommendation scheme as us can be directly compared. This narrows the space down from 51 games to 44 games.

Predictor
RMSE
Draw OR correct outcome
% scores correct
% scores correct (one side)
Bet recommendations % correct (subset of 44 games)
FF (Dynamic unrounded)
1.048
72.5%
N/A
N/A
N/A
FF (Dynamic rounded)
1.043
51.0%
15.7%
38.2%
65.9%
Betegy
1.306
64.7%
7.8%
31.4%
63.6%

Conclusions

It is encouraging that FootballForecasts' predictor has the lowest RMSE values across both the 'group stage' data subset and the 'group + knockout stages' set, as this metric captures regression performance as a whole, and is a widely used and trusted metric. Despite this, football is a relatively stochastic game, so it is therefore hard to learn patterns in data without a substantial error term, which can be seen in this value.

A success of the predictor is its ability to forecast 'draw or outcome' results with high accuracy, due to the decimal output of the regression algorithms providing more information about the expected goals scored for each side, so no draw outcomes are predicted. The rounding method reduced the accuracy somewhat, making this less competitive with the other predictors, however still producing improved results over the Goldman Sachs predictor. It would be interesting to explore on a larger dataset why the RMSE values were relatively close for FootballForecasts and Goldman Sachs, yet the 'draw or correct outcome' results were quite different (for the static predictor). A significant result is the increase in information that is given by producing a decimal score rather than in whole number form.

It is compelling to see that the dynamic predictions provided much more accuracy than static predictions, in terms of the percentage of scores correct, confirming the intuition that recent data provides more value than past data in predicting football scores correctly. A substantial success of the predictor is the size of the margin in which the 'percentage of scores correct' improves upon the Betegy predictor results for the same metric, for the group + knockout set.