Predictions Vol.1 (“Prediction of the FIFA World Cup 2018”)

The below rough excerpt of Prediction of the FIFA World Cup 2018 – A random forest approach with an emphasis on estimated team ability parameters (PDF; 06/08/2018) | Andreas Groll, Christophe Ley, Gunther Schauberger, Hans Van Eetvelde is on our own.

p.1: …four previous FIFA World Cups 2002 – 2014: Poisson regression models, random forests and ranking methods. …
p.2: … By aggregating the winning odds from several online bookmakers and transforming those into winning probabilities, inverse tournament simulation can be used to compute team-specific abilities… With the team-specific abilities all single matches are simulated via paired comparisons and, hence, the complete tournament course is obtained. Using this approach, Zeileis, Leitner, and Hornik (2018) forecast Brazil to win the FIFA World Cup 2018 with a probability of 16.6%, followed by Germany (15.8%) and Spain (12.5%).
…(Audran, Bolliger, Kolb, Mariscal, and Pilloud, 2018): they obtain Germany as top favorite with a winning probability of 24.0%, followed by Brazil (19.8%) and Spain (16.1%). They use a statistical model based on four factors that are supposed to indicate how well a team will be doing during the tournament: the Elo rating, the teams’ performances in the qualifications preceding the World Cup, the teams’ success in previous World Cup tournaments and a home advantage. …
p.5:
Economic Factors [GDP per capita, Population],
Sportive factors [ODDSET probability, FIFA rank],
Home advantage [Host, Continent, Confederation],
Factors describing the team’s structure [(Second) maximum number of teammates, Average age, Number of Champions League (Europa League) players, Number of players abroad/Legionnaires],
Factors describing the team’s coach
p.8: 3.1 Random forests
…an aggregation of a (large) number of classification or regression trees (CARTs). …to find partitions such that the respective response values are very homogeneous within a partition but very heterogeneous between partitions. CARTs can be used both for metric response (regression trees) and for nominal/ordinal responses (classification trees). The most frequent visualization tool for CARTs is the so-called dendrogram…
p.11: 3.2 Regression
…the scores of the competing teams are treated as (conditionally) independent variables following a Poisson distribution (conditioned on certain covariates)…
p.13: 3.3 Ranking methods
…how Poisson models can be used to lead to rankings that reflect a team’s current ability… The main idea consists in assigning a strength parameter to every team and in estimating those parameters over a period of M matches via weighted maximum likelihood, where the weights are of two types: time depreciation and match importance…
… The match importance weights are directly inherited from the official FIFA ranking and can take the values 1 for a friendly game, 2.5 for a confederation or world cup qualifier, 3 for a confederation tournament…, and 4 for World Cup matches. …
p.15: 3.4 Combining methods
1. Form a training data set containing three out of four World Cups.
5. Compare predicted and real outcomes for all prediction methods.
p.16: …three different performance measures to compare the predictive power of the methods:
…the multinomial likelihood, the classification rate, the rank probability score (RPS)…

p.20: 4 Prediction of the FIFA World Cup 2018
…combination of a random forest with adequate team ability estimates from a ranking method… The abilities were estimated by the bivariate
Poisson model with a half period of 3 years. All matches of the 228 national teams played since 2010-06-13 up to 2018-06-06 are used for the estimation, what results in a total of more than 7000 matches. All further predictor variables are taken as the latest values shortly before the World Cup…
4.1 Probabilities for FIFA World Cup 2018 Winner
…according to our random forest model, Spain is the favored team with a predicted winning probability of 17.8% followed by Germany, Brazil,
France and Belgium. … While Oddset favors Germany and Brazil, the random forest model predicts a slight advantage for Spain. …
p.21: Table 8: Estimated probabilities (in %) for reaching the different stages in the FIFA World Cup 2018 for all 32 teams based on 100,000 simulation runs of the FIFA World Cup together with winning probabilities based on the ODDSET odds.
p.22: Figure 4: Winning probabilities conditional on reaching the single stages of the tournament for the five favored teams.
p.23: 4.2 Most probable tournament course
… While in Group B and Group G the model forecasts Spain followed by Portugal as well as Belgium followed by England with rather high probabilities of 38.5% and 38.1%, respectively, other groups such as Group A, Group F and Group H seem to be more volatile. …
According to the most probable tournament course, instead of the Spanish the German team would win the World Cup. However, again it becomes obvious
that with (in that case) Switzerland the German team has to face a much stronger opponent than Spain in the round-of-sixteen. Even though still being the favorite in this match, they would succeed to move on to the quarter finals only with a probability of 61%. While in the most probable course of the knock-out stage, though having tough times in all single stages, Germany would make its way into the final and defend the title…
p.24: Table 9: Most probable final group standings together with the corresponding probabilities for the FIFA World Cup 2018 based on 100,000 simulation runs.
5 Concluding remarks
random forests, Poisson regression models and ranking methods. The former two approaches incorporate covariate information of the opposing teams, while the latter method pro-
p.25: Figure 5: Most probable course of the knockout stage together with corresponding probabilities for the FIFA World Cup 2018 based on 100,000 simulation runs.
vides team ability parameters which serve as adequate estimates of the current team strengths. …by incorporating the team ability parameters from the ranking methods as an additional covariate into the random forest the predictive power becomes substantially increased, leading to the best model capable of beating the bookmakers. …
p.26: …the fact that overall Spain is slightly favored over Germany is mainly due to the fact that Germany has a comparatively high chance to drop out in the round-of-sixteen. Actually, conditioned that Germany reaches the quarter finals, it overtakes Spain…