Based on 40,000 matches, I calculated the odds for each match in this year's World Cup

The quadrennial World Cup kicks off this week. If you are a fake football fan like me, please know the following first:

The World Cup is being held in Russia
A total of 32 teams are divided into 8 groups, and the top 2 teams in each group advance to the knockout round
The competition lasts a month
A total of 64 games were held in various Russian cities
There is no Chinese team because it did not qualify
No Italy, no Netherlands, because they didn’t qualify either
There are no Barcelona, Real Madrid, Manchester United, Bayern…
Regular game time, half and half 45 minutes each
A penalty shootout is a tie in the knockout stage after 30 minutes of extra time
Messi is From Argentina, Cristiano Ronaldo is from Portugal, Neymar is from Brazil, they are not Spanish
Ronaldo, ronaldinho, Kaka and Beckham are not taking part
The World Cup is a game of soccer, no Harden, Curry, durant

Every World Cup, one of the reserved programs is to predict the winner of the year, all kinds of gods, famous mouth, octopus, cats and dogs. Let me make a prediction this time. But what if I don’t understand balls? It’s okay, I can use the program! (It’s always a lie.)

The data source

The data, from Kaggle, is based on 38,929 matches from 1872 to 2018. We’ll use that data as the basis for our prediction.

International football results from 1872 to 2021

Available from the end of the article.

Let’s take a look at Kaggle, which is a data science competition platform, and we strongly recommend that those of you who are interested in data analysis or machine learning play.

Build a model

With all this historical game data, how do you predict? I established the following rules:

Data that is too old is of limited reference value to the current team, so set a starting age
Find the match data of the opposing sides from the starting years to now, and calculate the probability of victory =(wins + draws /2)/ total games
In the group stage, the team whose probability of victory exceeds a certain threshold (say, 0.7) wins, otherwise it is a draw
In the knockout phase, the team with the best probability wins
If the two teams have not played each other since the start of the year, N more years of data are selected in advance (generally appearing in teams with fewer games).
If they still haven’t played each other, the probability of victory is calculated based on their records against all other teams in the tournament. The team with the highest probability wins. However, if it is a group match, the probability difference must be higher than a certain threshold (such as 0.1), otherwise it is a draw

Schedule simulation

Based on the above rule model, we imported data and simulated 64 matches of 32 teams in this World Cup through Python program.

This “predicted” the outcome of the game.

Predicted results

So, what exactly does this code run look like?

Because different starting years and local thresholds will get different results. I tried to use 11 different years from 2006 to 2016 and the values of 4 groups of N to obtain a total of 44 groups of competition results. The number of times he won was:

23 times in Brazil

Spain 12 times

Germany 6 times

England 3 times

Brazil, it seems, is still the undisputed favorite to win. No wonder spinach websites are offering them the lowest odds.

But Brazil aside, England performed exceptionally well in my results. This is mainly due to their good record against Brazil in recent years: 1 win, 2 draws and 0 losses. Argentina, by contrast, is probably out of the running.

In addition, Senegal and Iran should be watched as they have a good record against other teams in recent years and could be dark horses:

Since 2012,

Senegal won 4, drew 3 and lost 1

Iran won 5, drew 6 and lost 3

Historical record query tool

Of course, my model is very crude. But the ball is round, and predicting the outcome of a game with historical data is just a bit of fun. If you have your own rules you want to implement, you can modify them based on my code. Access to code and data is explained at the end of the article.

In addition, I exported some of the data to make an online query tool, so that you can directly query the history of any two teams.

Click to enter:Online search tool for historical records

You can choose different years. At the same time I also created a set of “odds” calculation, for reference.

Home team’s combined win rate = Total games /(Home team’s wins + visiting team’s losses)

Because this odds model is based on more historical records, and the opponents of strong teams are mostly strong teams, while the opponents of weak teams are weak teams, the difference in odds is not as big as in the market, but generally speaking, it is basically in line with the relationship between winners and losers. If you find that there is a big difference between the result of a match and the result of someone else’s match, it could be an upset

The forecast results are for reference only. Any similarity is purely coincidental.

Finally, I suddenly thought, our national football team against the 32 teams of the record? What would happen if you were lucky enough to compete in another universe? So…

Since 2014:2 wins, 5 draws, 8 losses

Since 2002:8 wins, 19 draws, 35 losses

Panama, which has never played before, seems to be the only team to match.

Okay, forget it. Let us enjoy the joy of the World Cup!

The data and code used in this article can be downloaded from theCrossin’s programming classroom), reply to the keywordThe World Cup

Oh, oh, oh, oh, oh, oh, oh

Welcome to search and follow: Crossin programming classroom

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

Based on 40,000 matches, I calculated the odds for each match in this year’s World Cup

Click to enter:Online search tool for historical records

The data and code used in this article can be downloaded from theCrossin’s programming classroom), reply to the keywordThe World Cup

Based on 40,000 matches, I calculated the odds for each match in this year’s World Cup

Click to enter:Online search tool for historical records

The data and code used in this article can be downloaded from theCrossin’s programming classroom), reply to the keywordThe World Cup

Related Posts

360 internal monitoring System “Wonder Practice”

5. Arrays and slicing in Go language

How to solve the problem of byte stream garbled in Java? Read this article to teach you “calm enemy”!