公开数据集
数据结构 ? 174.44M
Data Structure ?
* 以上分析是由系统提取分析形成的结果,具体实际数据为准。
README.md
Context
-------------
Most publicly available football (soccer) statistics are limited to aggregated data such as Goals, Shots, Fouls, Cards. When assessing performance or building predictive models, this simple aggregation, without any context, can be misleading. For example, a team that produced 10 shots on target from long range has a lower chance of scoring than a club that produced the same amount of shots from inside the box. However, metrics derived from this simple count of shots will similarly asses the two teams.
A football game generates much more events and it is very important and interesting to take into account the context in which those events were generated. This dataset should keep sports analytics enthusiasts awake for long hours as the number of questions that can be asked is huge.
Content
-------
This dataset is a result of a very tiresome effort of webscraping and integrating different data sources. The central element is the text commentary. All the events were derived by reverse engineering the text commentary, using regex. Using this, I was able to derive 11 types of events, as well as the main player and secondary player involved in those events and many other statistics. In case I've missed extracting some useful information, you are gladly invited to do so and share your findings. The dataset provides a granular view of 9,074 games, totaling 941,009 events from the biggest 5 European football (soccer) leagues: England, Spain, Germany, Italy, France from 2011/2012 season to 2016/2017 season as of 25.01.2017.
There are games that have been played during these seasons for which I could not collect detailed data. Overall, over 90% of the played games during these seasons have event data.
The dataset is organized in 3 files:
- **events.csv** contains event data about each game. Text commentary was scraped from: bbc.com, espn.com and onefootball.com
- **ginf.csv** - contains metadata and market odds about each game. odds were collected from oddsportal.com
- **dictionary.txt** contains a dictionary with the textual description of each categorical variable coded with integers
Past Research
-------------
I have used this data to:
- create predictive models for football games in order to bet on football outcomes.
- make visualizations about upcoming games
- build expected goals models and compare players
Inspiration
-----------
There are tons of interesting questions a sports enthusiast can answer with this dataset. For example:
- What is the value of a shot? Or what is the probability of a shot being a goal given it's location, shooter, league, assist method, gamestate, number of players on the pitch, time - known as expected goals (xG) models
- When are teams more likely to score?
- Which teams are the best or sloppiest at holding the lead?
- Which teams or players make the best use of set pieces?
- In which leagues is the referee more likely to give a card?
- How do players compare when they shoot with their week foot versus strong foot? Or which players are ambidextrous?
- Identify different styles of plays (shooting from long range vs shooting from the box, crossing the ball vs passing the ball, use of headers)
- Which teams have a bias for attacking on a particular flank?
And many many more...
×
帕依提提提温馨提示
该数据集正在整理中,为您准备了其他渠道,请您使用
注:部分数据正在处理中,未能直接提供下载,还请大家理解和支持。
暂无相关内容。
暂无相关内容。
- 分享你的想法
去分享你的想法~~
全部内容
欢迎交流分享
开始分享您的观点和意见,和大家一起交流分享.
数据使用声明:
- 1、该数据来自于互联网数据采集或服务商的提供,本平台为用户提供数据集的展示与浏览。
- 2、本平台仅作为数据集的基本信息展示、包括但不限于图像、文本、视频、音频等文件类型。
- 3、数据集基本信息来自数据原地址或数据提供方提供的信息,如数据集描述中有描述差异,请以数据原地址或服务商原地址为准。
- 1、本站中的所有数据集的版权都归属于原数据发布者或数据提供方所有。
- 1、如您需要转载本站数据,请保留原数据地址及相关版权声明。
- 1、如本站中的部分数据涉及侵权展示,请及时联系本站,我们会安排进行数据下线。