公开数据集
数据结构 ? 0.11M
Data Structure ?
* 以上分析是由系统提取分析形成的结果,具体实际数据为准。
README.md
Context
Nowadays, in most sports either tracking or event data is available for sports data scientists to analyse leagues, teams, games or players. For example, in soccer event-based data is available for all major leagues by professional data providers like [Opta](https://www.optasports.com/), [Statsbomb](https://statsbomb.com/) or [Wyscout](https://wyscout.com/). For tennis this is different. Even though a camera-based tracking with Hawkeye is possible, this data is not available to the outside and only the largest courts are equipped with the system.
When I think about the latest breakthroughs in machine learning in image classification, detection, NLP ([deepl.com](https://www.deepl.com/translator)) and audio recognition ([Siri](https://www.apple.com/siri/), [Alexa](https://en.wikipedia.org/wiki/Amazon_Alexa)) it is evident that all of these areas provide a huge amount of *easily accessable* data.
Personally, I expect that there would be way more research in tennis if there would be a large amount of freely available match data.
There exists statistics of all matches played on ATP Tour which are available from different sources. For example, Jeff Sackmans [github repository](https://github.com/JeffSackmann) is a great way to start. He also has a [match charting project](http://www.tennisabstract.com/blog/2015/09/23/the-match-charting-project-quick-start-guide/) where point-by-point data is collected.
But when I think about tennis, it is about the movement of the players, their tactics, etc. It is the ball movement, the actual rallies and shots I want to be able to see and analyse.
Event data allows to capture positional, temporal and stroke information.
As a proof of concept, and a tribute to Novac Djokovic and Rafael Nadal, two of the greatest tennis players of all time, I manually annotated each rally and stroke of their [Australian Open final 2019](https://www.atptour.com/en/scores/2019/580/MS001/match-stats?isLive=False). Fortunately for me it only went over three sets.
Content
The data consists of all points played in the match. It is build hierarchically from **events**, to **rallies**, to actual **points**.
- **Points**: a list of all points played in the final with information about the server, receiver, point type, number of strokes, time of rally, new score of the game.
- **Rallies**: A list of all rallies with Server, Returner, etc.
- **Events**: Each time a player hit the ball, the stroke type, position of the player, and position of the opponent were recorded.
- **Serves**: For each successful serve, which was no failure, the position of the serve in the service box was recorded (whenever possible)
I have already done the hard part of data cleaning, and the dataset is hopefully easy to understand and ready to use.
Positions
The x, y positions are with respect to the court coordinate system shown in Figure 1. They were calculated from the pixel coordinates through a [direct linear transformation][1] at the beginning of the match. (As the camera angle changed a bit during the match, some of the positions are off.)
![The court coordinate system. The horizontal axis refers to x and the vertical axis to the y-direction.][2]
Inspiration
Look into the data, see what you can find. Is there information about the game in positional, temporal and stroke information that can tell you more about the players and the match than simple match sheet statistics like the number of break points or first serves in?
You can use the dataset however you want, but here are some things you could start with.
- It is a great way to practice pandas to generate general statistics like points played, serve percentages, games won, breakpoints etc. and compare them with the statistics from other websites.
- You can visualize the spatial positioning of the players on the court. I.e. answer the question if there is a difference between the return position of Nadal and Djokovic.
- You can calculate movement statistics like distance covered.
- You can calculate the percentage of forehand and backhands, or shot types like slice, topspin for each player.
- You can find out where the players are serving to? (Do not forget that Nadal is a lefty).
To get you started, I have created a sample kernel. Find it [here](https://www.kaggle.com/robseidl/australian-open-mens-final-2019-data-exploration).
[1]: https://en.wikipedia.org/wiki/Direct_linear_transformation
[2]: https://www.dropbox.com/s/gakg677f0uvhmb2/Screenshot%202019-03-02%2021.44.11.png?raw=1
×
帕依提提提温馨提示
该数据集正在整理中,为您准备了其他渠道,请您使用
注:部分数据正在处理中,未能直接提供下载,还请大家理解和支持。
暂无相关内容。
暂无相关内容。
- 分享你的想法
去分享你的想法~~
全部内容
欢迎交流分享
开始分享您的观点和意见,和大家一起交流分享.
数据使用声明:
- 1、该数据来自于互联网数据采集或服务商的提供,本平台为用户提供数据集的展示与浏览。
- 2、本平台仅作为数据集的基本信息展示、包括但不限于图像、文本、视频、音频等文件类型。
- 3、数据集基本信息来自数据原地址或数据提供方提供的信息,如数据集描述中有描述差异,请以数据原地址或服务商原地址为准。
- 1、本站中的所有数据集的版权都归属于原数据发布者或数据提供方所有。
- 1、如您需要转载本站数据,请保留原数据地址及相关版权声明。
- 1、如本站中的部分数据涉及侵权展示,请及时联系本站,我们会安排进行数据下线。