公开数据集
数据结构 ? 175.13M
Data Structure ?
* 以上分析是由系统提取分析形成的结果,具体实际数据为准。
README.md
**Context:**
------------
The FIFA World Cup (often simply called the World Cup?), ?being the most prestigious association football tournament, as well as the most widely viewed and followed sporting event in the world, was one of the Top Trending topics frequently on Twitter while ongoing.
This dataset contains a random collection of 530k tweets starting from the Round of 16 till the World Cup Final that took place on 15 July, 2018 & was won by France
A preliminary analysis from the data (till the Round of 16) is available at:
[https://medium.com/@ritu_rg/nlp-text-visualization-twitter-sentiment-analysis-in-r-5ac22c778448][1] **Content:** ------------ **Data Collection:**
The dataset was created using the Tweepy API, by streaming tweets from world-wide football fans before, during or after the matches.
Tweepy is a Python API for accessing the Twitter API, that provides an easy-to-use interface for streaming real-time data from Twitter. More information related to this API can be found at: http://tweepy.readthedocs.io/en/v3.5.0/
**Data Pre-processing:**
The dataset includes English language tweets containing any references to FIFA or the World Cup. The collected tweets have been pre-processed to facilitate analysis?, while trying to ensure that any information from the original tweets is not lost.
- The original tweet has been stored in the column "Orig_tweet".
- As part of pre-processing, using the "BeautifulSoup" & "regex" libraries in Python, the tweets have been cleaned off any nuances as required for natural language processing, such as website names, hashtags, user mentions, special characters, RTs, tabs, heading/trailing/multiple spaces, among others.
- Words containing extensions such as n't 'll 're 've have been replaced with their proper English language counterparts. Duplicate tweets have been removed from the dataset.
- The original Hashtags & User Mentions extracted during the above step have also been stored in separate columns.
**Data Storage:**
The collected tweets have been consolidated into a single dataset & shared as a Comma Separated Values file "FIFA.csv".
Each tweet is uniquely identifiable by its ID, & characterized by the following attributes, per availability:
- "Lang"?-?Language of the tweet
- "Date"?-?When it was tweeted
- "Source"?-?The device/medium where it was tweeted from
- "len"?-?The length of the tweet
- "Orig_Tweet"?-?The tweet in its original form
- "Tweet"?-?The updated tweet after pre-processing
- "Likes"?-?The number of likes received by the tweet (till the time the extraction was done)
- "RTs"?-?The number of times the tweet was shared
- "Hashtags"?-?The Hashtags found in the original tweet
- "UserMentionNames" & "UserMentionID"?-? Extracted from the original tweet
It also includes the following attributes about the person that the tweet is from:
- "Name" & "Place" of the user
- "Followers"?-?The number of followers that the user account has
- "Friends"?-?The number of friends the user account has
**Acknowledgements:**
----------------- The following resources have helped me through using the Tweepy API:
[http://tweepy.readthedocs.io/en/v3.5.0/auth_tutorial.html][2]
[https://developer.twitter.com/en/docs/tweets/search/api-reference/get-search-tweets][3]
[https://www.safaribooksonline.com/library/view/mining-the-social/9781449368180/ch01.html][4]
**Inspiration:**
------------ This project gave me a fascinating look into the conversations & sentiments of people from all over the world, who were following this prestigious football tournament, while also giving me the opportunity to explore some of the streaming, natural language processing & visualizations techniques in both R & Python
[1]: https://medium.com/@ritu_rg/nlp-text-visualization-twitter-sentiment-analysis-in-r-5ac22c778448 [2]: http://tweepy.readthedocs.io/en/v3.5.0/auth_tutorial.html [3]: https://developer.twitter.com/en/docs/tweets/search/api-reference/get-search-tweets [4]: https://www.safaribooksonline.com/library/view/mining-the-social/9781449368180/ch01.html
This dataset contains a random collection of 530k tweets starting from the Round of 16 till the World Cup Final that took place on 15 July, 2018 & was won by France
A preliminary analysis from the data (till the Round of 16) is available at:
[https://medium.com/@ritu_rg/nlp-text-visualization-twitter-sentiment-analysis-in-r-5ac22c778448][1] **Content:** ------------ **Data Collection:**
The dataset was created using the Tweepy API, by streaming tweets from world-wide football fans before, during or after the matches.
Tweepy is a Python API for accessing the Twitter API, that provides an easy-to-use interface for streaming real-time data from Twitter. More information related to this API can be found at: http://tweepy.readthedocs.io/en/v3.5.0/
**Data Pre-processing:**
The dataset includes English language tweets containing any references to FIFA or the World Cup. The collected tweets have been pre-processed to facilitate analysis?, while trying to ensure that any information from the original tweets is not lost.
- The original tweet has been stored in the column "Orig_tweet".
- As part of pre-processing, using the "BeautifulSoup" & "regex" libraries in Python, the tweets have been cleaned off any nuances as required for natural language processing, such as website names, hashtags, user mentions, special characters, RTs, tabs, heading/trailing/multiple spaces, among others.
- Words containing extensions such as n't 'll 're 've have been replaced with their proper English language counterparts. Duplicate tweets have been removed from the dataset.
- The original Hashtags & User Mentions extracted during the above step have also been stored in separate columns.
**Data Storage:**
The collected tweets have been consolidated into a single dataset & shared as a Comma Separated Values file "FIFA.csv".
Each tweet is uniquely identifiable by its ID, & characterized by the following attributes, per availability:
- "Lang"?-?Language of the tweet
- "Date"?-?When it was tweeted
- "Source"?-?The device/medium where it was tweeted from
- "len"?-?The length of the tweet
- "Orig_Tweet"?-?The tweet in its original form
- "Tweet"?-?The updated tweet after pre-processing
- "Likes"?-?The number of likes received by the tweet (till the time the extraction was done)
- "RTs"?-?The number of times the tweet was shared
- "Hashtags"?-?The Hashtags found in the original tweet
- "UserMentionNames" & "UserMentionID"?-? Extracted from the original tweet
It also includes the following attributes about the person that the tweet is from:
- "Name" & "Place" of the user
- "Followers"?-?The number of followers that the user account has
- "Friends"?-?The number of friends the user account has
**Acknowledgements:**
----------------- The following resources have helped me through using the Tweepy API:
[http://tweepy.readthedocs.io/en/v3.5.0/auth_tutorial.html][2]
[https://developer.twitter.com/en/docs/tweets/search/api-reference/get-search-tweets][3]
[https://www.safaribooksonline.com/library/view/mining-the-social/9781449368180/ch01.html][4]
**Inspiration:**
------------ This project gave me a fascinating look into the conversations & sentiments of people from all over the world, who were following this prestigious football tournament, while also giving me the opportunity to explore some of the streaming, natural language processing & visualizations techniques in both R & Python
[1]: https://medium.com/@ritu_rg/nlp-text-visualization-twitter-sentiment-analysis-in-r-5ac22c778448 [2]: http://tweepy.readthedocs.io/en/v3.5.0/auth_tutorial.html [3]: https://developer.twitter.com/en/docs/tweets/search/api-reference/get-search-tweets [4]: https://www.safaribooksonline.com/library/view/mining-the-social/9781449368180/ch01.html
×
帕依提提提温馨提示
该数据集正在整理中,为您准备了其他渠道,请您使用
注:部分数据正在处理中,未能直接提供下载,还请大家理解和支持。
暂无相关内容。
暂无相关内容。
- 分享你的想法
去分享你的想法~~
全部内容
欢迎交流分享
开始分享您的观点和意见,和大家一起交流分享.
数据使用声明:
- 1、该数据来自于互联网数据采集或服务商的提供,本平台为用户提供数据集的展示与浏览。
- 2、本平台仅作为数据集的基本信息展示、包括但不限于图像、文本、视频、音频等文件类型。
- 3、数据集基本信息来自数据原地址或数据提供方提供的信息,如数据集描述中有描述差异,请以数据原地址或服务商原地址为准。
- 1、本站中的所有数据集的版权都归属于原数据发布者或数据提供方所有。
- 1、如您需要转载本站数据,请保留原数据地址及相关版权声明。
- 1、如本站中的部分数据涉及侵权展示,请及时联系本站,我们会安排进行数据下线。