Select Language

AI社区

公开数据集

国际足联世界杯 2018 推文

国际足联世界杯 2018 推文

175.13M
299 浏览
0 喜欢
0 次下载
0 条讨论
Online Communities,Football Classification

数据结构 ? 175.13M

    Data Structure ?

    * 以上分析是由系统提取分析形成的结果,具体实际数据为准。

    README.md

    **Context:** ------------ The FIFA World Cup (often simply called the World Cup?), ?being the most prestigious association football tournament, as well as the most widely viewed and followed sporting event in the world, was one of the Top Trending topics frequently on Twitter while ongoing. 

    This dataset contains a random collection of 530k tweets starting from the Round of 16 till the World Cup Final that took place on 15 July, 2018 & was won by France
    A preliminary analysis from the data (till the Round of 16) is available at:
    [https://medium.com/@ritu_rg/nlp-text-visualization-twitter-sentiment-analysis-in-r-5ac22c778448][1] **Content:** ------------ **Data Collection:**
    The dataset was created using the Tweepy API, by streaming tweets from world-wide football fans before, during or after the matches.
    Tweepy is a Python API for accessing the Twitter API, that provides an easy-to-use interface for streaming real-time data from Twitter. More information related to this API can be found at: http://tweepy.readthedocs.io/en/v3.5.0/

    **Data Pre-processing:**
    The dataset includes English language tweets containing any references to FIFA or the World Cup. The collected tweets have been pre-processed to facilitate analysis?, while trying to ensure that any information from the original tweets is not lost. 
    - The original tweet has been stored in the column "Orig_tweet". 
    - As part of pre-processing, using the "BeautifulSoup" & "regex" libraries in Python, the tweets have been cleaned off any nuances as required for natural language processing, such as website names, hashtags, user mentions, special characters, RTs, tabs, heading/trailing/multiple spaces, among others.
    - Words containing extensions such as n't 'll 're 've have been replaced with their proper English language counterparts. Duplicate tweets have been removed from the dataset.
    - The original Hashtags & User Mentions extracted during the above step have also been stored in separate columns.

    **Data Storage:**
    The collected tweets have been consolidated into a single dataset & shared as a Comma Separated Values file "FIFA.csv".
    Each tweet is uniquely identifiable by its ID, & characterized by the following attributes, per availability:
    - "Lang"?-?Language of the tweet
    - "Date"?-?When it was tweeted
    - "Source"?-?The device/medium where it was tweeted from
    - "len"?-?The length of the tweet
    - "Orig_Tweet"?-?The tweet in its original form
    - "Tweet"?-?The updated tweet after pre-processing
    - "Likes"?-?The number of likes received by the tweet (till the time the extraction was done)
    - "RTs"?-?The number of times the tweet was shared
    - "Hashtags"?-?The Hashtags found in the original tweet
    - "UserMentionNames" & "UserMentionID"?-? Extracted from the original tweet

    It also includes the following attributes about the person that the tweet is from:
    - "Name" & "Place" of the user
    - "Followers"?-?The number of followers that the user account has
    - "Friends"?-?The number of friends the user account has
    **Acknowledgements:**
    ----------------- The following resources have helped me through using the Tweepy API:
    [http://tweepy.readthedocs.io/en/v3.5.0/auth_tutorial.html][2]
    [https://developer.twitter.com/en/docs/tweets/search/api-reference/get-search-tweets][3]
    [https://www.safaribooksonline.com/library/view/mining-the-social/9781449368180/ch01.html][4]
    **Inspiration:**
    ------------ This project gave me a fascinating look into the conversations & sentiments of people from all over the world, who were following this prestigious football tournament, while also giving me the opportunity to explore some of the streaming, natural language processing & visualizations techniques in both R & Python

    [1]: https://medium.com/@ritu_rg/nlp-text-visualization-twitter-sentiment-analysis-in-r-5ac22c778448 [2]: http://tweepy.readthedocs.io/en/v3.5.0/auth_tutorial.html [3]: https://developer.twitter.com/en/docs/tweets/search/api-reference/get-search-tweets [4]: https://www.safaribooksonline.com/library/view/mining-the-social/9781449368180/ch01.html
    ×

    帕依提提提温馨提示

    该数据集正在整理中,为您准备了其他渠道,请您使用

    注:部分数据正在处理中,未能直接提供下载,还请大家理解和支持。
    暂无相关内容。
    暂无相关内容。
    • 分享你的想法
    去分享你的想法~~

    全部内容

      欢迎交流分享
      开始分享您的观点和意见,和大家一起交流分享.
    所需积分:0 去赚积分?
    • 299浏览
    • 0下载
    • 0点赞
    • 收藏
    • 分享