Select Language

AI社区

公开数据集

纽约时报评论,对《纽约时报》发表文章的评论,超过200万条评论

纽约时报评论,对《纽约时报》发表文章的评论,超过200万条评论

1.55G
349 浏览
0 喜欢
0 次下载
0 条讨论
NLP,Computer Science,Programming,News Classification

New York Times has a wide audience and plays a prominent role in shaping people's opinion and outlook on current aff......

数据结构 ? 1.55G

    Data Structure ?

    * 以上分析是由系统提取分析形成的结果,具体实际数据为准。

    README.md

    New York Times has a wide audience and plays a prominent role in shaping people's opinion and outlook on current affairs and also in setting the tone of the public discourse, especially in the USA. The comment section in the articles is very active and it gives a glimpse of readers' take on the matters concerning the articles.

    Content

    The data contains information about the comments made on the articles published in New York Times in Jan-May 2017 and Jan-April 2018. The month-wise data is given in two csv files - one each for the articles on which comments were made and for the comments themselves. The csv files for comments contain over  2 million comments in total with 34 features and those for articles contain 16 features about more than 9,000 articles.

    Inspiration

    The data set is rich in information containing comments' texts, that are largely very well written, along with contextual information such as section/topic of the article, as well as features indicating how well the comment was received by the readers such as editorsSelection and recommendations. This data can serve the purpose of understanding and analyzing the public mood.  
    The exploratory kernel here can be used for a review of the features of the dataset and the NB-Logistic model kernel for predicting NYT's pick can be used as a starter for building models on a range of ideas, some of which are:

    1. Predicting the number of upvotes a comment will receive using the feature recommendations as the target variable. With enough training set for the model, we can make a guess of how a hypothetical comment on a certain topic will be received by the community of NYT readers' and this can be considered a tool to gauge public opinion. The design of this model will be very similar to the ones used in ranking the reviews based on guessing how many upvotes the reviews will receive.

    2. Predicting whether a comment will be editor's pick using feature editorsSelection as the target variable. It gives a clue to what NYT considers worth promoting.

    3. based on a comment, guessing the topic (using sectionName and/or newDesk as the target variable) of the article.

    4. Predicting how likely it is for a comment to get replies (using replyCount feature as the target variable).

    5. Predicting how likely it is for an article to initiate discussion and get comments and upvotes as well as sentiment analysis of the comments' text.

    6. Predicting the same as above for topics (indicated by the features sectionName and/or newDesk).

    7. Analyzing behaviors of the top commenters such as which topics they most likely comment and the sentiment analysis of the comments.

    Data collection

    The python package here written to supplant this dataset can be used to retrieve comments from a customized search of the NYT articles concerning a specific topic, for example - Iraq war or ObamaCare - in a given timeline. The tutorial here gives detailed information about the use of the package with the help of examples.


    ×

    帕依提提提温馨提示

    该数据集正在整理中,为您准备了其他渠道,请您使用

    注:部分数据正在处理中,未能直接提供下载,还请大家理解和支持。
    暂无相关内容。
    暂无相关内容。
    • 分享你的想法
    去分享你的想法~~

    全部内容

      欢迎交流分享
      开始分享您的观点和意见,和大家一起交流分享.
    所需积分:20 去赚积分?
    • 349浏览
    • 0下载
    • 0点赞
    • 收藏
    • 分享