公开数据集
数据结构 ? 1.55G
Data Structure ?
* 以上分析是由系统提取分析形成的结果,具体实际数据为准。
README.md
New York Times has a wide audience and plays a prominent role in shaping people's opinion and outlook on current affairs and also in setting the tone of the public discourse, especially in the USA. The comment section in the articles is very active and it gives a glimpse of readers' take on the matters concerning the articles.
Content
The data contains information about the comments made on the articles
published in New York Times in Jan-May 2017 and Jan-April 2018. The
month-wise data is given in two csv
files - one each for the articles on which comments were made and for the comments themselves. The csv
files for comments contain over 2 million comments in total with 34 features and those for articles contain 16 features about more than 9,000 articles.
Inspiration
The data set is rich in information containing comments' texts, that
are largely very well written, along with contextual information such as
section/topic of the article, as well as features indicating how well
the comment was received by the readers such as editorsSelection
and recommendations
. This data can serve the purpose of understanding and analyzing the public mood.
The exploratory kernel here can be used for a review of the features of the dataset and the NB-Logistic model kernel for predicting NYT's pick can be used as a starter for building models on a range of ideas, some of which are:
Predicting the number of upvotes a comment will receive using the feature
recommendations
as the target variable. With enough training set for the model, we can make a guess of how a hypothetical comment on a certain topic will be received by the community of NYT readers' and this can be considered a tool to gauge public opinion. The design of this model will be very similar to the ones used in ranking the reviews based on guessing how many upvotes the reviews will receive.Predicting whether a comment will be editor's pick using feature
editorsSelection
as the target variable. It gives a clue to what NYT considers worth promoting.based on a comment, guessing the topic (using
sectionName
and/ornewDesk
as the target variable) of the article.Predicting how likely it is for a comment to get replies (using
replyCount
feature as the target variable).Predicting how likely it is for an article to initiate discussion and get comments and upvotes as well as sentiment analysis of the comments' text.
Predicting the same as above for topics (indicated by the features
sectionName
and/ornewDesk
).Analyzing behaviors of the top commenters such as which topics they most likely comment and the sentiment analysis of the comments.
Data collection
The python package here written to supplant this dataset can be used to retrieve comments from a customized search of the NYT articles concerning a specific topic, for example - Iraq war or ObamaCare - in a given timeline. The tutorial here gives detailed information about the use of the package with the help of examples.
帕依提提提温馨提示
该数据集正在整理中,为您准备了其他渠道,请您使用
- 分享你的想法
全部内容
数据使用声明:
- 1、该数据来自于互联网数据采集或服务商的提供,本平台为用户提供数据集的展示与浏览。
- 2、本平台仅作为数据集的基本信息展示、包括但不限于图像、文本、视频、音频等文件类型。
- 3、数据集基本信息来自数据原地址或数据提供方提供的信息,如数据集描述中有描述差异,请以数据原地址或服务商原地址为准。
- 1、本站中的所有数据集的版权都归属于原数据发布者或数据提供方所有。
- 1、如您需要转载本站数据,请保留原数据地址及相关版权声明。
- 1、如本站中的部分数据涉及侵权展示,请及时联系本站,我们会安排进行数据下线。