公开数据集

新闻类别数据集

80.03M

306 浏览

0 喜欢

0 次下载

0 条讨论

News,NLP,Classification,Deep Learning,Linguistics Classification

数据介绍
文件预览
相关论文
Code
分享讨论(0)
使用声明

启动Notebook开发

数据结构 ? 80.03M

* 以上分析是由系统提取分析形成的结果，具体实际数据为准。

README.md

# Context This dataset contains around 200k news headlines from the year 2012 to 2018 obtained from [HuffPost](https://www.huffingtonpost.com/). The model trained on this dataset could be used to identify tags for untracked news articles or to identify the type of language used in different news articles. # Content Each news headline has a corresponding category. Categories and corresponding article counts are as follows: * POLITICS: 32739 * WELLNESS: 17827 * ENTERTAINMENT: 16058 * TRAVEL: 9887 * STYLE & BEAUTY: 9649 * PARENTING: 8677 * HEALTHY LIVING: 6694 * QUEER VOICES: 6314 * FOOD & DRINK: 6226 * BUSINESS: 5937 * COMEDY: 5175 * SPORTS: 4884 * BLACK VOICES: 4528 * HOME & LIVING: 4195 * PARENTS: 3955 * THE WORLDPOST: 3664 * WEDDINGS: 3651 * WOMEN: 3490 * IMPACT: 3459 * DIVORCE: 3426 * CRIME: 3405 * MEDIA: 2815 * WEIRD NEWS: 2670 * GREEN: 2622 * WORLDPOST: 2579 * RELIGION: 2556 * STYLE: 2254 * SCIENCE: 2178 * WORLD NEWS: 2177 * TASTE: 2096 * TECH: 2082 * MONEY: 1707 * ARTS: 1509 * FIFTY: 1401 * GOOD NEWS: 1398 * ARTS & CULTURE: 1339 * ENVIRONMENT: 1323 * COLLEGE: 1144 * LATINO VOICES: 1129 * CULTURE & ARTS: 1030 * EDUCATION: 1004 # Acknowledgements This dataset was collected from [HuffPost](https://www.huffingtonpost.com/). # Inspiration * Can you categorize news articles based on their headlines and short descriptions? * Do news articles from different categories have different writing styles? * A classifier trained on this dataset could be used on a free text to identify the type of language being used. # Citation If you're using this dataset for research purposes, please use the following BibTex for citation: @dataset{dataset, author = {Misra, Rishabh}, year = {2018}, month = {06}, pages = {}, title = {News Category Dataset}, doi = {10.13140/RG.2.2.20331.18729} } Please link to [rishabhmisra.github.io/publications](https://rishabhmisra.github.io/publications/) in your report. Thanks! Other datasets Please also checkout the following datasets collected by me: * [News Headlines Dataset For Sarcasm Detection](https://www.kaggle.com/rmisra/news-headlines-dataset-for-sarcasm-detection) * [Clothing Fit Dataset for Size Recommendation](https://www.kaggle.com/rmisra/clothing-fit-dataset-for-size-recommendation) * [IMDB Spoiler Dataset](https://www.kaggle.com/rmisra/imdb-spoiler-dataset)

暂无相关内容。

分享你的想法

去分享你的想法~~

全部内容

欢迎交流分享

开始分享您的观点和意见，和大家一起交流分享.

数据使用声明：

一、数据来源与展示说明：

1、该数据来自于互联网数据采集或服务商的提供，本平台为用户提供数据集的展示与浏览。
2、本平台仅作为数据集的基本信息展示、包括但不限于图像、文本、视频、音频等文件类型。
3、数据集基本信息来自数据原地址或数据提供方提供的信息，如数据集描述中有描述差异，请以数据原地址或服务商原地址为准。

二、所有权说明：

1、本站中的所有数据集的版权都归属于原数据发布者或数据提供方所有。

三、数据转载说明：

1、如您需要转载本站数据，请保留原数据地址及相关版权声明。

四、侵权与处理说明：

1、如本站中的部分数据涉及侵权展示，请及时联系本站，我们会安排进行数据下线。

所需积分：

0 去赚积分？

306浏览
0下载
0点赞
收藏
分享

Select Language

AI社区

今日排行

本月搜索

Dataset Category