公开数据集
数据结构 ? 80.03M
Data Structure ?
* 以上分析是由系统提取分析形成的结果,具体实际数据为准。
README.md
# Context
This dataset contains around 200k news headlines from the year 2012 to 2018 obtained from [HuffPost](https://www.huffingtonpost.com/). The model trained on this dataset could be used to identify tags for untracked news articles or to identify the type of language used in different news articles.
# Content
Each news headline has a corresponding category. Categories and corresponding article counts are as follows:
* POLITICS: 32739
* WELLNESS: 17827
* ENTERTAINMENT: 16058
* TRAVEL: 9887
* STYLE & BEAUTY: 9649
* PARENTING: 8677
* HEALTHY LIVING: 6694
* QUEER VOICES: 6314
* FOOD & DRINK: 6226
* BUSINESS: 5937
* COMEDY: 5175
* SPORTS: 4884
* BLACK VOICES: 4528
* HOME & LIVING: 4195
* PARENTS: 3955
* THE WORLDPOST: 3664
* WEDDINGS: 3651
* WOMEN: 3490
* IMPACT: 3459
* DIVORCE: 3426
* CRIME: 3405
* MEDIA: 2815
* WEIRD NEWS: 2670
* GREEN: 2622
* WORLDPOST: 2579
* RELIGION: 2556
* STYLE: 2254
* SCIENCE: 2178
* WORLD NEWS: 2177
* TASTE: 2096
* TECH: 2082
* MONEY: 1707
* ARTS: 1509
* FIFTY: 1401
* GOOD NEWS: 1398
* ARTS & CULTURE: 1339
* ENVIRONMENT: 1323
* COLLEGE: 1144
* LATINO VOICES: 1129
* CULTURE & ARTS: 1030
* EDUCATION: 1004
# Acknowledgements
This dataset was collected from [HuffPost](https://www.huffingtonpost.com/).
# Inspiration
* Can you categorize news articles based on their headlines and short descriptions?
* Do news articles from different categories have different writing styles?
* A classifier trained on this dataset could be used on a free text to identify the type of language being used.
# Citation
If you're using this dataset for research purposes, please use the following BibTex for citation:
@dataset{dataset,
author = {Misra, Rishabh},
year = {2018},
month = {06},
pages = {},
title = {News Category Dataset},
doi = {10.13140/RG.2.2.20331.18729}
}
Please link to [rishabhmisra.github.io/publications](https://rishabhmisra.github.io/publications/) in your report.
Thanks!
Other datasets
Please also checkout the following datasets collected by me:
* [News Headlines Dataset For Sarcasm Detection](https://www.kaggle.com/rmisra/news-headlines-dataset-for-sarcasm-detection)
* [Clothing Fit Dataset for Size Recommendation](https://www.kaggle.com/rmisra/clothing-fit-dataset-for-size-recommendation)
* [IMDB Spoiler Dataset](https://www.kaggle.com/rmisra/imdb-spoiler-dataset)
×
帕依提提提温馨提示
该数据集正在整理中,为您准备了其他渠道,请您使用
注:部分数据正在处理中,未能直接提供下载,还请大家理解和支持。
暂无相关内容。
暂无相关内容。
- 分享你的想法
去分享你的想法~~
全部内容
欢迎交流分享
开始分享您的观点和意见,和大家一起交流分享.
数据使用声明:
- 1、该数据来自于互联网数据采集或服务商的提供,本平台为用户提供数据集的展示与浏览。
- 2、本平台仅作为数据集的基本信息展示、包括但不限于图像、文本、视频、音频等文件类型。
- 3、数据集基本信息来自数据原地址或数据提供方提供的信息,如数据集描述中有描述差异,请以数据原地址或服务商原地址为准。
- 1、本站中的所有数据集的版权都归属于原数据发布者或数据提供方所有。
- 1、如您需要转载本站数据,请保留原数据地址及相关版权声明。
- 1、如本站中的部分数据涉及侵权展示,请及时联系本站,我们会安排进行数据下线。