Select Language

AI社区

公开数据集

新闻类别数据集

新闻类别数据集

80.03M
208 浏览
0 喜欢
0 次下载
0 条讨论
News,NLP,Classification,Deep Learning,Linguistics Classification

数据结构 ? 80.03M

    Data Structure ?

    * 以上分析是由系统提取分析形成的结果,具体实际数据为准。

    README.md

    # Context This dataset contains around 200k news headlines from the year 2012 to 2018 obtained from [HuffPost](https://www.huffingtonpost.com/). The model trained on this dataset could be used to identify tags for untracked news articles or to identify the type of language used in different news articles. # Content Each news headline has a corresponding category. Categories and corresponding article counts are as follows: * POLITICS: 32739 * WELLNESS: 17827 * ENTERTAINMENT: 16058 * TRAVEL: 9887 * STYLE & BEAUTY: 9649 * PARENTING: 8677 * HEALTHY LIVING: 6694 * QUEER VOICES: 6314 * FOOD & DRINK: 6226 * BUSINESS: 5937 * COMEDY: 5175 * SPORTS: 4884 * BLACK VOICES: 4528 * HOME & LIVING: 4195 * PARENTS: 3955 * THE WORLDPOST: 3664 * WEDDINGS: 3651 * WOMEN: 3490 * IMPACT: 3459 * DIVORCE: 3426 * CRIME: 3405 * MEDIA: 2815 * WEIRD NEWS: 2670 * GREEN: 2622 * WORLDPOST: 2579 * RELIGION: 2556 * STYLE: 2254 * SCIENCE: 2178 * WORLD NEWS: 2177 * TASTE: 2096 * TECH: 2082 * MONEY: 1707 * ARTS: 1509 * FIFTY: 1401 * GOOD NEWS: 1398 * ARTS & CULTURE: 1339 * ENVIRONMENT: 1323 * COLLEGE: 1144 * LATINO VOICES: 1129 * CULTURE & ARTS: 1030 * EDUCATION: 1004 # Acknowledgements This dataset was collected from [HuffPost](https://www.huffingtonpost.com/). # Inspiration * Can you categorize news articles based on their headlines and short descriptions? * Do news articles from different categories have different writing styles? * A classifier trained on this dataset could be used on a free text to identify the type of language being used. # Citation If you're using this dataset for research purposes, please use the following BibTex for citation: @dataset{dataset, author = {Misra, Rishabh}, year = {2018}, month = {06}, pages = {}, title = {News Category Dataset}, doi = {10.13140/RG.2.2.20331.18729} } Please link to [rishabhmisra.github.io/publications](https://rishabhmisra.github.io/publications/) in your report. Thanks! Other datasets Please also checkout the following datasets collected by me: * [News Headlines Dataset For Sarcasm Detection](https://www.kaggle.com/rmisra/news-headlines-dataset-for-sarcasm-detection) * [Clothing Fit Dataset for Size Recommendation](https://www.kaggle.com/rmisra/clothing-fit-dataset-for-size-recommendation) * [IMDB Spoiler Dataset](https://www.kaggle.com/rmisra/imdb-spoiler-dataset)
    ×

    帕依提提提温馨提示

    该数据集正在整理中,为您准备了其他渠道,请您使用

    注:部分数据正在处理中,未能直接提供下载,还请大家理解和支持。
    暂无相关内容。
    暂无相关内容。
    • 分享你的想法
    去分享你的想法~~

    全部内容

      欢迎交流分享
      开始分享您的观点和意见,和大家一起交流分享.
    所需积分:0 去赚积分?
    • 208浏览
    • 0下载
    • 0点赞
    • 收藏
    • 分享