Select Language



本维基百科,每个件都包含维基百科章的集合 Wikipedia dumps contain a tremendous amount of markup. WikiMedia Text is a hybrid of markdown and HTML, making it very d...NLP,Computer Science,Text Data,Text Mining Classification
23.71G 269
品酒师给出的葡萄酒评论数据 ,使用本分类来对评论中的评论者进行分类 Thinking of Natural Language Processing as a beginner!!The dataset has been about the wine comments or reviews that has...NLP,Business,News,Text Data,Multiclass Classification,Alcohol Classification
50.35M 593
阿拉伯新闻章半岛电视台.net Business,Education,News,NLP,Text Data,Psychology,Text Mining Classification
111.89M 539
多模态仇恨言语,150000条带有本和图像的推特,用于仇恨检测 现有的仇恨语音数据集仅包含文本数据。我们创建了一个新的手动注释的多模态仇恨语音数据集,该数据集由150000条推文组成,每条推...NLP,Online Communities,Image Data,Multiclass Classification,Social Networks Classification
6.55G 630
诗歌数据集(NLP) NLP,Text Data,LSTM,RNN,Transformers Classification
20.87M 595
本中的情感,句子中表达主要情感的本数据 I was looking for a well labeled dataset to perform a multiclass classification. I wanted to do something more than just...NLP,Earth and Nature,Text Data,Multiclass Classification Classification
2.15M 287
带有语言标签的本数据。它可以用于语言检测。 Language Detection Dataset Text data with language labels. It can be used for language detection....NLP,Classification,Computer Science,Multiclass Classification,Languages Classification
31.7M 584
科研论主题建模 Business,Earth and Nature,Education,NLP,Psychology Classification
21.96M 251
Tanglish情绪分析推,使用了4个标签来描述推特的情绪 So it all started when I was looking for Abusive Tamil tweets in the Roman Script to use for a project and instead of fi...NLP,Deep Learning,Online Communities,People Classification
0.85M 272
用户评级为10M的Goodreads图书数据集 Arts and Entertainment,Social Science,NLP,Literature,Recommender Systems Classification
1128.5M 547
所有英停止字(700+;) Computer Science,Education,NLP,Feature Engineering,Python Classification
0.01M 221
所有NeurIPS(NIPS) Computer Science,Sports,NLP,Deep Learning,Artificial Intelligence,Neural Networks Classification
310.53M 207
荷兰新闻 Internet,News,NLP,Text Data,Exploratory Data Analysis,Text Mining Classification
351.62M 227
Virgool数据集,这是一套从virgool.io收集的波斯章数据 This could be a nice tool for Persian writers or bloggers to automatically pick the suggested hashtag or even subject fo...NLP,Education,Software,Literature Classification
58.89M 307
阿拉伯圣训九册 NLP,Multiclass Classification,Clustering Classification
94.48M 269
COVID 19印尼推特,与“新冠肺炎”和“政府”相关的印尼推 ContentThis dataset contains Indonesian Tweets of users who have applied the following keywords: Corona and Pemerintah o...NLP,Deep Learning,Coronavirus,Social Networks,Email and Messaging,Government Classification
31.14M 250
电子邮件本分类 If you are working, then you are bound to face the problem of reading all the emails that are cluttered in your inbox. S...NLP,Business,Classification,Arts and Entertainment,News,Text Data Classification
18.22M 281
28种语言中的停止词,自然语言处理中的本预处理 Stopwords are the words in any language which does not add much meaning to a sentence. They can safely be ignored withou...NLP,Computer Science,Text Data,Languages Classification
0.09M 572
拼图竞赛数据集,包含翻译成英语的 These datasets refer to [jigsaw competition]( Classification
664.76M 240
消费者投诉-金融产品,该数据集包括消费者对金融产品的投诉和 This data is a collection of complaints about consumer financial products and services that we sent to companies for res...NLP,Beginner,Text Data,Banking,Text Mining,Lending Classification
243.79M 302