Select Language

AI社区

公开数据集

访谈 NLP,Exploratory Data Analysis,Data Cleaning,Feature Engineering,Employment Classification
4.37M 275
拼图竞赛数据集,包含翻译成英语的文本 These datasets refer to [jigsaw competition](https://www.kaggle.com/c/jigsaw-multilingual-toxic-comment-classification)T...NLP Classification
664.76M 233
28种语言中的停止词,自然语言处理中的文本预处理 Stopwords are the words in any language which does not add much meaning to a sentence. They can safely be ignored withou...NLP,Computer Science,Text Data,Languages Classification
0.09M 553
Septuagint Earth and Nature,Religion and Belief Systems,NLP,Text Data,Languages Classification
7.39M 222
俄语成语 Education,NLP,Russia Classification
0.06M 213
总统辩论视频评论 Politics,NLP,Exploratory Data Analysis Classification
6.52M 419
covid19 西班牙语 es py tweets 早 2020年4月底 Earth and Nature,Health,Social Networks,Coronavirus,NLP,Text Data Classification
805.29M 531
电子邮件文本分类 If you are working, then you are bound to face the problem of reading all the emails that are cluttered in your inbox. S...NLP,Business,Classification,Arts and Entertainment,News,Text Data Classification
18.22M 279
波斯语 NLP,Text Data,Text Mining Classification
0M 228
七个名字 Religion and Belief Systems,NLP Classification
0.15M 221
越南健康新闻 Health,News,NLP Classification
16.89M 230
COVID 19印尼推特,与“新冠肺炎”和“政府”相关的印尼推文 ContentThis dataset contains Indonesian Tweets of users who have applied the following keywords: Corona and Pemerintah o...NLP,Deep Learning,Coronavirus,Social Networks,Email and Messaging,Government Classification
31.14M 245
Youtube数据集包含43471个频道、325292个视频和1264035条评论 ContextA portion of data grabbed from Youtube ContentDataset contains youtube channels-videos-comments AcknowledgementsD...NLP,Online Communities,Social Networks Classification
629.07M 481
海绵宝宝成绩单 Arts and Entertainment,NLP Classification
4.85M 233
名称实体识别数据集 The label annotation mistakes by human annotators brings up two challenges to NER:mistakes in the test set can interfere...NLP Classification
5.64M 248
罗伯特·弗罗斯特系列 Arts and Entertainment,Education,NLP,Literature,Text Data,Transformers Classification
0.22M 538
BERT英语无冠词双冠词,BERT英语无上限训练数据的双谱图频率 Is BERT the right model to fine tune your data on? Or do you need to pretrain from scratch?Know your model's trainin...NLP,Music Classification
1.99G 244
阿拉伯文圣训九册 NLP,Multiclass Classification,Clustering Classification
94.48M 264
客户服务中的关系策略,来自四个来源的旅行相关客户服务数据集 Relational Strategies in Customer Service (RSiCS) DatasetHuman-computer data from three live customer service Intelligen...NLP,Business,Text Data Classification
57.78M 303
Virgool数据集,这是一套从virgool.io收集的波斯文章数据 This could be a nice tool for Persian writers or bloggers to automatically pick the suggested hashtag or even subject fo...NLP,Education,Software,Literature Classification
58.89M 305