Select Language

AI社区

公开数据集

相关搜索
您是不是在找?
今日排行
本周排行
本月排行
SMILES OCR数据集,包含超过 90 万个 SMILES 格式的单一产品反应 SMILES(简化分子输入行输入系统)是一种用于输入和表示分子和反应的行符号(一种使用可打印字符的印刷方法)。该数据集包含超过...NLP,Chemistry Classification
175M 866
Reddit 评论分数预测,使用 NLP 预测评论分数 The idea behind this dataset is to try to predict whether a particular comment would be highly up-voted or down-voted gi...NLP,Computer Science,Social Science Classification
1.8G 319
维基百科的句子,英语维基百科转储中收集了780万个句子 The wikipedia dump is a giant XML file and contains loads of not-so-useful content. I needed some english text for some...NLP,Text Mining Classification
891.28M 373
《辛普森一家》的台词 Arts and Entertainment,NLP,Text Data,Text Mining,Comics and Animation Classification
8.94M 574
带注释的GMB语料库,GMB文本语料库的注释子集 Named Entity Recognition for annotated corpus using GMB(Groningen Meaning Bank) corpus for entity classification with en...NLP,Exploratory Data Analysis,Classification,Random Forest Classification
1.52M 354
ConceptNet Numberbatch 向量,来自 ConceptNet 的词向量 These are the word vectors released by the Conceptnet project.ConceptNet的本质是一个三元组:...NLP Classification
899.91M 351
假新闻分类 News,NLP Classification
142.92M 638
Allennlp包 Computer Science,NLP Classification
715.44M 363
韩国极端主义网站Womad仇恨言论数据 NLP,Classification Classification
0.16M 332
Kaggle工作 Computer Science,Education,NLP,Recommender Systems,Search Engines Classification
0.27M 313
阿拉伯ULMFiT模型,基于Ar Wikipedia语料库的阿拉伯语模型 Arabic is a major world language yet is is under represented on the Internet and there is a lack of resources for Arabic...NLP,Transfer Learning,Languages Classification
160.13M 677
印度政治新闻2018 Politics,NLP,Text Data,Linguistics,India Classification
57.35M 285
SComedy Earth and Nature,NLP,Text Data,Text Mining Classification
2.99M 447
NLTK路透社新闻文件,NLTK路透社语料库中的所有文件 This dataset contains the ID, categories, and raw text from each file in NLTK's Reuters corpus.ContentEach file (row...NLP,Computer Science,News,Text Data,Text Mining Classification
3.3M 721
reddit自我发布分类任务,包含1000 多个精心挑选的类别 Welcome to the Reddit Self-Post Classification Task (RSPCT)!The aim of this dataset was to create an interesting, large...NLP,Classification,Computer Science,Multiclass Classification Classification
839.37M 340
医学成绩单,从mtsamples获取的医学转录数据 Medical data is extremely hard to find due to HIPAA privacy regulations. This dataset offers a solution by providing med...NLP,Health,Medicine Classification
16.22M 300
FakeNewsNet 假新闻研究数据收集,假新闻、虚假信息、数据挖掘 This is a repository for an ongoing data collection project for fake news research at ASU. We describe and compare FakeN...NLP,News,Social Science,Social Networks Classification
72.61M 1098
reddit向量数据集,用于训练 sence2vec模型 Sence2vec word embeddings model works better than word2vec , since it utilises contextual information from words.This re...NLP,Computer Science,Text Data,spaCy Classification
635.76M 453
Strongbad邮件 Business,NLP,Text Data Classification
0.11M 306
科学流行评论删除 Business,NLP,Text Data,Binary Classification,Bigquery Classification
74.17M 280