Select Language

AI社区

公开数据集

相关搜索
您是不是在找?
今日排行
本周排行
本月排行
用于NLP的文本数据集 This is a bundle of three text data sets to be used for NLP research.Dialog system technology challenge 7 (DSTC7)UbuntuA...NLP,Earth and Nature,Education Classification
6.49G 536
,法 NLP,Text Data,Languages Classification
18.33M 268
SMS Spam Ham Prediction Business,Earth and Nature,Internet,Economics,NLP Classification
0.48M 213
Warframe Steam 星际战甲用户评论数据 The data is crawled from STEAM, up until April 22nd, 2019...NLP,Video Games Classification
20.22M 249
斯坦福GloVe 200d数据集,转化为word2vec格式数据 Is the Stanford GloVe 200d dataset converted to word2vec format...NLP,Computer Science Classification
661.31M 510
确定代词解析数据集 ContextHere's the csv dataset for Definite Pronoun Resolution Dataset contributed by Rahman and Ng. (2012) http://ww...NLP Classification
143K 256
SMILES OCR数据集,包含超过 90 万个 SMILES 格式的单一产品反应 SMILES(简化分子输入行输入系统)是一种用于输入和表示分子和反应的行符号(一种使用可打印字符的印刷方法)。该数据集包含超过...NLP,Chemistry Classification
175M 560
Reddit 评论分数预测,使用 NLP 预测评论分数 The idea behind this dataset is to try to predict whether a particular comment would be highly up-voted or down-voted gi...NLP,Computer Science,Social Science Classification
1.8G 232
维基百科Word2Vec,Apache Spark word2vec由200K维基百科页面培训 I used Apache Spark to extract more than 6 million phrases from 200,000 English Wikipedia pages. Here is the process of...NLP,Business,Earth and Nature,Text Mining Classification
132.74M 261
维基百科的句子,英维基百科转储中收集了780万个句子 The wikipedia dump is a giant XML file and contains loads of not-so-useful content. I needed some english text for some...NLP,Text Mining Classification
891.28M 294
《辛普森一家》的台词 Arts and Entertainment,NLP,Text Data,Text Mining,Comics and Animation Classification
8.94M 365
带注释的GMB料库,GMB文本料库的注释子集 Named Entity Recognition for annotated corpus using GMB(Groningen Meaning Bank) corpus for entity classification with en...NLP,Exploratory Data Analysis,Classification,Random Forest Classification
1.52M 258
ConceptNet Numberbatch 向量,来自 ConceptNet 的词向量 These are the word vectors released by the Conceptnet project.ConceptNet的本质是一个三元组:...NLP Classification
899.91M 260
假新闻分类 News,NLP Classification
142.92M 438
Allennlp包 Computer Science,NLP Classification
715.44M 278
韩国极端主义网站Womad仇恨言论数据 NLP,Classification Classification
0.16M 235
Kaggle工作 Computer Science,Education,NLP,Recommender Systems,Search Engines Classification
0.27M 233
蔬菜(谷歌Word2Sec新闻) Vegetables (Google Word2Vec News)...NLP,News Classification
3.73M 455
阿拉伯ULMFiT模型,基于Ar Wikipedia料库的阿拉伯模型 Arabic is a major world language yet is is under represented on the Internet and there is a lack of resources for Arabic...NLP,Transfer Learning,Languages Classification
160.13M 472
印度政治新闻2018 Politics,NLP,Text Data,Linguistics,India Classification
57.35M 207