基于Reddit评论的单词表示法的全局矢量数据集
GloVe Reddit Comments Global Vectors for Word Representation based on Reddit comments...NLP Classification
19.1G
403
Leigh![](https://www.payititi.com/api/avatar/show.php?username=xiaochengxu&size=large)
Facebook 发布的300维预训练,在 Common Crawl 上训练的200万个词向量
300-dimensional pretrained FastText English word vectors released by Facebook.The first line of the file contains the nu...NLP,Arts and Entertainment Classification
650M
422
Manish Maharjan![](https://www.payititi.com/api/avatar/show.php?username=xiaochengxu&size=large)
英国癌症的文本挖掘和分析,英国癌症的自然语言处理
Text mining and analysis on Cancer UK Natural language processing on cancer UK...NLP,Biology,Text Data,Health Conditions Classification
4.33M
307
Moamen Ibrahim![](https://www.payititi.com/api/avatar/show.php?username=xiaochengxu&size=large)
FastText 一个用于学习词嵌入和文本分类的库
fastText is a library for learning of word embeddings and text classification created by Facebook's AI Research (FAI...NLP,Computer Science Classification
6.6G
737
Jia Yang![](https://www.payititi.com/api/avatar/show.php?username=xiaochengxu&size=large)
用于NLP的文本数据集
This is a bundle of three text data sets to be used for NLP research.Dialog system technology challenge 7 (DSTC7)UbuntuA...NLP,Earth and Nature,Education Classification
6.49G
736
Florian Peters![](https://www.payititi.com/api/avatar/show.php?username=xiaochengxu&size=large)
SMS Spam Ham Prediction
Business,Earth and Nature,Internet,Economics,NLP Classification
0.48M
281
Lampu![](https://www.payititi.com/api/avatar/show.php?username=ceshishuju009&size=large)
Warframe Steam 星际战甲用户评论数据
The data is crawled from STEAM, up until April 22nd, 2019...NLP,Video Games Classification
20.22M
326
Jiaxu Zhang![](https://www.payititi.com/api/avatar/show.php?username=xiaochengxu&size=large)
斯坦福GloVe 200d数据集,转化为word2vec格式数据
Is the Stanford GloVe 200d dataset converted to word2vec format...NLP,Computer Science Classification
661.31M
688
the kwisatz haderach![](https://www.payititi.com/api/avatar/show.php?username=xiaochengxu&size=large)
确定代词解析数据集
ContextHere's the csv dataset for Definite Pronoun Resolution Dataset contributed by Rahman and Ng. (2012) http://ww...NLP Classification
143K
345
Ariba Siddiqui![](https://www.payititi.com/api/avatar/show.php?username=xiaochengxu&size=large)
SMILES OCR数据集,包含超过 90 万个 SMILES 格式的单一产品反应
SMILES(简化分子输入行输入系统)是一种用于输入和表示分子和反应的行符号(一种使用可打印字符的印刷方法)。该数据集包含超过...NLP,Chemistry Classification
175M
830
Elahi![](https://www.payititi.com/api/avatar/show.php?username=xiaochengxu&size=large)
Reddit 评论分数预测,使用 NLP 预测评论分数
The idea behind this dataset is to try to predict whether a particular comment would be highly up-voted or down-voted gi...NLP,Computer Science,Social Science Classification
1.8G
312
Evan Hallmark![](https://www.payititi.com/api/avatar/show.php?username=xiaochengxu&size=large)
维基百科的句子,英语维基百科转储中收集了780万个句子
The wikipedia dump is a giant XML file and contains loads of not-so-useful content. I needed some english text for some...NLP,Text Mining Classification
891.28M
369
Mike Ortman![](https://www.payititi.com/api/avatar/show.php?username=xiaochengxu&size=large)
《辛普森一家》的台词
Arts and Entertainment,NLP,Text Data,Text Mining,Comics and Animation Classification
8.94M
557
Pierre Megret![](https://www.payititi.com/api/avatar/show.php?username=ceshishuju009&size=large)
带注释的GMB语料库,GMB文本语料库的注释子集
Named Entity Recognition for annotated corpus using GMB(Groningen Meaning Bank) corpus for entity classification with en...NLP,Exploratory Data Analysis,Classification,Random Forest Classification
1.52M
347
Shoumik![](https://www.payititi.com/api/avatar/show.php?username=xiaochengxu&size=large)
ConceptNet Numberbatch 向量,来自 ConceptNet 的词向量
These are the word vectors released by the Conceptnet project.ConceptNet的本质是一个三元组:...NLP Classification
899.91M
345
Nohman![](https://www.payititi.com/api/avatar/show.php?username=xiaochengxu&size=large)