预测Reddit社区参与度数据集,GDELT帖子分类以及Sirocco文本分析(意见和实体提取)
该数据集包含3个月(2017年6月至8月)的Reddit新闻帖子,以及GDELT帖子分类以及Sirocco文本分析(意见和实体提取)的结果。它用...NLP,Computer Science,Online Communities Classification
174.09M
348
Sergei Sokolenko
Word2vec在维基百科上训练数据(单字母+双字母),以捕捉unigram和bigram
这是一个单词嵌入模型,创建于维基百科+各种来源的评论。与从基于短语的方法(不考虑相邻词的短语/双词上下文)创建双词不同,这...NLP,Computer Science,Software,Programming,Neural Networks Classification
8.62G
299
aintnosunshine
Facebook 发布的300维预训练,在 Common Crawl 上训练的200万个词向量
300-dimensional pretrained FastText English word vectors released by Facebook.The first line of the file contains the nu...NLP,Arts and Entertainment Classification
650M
360
Manish Maharjan
维基百科Word2Vec,Apache Spark word2vec由200K维基百科页面培训
I used Apache Spark to extract more than 6 million phrases from 200,000 English Wikipedia pages. Here is the process of...NLP,Business,Earth and Nature,Text Mining Classification
132.74M
321
Maziyar
reddit向量数据集,用于训练 sence2vec模型
Sence2vec word embeddings model works better than word2vec , since it utilises contextual information from words.This re...NLP,Computer Science,Text Data,spaCy Classification
635.76M
352
Poonam Ligade