推荐引擎笔记本的静态拷贝数据集,用于代码元分析
This is a static copy of [FabienDaniel's](https://www.kaggle.com/fabiendaniel) popular recommendation [Film Recommen...Software Classification
789K
322
Sohier Dane
3.58G
1119
Sriram Reddy Kalluri
可描述纹理数据集 (DTD),可供计算机视觉社区用于研究目的
Our ability of vividly describing the content of images isa clear demonstration of the power of human visual system. Not...Others Classification
1.17G
265
JMExpert
多模态仇恨言语,150000条带有文本和图像的推特,用于仇恨检测
现有的仇恨语音数据集仅包含文本数据。我们创建了一个新的手动注释的多模态仇恨语音数据集,该数据集由150000条推文组成,每条推...NLP,Online Communities,Image Data,Multiclass Classification,Social Networks Classification
6.55G
534
Victor Callejas Fuentes
带有语言标签的文本数据。它可以用于语言检测。
Language Detection Dataset Text data with language labels. It can be used for language detection....NLP,Classification,Computer Science,Multiclass Classification,Languages Classification
31.7M
507
Ishant
来自wallstreetbets等的Subreddit数据,用于后验量化交易算法的情绪分析
All of the submissions to each of the r/wallstreetbets, r/investing, r/options, and r/SecurityAnalysis subreddits since...NLP,Online Communities,Investing Classification
1.49G
243
Sheridan Green
ELI5记分器训练数据原型816000例,用于创建评分模型
ELI5 means Explain like I am 5 . It's originally a long and free form Question-Answering scraping from reddit eli5 s...NLP,Earth and Nature,Arts and Entertainment,Education,Social Science,Sports,Regression,Transformers Classification
672.61M
250
Neuron Engineer
用于Sarcasm检测的新闻标题数据集,用于讽刺和假新闻检测任务的高质量数据集
Past studies in Sarcasm Detection mostly make use of Twitter datasets collected using hashtag based supervision but such...NLP,Deep Learning,Classification,Earth and Nature,Computer Science,Programming Classification
11.13M
256
Rishabh Misra
OSCAR尼泊尔语语料库,尼泊尔语文本语料库,用于训练NLP的无监督语言模型
The files are from [OSCAR Corpus](https://oscar-corpus.com/). Please visit their site for more information.The dataset i...NLP,Computer Science,Movies and TV Shows,Text Data,Languages Classification
3.1G
302
Prabesh Dhakal
用于语音克隆的英语多说话人语料库 CSTR-VCTK语料库
This CSTR VCTK Corpus includes speech data uttered by 109 native speakers of English with various accents. Each speaker...NLP,Audio Data Classification
15.22G
381
Michael Fekadu
语言生成数据集:2亿个样本,用于语言生成的已处理Amazon Review数据集
Amazon Customer Reviews Dataset is a dataset of user-generated product reviews on the shopping website Amazon. It contai...NLP,Business,Deep Learning,Classification,Artificial Intelligence Classification
20.51G
277
Abhishek Chatterjee