8.49G
487
kambarakun
用于命名实体识别的标注语料库,使用BIO和POS标签注释的语料库
Annotated (BIO) Corpus for Named Entity RecognitionThis corpus is made up of texts of news sites and built specifically...Business,Arts and Entertainment,Literature,Languages Classification
2.21M
519
Alexander Kovalev
44.46M
465
Sameer Dev
来自Goodreads的2018年最佳图书数据
来自Goodreads的2018年最佳图书数据...NLP,Image Data,Tabular Data,Literature Classification
81.57M
730
Naren
268.7M
558
Yakin
ATIS数据集清洁重新点燃,ATIS数据集的清理和平衡分割
ATIS DataSetThe ATIS dataset is a standard benchmark dataset widely used as an intent classification and slot filling ta...NLP,Classification,Earth and Nature,Computer Science,Health Classification
1.02M
442
kpe
Dmoztools分类数据, 包含艺术、商业、计算机、游戏、健康、科学购物、社会等
# DatasetThis dataset was created by Patanjali ChintalapatiReleased under Other (specified in description)# ContentsIt c...NLP,Text Mining,Websites Classification
279.6M
422
Patanjali Chintalapati
Machado de Assis的116部小说和其他文本数据
este repositório estão contidas 116 obras de ficção e outros textos de Machado de Assis nos formatos pdf e txt nas c...NLP,Business,Literature,Art,Brazil Classification
40.38M
668
Luiz Amaral
命名实体识别(NER)从临床提取感兴趣的实体(例如,疾病名称、药物名称
Problem StatementClinical studies often require detailed patients’ information documented in clinical narratives. Named...NLP,Health,Health Conditions,Model Comparison,Statistical Analysis,Artificial Intelligence Classification
249.01M
364
Ramashankar Nayak
用无衬线谷歌字体书写的字母图像数据集
# DatasetThis dataset was created by Jihye YeomReleased under Other (specified in description)# ContentsIt contains the...NLP,Image Data Classification
768M
467
CheaperThanTires
CoNLL003 命名实体识别(NER)问题的注释数据集
This is an annotated dataset for Named Entity Recognition (NER) problemContentThis dataset is divided into train.txt, te...NLP,Arts and Entertainment,Computer Science,Text Data,Games,Text Mining Classification
4.63M
498
AlaaKhaled
有毒嵌入物,拼图有毒评论挑战中的通用句子编码文本
There's no need for everyone to encode the same text with the Universal Sentence EmbeddingThis data set contains the...NLP,Deep Learning,Earth and Nature Classification
610.81M
622
Liling Tan
英语词频,⅓ 百万网络上最常见的英语单词
This dataset contains the counts of the 333,333 most commonly-used single words on the English language web, as derived...Languages Classification
4.73M
600
Rachael Tatman
斯坦福自然语言推理 (SNLI) 语料库的 Jsonl 格式
这是斯坦福大学自然语言推理(snLI)语料库的1.0版本。如果你使用这个语料库,请引用这篇论文: http://nlp.Stanford.edu/pubs/snli...Languages Classification
483.45M
454
John S. Hudzina
2.46M
410
NLTK Data
Mac Morpho,带有词性标签的巴西葡萄牙语新闻文本
The canonical metadata on NLTK:packageid=mac_morphoname=MAC-MORPHO:BrazilianPortuguesenewstextwithpart-of-speechtagswebp...Earth and Nature Classification
10.43M
729
NLTK Data