Select Language

AI社区

公开数据集

DBpedia NIF数据集

DBpedia NIF数据集

230 浏览
0 喜欢
0 次下载
0 条讨论
Education,NLP,Deep Learning,Text Data,Text Mining,Research Classification

数据结构 ? 0M

    Data Structure ?

    * 以上分析是由系统提取分析形成的结果,具体实际数据为准。

    README.md

    # DBpedia NIF Dataset DBpedia NIF - a large-scale and multilingual knowledge extraction corpus. The aim of the dataset is two-fold: to dramatically broaden and deepen the amount of structured information in DBpedia, and to provide large-scale and multilingual language resource for development of various NLP and IR task. The dataset provides the content of all articles for 128 Wikipedia languages. ## Overview The DBpedia community has put significant amount of effort on developing technical infrastructure and methods for efficient extraction of structured information from Wikipedia. These efforts have been primarily focused on harvesting, refinement and publishing semi-structured information found in Wikipedia articles, such as information from infoboxes, categorization information, images, wikilinks and citations. Nevertheless, still vast amount of valuable information is contained in the unstructured Wikipedia article texts. DBpedia NIF aims to fill in these gaps and extract valuable information from Wikipedia article texts. In its core, DBpedia NIF is a large-scale and multilingual knowledge extraction corpus. The purpose of this project is two-fold: to dramatically broaden and deepen the amount of structured information in DBpedia, and to provide large-scale and multilingual language resource for development of various NLP and IR task. The dataset provides the content of all articles for 128 Wikipedia languages. It captures the content as it is found in Wikipedia-it captures the structure (sections and paragraphs) and the annotations provided by the Wikipedia editors. ## Key Features and Facts * content in 128 Wikipedia languages * over 9 billion RDF triples, which is almost 40 % of DBpedia * selected partitions published as Linked Data * exploited within the TextExt - DBpedia Open Extraction challenge * available for large-scale training NLP and IR methods ## TextExt - DBpedia Open Extraction challenge The DBpedia Open Text Extraction Challenge differs significantly from other challenges in the language technology and other areas in that it is not a one time call, but a continuous growing and expanding challenge with the focus to sustainably advance the state of the art and transcend boundaries in a systematic way. The DBpedia Association and the people behind this challenge are committed to provide the necessary infrastructure and drive the challenge for an indefinite time as well as potentially extend the challenge beyond Wikipedia. We provide data form the DBpedia NIF datasets in 9 different languages and your task is to execute your NLP tool on the data and extract valuable information such as facts, relations, events, terminology, ontologies as RDF triples, or useful NLP annotations such as pos-tags, dependencies or co-reference. ## Project Team * Dr. Milan Dojchinovski (Principle Contact / Maintainer) * Dr.-Ing. Sebastian Hellmann
    ×

    帕依提提提温馨提示

    该数据集正在整理中,为您准备了其他渠道,请您使用

    注:部分数据正在处理中,未能直接提供下载,还请大家理解和支持。
    暂无相关内容。
    暂无相关内容。
    • 分享你的想法
    去分享你的想法~~

    全部内容

      欢迎交流分享
      开始分享您的观点和意见,和大家一起交流分享.
    所需积分:0 去赚积分?
    • 230浏览
    • 0下载
    • 0点赞
    • 收藏
    • 分享