公开数据集
数据结构 ? 5.75M
Data Structure ?
* 以上分析是由系统提取分析形成的结果,具体实际数据为准。
README.md
Context
A broad-coverage corpus such as the [Human Language Project envisioned by Abney and Bird (2010)](http://www.anthology.aclweb.org/P/P10/P10-1010.pdf) would be a powerful resource for the study of endangered languages.
SeedLing was created as a seed corpus for the Human Language Project to cover a broad range of languages (Guy et al. 2014).
TAUS (Translation Automation User Society) also see the [importance of the Human Language Project in the context of keeping up with the demand for capacity and speed for translation](https://www.taus.net/think-tank/articles/translate-articles/the-call-for-the-human-language-project). TAUS' definition of the Human Language Project can be found on https://www.taus.net/knowledgebase/index.php/Human_Language_Project
A detailed explanation of how to use the corpus can be found on https://github.com/alvations/SeedLing
Content
The SeedLing corpus on this repository includes the data from:
- **ODIN**: Online Database of Interlinear Text
- **Omniglot**: Useful foreign phrases from www.omniglot.com
- **UDHR**: Universal Declaration of Human Rights
Acknowledgements
**Citation**:
Guy Emerson, Liling Tan, Susanne Fertmann, Alexis Palmer and Michaela Regneri . 2014. SeedLing: Building and using a seed corpus for the Human Language Project. In Proceedings of The use of Computational methods in the study of Endangered Languages (ComputEL) Workshop. Baltimore, USA.
@InProceedings{seedling2014,
author = {Guy Emerson, Liling Tan, Susanne Fertmann, Alexis Palmer and Michaela Regneri},
title = {SeedLing: Building and using a seed corpus for the Human Language Project},
booktitle = {Proceedings of The use of Computational methods in the study of Endangered Languages (ComputEL) Workshop},
month = {June},
year = {2014},
address = {Baltimore, USA},
publisher = {Association for Computational Linguistics},
pages = {},
url = {}
}
**References**:
Steven Abney and Steven Bird. 2010. The Human Language
Project: Building a universal corpus of the world’s languages.
In Proceedings of the 48th Annual Meeting of the Association
for Computational Linguistics, pages 88–97.
Sime Ager. Omniglot - writing systems and languages
of the world. Retrieved from www.omniglot.com.
William D Lewis and Fei Xia. 2010. Developing ODIN: A multilingual
repository of annotated language data for hundreds of the world’s
languages. Literary and Linguistic Computing, 25(3):303–319.
UN General Assembly, Universal Declaration of Human Rights,
10 December 1948, 217 A (III), available at:
http://www.refworld.org/docid/3ae6b3712c.html
[accessed 26 April 2014]
Inspiration
This corpus was created in a span a semester in Saarland University by a linguist, a mathematician, a data geek and two amazing mentors from the [COLI department](http://www.coli.uni-saarland.de/). It wouldn't have been possible without the cross-disciplinary synergy and the common goal we had.
- Expand/Explore the Human Language Project.
- Go to the field and record/document their language. Make them computationally readable.
- Grow the Seedling!
×
帕依提提提温馨提示
该数据集正在整理中,为您准备了其他渠道,请您使用
注:部分数据正在处理中,未能直接提供下载,还请大家理解和支持。
暂无相关内容。
暂无相关内容。
- 分享你的想法
去分享你的想法~~
全部内容
欢迎交流分享
开始分享您的观点和意见,和大家一起交流分享.
数据使用声明:
- 1、该数据来自于互联网数据采集或服务商的提供,本平台为用户提供数据集的展示与浏览。
- 2、本平台仅作为数据集的基本信息展示、包括但不限于图像、文本、视频、音频等文件类型。
- 3、数据集基本信息来自数据原地址或数据提供方提供的信息,如数据集描述中有描述差异,请以数据原地址或服务商原地址为准。
- 1、本站中的所有数据集的版权都归属于原数据发布者或数据提供方所有。
- 1、如您需要转载本站数据,请保留原数据地址及相关版权声明。
- 1、如本站中的部分数据涉及侵权展示,请及时联系本站,我们会安排进行数据下线。