公开数据集

幼苗

5.75M

1137 浏览

0 喜欢

0 次下载

0 条讨论

Education,Biology,Linguistics,Languages,Culture and Humanities Classification

数据介绍
文件预览
相关论文
Code
分享讨论(0)
使用声明

启动Notebook开发

数据结构 ? 5.75M

* 以上分析是由系统提取分析形成的结果，具体实际数据为准。

README.md

Context A broad-coverage corpus such as the [Human Language Project envisioned by Abney and Bird (2010)](http://www.anthology.aclweb.org/P/P10/P10-1010.pdf) would be a powerful resource for the study of endangered languages. SeedLing was created as a seed corpus for the Human Language Project to cover a broad range of languages (Guy et al. 2014). TAUS (Translation Automation User Society) also see the [importance of the Human Language Project in the context of keeping up with the demand for capacity and speed for translation](https://www.taus.net/think-tank/articles/translate-articles/the-call-for-the-human-language-project). TAUS' definition of the Human Language Project can be found on https://www.taus.net/knowledgebase/index.php/Human_Language_Project A detailed explanation of how to use the corpus can be found on https://github.com/alvations/SeedLing Content The SeedLing corpus on this repository includes the data from: - **ODIN**: Online Database of Interlinear Text - **Omniglot**: Useful foreign phrases from www.omniglot.com - **UDHR**: Universal Declaration of Human Rights Acknowledgements **Citation**: Guy Emerson, Liling Tan, Susanne Fertmann, Alexis Palmer and Michaela Regneri . 2014. SeedLing: Building and using a seed corpus for the Human Language Project. In Proceedings of The use of Computational methods in the study of Endangered Languages (ComputEL) Workshop. Baltimore, USA. @InProceedings{seedling2014, author = {Guy Emerson, Liling Tan, Susanne Fertmann, Alexis Palmer and Michaela Regneri}, title = {SeedLing: Building and using a seed corpus for the Human Language Project}, booktitle = {Proceedings of The use of Computational methods in the study of Endangered Languages (ComputEL) Workshop}, month = {June}, year = {2014}, address = {Baltimore, USA}, publisher = {Association for Computational Linguistics}, pages = {}, url = {} } **References**: Steven Abney and Steven Bird. 2010. The Human Language Project: Building a universal corpus of the world’s languages. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pages 88–97. Sime Ager. Omniglot - writing systems and languages of the world. Retrieved from www.omniglot.com. William D Lewis and Fei Xia. 2010. Developing ODIN: A multilingual repository of annotated language data for hundreds of the world’s languages. Literary and Linguistic Computing, 25(3):303–319. UN General Assembly, Universal Declaration of Human Rights, 10 December 1948, 217 A (III), available at: http://www.refworld.org/docid/3ae6b3712c.html [accessed 26 April 2014] Inspiration This corpus was created in a span a semester in Saarland University by a linguist, a mathematician, a data geek and two amazing mentors from the [COLI department](http://www.coli.uni-saarland.de/). It wouldn't have been possible without the cross-disciplinary synergy and the common goal we had. - Expand/Explore the Human Language Project. - Go to the field and record/document their language. Make them computationally readable. - Grow the Seedling!

暂无相关内容。

分享你的想法

去分享你的想法~~

全部内容

欢迎交流分享

开始分享您的观点和意见，和大家一起交流分享.

数据使用声明：

一、数据来源与展示说明：

1、该数据来自于互联网数据采集或服务商的提供，本平台为用户提供数据集的展示与浏览。
2、本平台仅作为数据集的基本信息展示、包括但不限于图像、文本、视频、音频等文件类型。
3、数据集基本信息来自数据原地址或数据提供方提供的信息，如数据集描述中有描述差异，请以数据原地址或服务商原地址为准。

二、所有权说明：

1、本站中的所有数据集的版权都归属于原数据发布者或数据提供方所有。

三、数据转载说明：

1、如您需要转载本站数据，请保留原数据地址及相关版权声明。

四、侵权与处理说明：

1、如本站中的部分数据涉及侵权展示，请及时联系本站，我们会安排进行数据下线。

所需积分：

0 去赚积分？

1137浏览
0下载
0点赞
收藏
分享

今日排行

本月搜索

Dataset Category