Select Language

AI社区

公开数据集

幼苗

幼苗

5.75M
536 浏览
0 喜欢
0 次下载
0 条讨论
Education,Biology,Linguistics,Languages,Culture and Humanities Classification

数据结构 ? 5.75M

    Data Structure ?

    * 以上分析是由系统提取分析形成的结果,具体实际数据为准。

    README.md

    Context A broad-coverage corpus such as the [Human Language Project envisioned by Abney and Bird (2010)](http://www.anthology.aclweb.org/P/P10/P10-1010.pdf) would be a powerful resource for the study of endangered languages. SeedLing was created as a seed corpus for the Human Language Project to cover a broad range of languages (Guy et al. 2014). TAUS (Translation Automation User Society) also see the [importance of the Human Language Project in the context of keeping up with the demand for capacity and speed for translation](https://www.taus.net/think-tank/articles/translate-articles/the-call-for-the-human-language-project). TAUS' definition of the Human Language Project can be found on https://www.taus.net/knowledgebase/index.php/Human_Language_Project A detailed explanation of how to use the corpus can be found on https://github.com/alvations/SeedLing Content The SeedLing corpus on this repository includes the data from: - **ODIN**: Online Database of Interlinear Text - **Omniglot**: Useful foreign phrases from www.omniglot.com - **UDHR**: Universal Declaration of Human Rights Acknowledgements **Citation**: Guy Emerson, Liling Tan, Susanne Fertmann, Alexis Palmer and Michaela Regneri . 2014. SeedLing: Building and using a seed corpus for the Human Language Project. In Proceedings of The use of Computational methods in the study of Endangered Languages (ComputEL) Workshop. Baltimore, USA. @InProceedings{seedling2014, author = {Guy Emerson, Liling Tan, Susanne Fertmann, Alexis Palmer and Michaela Regneri}, title = {SeedLing: Building and using a seed corpus for the Human Language Project}, booktitle = {Proceedings of The use of Computational methods in the study of Endangered Languages (ComputEL) Workshop}, month = {June}, year = {2014}, address = {Baltimore, USA}, publisher = {Association for Computational Linguistics}, pages = {}, url = {} } **References**: Steven Abney and Steven Bird. 2010. The Human Language Project: Building a universal corpus of the world’s languages. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pages 88–97. Sime Ager. Omniglot - writing systems and languages of the world. Retrieved from www.omniglot.com. William D Lewis and Fei Xia. 2010. Developing ODIN: A multilingual repository of annotated language data for hundreds of the world’s languages. Literary and Linguistic Computing, 25(3):303–319. UN General Assembly, Universal Declaration of Human Rights, 10 December 1948, 217 A (III), available at: http://www.refworld.org/docid/3ae6b3712c.html [accessed 26 April 2014] Inspiration This corpus was created in a span a semester in Saarland University by a linguist, a mathematician, a data geek and two amazing mentors from the [COLI department](http://www.coli.uni-saarland.de/). It wouldn't have been possible without the cross-disciplinary synergy and the common goal we had. - Expand/Explore the Human Language Project. - Go to the field and record/document their language. Make them computationally readable. - Grow the Seedling!
    ×

    帕依提提提温馨提示

    该数据集正在整理中,为您准备了其他渠道,请您使用

    注:部分数据正在处理中,未能直接提供下载,还请大家理解和支持。
    暂无相关内容。
    暂无相关内容。
    • 分享你的想法
    去分享你的想法~~

    全部内容

      欢迎交流分享
      开始分享您的观点和意见,和大家一起交流分享.
    所需积分:0 去赚积分?
    • 536浏览
    • 0下载
    • 0点赞
    • 收藏
    • 分享