Select Language





536 浏览
0 喜欢
0 次下载
0 条讨论
Education,Biology,Linguistics,Languages,Culture and Humanities Classification

数据结构 ? 5.75M

    Data Structure ?

    * 以上分析是由系统提取分析形成的结果,具体实际数据为准。

    Context A broad-coverage corpus such as the [Human Language Project envisioned by Abney and Bird (2010)]( would be a powerful resource for the study of endangered languages. SeedLing was created as a seed corpus for the Human Language Project to cover a broad range of languages (Guy et al. 2014). TAUS (Translation Automation User Society) also see the [importance of the Human Language Project in the context of keeping up with the demand for capacity and speed for translation]( TAUS' definition of the Human Language Project can be found on A detailed explanation of how to use the corpus can be found on Content The SeedLing corpus on this repository includes the data from: - **ODIN**: Online Database of Interlinear Text - **Omniglot**: Useful foreign phrases from - **UDHR**: Universal Declaration of Human Rights Acknowledgements **Citation**: Guy Emerson, Liling Tan, Susanne Fertmann, Alexis Palmer and Michaela Regneri . 2014. SeedLing: Building and using a seed corpus for the Human Language Project. In Proceedings of The use of Computational methods in the study of Endangered Languages (ComputEL) Workshop. Baltimore, USA. @InProceedings{seedling2014, author = {Guy Emerson, Liling Tan, Susanne Fertmann, Alexis Palmer and Michaela Regneri}, title = {SeedLing: Building and using a seed corpus for the Human Language Project}, booktitle = {Proceedings of The use of Computational methods in the study of Endangered Languages (ComputEL) Workshop}, month = {June}, year = {2014}, address = {Baltimore, USA}, publisher = {Association for Computational Linguistics}, pages = {}, url = {} } **References**: Steven Abney and Steven Bird. 2010. The Human Language Project: Building a universal corpus of the world’s languages. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pages 88–97. Sime Ager. Omniglot - writing systems and languages of the world. Retrieved from William D Lewis and Fei Xia. 2010. Developing ODIN: A multilingual repository of annotated language data for hundreds of the world’s languages. Literary and Linguistic Computing, 25(3):303–319. UN General Assembly, Universal Declaration of Human Rights, 10 December 1948, 217 A (III), available at: [accessed 26 April 2014] Inspiration This corpus was created in a span a semester in Saarland University by a linguist, a mathematician, a data geek and two amazing mentors from the [COLI department]( It wouldn't have been possible without the cross-disciplinary synergy and the common goal we had. - Expand/Explore the Human Language Project. - Go to the field and record/document their language. Make them computationally readable. - Grow the Seedling!



    • 分享你的想法


    所需积分:0 去赚积分?
    • 536浏览
    • 0下载
    • 0点赞
    • 收藏
    • 分享