Select Language

AI社区

公开数据集

去坚果存档

去坚果存档

272.4M
210 浏览
0 喜欢
0 次下载
0 条讨论
Earth and Nature,Education,Programming,Food,Linguistics Classification

数据结构 ? 272.4M

    Data Structure ?

    * 以上分析是由系统提取分析形成的结果,具体实际数据为准。

    README.md

    Context I wanted to create a dataset to practice NLP using Go. go-nuts is the official Go mailing list, and it's messages are an ideal candidate. Content This dataset was created by crawling Google Groups using [ggmbox](https://github.com/vmarkovtsev/ggmbox). It is possible to fetch raw emails given their ID, so first all the topics were discovered and listed and then individual emails fetched. This dataset has **31432 topics** with **200785 messages**. `golang-nuts.tar.xz` is the fetched emails as of 2018/12/09 and `golang-nuts.json.gz` is the metadata of each discussion topic. There is also `threads.csv.gz` with plain text messages per topic, in logical order and with some filtering performed. E.g. citations were removed. If you want to improve it or write a custom information extractor, refer to [parse.go](https://github.com/vmarkovtsev/ggmbox/blob/master/parse.go). Python users: emails can be loaded with [`email.message_from_file()`](https://docs.python.org/3/library/email.parser.html#email.message_from_file) but some additional work may be required to decode base64-encoded parts of some files, see [MIME](https://en.wikipedia.org/wiki/MIME). Acknowledgements The rights on the email contents belong to their respective authors. Idiomatic Go reviews of ggmbox code were done (and not finished yet!) by [Francesc Campoy](https://twitter.com/francesc). Crawling speed is thanks to [Scrapy](https://scrapy.org/). Hardware and mental support by [source{d}](https://sourced.tech). Idea inspired by [GopherCon Russia](https://www.gophercon-russia.ru/). See also: [chromium-dev archive](https://www.kaggle.com/vmarkovtsev/chromiumdev-archive).
    ×

    帕依提提提温馨提示

    该数据集正在整理中,为您准备了其他渠道,请您使用

    注:部分数据正在处理中,未能直接提供下载,还请大家理解和支持。
    暂无相关内容。
    暂无相关内容。
    • 分享你的想法
    去分享你的想法~~

    全部内容

      欢迎交流分享
      开始分享您的观点和意见,和大家一起交流分享.
    所需积分:0 去赚积分?
    • 210浏览
    • 0下载
    • 0点赞
    • 收藏
    • 分享