公开数据集
数据结构 ? 272.4M
Data Structure ?
* 以上分析是由系统提取分析形成的结果,具体实际数据为准。
README.md
Context
I wanted to create a dataset to practice NLP using Go. go-nuts is the official Go mailing list, and it's messages are an ideal candidate.
Content
This dataset was created by crawling Google Groups using [ggmbox](https://github.com/vmarkovtsev/ggmbox). It is possible to fetch raw emails given their ID, so first all the topics were discovered and listed and then individual emails fetched. This dataset has **31432 topics** with **200785 messages**.
`golang-nuts.tar.xz` is the fetched emails as of 2018/12/09 and `golang-nuts.json.gz` is the metadata of each discussion topic. There is also `threads.csv.gz` with plain text messages per topic, in logical order and with some filtering performed. E.g. citations were removed. If you want to improve it or write a custom information extractor, refer to [parse.go](https://github.com/vmarkovtsev/ggmbox/blob/master/parse.go). Python users: emails can be loaded with [`email.message_from_file()`](https://docs.python.org/3/library/email.parser.html#email.message_from_file) but some additional work may be required to decode base64-encoded parts of some files, see [MIME](https://en.wikipedia.org/wiki/MIME).
Acknowledgements
The rights on the email contents belong to their respective authors. Idiomatic Go reviews of ggmbox code were done (and not finished yet!) by [Francesc Campoy](https://twitter.com/francesc). Crawling speed is thanks to [Scrapy](https://scrapy.org/). Hardware and mental support by [source{d}](https://sourced.tech). Idea inspired by [GopherCon Russia](https://www.gophercon-russia.ru/).
See also: [chromium-dev archive](https://www.kaggle.com/vmarkovtsev/chromiumdev-archive).
×
帕依提提提温馨提示
该数据集正在整理中,为您准备了其他渠道,请您使用
注:部分数据正在处理中,未能直接提供下载,还请大家理解和支持。
暂无相关内容。
暂无相关内容。
- 分享你的想法
去分享你的想法~~
全部内容
欢迎交流分享
开始分享您的观点和意见,和大家一起交流分享.
数据使用声明:
- 1、该数据来自于互联网数据采集或服务商的提供,本平台为用户提供数据集的展示与浏览。
- 2、本平台仅作为数据集的基本信息展示、包括但不限于图像、文本、视频、音频等文件类型。
- 3、数据集基本信息来自数据原地址或数据提供方提供的信息,如数据集描述中有描述差异,请以数据原地址或服务商原地址为准。
- 1、本站中的所有数据集的版权都归属于原数据发布者或数据提供方所有。
- 1、如您需要转载本站数据,请保留原数据地址及相关版权声明。
- 1、如本站中的部分数据涉及侵权展示,请及时联系本站,我们会安排进行数据下线。