The [IndieWeb](https://indieweb.org/) is a people-focused alternative to the "corporate" web. Participants use their own personal web sites to post, reply, share, organize events and RSVP, and interact in online social networking in ways that have otherwise been limited to centralized silos like Facebook and Twitter.
The [Indie Map](http://www.indiemap.org/) dataset is a social network of the 2300 most active IndieWeb sites, including all connections between sites and number of links in each direction, broken down by type. It includes:
* 5.8M web pages, including raw HTML, parsed microformats2, and extracted links with metadata.
* 631M links and 706K "friend" relationships between sites.
* 380GB of HTML and HTTP requests in WARC format.
The zip file here contains a JSON file for each site, which includes metadata, a list of other sites linked to and from, and the number of links of each type.
The complete dataset of 5.8M HTML pages is available in a [publicly accessible Google BigQuery dataset](https://bigquery.cloud.google.com/dataset/indie-map:indiemap). The raw pages can also be downloaded as WARC files. [They're hosted on Google Cloud Storage.](https://console.cloud.google.com/storage/browser/indie-map/)
[More details in the full documentation.](http://www.indiemap.org/docs.html)
Indie Map is free, [open source](https://github.com/snarfed/indie-map), and placed into the public domain via [CC0](https://creativecommons.org/share-your-work/public-domain/cc0/). Crawled content remains the property of each site's owner and author, and subject to their existing copyrights.
