# Context
I ran into the [American Presidency Project](http://www.presidency.ucsb.edu) and was inspired by the incredible amount of data that the founders of this project had accumulated. Further, I ran into a few key projects such as [The Wordy Words of Hillary Clinton and Donald Trump](https://medium.com/@omgimanerd/the-wordy-words-of-hillary-clinton-and-donald-trump-59d0bed33b74#.iuls1pky6) and [IBM Watson Compares Trump's Inauguration Speech to Obama's](https://www.linkedin.com/pulse/ibm-watson-compares-trumps-inauguration-speech-obamas-jeremy-waite) that used this data.
The site itself, however, simply has a PHP page that individually returns every one of the 120,000+ documents in an HTML format. My goal was to extract the almost 60,000 documents released by the offices of all of the presidents of the United States, starting with George Washington, in an effort to make this data available to anyone interested in diving into this data set for unique studies and experimentation.
# Content
The data is normalized using two key properties of a document: `President` and `Document Category`. Document categories can include, but are not limited to: `Oral`, `Written`, etc.
Each document has a variety of properties:
- `category` - This `category` field is a further detailed categorial assignment, such as `Address`, `Memo`, etc.
- `subcategory` - `Inaugural`, etc.
- `document_date` - Format: `1861-03-04 00:00:00`
- `title` - Title of the released document.
- `pid` - This value, stored as an integer, can be used to access the original document at the following URL: `http://www.presidency.ucsb.edu/ws/index.php?pid={}`. where `{}` can be replaced with the value in this field.
- `content` - This is the full text of the released document.
A markdown version of this JSON structure can be found on [GitHub](https://github.com/jayrav13/presidency#american-presidency-project).
# Acknowledgements
A HUGE thank you for the data and inspiration to the [American Presidency Project](http://www.presidency.ucsb.edu).
