数据结构 ?
Data Structure ?
This database contains a subset of the [Memetracker](http://snap.stanford.edu/data/memetracker9.html) dataset collected by [SNAP](http://snap.stanford.edu/index.html).
The full Memetracker dataset has observations broken into months. Because of size considerations, however, this version consists of one-half of a month: the first 15 days of Memetracker observations from November 2008.
## About
Memetracker tracks the quotes and phrases that appear most frequently over time across the entire online news spectrum. This makes it possible to see how different stories compete for news and blog coverage each day, and how certain stories persist while others fade quickly.
Overall Memetracker tracks more than 17 million different phrases and about 54% of the total phrase/quote mentions appear on blogs and 46% in news media.
## Acknowledgments
This dataset was collected by the Stanford Network Analysis Project. Detailed information about the data and its analysis can be found at the website [here](http://snap.stanford.edu/data/memetracker9.html).
An analysis of this dataset was published here:
J. Leskovec, L. Backstrom, J. Kleinberg. [Meme-tracking and the Dynamics of the News Cycle](http://cs.stanford.edu/people/jure/pubs/quotes-kdd09.pdf). ACM SIGKDD Intl. Conf. on Knowledge Discovery and Data Mining, 2009.
## The Data
The SQLite database contains three tables:
articles: 4,542,920 records, with the following fields:
- **article_id**: a unique id for the article (int)
- **url**: the URL of the article (text)
- **date**: the date of the article (text), in the strptime format '%Y-%m-%d %H:%M:%S'
quotes: 7,956,125 records, with the following fields:
- **article_id**: unique id for the article that this quote was found in (int)
- **phrase**: the high-frequency phrase found in the article (text)
links: 16,727,125 records, with the following fields:
- **article_id**: unique id for the article that this link was found in (int)
- **link_out**: the URL of the link out (text)
- **link_out_id**: unique id for the target article (int), if it exists; else NULL
