# Context
This data set contains collision data for car accidents in Canada from 1999-2014 as provided by [Transport Canada][1]. This dataset provides various features such as time of day, whether or not there were fatalities, driver gender, etc. The codes for the different categories can be found in 'drivingLegend.pdf'. The original csv file is no longer available, however it can be downloaded in portions by selecting the various features using this [portal][2].
Each feature is 100% categorical data, with some features having 2 categories, while others can have 30+. The data is **not** completely imputed appropriately (you can thank Stats Canada), so some data preprocessing is required. For instance, categories may have duplicates in the form of '01' and '1', or some data may be formatted as integers while others are formatted as strings. Some data is not known and is marked accordingly in 'drivingLegend.pdf'. Unfortunately, features such as location and impaired driving are not a part of this feature set, however there are plenty of others to work with.
This data is provided by [Transport Canada][3] and [Statistics Canada][4]. This data is provided under the [Statistics Canada Open License Agreement][5].
Questions of particular interest:
- What are the main contributing factors to accident fatalities?
- Can a machine learning classifier be used to predict fatalities?
**Note:** If attempting to predict fatalities, the data is highly skewed towards non-fatalities.
[1]: http://www.tc.gc.ca/eng/menu.htm
[2]: http://wwwapps2.tc.gc.ca/Saf-Sec-Sur/7/NCDB-BNDC/p.aspx?l=en&c=100-1-0
[3]: http://www.tc.gc.ca/eng/menu.htm
[4]: http://www.statcan.gc.ca/eng/start
[5]: http://www.statcan.gc.ca/eng/reference/licence
