# Context
This is *real* real-time bidding data that is used to predict if an advertiser should bid for a marketing slot e.g. a banner on a webpage. Explanatory variables are things like browser, operation system or time of the day the user is online, marketplace his identifiers were traded on earlier, etc. The column **'convert'** is 1, when the person clicked on the ad, and 0 if this is not the case.
# Content
Unfortunately, the data had to be anonymized, so you basically can't do a lot of feature engineering. I just applied PCA and kept 0.99 of the linear explanatory power. However, I think it's still really interesting data to just test your general algorithms on imbalanced data. ;)
# Inspiration
Since it's heavily imbalanced data, it doesn't make sense to train for accuracy, but rather try to get obtain a good AUC, F1Score, MCC or recall rate, by cross-validating your data.
It's interesting to compare different models (logistic regression, decision trees, svms, ...) over these metrics and see the impact that your split in train:test data has on the data.
It might be good strategy to follow these
[Tactics to combat imbalanced classes](http://machinelearningmastery.com/tactics-to-combat-imbalanced-classes-in-your-machine-learning-dataset/).
