'Identifying Malicious URLs: An Application of Large-Scale online Learning' (ICML-09)
Justin Ma, Lawrence K. Saul, Stefan Savage, Geoffrey M. Voelker
Please visit [http://sysnet.ucsd.edu/projects/url/] for more information.
Data Set Information:
Uncompressing the archive url_svmlight.tar.gz will yield a directory url_svmlight/ containing the following files:
* FeatureTypes --- A text file list of feature indices that correspond to real-valued features.
* DayX.svm (where X is an integer from 0 to 120) --- The data for day X in SVM-light format. A label of +1 corresponds to a malicious URL and -1 corresponds to a benign URL.
Attribute Information:
Attributes are anonymized, but correspond to lexical and host-based features gathered for each URL.
Relevant Papers:
Citation Request:
If you use this data set in published work, please cite the ICML-09 paper in which it was first introduced and described:
Justin Ma, Lawrence K. Saul, Stefan Savage, and Geoffrey M. Voelker,
Identifying Suspicious URLs: An Application of Large-Scale online Learning
Proceedings of the International Conference on Machine Learning (ICML), pages 681-688, Montreal, Quebec, June 2009.
