URL Reputation

Donated on 10/14/2009

Anonymized 120-day subset of the ICML-09 URL data containing 2.4 million examples and 3.2 million features.

Dataset Characteristics

Multivariate, Time-Series

Subject Area

Computer Science

Associated Tasks

Classification

Feature Type

Integer, Real

# Instances

2396130

# Features

3231961

Dataset Information

Additional Information

Uncompressing the archive url_svmlight.tar.gz will yield a directory url_svmlight/ containing the following files: * FeatureTypes --- A text file list of feature indices that correspond to real-valued features. * DayX.svm (where X is an integer from 0 to 120) --- The data for day X in SVM-light format. A label of +1 corresponds to a malicious URL and -1 corresponds to a benign URL.

Has Missing Values?

Variable Information

Attributes are anonymized, but correspond to lexical and host-based features gathered for each URL.

Dataset Files

File	Size
url_svmlight.tar.gz	233.7 MB
url.names	2 KB

Reviews

There are no reviews for this dataset yet.

Download (233.7 MB)

0 citations

3923 views

Creators

Justin Ma

Lawrence Saul

Stefan Savage

Geoffrey Voelker

DOI

10.24432/C5H89Q

License

This dataset is licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0) license.

This allows for the sharing and adaptation of the datasets for any purpose, provided that the appropriate credit is given.