URL Reputation

Donated on 10/14/2009

Anonymized 120-day subset of the ICML-09 URL data containing 2.4 million examples and 3.2 million features.

Dataset Characteristics

Multivariate, Time-Series

Subject Area

Computer Science

Associated Tasks

Classification

Feature Type

Integer, Real

# Instances

2396130

# Features

3231961

Dataset Information

Additional Information

Uncompressing the archive url_svmlight.tar.gz will yield a directory url_svmlight/ containing the following files: * FeatureTypes --- A text file list of feature indices that correspond to real-valued features. * DayX.svm (where X is an integer from 0 to 120) --- The data for day X in SVM-light format. A label of +1 corresponds to a malicious URL and -1 corresponds to a benign URL.

Has Missing Values?

No

Variable Information

Attributes are anonymized, but correspond to lexical and host-based features gathered for each URL.

Dataset Files

FileSize
url_svmlight.tar.gz233.7 MB
url.names2 KB

Reviews

There are no reviews for this dataset yet.

Login to Write a Review
Download (233.7 MB)
0 citations
2816 views

Creators

Justin Ma

Lawrence Saul

Stefan Savage

Geoffrey Voelker

License

By using the UCI Machine Learning Repository, you acknowledge and accept the cookies and privacy practices used by the UCI Machine Learning Repository.

Read Policy