URL Reputation
Donated on 10/14/2009
Anonymized 120-day subset of the ICML-09 URL data containing 2.4 million examples and 3.2 million features.
Dataset Characteristics
Multivariate, Time-Series
Subject Area
Computer Science
Associated Tasks
Classification
Feature Type
Integer, Real
# Instances
2396130
# Features
3231961
Dataset Information
Additional Information
Uncompressing the archive url_svmlight.tar.gz will yield a directory url_svmlight/ containing the following files: * FeatureTypes --- A text file list of feature indices that correspond to real-valued features. * DayX.svm (where X is an integer from 0 to 120) --- The data for day X in SVM-light format. A label of +1 corresponds to a malicious URL and -1 corresponds to a benign URL.
Has Missing Values?
No
Variable Information
Attributes are anonymized, but correspond to lexical and host-based features gathered for each URL.
Dataset Files
File | Size |
---|---|
url_svmlight.tar.gz | 233.7 MB |
url.names | 2 KB |
Reviews
There are no reviews for this dataset yet.
pip install ucimlrepo
from ucimlrepo import fetch_ucirepo # fetch dataset url_reputation = fetch_ucirepo(id=187) # data (as pandas dataframes) X = url_reputation.data.features y = url_reputation.data.targets # metadata print(url_reputation.metadata) # variable information print(url_reputation.variables)
Ma, J., Saul, L., Savage, S., & Voelker, G. (2009). URL Reputation [Dataset]. UCI Machine Learning Repository. https://doi.org/10.24432/C5H89Q.
Creators
Justin Ma
Lawrence Saul
Stefan Savage
Geoffrey Voelker
DOI
License
This dataset is licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0) license.
This allows for the sharing and adaptation of the datasets for any purpose, provided that the appropriate credit is given.