Phishing Websites
Donated on 3/25/2015
This dataset collected mainly from: PhishTank archive, MillerSmiles archive, Google’s searching operators.
Dataset Characteristics
Tabular
Subject Area
Computer Science
Associated Tasks
Classification
Feature Type
Integer
# Instances
11055
# Features
30
Dataset Information
Additional Information
One of the challenges faced by our research was the unavailability of reliable training datasets. In fact this challenge faces any researcher in the field. However, although plenty of articles about predicting phishing websites have been disseminated these days, no reliable training dataset has been published publically, may be because there is no agreement in literature on the definitive features that characterize phishing webpages, hence it is difficult to shape a dataset that covers all possible features. In this dataset, we shed light on the important features that have proved to be sound and effective in predicting phishing websites. In addition, we propose some new features.
Has Missing Values?
No
Introductory Paper
By R. Mohammad, F. Thabtah, L. Mccluskey. 2012
Published in International Conference for Internet Technology and Secured Transactions
Variables Table
Variable Name | Role | Type | Description | Units | Missing Values |
---|---|---|---|---|---|
having_ip_address | Feature | Integer | no | ||
url_length | Feature | Integer | no | ||
shortining_service | Feature | Integer | no | ||
having_at_symbol | Feature | Integer | no | ||
double_slash_redirecting | Feature | Integer | no | ||
prefix_suffix | Feature | Integer | no | ||
having_sub_domain | Feature | Integer | no | ||
sslfinal_state | Feature | Integer | no | ||
domain_registration_length | Feature | Integer | no | ||
favicon | Feature | Integer | no |
0 to 10 of 31
Additional Variable Information
For Further information about the features see the features file in the data folder.
Dataset Files
File | Size |
---|---|
Training Dataset.arff | 782.1 KB |
.old.arff | 166.9 KB |
Phishing Websites Features.docx | 38.1 KB |
Reviews
There are no reviews for this dataset yet.
pip install ucimlrepo
from ucimlrepo import fetch_ucirepo # fetch dataset phishing_websites = fetch_ucirepo(id=327) # data (as pandas dataframes) X = phishing_websites.data.features y = phishing_websites.data.targets # metadata print(phishing_websites.metadata) # variable information print(phishing_websites.variables)
Mohammad, R. & McCluskey, L. (2012). Phishing Websites [Dataset]. UCI Machine Learning Repository. https://doi.org/10.24432/C51W2X.
Keywords
Creators
Rami Mohammad
Lee McCluskey
DOI
License
This dataset is licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0) license.
This allows for the sharing and adaptation of the datasets for any purpose, provided that the appropriate credit is given.