Phishing Websites

Donated on 3/25/2015

This dataset collected mainly from: PhishTank archive, MillerSmiles archive, Googleâ€™s searching operators.

Dataset Characteristics

Tabular

Subject Area

Computer Science

Associated Tasks

Classification

Feature Type

Integer

# Instances

11055

# Features

Dataset Information

Additional Information

One of the challenges faced by our research was the unavailability of reliable training datasets. In fact this challenge faces any researcher in the field. However, although plenty of articles about predicting phishing websites have been disseminated these days, no reliable training dataset has been published publically, may be because there is no agreement in literature on the definitive features that characterize phishing webpages, hence it is difficult to shape a dataset that covers all possible features. In this dataset, we shed light on the important features that have proved to be sound and effective in predicting phishing websites. In addition, we propose some new features.

Has Missing Values?

Introductory Paper

An assessment of features related to phishing websites using an automated technique

By R. Mohammad, F. Thabtah, L. Mccluskey. 2012

Published in International Conference for Internet Technology and Secured Transactions

Variables Table

Variable Name	Role	Type	Missing Values
having_ip_address	Feature	Integer	no
url_length	Feature	Integer	no
shortining_service	Feature	Integer	no
having_at_symbol	Feature	Integer	no
double_slash_redirecting	Feature	Integer	no
prefix_suffix	Feature	Integer	no
having_sub_domain	Feature	Integer	no
sslfinal_state	Feature	Integer	no
domain_registration_length	Feature	Integer	no
favicon	Feature	Integer	no

Rows per page

0 to 10 of 31

Additional Variable Information

For Further information about the features see the features file in the data folder.

Dataset Files

File	Size
Training Dataset.arff	782.1 KB
.old.arff	166.9 KB
Phishing Websites Features.docx	38.1 KB

Reviews

There are no reviews for this dataset yet.

Download (987.5 KB)

1 citations

70724 views

Keywords

phishing

Creators

Rami Mohammad

Lee McCluskey

DOI

10.24432/C51W2X

License

This dataset is licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0) license.

This allows for the sharing and adaptation of the datasets for any purpose, provided that the appropriate credit is given.