Website Phishing
Donated on 11/1/2016
Dataset Characteristics
Multivariate
Subject Area
Computer Science
Associated Tasks
Classification
Feature Type
Integer
# Instances
1353
# Features
9
Dataset Information
Additional Information
The phishing problem is considered a vital issue in the e-commerce industry especially e-banking and e-commerce taking the number of online transactions involving payments. We have identified different features related to legitimate and phishy websites and collected 1353 different websites from difference sources.Phishing websites were collected from Phishtank data archive (www.phishtank.com), which is a free community site where users can submit, verify, track and share phishing data. The legitimate websites were collected from Yahoo and starting point directories using a web script developed in PHP. The PHP script was plugged with a browser and we collected 548 legitimate websites out of 1353 websites. There is 702 phishing URLs, and 103 suspicious URLs. When a website is considered SUSPICIOUS that means it can be either phishy or legitimate, meaning the website held some legit and phishy features.
Has Missing Values?
No
Introductory Paper
By Neda Abdelhamid, A. Ayesh, F. Thabtah. 2014
Published in Expert systems with applications
Variables Table
Variable Name | Role | Type | Description | Units | Missing Values |
---|---|---|---|---|---|
SFH | Feature | Integer | no | ||
popUpWindow | Feature | Integer | no | ||
SSLfinal_State | Feature | Integer | no | ||
Request_URL | Feature | Integer | no | ||
URL_of_Anchor | Feature | Integer | no | ||
web_traffic | Feature | Integer | no | ||
URL_Length | Feature | Integer | no | ||
age_of_domain | Feature | Integer | no | ||
having_IP_Address | Feature | Integer | no | ||
Result | Target | Integer | no |
0 to 10 of 10
Additional Variable Information
URL Anchor Request URL SFH URL Length Having ’@’ Prefix/Suffix IP Sub Domain Web traffic Domain age Class collected features hold the categorical values , “Legitimateâ€, â€Suspicious†and “Phishyâ€, these values have been replaced with numerical values 1,0 and -1 respectively. details of each feature are mentioned in the research paper mentioned below
Dataset Files
File | Size |
---|---|
PhishingData.arff | 32.8 KB |
Reviews
There are no reviews for this dataset yet.
pip install ucimlrepo
from ucimlrepo import fetch_ucirepo # fetch dataset website_phishing = fetch_ucirepo(id=379) # data (as pandas dataframes) X = website_phishing.data.features y = website_phishing.data.targets # metadata print(website_phishing.metadata) # variable information print(website_phishing.variables)
Abdelhamid, N. (2014). Website Phishing [Dataset]. UCI Machine Learning Repository. https://doi.org/10.24432/C5B301.
Keywords
Creators
Neda Abdelhamid
DOI
License
This dataset is licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0) license.
This allows for the sharing and adaptation of the datasets for any purpose, provided that the appropriate credit is given.