PhiUSIIL Phishing URL (Website)
Donated on 3/3/2024
PhiUSIIL Phishing URL Dataset is a substantial dataset comprising 134,850 legitimate and 100,945 phishing URLs. Most of the URLs we analyzed, while constructing the dataset, are the latest URLs. Features are extracted from the source code of the webpage and URL. Features such as CharContinuationRate, URLTitleMatchScore, URLCharProb, and TLDLegitimateProb are derived from existing features.
Dataset Characteristics
Tabular
Subject Area
Computer Science
Associated Tasks
Classification
Feature Type
Real, Categorical, Integer
# Instances
235795
# Features
54
Dataset Information
What do the instances in this dataset represent?
URLs and their corresponding webpages
Has Missing Values?
No
Introductory Paper
By Arvind Prasad and Shalini Chandra. 2024
Published in Computers & Security
Variables Table
Variable Name | Role | Type | Description | Units | Missing Values |
---|---|---|---|---|---|
FILENAME | Other | Categorical | no | ||
URL | Feature | Categorical | no | ||
URLLength | Feature | Integer | no | ||
Domain | Feature | Categorical | no | ||
DomainLength | Feature | Integer | no | ||
IsDomainIP | Feature | Integer | no | ||
TLD | Feature | Categorical | no | ||
URLSimilarityIndex | Feature | Integer | no | ||
CharContinuationRate | Feature | Integer | no | ||
TLDLegitimateProb | Feature | Continuous | no |
0 to 10 of 56
Additional Variable Information
Column "FILENAME" can be ignored.
Class Labels
Label 1 corresponds to a legitimate URL, label 0 to a phishing URL
Dataset Files
File | Size |
---|---|
PhiUSIIL_Phishing_URL_Dataset.csv | 54.2 MB |
Reviews
There are no reviews for this dataset yet.
pip install ucimlrepo
from ucimlrepo import fetch_ucirepo # fetch dataset phiusiil_phishing_url_website = fetch_ucirepo(id=967) # data (as pandas dataframes) X = phiusiil_phishing_url_website.data.features y = phiusiil_phishing_url_website.data.targets # metadata print(phiusiil_phishing_url_website.metadata) # variable information print(phiusiil_phishing_url_website.variables)
Prasad, A. & Chandra, S. (2024). PhiUSIIL Phishing URL (Website) [Dataset]. UCI Machine Learning Repository. https://doi.org/10.1016/j.cose.2023.103545.
Creators
Arvind Prasad
arvindbitm@gmail.com
Babashaheb Bhimrao Ambedkar University
Shalini Chandra
Babashaheb Bhimrao Ambedkar University
License
This dataset is licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0) license.
This allows for the sharing and adaptation of the datasets for any purpose, provided that the appropriate credit is given.