YouTube Spam Collection
Donated on 3/25/2017
It is a public set of comments collected for spam research. It has five datasets composed by 1,956 real messages extracted from five videos that were among the 10 most viewed on the collection period.
Dataset Characteristics
Text
Subject Area
Computer Science
Associated Tasks
Classification
Feature Type
-
# Instances
1956
# Features
3
Dataset Information
Additional Information
The table below lists the datasets, the YouTube video ID, the amount of samples in each class and the total number of samples per dataset. Dataset --- YouTube ID -- # Spam - # Ham - Total Psy ------- 9bZkp7q19f0 --- 175 --- 175 --- 350 KatyPerry - CevxZvSJLk8 --- 175 --- 175 --- 350 LMFAO ----- KQ6zr6kCPj8 --- 236 --- 202 --- 438 Eminem ---- uelHwf8o7_U --- 245 --- 203 --- 448 Shakira --- pRpeEdMmmQ0 --- 174 --- 196 --- 370 Note: the chronological order of the comments were kept.
Has Missing Values?
No
Variables Table
Variable Name | Role | Type | Description | Units | Missing Values |
---|---|---|---|---|---|
VIDEO | ID | Categorical | no | ||
COMMENT_ID | ID | Categorical | no | ||
AUTHOR | Feature | Categorical | no | ||
DATE | Feature | Categorical | no | ||
CONTENT | Feature | Categorical | no | ||
CLASS | Target | Binary | no |
0 to 6 of 6
Additional Variable Information
The collection is composed by one CSV file per dataset, where each line has the following attributes: COMMENT_ID,AUTHOR,DATE,CONTENT,TAG We offer one example bellow: z12oglnpoq3gjh4om04cfdlbgp2uepyytpw0k,Francisco Nora,2013-11-28T19:52:35,please like :D https://premium.easypromosapp.com/voteme/19924/616375350,1
Dataset Files
File | Size |
---|---|
Youtube04-Eminem.csv | 81 KB |
Youtube05-Shakira.csv | 71 KB |
Youtube03-LMFAO.csv | 62.9 KB |
Youtube02-KatyPerry.csv | 62.8 KB |
Youtube01-Psy.csv | 56.1 KB |
0 to 5 of 10
Reviews
There are no reviews for this dataset yet.
pip install ucimlrepo
from ucimlrepo import fetch_ucirepo # fetch dataset youtube_spam_collection = fetch_ucirepo(id=380) # data (as pandas dataframes) X = youtube_spam_collection.data.features y = youtube_spam_collection.data.targets # metadata print(youtube_spam_collection.metadata) # variable information print(youtube_spam_collection.variables)
Alberto, T. & Lochter, J. (2015). YouTube Spam Collection [Dataset]. UCI Machine Learning Repository. https://doi.org/10.24432/C58885.
Creators
T.C. Alberto
J.V. Lochter
DOI
License
This dataset is licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0) license.
This allows for the sharing and adaptation of the datasets for any purpose, provided that the appropriate credit is given.