YouTube Spam Collection

Donated on 3/25/2017

It is a public set of comments collected for spam research. It has five datasets composed by 1,956 real messages extracted from five videos that were among the 10 most viewed on the collection period.

Dataset Characteristics

Text

Subject Area

Computer Science

Associated Tasks

Classification

Feature Type

-

# Instances

1956

# Features

3

Dataset Information

Additional Information

The table below lists the datasets, the YouTube video ID, the amount of samples in each class and the total number of samples per dataset. Dataset --- YouTube ID -- # Spam - # Ham - Total Psy ------- 9bZkp7q19f0 --- 175 --- 175 --- 350 KatyPerry - CevxZvSJLk8 --- 175 --- 175 --- 350 LMFAO ----- KQ6zr6kCPj8 --- 236 --- 202 --- 438 Eminem ---- uelHwf8o7_U --- 245 --- 203 --- 448 Shakira --- pRpeEdMmmQ0 --- 174 --- 196 --- 370 Note: the chronological order of the comments were kept.

Has Missing Values?

No

Variables Table

Variable NameRoleTypeDescriptionUnitsMissing Values
VIDEOIDCategoricalno
COMMENT_IDIDCategoricalno
AUTHORFeatureCategoricalno
DATEFeatureCategoricalno
CONTENTFeatureCategoricalno
CLASSTargetBinaryno

0 to 6 of 6

Additional Variable Information

The collection is composed by one CSV file per dataset, where each line has the following attributes: COMMENT_ID,AUTHOR,DATE,CONTENT,TAG We offer one example bellow: z12oglnpoq3gjh4om04cfdlbgp2uepyytpw0k,Francisco Nora,2013-11-28T19:52:35,please like :D https://premium.easypromosapp.com/voteme/19924/616375350,1

Reviews

There are no reviews for this dataset yet.

Login to Write a Review
Download
0 citations
11193 views

Creators

T.C. Alberto

J.V. Lochter

License

By using the UCI Machine Learning Repository, you acknowledge and accept the cookies and privacy practices used by the UCI Machine Learning Repository.

Read Policy