DBWorld e-mails

Donated on 11/5/2011

It contains 64 e-mails which I have manually collected from DBWorld mailing list. They are classified in: 'announces of conferences' and 'everything else'.

Dataset Characteristics

Text

Subject Area

Computer Science

Associated Tasks

Classification

Feature Type

-

# Instances

64

# Features

4702

Dataset Information

Additional Information

I collected 64 e-mails from DBWorld newsletter and I used them to train different algorithms in order to classify between 'announces of conferences' and 'everything else'. I used a binary bag-of-words representation with a stopword removal pre-processing task before.

Has Missing Values?

No

Variable Information

Each attribute corresponds to a precise word or stem in the entire data set vocabulary (I used bag-of-words representation).

Dataset Files

FileSize
WEKA/dbworld_bodies.arff704 KB
WEKA/dbworld_bodies_stemmed.arff552.9 KB
MATLAB/dbworld_bodies.mat40.2 KB
WEKA/dbworld_subjects.arff36.3 KB
WEKA/dbworld_subjects_stemmed.arff34.1 KB

0 to 5 of 9

Reviews

There are no reviews for this dataset yet.

Login to Write a Review
Download (153 KB)
0 citations
1286 views

Creators

Michele Filannino

License

By using the UCI Machine Learning Repository, you acknowledge and accept the cookies and privacy practices used by the UCI Machine Learning Repository.

Read Policy