DBWorld e-mails
Donated on 11/5/2011
It contains 64 e-mails which I have manually collected from DBWorld mailing list. They are classified in: 'announces of conferences' and 'everything else'.
Dataset Characteristics
Text
Subject Area
Computer Science
Associated Tasks
Classification
Feature Type
-
# Instances
64
# Features
4702
Dataset Information
Additional Information
I collected 64 e-mails from DBWorld newsletter and I used them to train different algorithms in order to classify between 'announces of conferences' and 'everything else'. I used a binary bag-of-words representation with a stopword removal pre-processing task before.
Has Missing Values?
No
Variable Information
Each attribute corresponds to a precise word or stem in the entire data set vocabulary (I used bag-of-words representation).
Dataset Files
File | Size |
---|---|
WEKA/dbworld_bodies.arff | 704 KB |
WEKA/dbworld_bodies_stemmed.arff | 552.9 KB |
MATLAB/dbworld_bodies.mat | 40.2 KB |
WEKA/dbworld_subjects.arff | 36.3 KB |
WEKA/dbworld_subjects_stemmed.arff | 34.1 KB |
0 to 5 of 9
Reviews
There are no reviews for this dataset yet.
pip install ucimlrepo
from ucimlrepo import fetch_ucirepo # fetch dataset dbworld_e_mails = fetch_ucirepo(id=219) # data (as pandas dataframes) X = dbworld_e_mails.data.features y = dbworld_e_mails.data.targets # metadata print(dbworld_e_mails.metadata) # variable information print(dbworld_e_mails.variables)
Filannino, M. (2011). DBWorld e-mails [Dataset]. UCI Machine Learning Repository. https://doi.org/10.24432/C5589M.
Creators
Michele Filannino
DOI
License
This dataset is licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0) license.
This allows for the sharing and adaptation of the datasets for any purpose, provided that the appropriate credit is given.