DBWorld e-mails
Donated on 11/5/2011
It contains 64 e-mails which I have manually collected from DBWorld mailing list. They are classified in: 'announces of conferences' and 'everything else'.
Dataset Characteristics
Text
Subject Area
Computer Science
Associated Tasks
Classification
Feature Type
-
# Instances
64
# Features
4702
Dataset Information
Additional Information
I collected 64 e-mails from DBWorld newsletter and I used them to train different algorithms in order to classify between 'announces of conferences' and 'everything else'. I used a binary bag-of-words representation with a stopword removal pre-processing task before.
Has Missing Values?
No
Variable Information
Each attribute corresponds to a precise word or stem in the entire data set vocabulary (I used bag-of-words representation).
Reviews
There are no reviews for this dataset yet.
pip install ucimlrepo
from ucimlrepo import fetch_ucirepo # fetch dataset dbworld_e_mails = fetch_ucirepo(id=219) # data (as pandas dataframes) X = dbworld_e_mails.data.features y = dbworld_e_mails.data.targets # metadata print(dbworld_e_mails.metadata) # variable information print(dbworld_e_mails.variables)
Filannino,Michele. (2011). DBWorld e-mails. UCI Machine Learning Repository. https://doi.org/10.24432/C5589M.
@misc{misc_dbworld_e-mails_219, author = {Filannino,Michele}, title = {{DBWorld e-mails}}, year = {2011}, howpublished = {UCI Machine Learning Repository}, note = {{DOI}: https://doi.org/10.24432/C5589M} }
Creators
Michele Filannino
DOI
License
This dataset is licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0) license.
This allows for the sharing and adaptation of the datasets for any purpose, provided that the appropriate credit is given.