Center for Machine Learning and Intelligent Systems
About  Citation Policy  Donate a Data Set  Contact


Repository Web            Google
View ALL Data Sets

× Check out the beta version of the new UCI Machine Learning Repository we are currently testing! Contact us if you have any issues, questions, or concerns. Click here to try out the new site.

DBWorld e-mails Data Set
Download: Data Folder, Data Set Description

Abstract: It contains 64 e-mails which I have manually collected from DBWorld mailing list. They are classified in: 'announces of conferences' and 'everything else'.

Data Set Characteristics:  

Text

Number of Instances:

64

Area:

Computer

Attribute Characteristics:

N/A

Number of Attributes:

4702

Date Donated

2011-11-06

Associated Tasks:

Classification

Missing Values?

N/A

Number of Web Hits:

63661


Source:

Michele Filannino, PhD
University of Manchester
Centre for Doctoral Training
Email: filannim_AT_cs.man.ac.uk


Data Set Information:

I collected 64 e-mails from DBWorld newsletter and I used them to train different algorithms in order to classify between 'announces of conferences' and 'everything else'. I used a binary bag-of-words representation with a stopword removal pre-processing task before.


Attribute Information:

Each attribute corresponds to a precise word or stem in the entire data set vocabulary (I used bag-of-words representation).


Relevant Papers:

Michele Filannino, 'DBWorld e-mail classification using a very small corpus', Project of Machine Learning course, University of Manchester, 2011. [Web link]



Citation Request:

Thanks to ACM-SIGMOD for its useful service! :)


Supported By:

 In Collaboration With:

About  ||  Citation Policy  ||  Donation Policy  ||  Contact  ||  CML