Center for Machine Learning and Intelligent Systems
About  Citation Policy  Donate a Data Set  Contact


Repository Web            Google
View ALL Data Sets

DBWorld e-mails Data Set
Download: Data Folder, Data Set Description

Abstract: It contains 64 e-mails which I have manually collected from DBWorld mailing list. They are classified in: 'announces of conferences' and 'everything else'.

Data Set Characteristics:  

Text

Number of Instances:

64

Area:

Computer

Attribute Characteristics:

N/A

Number of Attributes:

4702

Date Donated

2011-11-06

Associated Tasks:

Classification

Missing Values?

N/A

Number of Web Hits:

26477


Source:

Michele Filannino, PhD
University of Manchester
Centre for Doctoral Training
Email: filannim_AT_cs.man.ac.uk


Data Set Information:

I collected 64 e-mails from DBWorld newsletter and I used them to train different algorithms in order to classify between 'announces of conferences' and 'everything else'. I used a binary bag-of-words representation with a stopword removal pre-processing task before.


Attribute Information:

Each attribute corresponds to a precise word or stem in the entire data set vocabulary (I used bag-of-words representation).


Relevant Papers:

Michele Filannino, 'DBWorld e-mail classification using a very small corpus', Project of Machine Learning course, University of Manchester, 2011. [Web link]



Citation Request:

Thanks to ACM-SIGMOD for its useful service! :)


Supported By:

 In Collaboration With:

About  ||  Citation Policy  ||  Donation Policy  ||  Contact  ||  CML