Center for Machine Learning and Intelligent Systems
About  Citation Policy  Donate a Data Set  Contact


Repository Web            Google
View ALL Data Sets

× Check out the beta version of the new UCI Machine Learning Repository we are currently testing! Contact us if you have any issues, questions, or concerns. Click here to try out the new site.

Farm Ads Data Set
Download: Data Folder, Data Set Description

Abstract: This data was collected from text ads found on twelve websites that deal with various farm animal related topics. The binary labels are based on whether or not the content owner approves of the ad.

Data Set Characteristics:  

Text

Number of Instances:

4143

Area:

Business

Attribute Characteristics:

N/A

Number of Attributes:

54877

Date Donated

2011-10-18

Associated Tasks:

Classification

Missing Values?

N/A

Number of Web Hits:

94800


Source:

Chris Mesterharm and Michael J. Pazzani
Rutgers, The State University of New Jersey
mesterha '@' cs.rutgers.edu


Data Set Information:

This data was collected from text ads found on twelve websites that deal with various farm animal related topics. Information from the ad creative and the ad landing page is included. The binary labels are based on whether or not the content owner approves of the ad.

For each ad, we include the words on the ad creative and the words from the landing page. Each word from the creative is given a prefix
of 'ad-'. Title and header HTML markups are noted in a similar way in the text of the landing page. We have already performed stemming and
stop word removal. Each ad is on a single line. The first word in the line is the label of the instance. It is 1 for accepted ads and -1 for rejected ads.

We have also included a straightforward bag-of-words representation of our data. We use the SVMlight sparse vector format. The first value
is the label followed by every nonzero attribute. Each of these attributes is encoded as index:value. This is the representation used for the relevant paper cited below.


Attribute Information:

Text words in file farm-ads. SVMlight format sparse vectors in file farm-ads-vect.


Relevant Papers:

Active Learning using On-line Algorithms. Chris Mesterharm, Michael J. Pazzani. In KDD 2011.



Citation Request:

Please refer to the Machine Learning Repository's citation policy


Supported By:

 In Collaboration With:

About  ||  Citation Policy  ||  Donation Policy  ||  Contact  ||  CML