Farm Ads
Donated on 10/17/2011
This data was collected from text ads found on twelve websites that deal with various farm animal related topics. The binary labels are based on whether or not the content owner approves of the ad.
Dataset Characteristics
Text
Subject Area
Business
Associated Tasks
Classification
Feature Type
-
# Instances
4143
# Features
54877
Dataset Information
Additional Information
This data was collected from text ads found on twelve websites that deal with various farm animal related topics. Information from the ad creative and the ad landing page is included. The binary labels are based on whether or not the content owner approves of the ad. For each ad, we include the words on the ad creative and the words from the landing page. Each word from the creative is given a prefix of 'ad-'. Title and header HTML markups are noted in a similar way in the text of the landing page. We have already performed stemming and stop word removal. Each ad is on a single line. The first word in the line is the label of the instance. It is 1 for accepted ads and -1 for rejected ads. We have also included a straightforward bag-of-words representation of our data. We use the SVMlight sparse vector format. The first value is the label followed by every nonzero attribute. Each of these attributes is encoded as index:value. This is the representation used for the relevant paper cited below.
Has Missing Values?
No
Variable Information
Text words in file farm-ads. SVMlight format sparse vectors in file farm-ads-vect.
Dataset Files
File | Size |
---|---|
farm-ads | 13MB |
farm-ads-vect | 5.6MB |
Reviews
There are no reviews for this dataset yet.
pip install ucimlrepo
from ucimlrepo import fetch_ucirepo # fetch dataset farm_ads = fetch_ucirepo(id=218) # data (as pandas dataframes) X = farm_ads.data.features y = farm_ads.data.targets # metadata print(farm_ads.metadata) # variable information print(farm_ads.variables)
Mesterharm, C. & Pazzani, M. (2011). Farm Ads [Dataset]. UCI Machine Learning Repository. https://doi.org/10.24432/C5ZC8D.
Creators
Chris Mesterharm
Michael Pazzani
DOI
License
This dataset is licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0) license.
This allows for the sharing and adaptation of the datasets for any purpose, provided that the appropriate credit is given.