Internet Advertisements
Donated on 6/30/1998
This dataset represents a set of possible advertisements on Internet pages.
Dataset Characteristics
Multivariate
Subject Area
Computer Science
Associated Tasks
Classification
Feature Type
Categorical, Integer, Real
# Instances
3279
# Features
1558
Dataset Information
Additional Information
This dataset represents a set of possible advertisements on Internet pages. The features encode the geometry of the image (if available) as well as phrases occuring in the URL, the image's URL and alt text, the anchor text, and words occuring near the anchor text. The task is to predict whether an image is an advertisement ("ad") or not ("nonad").
Has Missing Values?
Yes
Variables Table
Variable Name | Role | Type | Description | Units | Missing Values |
---|---|---|---|---|---|
origurl*hevern+psychref | Feature | Categorical | no | ||
origurl*and | Feature | Categorical | no | ||
origurl*pterry+htm | Feature | Categorical | no | ||
origurl*bishop | Feature | Categorical | no | ||
origurl*ora.com | Feature | Categorical | no | ||
origurl*www.nyx.net | Feature | Categorical | no | ||
origurl*www.yahoo.co.uk | Feature | Categorical | no | ||
origurl*www.truluck.com | Feature | Categorical | no | ||
url*media | Feature | Categorical | no | ||
url*peace+images | Feature | Categorical | no |
0 to 10 of 1559
Additional Variable Information
(3 continous; others binary; this is the "STANDARD encoding" mentioned in the [Kushmerick, 99].) One or more of the three continous features are missing in 28% of the instances; missing values should be interpreted as "unknown".
Baseline Model Performance
Dataset Files
File | Size |
---|---|
ad.data | 9.8 MB |
ad.names | 34.7 KB |
ad.DOCUMENTATION | 2.1 KB |
Papers Citing this Dataset
Sort by Year, desc
By Rudrajit Das, Aditya Golatkar, Suyash Awate. 2018
Published in ArXiv.
0 to 1 of 1
Reviews
There are no reviews for this dataset yet.
pip install ucimlrepo
from ucimlrepo import fetch_ucirepo # fetch dataset internet_advertisements = fetch_ucirepo(id=51) # data (as pandas dataframes) X = internet_advertisements.data.features y = internet_advertisements.data.targets # metadata print(internet_advertisements.metadata) # variable information print(internet_advertisements.variables)
Kushmerick, N. (1999). Internet Advertisements [Dataset]. UCI Machine Learning Repository. https://doi.org/10.24432/C5V011.
Creators
Nicholas Kushmerick
DOI
License
This dataset is licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0) license.
This allows for the sharing and adaptation of the datasets for any purpose, provided that the appropriate credit is given.