Internet Advertisements

Donated on 6/30/1998

This dataset represents a set of possible advertisements on Internet pages.

Dataset Characteristics

Multivariate

Subject Area

Computer Science

Associated Tasks

Classification

Feature Type

Categorical, Integer, Real

# Instances

3279

# Features

1558

Dataset Information

Additional Information

This dataset represents a set of possible advertisements on Internet pages. The features encode the geometry of the image (if available) as well as phrases occuring in the URL, the image's URL and alt text, the anchor text, and words occuring near the anchor text. The task is to predict whether an image is an advertisement ("ad") or not ("nonad").

Has Missing Values?

Yes

Variables Table

Variable NameRoleTypeDemographicDescriptionUnitsMissing Values
origurl*hevern+psychrefFeatureCategoricalno
origurl*andFeatureCategoricalno
origurl*pterry+htmFeatureCategoricalno
origurl*bishopFeatureCategoricalno
origurl*ora.comFeatureCategoricalno
origurl*www.nyx.netFeatureCategoricalno
origurl*www.yahoo.co.ukFeatureCategoricalno
origurl*www.truluck.comFeatureCategoricalno
url*mediaFeatureCategoricalno
url*peace+imagesFeatureCategoricalno

0 to 10 of 1559

Additional Variable Information

(3 continous; others binary; this is the "STANDARD encoding" mentioned in the [Kushmerick, 99].) One or more of the three continous features are missing in 28% of the instances; missing values should be interpreted as "unknown".

Baseline Model Performance

Papers Citing this Dataset

Sparse Kernel PCA for Outlier Detection

By Rudrajit Das, Aditya Golatkar, Suyash Awate. 2018

Published in ArXiv.

0 to 1 of 1

Reviews

There are no reviews for this dataset yet.

Login to Write a Review
Download
1 citations
51562 views

Creators

Nicholas Kushmerick

License

By using the UCI Machine Learning Repository, you acknowledge and accept the cookies and privacy practices used by the UCI Machine Learning Repository.

Read Policy