Syskill and Webert Web Page Ratings

Donated on 10/19/1998

This database contains HTML source of web pages plus the ratings of a single user on these web pages. Web pages are on four seperate subjects (Bands- recording artists; Goats; Sheep; and BioMedical)

Dataset Characteristics

Multivariate, Text

Subject Area

Computer Science

Associated Tasks

Classification

Feature Type

Categorical

# Instances

332

# Features

5

Dataset Information

Additional Information

The HTML source of a web page is given. Users looked at each web page and inidated on a 3 point scale (hot medium cold) 50-100 pages per domain. However, this is realistic because we want to learn user profiles from as few examples as possible so that users have an incentitive to rate pages.

Has Missing Values?

No

Variable Information

Each subject is in a separate directory. Within each directory, there is an file named "index". The index contains information on the other files. Each entry is a line of the form: file-name | rating | url | date-rated | title where file-name is the name of a file (usually an integer), rating is hot, medium, or cold. There are so few medium's that mediums are usually merged with cold in experiments. The other fields aren't used in learning, but they are collected by the interface for other purposes. They are the url of the html source, the date rated and the title of the web oage.

Reviews

There are no reviews for this dataset yet.

Login to Write a Review
Download
0 citations
10691 views

Creators

Michael Pazzani

License

By using the UCI Machine Learning Repository, you acknowledge and accept the cookies and privacy practices used by the UCI Machine Learning Repository.

Read Policy