Syskill and Webert Web Page Ratings

Donated on 10/19/1998

This database contains HTML source of web pages plus the ratings of a single user on these web pages. Web pages are on four seperate subjects (Bands- recording artists; Goats; Sheep; and BioMedical)

Dataset Characteristics

Multivariate, Text

Subject Area


Associated Tasks


Attribute Type


# Instances


# Attributes



Additional Information

The HTML source of a web page is given. Users looked at each web page and inidated on a 3 point scale (hot medium cold) 50-100 pages per domain. However, this is realistic because we want to learn user profiles from as few examples as possible so that users have an incentitive to rate pages.

Attribute Information

Additional Information

Each subject is in a separate directory. Within each directory, there is an file named "index". The index contains information on the other files. Each entry is a line of the form: file-name | rating | url | date-rated | title where file-name is the name of a file (usually an integer), rating is hot, medium, or cold. There are so few medium's that mediums are usually merged with cold in experiments. The other fields aren't used in learning, but they are collected by the interface for other purposes. They are the url of the html source, the date rated and the title of the web oage.

0 citations


Michael Pazzani


By using the UCI Machine Learning Repository, you acknowledge and accept the cookies and privacy practices used by the UCI Machine Learning Repository.

Read Policy