Syskill and Webert Web Page Ratings
Donated on 10/19/1998
This database contains HTML source of web pages plus the ratings of a single user on these web pages. Web pages are on four seperate subjects (Bands- recording artists; Goats; Sheep; and BioMedical)
Dataset Characteristics
Multivariate, Text
Subject Area
Computer Science
Associated Tasks
Classification
Feature Type
Categorical
# Instances
332
# Features
5
Dataset Information
Additional Information
The HTML source of a web page is given. Users looked at each web page and inidated on a 3 point scale (hot medium cold) 50-100 pages per domain. However, this is realistic because we want to learn user profiles from as few examples as possible so that users have an incentitive to rate pages.
Has Missing Values?
No
Variable Information
Each subject is in a separate directory. Within each directory, there is an file named "index". The index contains information on the other files. Each entry is a line of the form: file-name | rating | url | date-rated | title where file-name is the name of a file (usually an integer), rating is hot, medium, or cold. There are so few medium's that mediums are usually merged with cold in experiments. The other fields aren't used in learning, but they are collected by the interface for other purposes. They are the url of the html source, the date rated and the title of the web oage.
Dataset Files
File | Size |
---|---|
SyskillWebert.tar.gz | 476KB |
SyskillWebert.data.html | 4.1KB |
SyskillWebert.task.html | 3.3KB |
SyskillWebert.html | 1.1KB |
Reviews
There are no reviews for this dataset yet.
pip install ucimlrepo
from ucimlrepo import fetch_ucirepo # fetch dataset syskill_and_webert_web_page_ratings = fetch_ucirepo(id=140) # data (as pandas dataframes) X = syskill_and_webert_web_page_ratings.data.features y = syskill_and_webert_web_page_ratings.data.targets # metadata print(syskill_and_webert_web_page_ratings.metadata) # variable information print(syskill_and_webert_web_page_ratings.variables)
Pazzani, M. (1997). Syskill and Webert Web Page Ratings [Dataset]. UCI Machine Learning Repository. https://doi.org/10.24432/C5Z88C.
Creators
Michael Pazzani
DOI
License
This dataset is licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0) license.
This allows for the sharing and adaptation of the datasets for any purpose, provided that the appropriate credit is given.