MSNBC.com Anonymous Web Data
This data describes the page visits of users who visited msnbc.com on September 28, 1999. Visits are recorded at the level of URL category (see description) and are recorded in time order.
Dataset Characteristics
Sequential
Subject Area
Computer Science
Associated Tasks
-
Feature Type
Categorical
# Instances
989818
# Features
-
Dataset Information
Additional Information
The data comes from Internet Information Server (IIS) logs for msnbc.com and news-related portions of msn.com for the entire day of September, 28, 1999 (Pacific Standard Time). Each sequence in the dataset corresponds to page views of a user during that twenty-four hour period. Each event in the sequence corresponds to a user's request for a page. Requests are not recorded at the finest level of detail---that is, at the level of URL, but rather, they are recorded at the level of page category (as determined by a site administrator). The categories are "frontpage", "news", "tech", "local", "opinion", "on-air", "misc", "weather", "health", "living", "business", "sports", "summary", "bbs" (bulletin board service), "travel", "msn-news", and "msn-sports". Any page requests served via a caching mechanism were not recorded in the server logs and, hence, not present in the data. Other Relevant Information: * Number of users: 989818 * Average number of vitis per user: 5.7 * Number of URLs per category: 10 to 5000
Has Missing Values?
No
Variable Information
Each category is associated--in order--with an integer starting with "1". For example, "frontpage" is associated with 1, "news" with 2, and "tech" with 3. Each row below "% Sequences:" describes the hits--in order--of a single user. For example, the first user hits "frontpage" twice, and the second user hits "news" once.
Dataset Files
File | Size |
---|---|
msnbc990928.seq.gz | 2.2 MB |
msnbc.data.html | 2.9 KB |
description.txt | 2.5 KB |
msnbc.html | 849 Bytes |
Reviews
There are no reviews for this dataset yet.
pip install ucimlrepo
from ucimlrepo import fetch_ucirepo # fetch dataset msnbc_com_anonymous_web_data = fetch_ucirepo(id=133) # data (as pandas dataframes) X = msnbc_com_anonymous_web_data.data.features y = msnbc_com_anonymous_web_data.data.targets # metadata print(msnbc_com_anonymous_web_data.metadata) # variable information print(msnbc_com_anonymous_web_data.variables)
Heckerman, D. (1999). MSNBC.com Anonymous Web Data [Dataset]. UCI Machine Learning Repository. https://doi.org/10.24432/C5390X.
Creators
David Heckerman
DOI
License
This dataset is licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0) license.
This allows for the sharing and adaptation of the datasets for any purpose, provided that the appropriate credit is given.