Anonymous Microsoft Web Data Data Set
Download: Data Folder, Data Set Description
Abstract: Log of anonymous users of www.microsoft.com; predict areas of the web site a user visited based on data on other areas the user visited.
|
|
Data Set Characteristics: |
N/A |
Number of Instances: |
37711 |
Area: |
Computer |
Attribute Characteristics: |
Categorical |
Number of Attributes: |
294 |
Date Donated |
1998-11-01 |
Associated Tasks: |
Recommender-Systems |
Missing Values? |
N/A |
Number of Web Hits: |
170374 |
Source:
Creators:
Jack S. Breese, David Heckerman, Carl M. Kadie
Microsoft Research, Redmond WA, 98052-6399, USA
breese '@' microsoft.com, heckerma '@' microsoft.com, carlk '@' microsoft.com
Donors:
Breese:, Heckerman, & Kadie
Data Set Information:
We created the data by sampling and processing the www.microsoft.com logs. The data records the use of www.microsoft.com by 38000 anonymous, randomly-selected users. For each user, the data lists all the areas of the web site (Vroots) that user visited in a one week timeframe.
Users are identified only by a sequential number, for example, User #14988, User #14989, etc. The file contains no personally identifiable information. The 294 Vroots are identified by their title (e.g. "NetShow for PowerPoint") and URL (e.g. "/stream"). The data comes from one week in February, 1998.
Attribute Information:
Each attribute is an area ("vroot") of the www.microsoft.com web site.
The datasets record which Vroots each user visited in a one-week timeframe in Feburary 1998.
Relevant Papers:
J. Breese, D. Heckerman., C. Kadie _Empirical Analysis of Predictive Algorithms for Collaborative Filtering_ Proceedings of the Fourteenth Conference on Uncertainty in Artificial Intelligence, Madison, WI, July, 1998.
[Web Link]
Also, expanded as Microsoft Research Technical Report MSR-TR-98-12, The papers are available on-line at: [Web Link]
Papers That Cite This Data Set1:
 W. Nick Street and Yoo-Hyon Kim. A streaming ensemble algorithm (SEA) for large-scale classification. KDD. 2001. [View Context].
Dmitry Pavlov and Jianchang Mao and Byron Dom. Scaling-Up Support Vector Machines Using Boosting Algorithm. ICPR. 2000. [View Context].
Dmitry Pavlov and Darya Chudova and Padhraic Smyth. Towards scalable support vector machines using squashing. KDD. 2000. [View Context].
Kristin P. Bennett and Erin J. Bredensteiner. Geometry in Learning. Department of Mathematical Sciences Rensselaer Polytechnic Institute. [View Context].
Citation Request:
Please refer to the Machine Learning
Repository's citation policy
|