Center for Machine Learning and Intelligent Systems
About  Citation Policy  Donate a Data Set  Contact


Repository Web            Google
View ALL Data Sets

× Check out the beta version of the new UCI Machine Learning Repository we are currently testing! Contact us if you have any issues, questions, or concerns. Click here to try out the new site.

Mturk User-Perceived Clusters over Images Data Set
Download: Data Folder, Data Set Description

Abstract: This dataset was collected by Shan-Hung Wu and DataLab members at NTHU, Taiwan. There're 325 user-perceived clusters from 100 users and their corresponding descriptions.

Data Set Characteristics:  

Multivariate, Text

Number of Instances:

180

Area:

Computer

Attribute Characteristics:

Integer

Number of Attributes:

500

Date Donated

2016-11-02

Associated Tasks:

Clustering

Missing Values?

N/A

Number of Web Hits:

11274


Source:

Shan-Hung Wu
Associate Professor, CS, National Tsing Hua University(NTHU)
Email: shwu [AT] cs.nthu.edu.tw


Data Set Information:

This dataset was collected by Shan-Hung Wu and DataLab members at National Tsing Hua University, Taiwan. It random sampled 180 images from the NUS-WIDE image database. Each image has 500 features consisting of the bag of words based on SIFT descriptions. With a series of experiments on the Amazon Mechanical Turk platform, there are 325 user-perceived clusters from 100 users and their corresponding descriptions.

Dataset spec 1:

- #Image: 180
- #Cluster: 325 (may be created by different users)
- #User: 100
- |Vocabulary of supervision|: 108


- cluster_data.csv : 325 clusters x 180 images
- 'cluster_data.csv' is an indicator matrix. M_(i,j) = 1 if image_j belongs to cluster_i. Note: Clusters may be created by different users.


- cluster_userIndex.csv : 325 clusters x 1 userIndex(0-99)
- 'cluster_userIndex.csv' is an vector where V_i = k if cluster_i is grouped by user_k.

- data_feature.csv : 180 images x 500 features
- Each row is 500 features vector consisting of the bag of words based on SIFT descriptions. All 180 images are sampled from NUS-WIDE dataset.
- Reference: [Web Link]

- supervision_cluster_matrix.csv : 108 bag of words x 183 clusters
- We parse the raw supervisions and merge similar words into 108 dimensions. Each row is a description of corresponding cluster.

- perception_words.csv : 108 perception words
- Vocabulary of perception words .






Dataset spec 2(Raw data):

- cluster_list.csv:
-FileName: ['UserId'], ['ImageId Cluster'], ['Description']
-['UserId']: Specify the user who created the cluster.
-['ImageId Cluster']: Image ids in the cluster which are separated by ';'.
-['Description']: A sentence or some keywords describe the images in the cluster by user.
- 325 records(clusters) in total.


Attribute Information:

As the above.


Relevant Papers:

Learning User Perceived Clusters with Feature-Level Supervision
Ting-Yu Cheng, ; Kuan-Hua Lin, ; Xinyang Gong, Baidu Inc.; Kang-Jun Liu, ; Shan-Hung Wu*, National Tsing Hua University



Citation Request:

Please include this citation if you use this dataset.

Learning User Perceived Clusters with Feature-Level Supervision
Ting-Yu Cheng, ; Kuan-Hua Lin, ; Xinyang Gong, Baidu Inc.; Kang-Jun Liu, ; Shan-Hung Wu*, National Tsing Hua University


Supported By:

 In Collaboration With:

About  ||  Citation Policy  ||  Donation Policy  ||  Contact  ||  CML