Center for Machine Learning and Intelligent Systems
About  Citation Policy  Donate a Data Set  Contact


Repository Web            Google
View ALL Data Sets

Mturk User-Perceived Clusters over Images Data Set
Download: Data Folder, Data Set Description

Abstract: This dataset was collected by Shan-Hung Wu and DataLab members at NTHU, Taiwan. There're 325 user-perceived clusters from 100 users and their corresponding descriptions.

Data Set Characteristics:  

Multivariate, Text

Number of Instances:

180

Area:

Computer

Attribute Characteristics:

Integer

Number of Attributes:

500

Date Donated

2016-11-02

Associated Tasks:

Clustering

Missing Values?

N/A

Number of Web Hits:

3817


Source:

Shan-Hung Wu
Associate Professor, CS, National Tsing Hua University(NTHU)
Email: shwu [AT] cs.nthu.edu.tw


Data Set Information:

This dataset was collected by Shan-Hung Wu and DataLab members at National Tsing Hua University, Taiwan. It random sampled 180 images from the NUS-WIDE image database. Each image has 500 features consisting of the bag of words based on SIFT descriptions. With a series of experiments on the Amazon Mechanical Turk platform, there are 325 user-perceived clusters from 100 users and their corresponding descriptions.

Dataset spec 1:

- #Image: 180
- #Cluster: 325 (may be created by different users)
- #User: 100
- |Vocabulary of supervision|: 108


- cluster_data.csv : 325 clusters x 180 images
- 'cluster_data.csv' is an indicator matrix. M_(i,j) = 1 if image_j belongs to cluster_i. Note: Clusters may be created by different users.


- cluster_userIndex.csv : 325 clusters x 1 userIndex(0-99)
- 'cluster_userIndex.csv' is an vector where V_i = k if cluster_i is grouped by user_k.

- data_feature.csv : 180 images x 500 features
- Each row is 500 features vector consisting of the bag of words based on SIFT descriptions. All 180 images are sampled from NUS-WIDE dataset.
- Reference: [Web Link]

- supervision_cluster_matrix.csv : 108 bag of words x 183 clusters
- We parse the raw supervisions and merge similar words into 108 dimensions. Each row is a description of corresponding cluster.

- perception_words.csv : 108 perception words
- Vocabulary of perception words .






Dataset spec 2(Raw data):

- cluster_list.csv:
-FileName: ['UserId'], ['ImageId Cluster'], ['Description']
-['UserId']: Specify the user who created the cluster.
-['ImageId Cluster']: Image ids in the cluster which are separated by ';'.
-['Description']: A sentence or some keywords describe the images in the cluster by user.
- 325 records(clusters) in total.


Attribute Information:

As the above.


Relevant Papers:

Learning User Perceived Clusters with Feature-Level Supervision
Ting-Yu Cheng, ; Kuan-Hua Lin, ; Xinyang Gong, Baidu Inc.; Kang-Jun Liu, ; Shan-Hung Wu*, National Tsing Hua University



Citation Request:

Please include this citation if you use this dataset.

Learning User Perceived Clusters with Feature-Level Supervision
Ting-Yu Cheng, ; Kuan-Hua Lin, ; Xinyang Gong, Baidu Inc.; Kang-Jun Liu, ; Shan-Hung Wu*, National Tsing Hua University


Supported By:

 In Collaboration With:

About  ||  Citation Policy  ||  Donation Policy  ||  Contact  ||  CML