Mturk User-Perceived Clusters over Images

Donated on 11/1/2016

This dataset was collected by Shan-Hung Wu and DataLab members at NTHU, Taiwan. There're 325 user-perceived clusters from 100 users and their corresponding descriptions.

Dataset Characteristics

Multivariate, Text

Subject Area

Computer Science

Associated Tasks

Clustering

Feature Type

Integer

# Instances

180

# Features

-

Dataset Information

Additional Information

This dataset was collected by Shan-Hung Wu and DataLab members at National Tsing Hua University, Taiwan. It random sampled 180 images from the NUS-WIDE image database. Each image has 500 features consisting of the bag of words based on SIFT descriptions. With a series of experiments on the Amazon Mechanical Turk platform, there are 325 user-perceived clusters from 100 users and their corresponding descriptions. Dataset spec 1: - #Image: 180 - #Cluster: 325 (may be created by different users) - #User: 100 - |Vocabulary of supervision|: 108 - cluster_data.csv : 325 clusters x 180 images - 'cluster_data.csv' is an indicator matrix. M_(i,j) = 1 if image_j belongs to cluster_i. Note: Clusters may be created by different users. - cluster_userIndex.csv : 325 clusters x 1 userIndex(0-99) - 'cluster_userIndex.csv' is an vector where V_i = k if cluster_i is grouped by user_k. - data_feature.csv : 180 images x 500 features - Each row is 500 features vector consisting of the bag of words based on SIFT descriptions. All 180 images are sampled from NUS-WIDE dataset. - Reference: http://lms.comp.nus.edu.sg/research/NUS-WIDE.htm - supervision_cluster_matrix.csv : 108 bag of words x 183 clusters - We parse the raw supervisions and merge similar words into 108 dimensions. Each row is a description of corresponding cluster. - perception_words.csv : 108 perception words - Vocabulary of perception words . Dataset spec 2(Raw data): - cluster_list.csv: -FileName: ['UserId'], ['ImageId Cluster'], ['Description'] -['UserId']: Specify the user who created the cluster. -['ImageId Cluster']: Image ids in the cluster which are separated by ';'. -['Description']: A sentence or some keywords describe the images in the cluster by user. - 325 records(clusters) in total.

Has Missing Values?

No

Variables Table

Variable NameRoleTypeDescriptionUnitsMissing Values
no
no
no
no
no
no
no
no
no
no

0 to 10 of 500

Additional Variable Information

As the above.

Dataset Files

FileSize
DataSet_Spec_1/data_feature.csv176.1 KB
DataSet_Spec_1/cluster_data.csv114.3 KB
DataSet_Spec_1/supervision_cluster.csv68.6 KB
DataSet_Spec_2/cluster_list.csv41.5 KB
DataSet_Spec_1/perception vocab.csv1016 Bytes

0 to 5 of 6

Reviews

There are no reviews for this dataset yet.

Login to Write a Review
Download (57.3 KB)
0 citations
973 views

Creators

Shan-Hung Wu

License

By using the UCI Machine Learning Repository, you acknowledge and accept the cookies and privacy practices used by the UCI Machine Learning Repository.

Read Policy