Mturk User-Perceived Clusters over Images

Donated on 11/1/2016

This dataset was collected by Shan-Hung Wu and DataLab members at NTHU, Taiwan. There're 325 user-perceived clusters from 100 users and their corresponding descriptions.

Dataset Characteristics

Multivariate, Text

Subject Area

Computer Science

Associated Tasks

Clustering

Feature Type

Integer

# Instances

180

# Features

Dataset Information

Additional Information

This dataset was collected by Shan-Hung Wu and DataLab members at National Tsing Hua University, Taiwan. It random sampled 180 images from the NUS-WIDE image database. Each image has 500 features consisting of the bag of words based on SIFT descriptions. With a series of experiments on the Amazon Mechanical Turk platform, there are 325 user-perceived clusters from 100 users and their corresponding descriptions. Dataset spec 1: - #Image: 180 - #Cluster: 325 (may be created by different users) - #User: 100 - |Vocabulary of supervision|: 108 - cluster_data.csv : 325 clusters x 180 images - 'cluster_data.csv' is an indicator matrix. M_(i,j) = 1 if image_j belongs to cluster_i. Note: Clusters may be created by different users. - cluster_userIndex.csv : 325 clusters x 1 userIndex(0-99) - 'cluster_userIndex.csv' is an vector where V_i = k if cluster_i is grouped by user_k. - data_feature.csv : 180 images x 500 features - Each row is 500 features vector consisting of the bag of words based on SIFT descriptions. All 180 images are sampled from NUS-WIDE dataset. - Reference: http://lms.comp.nus.edu.sg/research/NUS-WIDE.htm - supervision_cluster_matrix.csv : 108 bag of words x 183 clusters - We parse the raw supervisions and merge similar words into 108 dimensions. Each row is a description of corresponding cluster. - perception_words.csv : 108 perception words - Vocabulary of perception words . Dataset spec 2(Raw data): - cluster_list.csv: -FileName: ['UserId'], ['ImageId Cluster'], ['Description'] -['UserId']: Specify the user who created the cluster. -['ImageId Cluster']: Image ids in the cluster which are separated by ';'. -['Description']: A sentence or some keywords describe the images in the cluster by user. - 325 records(clusters) in total.

Has Missing Values?

Variables Table

Variable Name	Role	Type	Description	Units	Missing Values
					no
					no
					no
					no
					no
					no
					no
					no
					no
					no

Rows per page

0 to 10 of 500

Additional Variable Information

As the above.

Dataset Files

File	Size
DataSet_Spec_1/data_feature.csv	176.1 KB
DataSet_Spec_1/cluster_data.csv	114.3 KB
DataSet_Spec_1/supervision_cluster.csv	68.6 KB
DataSet_Spec_2/cluster_list.csv	41.5 KB
DataSet_Spec_1/perception vocab.csv	1016 Bytes

Rows per page

0 to 5 of 6

Reviews

There are no reviews for this dataset yet.

Download (57.3 KB)

0 citations

1333 views

Creators

Shan-Hung Wu

DOI

10.24432/C50K7D

License

This dataset is licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0) license.

This allows for the sharing and adaptation of the datasets for any purpose, provided that the appropriate credit is given.