Mturk User-Perceived Clusters over Images
Donated on 11/1/2016
This dataset was collected by Shan-Hung Wu and DataLab members at NTHU, Taiwan. There're 325 user-perceived clusters from 100 users and their corresponding descriptions.
Dataset Characteristics
Multivariate, Text
Subject Area
Computer Science
Associated Tasks
Clustering
Feature Type
Integer
# Instances
180
# Features
-
Dataset Information
Additional Information
This dataset was collected by Shan-Hung Wu and DataLab members at National Tsing Hua University, Taiwan. It random sampled 180 images from the NUS-WIDE image database. Each image has 500 features consisting of the bag of words based on SIFT descriptions. With a series of experiments on the Amazon Mechanical Turk platform, there are 325 user-perceived clusters from 100 users and their corresponding descriptions. Dataset spec 1: - #Image: 180 - #Cluster: 325 (may be created by different users) - #User: 100 - |Vocabulary of supervision|: 108 - cluster_data.csv : 325 clusters x 180 images - 'cluster_data.csv' is an indicator matrix. M_(i,j) = 1 if image_j belongs to cluster_i. Note: Clusters may be created by different users. - cluster_userIndex.csv : 325 clusters x 1 userIndex(0-99) - 'cluster_userIndex.csv' is an vector where V_i = k if cluster_i is grouped by user_k. - data_feature.csv : 180 images x 500 features - Each row is 500 features vector consisting of the bag of words based on SIFT descriptions. All 180 images are sampled from NUS-WIDE dataset. - Reference: http://lms.comp.nus.edu.sg/research/NUS-WIDE.htm - supervision_cluster_matrix.csv : 108 bag of words x 183 clusters - We parse the raw supervisions and merge similar words into 108 dimensions. Each row is a description of corresponding cluster. - perception_words.csv : 108 perception words - Vocabulary of perception words . Dataset spec 2(Raw data): - cluster_list.csv: -FileName: ['UserId'], ['ImageId Cluster'], ['Description'] -['UserId']: Specify the user who created the cluster. -['ImageId Cluster']: Image ids in the cluster which are separated by ';'. -['Description']: A sentence or some keywords describe the images in the cluster by user. - 325 records(clusters) in total.
Has Missing Values?
No
Variables Table
Variable Name | Role | Type | Description | Units | Missing Values |
---|---|---|---|---|---|
no | |||||
no | |||||
no | |||||
no | |||||
no | |||||
no | |||||
no | |||||
no | |||||
no | |||||
no |
0 to 10 of 500
Additional Variable Information
As the above.
Dataset Files
File | Size |
---|---|
DataSet_Spec_1/data_feature.csv | 176.1 KB |
DataSet_Spec_1/cluster_data.csv | 114.3 KB |
DataSet_Spec_1/supervision_cluster.csv | 68.6 KB |
DataSet_Spec_2/cluster_list.csv | 41.5 KB |
DataSet_Spec_1/perception vocab.csv | 1016 Bytes |
0 to 5 of 6
Reviews
There are no reviews for this dataset yet.
pip install ucimlrepo
from ucimlrepo import fetch_ucirepo # fetch dataset mturk_user_perceived_clusters_over_images = fetch_ucirepo(id=416) # data (as pandas dataframes) X = mturk_user_perceived_clusters_over_images.data.features y = mturk_user_perceived_clusters_over_images.data.targets # metadata print(mturk_user_perceived_clusters_over_images.metadata) # variable information print(mturk_user_perceived_clusters_over_images.variables)
Wu, S. (2016). Mturk User-Perceived Clusters over Images [Dataset]. UCI Machine Learning Repository. https://doi.org/10.24432/C50K7D.
Creators
Shan-Hung Wu
DOI
License
This dataset is licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0) license.
This allows for the sharing and adaptation of the datasets for any purpose, provided that the appropriate credit is given.