ImageNet
Linked on 11/26/2021
A well-known large-scale image classification dataset with between 1000 and 20000 class labels and multiple million images.
Dataset Characteristics
Image
Subject Area
Computer Science
Associated Tasks
Classification
Feature Type
-
# Instances
14000000
# Features
-
Dataset Information
For what purpose was the dataset created?
The ImageNet dataset was created to support research in large-scale image classification. Note that there various specific subsets that were subsequently created to support various challenge competitions, such as the widely-used ImageNet Large Scale Visual Recognition Challenge (ILSVRC) datasets.
Who funded the creation of the dataset?
The dataset was originally developed by researchers at Princeton University. The original paper on Imagenet (Deng et al, CVPR, 2009) credits the National Science Foundation, Google, Intel, Microsoft, and Yahoo! as providing funding support.
What do the instances in this dataset represent?
Color images of varying sizes obtained via internet search and crowd-sourcing. For machine learning experiments the images are typically cropped to 256 x 256 pixels (or similar size). Images are manually annotated with labels of objects (and for a subset of the images with bounding boxes for the objects)
Are there recommended data splits?
Yes. For the ILSVRC version of the data, https://image-net.org/download.php, the standard partition of the data used in machine learning evaluations contains 1,281,167 training images, 50,000 validation images, and 100,000 test images.
Does the dataset contain data that might be considered sensitive in any way?
Out of the 1000 class labels in the ILSVRC dataset, 3 involve people. As ImageNet became more widely used, researchers became aware of issues related to fairness, representation, and offensive vocabulary for the images and annotations in these 3 categories. The ImageNet team at Princeton and Stanford are working on modifying the original ImageNet dataset to address these issues. For additional information see https://image-net.org/update-mar-11-2021.php and https://image-net.org/update-sep-17-2019.php
Was there any data preprocessing performed?
See the Deng et al, CVPR 2009 paper for details
Has Missing Values?
No
Introductory Paper
By Deng, J. and Dong, W. and Socher, R. and Li, L.-J. and Li, K. and Fei-Fei, L.. 2009
Published in CVPR
Reviews
There are no reviews for this dataset yet.
pip install ucimlrepo
from ucimlrepo import fetch_ucirepo # fetch dataset imagenet = fetch_ucirepo(id=693) # data (as pandas dataframes) X = imagenet.data.features y = imagenet.data.targets # metadata print(imagenet.metadata) # variable information print(imagenet.variables)
ImageNet [Dataset]. (2009). UCI Machine Learning Repository. https://doi.org/10.24432/C5C33G.
Citations/Acknowledgements
If you use this dataset, please follow the acknowledgment policy on the original dataset website.