Newspaper and magazine images segmentation dataset
Donated on 7/14/2014
Dataset is well suited for segmentation tasks. It contains 101 scanned pages from different newspapers and magazines in Russian with ground truth pixel-based masks.
Dataset Characteristics
-
Subject Area
Computer Science
Associated Tasks
Classification
Feature Type
-
# Instances
101
# Features
-
Dataset Information
Additional Information
This dataset was collected for training and validation of machine learning algorithm for classification regions of documents on text, picture and background areas. It contains 101 scanned images of various newspapers and magazines in Russian. Most of the images have resolution 300 dpi and size A4, about 2400x3500 pixels. For all images ground truth pixel-based masks were manually created. The ground truth masks named like original images with postfix _m. There are three classes: text area, picture area, background. Pixels on the mask with color 255, 0, 0 (rgb, red color) correspond to picture area, pixels with color 0, 0, 255 (rgb, blue color) correspond to text area, all other pixels correspond to background. Images with background of different colors are in the dataset.
Has Missing Values?
No
Variable Information
There are three classes: text area, picture area, background. Pixels on the mask with color 255, 0, 0 (rgb, red color) correspond to picture area, pixels with color 0, 0, 255 (rgb, blue color) correspond to text area, all other pixels correspond to background.
Dataset Files
File | Size |
---|---|
dataset_segmentation.rar | 391.2 MB |
Reviews
There are no reviews for this dataset yet.
pip install ucimlrepo
from ucimlrepo import fetch_ucirepo # fetch dataset newspaper_and_magazine_images_segmentation_dataset = fetch_ucirepo(id=306) # data (as pandas dataframes) X = newspaper_and_magazine_images_segmentation_dataset.data.features y = newspaper_and_magazine_images_segmentation_dataset.data.targets # metadata print(newspaper_and_magazine_images_segmentation_dataset.metadata) # variable information print(newspaper_and_magazine_images_segmentation_dataset.variables)
Vilkin, A. & Safonov, I. (2012). Newspaper and magazine images segmentation dataset [Dataset]. UCI Machine Learning Repository. https://doi.org/10.24432/C5N60V.
Creators
Aleksey Vilkin
Ilia Safonov
DOI
License
This dataset is licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0) license.
This allows for the sharing and adaptation of the datasets for any purpose, provided that the appropriate credit is given.