Newspaper and magazine images segmentation dataset

Donated on 7/14/2014

Dataset is well suited for segmentation tasks. It contains 101 scanned pages from different newspapers and magazines in Russian with ground truth pixel-based masks.

Dataset Characteristics

-

Subject Area

Computer Science

Associated Tasks

Classification

Feature Type

-

# Instances

101

# Features

-

Dataset Information

Additional Information

This dataset was collected for training and validation of machine learning algorithm for classification regions of documents on text, picture and background areas. It contains 101 scanned images of various newspapers and magazines in Russian. Most of the images have resolution 300 dpi and size A4, about 2400x3500 pixels. For all images ground truth pixel-based masks were manually created. The ground truth masks named like original images with postfix _m. There are three classes: text area, picture area, background. Pixels on the mask with color 255, 0, 0 (rgb, red color) correspond to picture area, pixels with color 0, 0, 255 (rgb, blue color) correspond to text area, all other pixels correspond to background. Images with background of different colors are in the dataset.

Has Missing Values?

No

Variable Information

There are three classes: text area, picture area, background. Pixels on the mask with color 255, 0, 0 (rgb, red color) correspond to picture area, pixels with color 0, 0, 255 (rgb, blue color) correspond to text area, all other pixels correspond to background.

Dataset Files

FileSize
dataset_segmentation.rar391.2 MB

Reviews

There are no reviews for this dataset yet.

Login to Write a Review
Download (391.2 MB)
0 citations
1276 views

Creators

Aleksey Vilkin

Ilia Safonov

License

By using the UCI Machine Learning Repository, you acknowledge and accept the cookies and privacy practices used by the UCI Machine Learning Repository.

Read Policy