Page Blocks Classification

Donated on 6/30/1995

The problem consists of classifying all the blocks of the page layout of a document that has been detected by a segmentation process.

Dataset Characteristics

Multivariate

Subject Area

Computer Science

Associated Tasks

Classification

Feature Type

Integer, Real

# Instances

5473

# Features

-

Dataset Information

Additional Information

The 5473 examples comes from 54 distinct documents. Each observation concerns one block. All attributes are numeric. Data are in a format readable by C4.5.

Has Missing Values?

No

Variables Table

Variable NameRoleTypeDemographicDescriptionUnitsMissing Values
no
no
no
no
no
no
no
no
no
no

0 to 10 of 10

Additional Variable Information

height: integer. | Height of the block. lenght: integer. | Length of the block. area: integer. | Area of the block (height * lenght); eccen: continuous. | Eccentricity of the block (lenght / height); p_black: continuous. | Percentage of black pixels within the block (blackpix / area); p_and: continuous. | Percentage of black pixels after the application of the Run Length Smoothing Algorithm (RLSA) (blackand / area); mean_tr: continuous. | Mean number of white-black transitions (blackpix / wb_trans); blackpix: integer. | Total number of black pixels in the original bitmap of the block. blackand: integer. | Total number of black pixels in the bitmap of the block after the RLSA. wb_trans: integer. | Number of white-black transitions in the original bitmap of the block.

Reviews

There are no reviews for this dataset yet.

Login to Write a Review
Download
0 citations
15402 views

Creators

Donato Malerba

License

By using the UCI Machine Learning Repository, you acknowledge and accept the cookies and privacy practices used by the UCI Machine Learning Repository.

Read Policy