# Page Blocks Classification

## Donated on 6/30/1995

The problem consists of classifying all the blocks of the page layout of a document that has been detected by a segmentation process.

Multivariate

Computer Science

Classification

Integer, Real

5473

10

# Dataset Information

The 5473 examples comes from 54 distinct documents. Each observation concerns one block. All attributes are numeric. Data are in a format readable by C4.5.

Has Missing Values?

No

# Variables Table

Variable NameRoleTypeDescriptionUnitsMissing Values
heightFeatureIntegerHeight of the blockno
lengthFeatureIntegerLength of the blockno
areaFeatureIntegerArea of the block (height * length)no
eccenFeatureContinuousEccentricity of the block (length / height)no
p_blackFeatureContinuousPercentage of black pixels within the block (blackpix / area)no
p_andFeatureContinuousPercentage of black pixels after the application of the Run Length Smoothing Algorithm (RLSA) (blackand / area)no
mean_trFeatureContinuousMean number of white-black transitions (blackpix / wb_trans)no
blackpixFeatureIntegerTotal number of black pixels in the original bitmap of the blockno
blackandFeatureIntegerTotal number of black pixels in the bitmap of the block after the RLSAno
wb_transFeatureIntegerNumber of white-black transitions in the original bitmap of the blockno

0 to 10 of 11

height: integer. | Height of the block. lenght: integer. | Length of the block. area: integer. | Area of the block (height * lenght); eccen: continuous. | Eccentricity of the block (lenght / height); p_black: continuous. | Percentage of black pixels within the block (blackpix / area); p_and: continuous. | Percentage of black pixels after the application of the Run Length Smoothing Algorithm (RLSA) (blackand / area); mean_tr: continuous. | Mean number of white-black transitions (blackpix / wb_trans); blackpix: integer. | Total number of black pixels in the original bitmap of the block. blackand: integer. | Total number of black pixels in the bitmap of the block after the RLSA. wb_trans: integer. | Number of white-black transitions in the original bitmap of the block.

Class Labels

text, horiz. line, graphic, vert. line, picture

# Reviews

There are no reviews for this dataset yet.

0 citations
19688 views

Donato Malerba