# Page Blocks Classification

## Donated on 6/30/1995

The problem consists of classifying all the blocks of the page layout of a document that has been detected by a segmentation process.

Multivariate

Computer Science

Classification

Integer, Real

5473

10

# Dataset Information

The 5473 examples comes from 54 distinct documents. Each observation concerns one block. All attributes are numeric. Data are in a format readable by C4.5.

Has Missing Values?

No

# Variables Table

Variable NameRoleTypeDescriptionUnitsMissing Values
heightFeatureIntegerHeight of the blockno
lengthFeatureIntegerLength of the blockno
areaFeatureIntegerArea of the block (height * length)no
eccenFeatureContinuousEccentricity of the block (length / height)no
p_blackFeatureContinuousPercentage of black pixels within the block (blackpix / area)no
p_andFeatureContinuousPercentage of black pixels after the application of the Run Length Smoothing Algorithm (RLSA) (blackand / area)no
mean_trFeatureContinuousMean number of white-black transitions (blackpix / wb_trans)no
blackpixFeatureIntegerTotal number of black pixels in the original bitmap of the blockno
blackandFeatureIntegerTotal number of black pixels in the bitmap of the block after the RLSAno
wb_transFeatureIntegerNumber of white-black transitions in the original bitmap of the blockno

Class Labels

text, horiz. line, graphic, vert. line, picture

Donato Malerba