Page Blocks Classification

Donated on 6/30/1995

The problem consists of classifying all the blocks of the page layout of a document that has been detected by a segmentation process.

Dataset Characteristics

Multivariate

Subject Area

Computer Science

Associated Tasks

Classification

Feature Type

Integer, Real

# Instances

5473

# Features

10

Dataset Information

Additional Information

The 5473 examples comes from 54 distinct documents. Each observation concerns one block. All attributes are numeric. Data are in a format readable by C4.5.

Has Missing Values?

No

Variables Table

Variable NameRoleTypeDescriptionUnitsMissing Values
heightFeatureIntegerHeight of the blockno
lengthFeatureIntegerLength of the blockno
areaFeatureIntegerArea of the block (height * length)no
eccenFeatureContinuousEccentricity of the block (length / height)no
p_blackFeatureContinuousPercentage of black pixels within the block (blackpix / area)no
p_andFeatureContinuousPercentage of black pixels after the application of the Run Length Smoothing Algorithm (RLSA) (blackand / area)no
mean_trFeatureContinuousMean number of white-black transitions (blackpix / wb_trans)no
blackpixFeatureIntegerTotal number of black pixels in the original bitmap of the blockno
blackandFeatureIntegerTotal number of black pixels in the bitmap of the block after the RLSAno
wb_transFeatureIntegerNumber of white-black transitions in the original bitmap of the blockno

0 to 10 of 11

Additional Variable Information

height: integer. | Height of the block. lenght: integer. | Length of the block. area: integer. | Area of the block (height * lenght); eccen: continuous. | Eccentricity of the block (lenght / height); p_black: continuous. | Percentage of black pixels within the block (blackpix / area); p_and: continuous. | Percentage of black pixels after the application of the Run Length Smoothing Algorithm (RLSA) (blackand / area); mean_tr: continuous. | Mean number of white-black transitions (blackpix / wb_trans); blackpix: integer. | Total number of black pixels in the original bitmap of the block. blackand: integer. | Total number of black pixels in the bitmap of the block after the RLSA. wb_trans: integer. | Number of white-black transitions in the original bitmap of the block.

Class Labels

text, horiz. line, graphic, vert. line, picture

Dataset Files

FileSize
page-blocks.data.Z102.1 KB
page-blocks.names3.8 KB
Index128 Bytes

Reviews

There are no reviews for this dataset yet.

Login to Write a Review
Download (104 KB)
0 citations
6085 views

Creators

Donato Malerba

License

By using the UCI Machine Learning Repository, you acknowledge and accept the cookies and privacy practices used by the UCI Machine Learning Repository.

Read Policy