Document Understanding
Donated on 10/31/1994
Five concepts, expressed as predicates, to be learned
Dataset Characteristics
-
Subject Area
Other
Associated Tasks
-
Feature Type
-
# Instances
-
# Features
-
Dataset Information
Additional Information
In the experimentation, 30 single page documents were considered. They are copies of letters sent by Olivetti. Six trials were performed by randomly selecting 20 documents for the training set and 10 for the test set. Each document is identified by a letter (A to Z) or a pair of letters (AA, AB, AC, AD). Trial Training documents 1 A B C D E F G H I J K L M N O P Q R S T 2 C D E F G H I M P R S V X Y W Z AA AB AC AD 3 C D E F G H I J K P R S T U V Y W AA AB AC 4 A B C D E F G J L M N O P Q T V X Z AB AD 5 A B E F G I J K M N O P Q R T V X Z AA AD 6 A B C D E F G I J M Q S T X Y Z AA AB AC AD
Has Missing Values?
No
Dataset Files
File | Size |
---|---|
test2.data | 28 KB |
test6.data | 26.1 KB |
FOIL.data | 25.4 KB |
test3.data | 24.2 KB |
test5.data | 24.2 KB |
0 to 5 of 15
Reviews
There are no reviews for this dataset yet.
pip install ucimlrepo
from ucimlrepo import fetch_ucirepo # fetch dataset document_understanding = fetch_ucirepo(id=36) # data (as pandas dataframes) X = document_understanding.data.features y = document_understanding.data.targets # metadata print(document_understanding.metadata) # variable information print(document_understanding.variables)
Malerba, D. (1993). Document Understanding [Dataset]. UCI Machine Learning Repository. https://doi.org/10.24432/C5260K.
Creators
Donato Malerba
DOI
License
This dataset is licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0) license.
This allows for the sharing and adaptation of the datasets for any purpose, provided that the appropriate credit is given.