
USPTO Algorithm Challenge, run by NASA-Harvard Tournament Lab and TopCoder Problem: Pat
Donated on 10/12/2013
Data used for USPTO Algorithm Competition. Contains drawing pages from US patents with manually labeled figure and part labels.
Dataset Characteristics
Domain-Theory
Subject Area
Other
Associated Tasks
Classification
Feature Type
Integer
# Instances
306
# Features
5
Dataset Information
Additional Information
USPTO Algorithm Challenge, run by NASA-Harvard Tournament Lab and TopCoder Problem: Patent Labeling
Has Missing Values?
No
Variable Information
Dataset Information: -- This folder contains 4 groups of USPTO patent images including ground truth information. -- The 4 groups are 'train1', 'train2', 'test', 'evaluation'. -- 'train1', 'test', 'evaluation' contains data in the original 'USPTO Algorithm Challenge' for training, testing and final evaluation, respectively. -- 'train2' contains additional data which was used in the 'USPTO Algorithm Followup Challenge.' Notice that 'train2' includes some cover page images of patent document which is not included in other groups. -- In each group, there are two folders contain original images and corresponding ground truth informations. -- The original images are in 'jpeg' format. -- There are two types of ground truth: figure label ground truth and part label ground truth. -- The ground truth files are text files with '.ans' extension. -- The structure of the ground truth files are described as below: -- The first line is one number indicating how many instances exist in corresponding image -- The following lines are polygon coordinates and corresponding label contents, each line corresponds to a figure label or part label, in the form 'N x1 y1 x2 y2 … xN yN x1 y1 content'. -- In each of those lines, the first number N indicates how many polygon vertices are recorded in current instance. -- The following numbers are x, y coordinates of those vertices. -- The final word in each line is the content of figure label or part label. <Notice for figure labels, the word 'Figure', 'Fig' etc. are omitted> -- Each number or word is separated by a white space. -- For group 'train2', there are only part label ground truth available. -- We also release the source code of the top 5 winning solution. See additional archive file.
Dataset Files
File | Size |
---|---|
Data.zip | 135.5 MB |
SourceCode.zip | 766.7 KB |
README.txt | 2.7 KB |
Reviews
There are no reviews for this dataset yet.
pip install ucimlrepo
from ucimlrepo import fetch_ucirepo # fetch dataset uspto_algorithm_challenge_run_by_nasa_harvard_tournament_lab_and_topcoder_problem_pat = fetch_ucirepo(id=268) # data (as pandas dataframes) X = uspto_algorithm_challenge_run_by_nasa_harvard_tournament_lab_and_topcoder_problem_pat.data.features y = uspto_algorithm_challenge_run_by_nasa_harvard_tournament_lab_and_topcoder_problem_pat.data.targets # metadata print(uspto_algorithm_challenge_run_by_nasa_harvard_tournament_lab_and_topcoder_problem_pat.metadata) # variable information print(uspto_algorithm_challenge_run_by_nasa_harvard_tournament_lab_and_topcoder_problem_pat.variables)
Riedl, C., Zanibbi, R., Hearst, M., Zhu, S., Minetti, M., & Crusan, J. (2013). USPTO Algorithm Challenge, run by NASA-Harvard Tournament Lab and TopCoder Problem: Pat [Dataset]. UCI Machine Learning Repository. https://doi.org/10.24432/C5BP5S.
Creators
Christoph Riedl
Richard Zanibbi
Marti Hearst
Siyu Zhu
Michael Minetti
Jason Crusan
DOI
License
This dataset is licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0) license.
This allows for the sharing and adaptation of the datasets for any purpose, provided that the appropriate credit is given.