Covertype

Donated on 7/31/1998

Classification of pixels into 7 forest cover types based on attributes such as elevation, aspect, slope, hillshade, soil-type, and more.

Dataset Characteristics

Multivariate

Subject Area

Biology

Associated Tasks

Classification

Feature Type

Categorical, Integer

# Instances

581012

# Features

Dataset Information

Additional Information

Predicting forest cover type from cartographic variables only (no remotely sensed data). The actual forest cover type for a given observation (30 x 30 meter cell) was determined from US Forest Service (USFS) Region 2 Resource Information System (RIS) data. Independent variables were derived from data originally obtained from US Geological Survey (USGS) and USFS data. Data is in raw form (not scaled) and contains binary (0 or 1) columns of data for qualitative independent variables (wilderness areas and soil types). This study area includes four wilderness areas located in the Roosevelt National Forest of northern Colorado. These areas represent forests with minimal human-caused disturbances, so that existing forest cover types are more a result of ecological processes rather than forest management practices. Some background information for these four wilderness areas: Neota (area 2) probably has the highest mean elevational value of the 4 wilderness areas. Rawah (area 1) and Comanche Peak (area 3) would have a lower mean elevational value, while Cache la Poudre (area 4) would have the lowest mean elevational value. As for primary major tree species in these areas, Neota would have spruce/fir (type 1), while Rawah and Comanche Peak would probably have lodgepole pine (type 2) as their primary species, followed by spruce/fir and aspen (type 5). Cache la Poudre would tend to have Ponderosa pine (type 3), Douglas-fir (type 6), and cottonwood/willow (type 4). The Rawah and Comanche Peak areas would tend to be more typical of the overall dataset than either the Neota or Cache la Poudre, due to their assortment of tree species and range of predictive variable values (elevation, etc.) Cache la Poudre would probably be more unique than the others, due to its relatively low elevation range and species composition.

Has Missing Values?

Variables Table

Variable Name	Role	Type	Missing Values
Elevation	Feature	Integer	no
Aspect	Feature	Integer	no
Slope	Feature	Integer	no
Horizontal_Distance_To_Hydrology	Feature	Integer	no
Vertical_Distance_To_Hydrology	Feature	Integer	no
Horizontal_Distance_To_Roadways	Feature	Integer	no
Hillshade_9am	Feature	Integer	no
Hillshade_Noon	Feature	Integer	no
Hillshade_3pm	Feature	Integer	no
Horizontal_Distance_To_Fire_Points	Feature	Integer	no

Rows per page

0 to 10 of 55

Additional Variable Information

Given is the attribute name, attribute type, the measurement unit and a brief description. The forest cover type is the classification problem. The order of this listing corresponds to the order of numerals along the rows of the database. Name / Data Type / Measurement / Description Elevation / quantitative /meters / Elevation in meters Aspect / quantitative / azimuth / Aspect in degrees azimuth Slope / quantitative / degrees / Slope in degrees Horizontal_Distance_To_Hydrology / quantitative / meters / Horz Dist to nearest surface water features Vertical_Distance_To_Hydrology / quantitative / meters / Vert Dist to nearest surface water features Horizontal_Distance_To_Roadways / quantitative / meters / Horz Dist to nearest roadway Hillshade_9am / quantitative / 0 to 255 index / Hillshade index at 9am, summer solstice Hillshade_Noon / quantitative / 0 to 255 index / Hillshade index at noon, summer soltice Hillshade_3pm / quantitative / 0 to 255 index / Hillshade index at 3pm, summer solstice Horizontal_Distance_To_Fire_Points / quantitative / meters / Horz Dist to nearest wildfire ignition points Wilderness_Area (4 binary columns) / qualitative / 0 (absence) or 1 (presence) / Wilderness area designation Soil_Type (40 binary columns) / qualitative / 0 (absence) or 1 (presence) / Soil Type designation Cover_Type (7 types) / integer / 1 to 7 / Forest Cover Type designation

Class Labels

Spruce/Fir, Lodgepole Pine, Ponderosa Pine, Cottonwood/Willow, Aspen, Douglas-fir, Krummholz

Dataset Files

File	Size
covtype.data.gz	10.7 MB
covtype.info	14.3 KB
old_covtype.info	4.7 KB

Papers Citing this Dataset

A Quantum Annealing-Based Approach to Extreme Clustering

By Tim Jaschek, Marko Bucyk, Jaspreet Oberoi. 2019

Published in ArXiv.

Communication-Censored Distributed Stochastic Gradient Descent

By Weiyu Li, Tianyi Chen, Liping Li, Qing Ling. 2019

Published in ArXiv.

Ultra-Scalable Spectral Clustering and Ensemble Clustering

By Dong Huang, Chang-Dong Wang, Jian-Sheng Wu, Jian-Huang Lai, Chee-Keong Kwoh. 2019

Published in ArXiv.

Adaptive scale-invariant online algorithms for learning linear models

By Michal Kempka, Wojciech Kotlowski, Manfred Warmuth. 2019

Published in ArXiv.

A Practical Framework for Solving Center-Based Clustering with Outliers

By Hu Ding, Haikuo Yu. 2019

Published in ArXiv.

Rows per page

0 to 5 of 47

Reviews

There are no reviews for this dataset yet.

Download (10.7 MB)

47 citations

57222 views

Keywords

forest landcover pixel soil ecology image processing

Creators

Jock Blackard

DOI

10.24432/C50K5N

License

This dataset is licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0) license.

This allows for the sharing and adaptation of the datasets for any purpose, provided that the appropriate credit is given.

Covertype

Donated on 7/31/1998

Dataset Characteristics

Subject Area

Associated Tasks

Feature Type

# Instances

# Features

Dataset Information

Variables Table

Additional Variable Information

Dataset Files

Papers Citing this Dataset

Reviews

Write a Review

Keywords

Creators

DOI

License