Center for Machine Learning and Intelligent Systems
About  Citation Policy  Donate a Data Set  Contact


Repository Web            Google
View ALL Data Sets

× Check out the beta version of the new UCI Machine Learning Repository we are currently testing! Contact us if you have any issues, questions, or concerns. Click here to try out the new site.

Crowdsourced Mapping Data Set
Download: Data Folder, Data Set Description

Abstract: Crowdsourced data from OpenStreetMap is used to automate the classification of satellite images into different land cover classes (impervious, farm, forest, grass, orchard, water).

Data Set Characteristics:  

Multivariate

Number of Instances:

10546

Area:

Physical

Attribute Characteristics:

N/A

Number of Attributes:

29

Date Donated

2016-05-25

Associated Tasks:

Classification

Missing Values?

N/A

Number of Web Hits:

36445


Source:

Brian Johnson
johnson '@' iges.or.jp
Institute for Global Environmental Strategies, Japan


Data Set Information:

This dataset was derived from geospatial data from two sources: 1) Landsat time-series satellite imagery from the years 2014-2015, and 2) crowdsourced georeferenced polygons with land cover labels obtained from OpenStreetMap. The crowdsourced polygons cover only a small part of the image area, and are used used to extract training data from the image for classifying the rest of the image. The main challenge with the dataset is that both the imagery and the crowdsourced data contain noise (due to cloud cover in the images and innaccurate labeling/digitizing of polygons).

Files in zip folder
-The 'training.csv' file contains the training data for classification. Do not use this file to evaluate classification accuracy because it contains noise (many class labeling errors).
-The 'testing.csv' file contains testing data to evaluate the classification accuracy. This file does not contain any class labeling errors.


Attribute Information:

class: The land cover class (impervious, farm, forest, grass, orchard, water) [note: this is the target variable to classify].
max_ndvi: the maximum NDVI (normalized difference vegetation index) value derived from the time-series of satellite images.
20150720_N - 20140101_N : NDVI values extracted from satellite images acquired between January 2014 and July 2015, in reverse chronological order (dates given in the format yyyymmdd).


Relevant Papers:

Johnson, B. A., & Iizuka, K. (2016). Integrating OpenStreetMap crowdsourced data and Landsat time-series imagery for rapid land use/land cover (LULC) mapping: Case study of the Laguna de Bay area of the Philippines. Applied Geography, 67, 140-149.



Citation Request:

Please cite: Johnson, B. A., & Iizuka, K. (2016). Integrating OpenStreetMap crowdsourced data and Landsat time-series imagery for rapid land use/land cover (LULC) mapping: Case study of the Laguna de Bay area of the Philippines. Applied Geography, 67, 140-149.


Supported By:

 In Collaboration With:

About  ||  Citation Policy  ||  Donation Policy  ||  Contact  ||  CML