Forest Fires

Donated on 2/28/2008

This is a difficult regression task, where the aim is to predict the burned area of forest fires, in the northeast region of Portugal, by using meteorological and other data (see details at:

Dataset Characteristics


Subject Area

Climate and Environment

Associated Tasks


Feature Type


# Instances


# Features


Dataset Information

Additional Information

In [Cortez and Morais, 2007], the output 'area' was first transformed with a ln(x+1) function. Then, several Data Mining methods were applied. After fitting the models, the outputs were post-processed with the inverse of the ln(x+1) transform. Four different input setups were used. The experiments were conducted using a 10-fold (cross-validation) x 30 runs. Two regression metrics were measured: MAD and RMSE. A Gaussian support vector machine (SVM) fed with only 4 direct weather conditions (temp, RH, wind and rain) obtained the best MAD value: 12.71 +- 0.01 (mean and confidence interval within 95% using a t-student distribution). The best RMSE was attained by the naive mean predictor. An analysis to the regression error curve (REC) shows that the SVM model predicts more examples within a lower admitted error. In effect, the SVM model predicts better small fires, which are the majority.

Has Missing Values?


Introductory Paper

A data mining approach to predict forest fires using meteorological data

By P. Cortez, Aníbal de Jesus Raimundo Morais. 2007

Published in New Trends in Artificial Intelligence, Proceedings of the 13th EPIA 2007 - Portuguese Conference on Artificial Intelligence

Variables Table

Variable NameRoleTypeDescriptionUnitsMissing Values
XFeatureIntegerx-axis spatial coordinate within the Montesinho park map: 1 to 9no
YFeatureIntegery-axis spatial coordinate within the Montesinho park map: 2 to 9no
monthFeatureCategoricalmonth of the year: 'jan' to 'dec' no
dayFeatureCategoricalday of the week: 'mon' to 'sun'no
FFMCFeatureContinuousFFMC index from the FWI system: 18.7 to 96.20no
DMCFeatureIntegerDMC index from the FWI system: 1.1 to 291.3 no
DCFeatureContinuousDC index from the FWI system: 7.9 to 860.6no
ISIFeatureContinuousISI index from the FWI system: 0.0 to 56.10no
tempFeatureContinuoustemperature: 2.2 to 33.30Celsius degreesno
RHFeatureIntegerrelative humidity: 15.0 to 100%no

0 to 10 of 13

Additional Variable Information

For more information, read [Cortez and Morais, 2007]. 1. X - x-axis spatial coordinate within the Montesinho park map: 1 to 9 2. Y - y-axis spatial coordinate within the Montesinho park map: 2 to 9 3. month - month of the year: 'jan' to 'dec' 4. day - day of the week: 'mon' to 'sun' 5. FFMC - FFMC index from the FWI system: 18.7 to 96.20 6. DMC - DMC index from the FWI system: 1.1 to 291.3 7. DC - DC index from the FWI system: 7.9 to 860.6 8. ISI - ISI index from the FWI system: 0.0 to 56.10 9. temp - temperature in Celsius degrees: 2.2 to 33.30 10. RH - relative humidity in %: 15.0 to 100 11. wind - wind speed in km/h: 0.40 to 9.40 12. rain - outside rain in mm/m2 : 0.0 to 6.4 13. area - the burned area of the forest (in ha): 0.00 to 1090.84 (this output variable is very skewed towards 0.0, thus it may make sense to model with the logarithm transform).

Papers Citing this Dataset

Neural Random Forests

By G'erard Biau, Erwan Scornet, Johannes Welbl. 2016

Published in ArXiv.

Ensemble stream model for data-cleaning in sensor networks

By Vasanth Iyer, S. Iyengar, Niki Pissinou. 2015

Published in SIGAI.

Asymptotic Theory for Random Forests

By Stefan Wager. 2014

Published in

A Theoretical and Empirical Study on Unbiased Boundary-extended Crossover for Real-valued Representation

By Yourim Yoon, Yong-Hyuk Kim, Alberto Moraglio, Byung Moon. 2011

Published in Inf. Sci..

0 to 5 of 5


There are no reviews for this dataset yet.

Login to Write a Review
5 citations


Paulo Cortez

Anbal Morais


By using the UCI Machine Learning Repository, you acknowledge and accept the cookies and privacy practices used by the UCI Machine Learning Repository.

Read Policy