Center for Machine Learning and Intelligent Systems
About  Citation Policy  Donate a Data Set  Contact

Repository Web            Google
View ALL Data Sets

Ozone Level Detection Data Set
Download: Data Folder, Data Set Description

Abstract: Two ground ozone level data sets are included in this collection. One is the eight hour peak set (, the other is the one hour peak set ( Those data were collected from 1998 to 2004 at the Houston, Galveston and Brazoria area.

Data Set Characteristics:  

Multivariate, Sequential, Time-Series

Number of Instances:




Attribute Characteristics:


Number of Attributes:


Date Donated


Associated Tasks:


Missing Values?


Number of Web Hits:



Kun Zhang, zhang.kun05 '@', Department of Computer Science, Xavier University of Lousiana
Wei Fan, '@', IBM T.J.Watson Research
XiaoJing Yuan, xyuan '@', Engineering Technology Department, College of Technology, University of Houston

Data Set Information:

For a list of attributes, please refer to those two .names files. They use the following naming convention:

All the attribute start with T means the temperature measured at different time throughout the day; and those starts with WS indicate the wind speed at various time.

WSR_PK: continuous. peek wind speed -- resultant (meaning average of wind vector)

WSR_AV: continuous. average wind speed

T_PK: continuous. Peak T
T_AV: continuous. Average T
T85: continuous. T at 850 hpa level (or about 1500 m height)
RH85: continuous. Relative Humidity at 850 hpa
U85: continuous. (U wind - east-west direction wind at 850 hpa)
V85: continuous. V wind - N-S direction wind at 850
HT85: continuous. Geopotential height at 850 hpa, it is about the same as height at low altitude
T70: continuous. T at 700 hpa level (roughly 3100 m height)

RH70: continuous.
U70: continuous.
V70: continuous.
HT70: continuous.

T50: continuous. T at 500 hpa level (roughly at 5500 m height)

RH50: continuous.
U50: continuous.
V50: continuous.
HT50: continuous.

KI: continuous. K-Index [Web Link]
TT: continuous. T-Totals [Web Link]
SLP: continuous. Sea level pressure
SLP_: continuous. SLP change from previous day

Precp: continuous. -- precipitation

Attribute Information:

The following are specifications for several most important attributes that are highly valued by Texas Commission on Environmental Quality (TCEQ). More details can be found in the two relevant papers.

O 3 - Local ozone peak prediction
Upwind - Upwind ozone background level
EmFactor - Precursor emissions related factor
Tmax - Maximum temperature in degrees F
Tb - Base temperature where net ozone production begins (50 F)
SRd - Solar radiation total for the day
WSa - Wind speed near sunrise (using 09-12 UTC forecast mode)
WSp - Wind speed mid-day (using 15-21 UTC forecast mode)

Please refer to those two .names files.

Relevant Papers:

Forecasting skewed biased stochastic ozone days: analyses, solutions and beyond, Knowledge and Information Systems, Vol. 14, No. 3, 2008.
Discusses details about the dataset, its use as well as various experiments (both cross-validation and streaming) using many state-of-the-art methods.
A shorter version of the paper (does not contain some detailed experiments as the journal paper above) is in:
Forecasting Skewed Biased Stochastic Ozone Days: Analyses and Solutions. ICDM 2006: 753-764

Citation Request:

Please refer to the Machine Learning Repository's citation policy

Supported By:

 In Collaboration With:

About  ||  Citation Policy  ||  Donation Policy  ||  Contact  ||  CML