Insurance Company Benchmark (COIL 2000)
Donated on 7/2/2000
This data set used in the CoIL 2000 Challenge contains information on customers of an insurance company. The data consists of 86 variables and includes product usage data and socio-demographic data
Dataset Characteristics
Multivariate
Subject Area
Social Science
Associated Tasks
Regression, Description
Feature Type
Categorical, Integer
# Instances
9000
# Features
-
Dataset Information
Additional Information
Information about customers consists of 86 variables and includes product usage data and socio-demographic data derived from zip area codes. The data was supplied by the Dutch data mining company Sentient Machine Research and is based on a real world business problem. The training set contains over 5000 descriptions of customers, including the information of whether or not they have a caravan insurance policy. A test set contains 4000 customers of whom only the organisers know if they have a caravan insurance policy. The data dictionary (http://kdd.ics.uci.edu/databases/tic/dictionary.txt) describes the variables used and their values. Note: All the variables starting with M are zipcode variables. They give information on the distribution of that variable, e.g. Rented house, in the zipcode area of the customer. One instance per line with tab delimited fields. TICDATA2000.txt: Dataset to train and validate prediction models and build a description (5822 customer records). Each record consists of 86 attributes, containing sociodemographic data (attribute 1-43) and product ownership (attributes 44-86).The sociodemographic data is derived from zip codes. All customers living in areas with the same zip code have the same sociodemographic attributes. Attribute 86, "CARAVAN:Number of mobile home policies", is the target variable. TICEVAL2000.txt: Dataset for predictions (4000 customer records). It has the same format as TICDATA2000.txt, only the target is missing. Participants are supposed to return the list of predicted targets only. All datasets are in tab delimited format. The meaning of the attributes and attribute values is given below. TICTGTS2000.txt Targets for the evaluation set.
Has Missing Values?
No
Introductory Paper
By P. van der Putten, M. van Someren. 2000
Published in Sentient Machine Research, Amsterdam
Variables Table
Variable Name | Role | Type | Description | Units | Missing Values |
---|---|---|---|---|---|
no | |||||
no | |||||
no | |||||
no | |||||
no | |||||
no | |||||
no | |||||
no | |||||
no | |||||
no |
0 to 10 of 86
Dataset Files
File | Size |
---|---|
ticdata2000.txt | 988.3 KB |
ticeval2000.txt | 671.2 KB |
tic.tar.gz | 225.9 KB |
tictgts2000.txt | 11.7 KB |
TicDataDescr.txt | 7.1 KB |
0 to 5 of 9
Reviews
There are no reviews for this dataset yet.
pip install ucimlrepo
from ucimlrepo import fetch_ucirepo # fetch dataset insurance_company_benchmark_coil_2000 = fetch_ucirepo(id=125) # data (as pandas dataframes) X = insurance_company_benchmark_coil_2000.data.features y = insurance_company_benchmark_coil_2000.data.targets # metadata print(insurance_company_benchmark_coil_2000.metadata) # variable information print(insurance_company_benchmark_coil_2000.variables)
Putten, P. (2000). Insurance Company Benchmark (COIL 2000) [Dataset]. UCI Machine Learning Repository. https://doi.org/10.24432/C5630S.
Creators
Peter Putten
DOI
License
This dataset is licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0) license.
This allows for the sharing and adaptation of the datasets for any purpose, provided that the appropriate credit is given.