Default of Credit Card Clients
Donated on 1/25/2016
This research aimed at the case of customers' default payments in Taiwan and compares the predictive accuracy of probability of default among six data mining methods.
Dataset Characteristics
Multivariate
Subject Area
Business
Associated Tasks
Classification
Feature Type
Integer, Real
# Instances
30000
# Features
23
Dataset Information
Additional Information
This research aimed at the case of customers' default payments in Taiwan and compares the predictive accuracy of probability of default among six data mining methods. From the perspective of risk management, the result of predictive accuracy of the estimated probability of default will be more valuable than the binary result of classification - credible or not credible clients. Because the real probability of default is unknown, this study presented the novel Sorting Smoothing Method to estimate the real probability of default. With the real probability of default as the response variable (Y), and the predictive probability of default as the independent variable (X), the simple linear regression result (Y = A + BX) shows that the forecasting model produced by artificial neural network has the highest coefficient of determination; its regression intercept (A) is close to zero, and regression coefficient (B) to one. Therefore, among the six data mining techniques, artificial neural network is the only one that can accurately estimate the real probability of default.
Has Missing Values?
No
Introductory Paper
By I. Yeh, Che-hui Lien. 2009
Published in Expert systems with applications
Variables Table
Variable Name | Role | Type | Demographic | Description | Units | Missing Values |
---|---|---|---|---|---|---|
ID | ID | Integer | no | |||
X1 | Feature | Integer | LIMIT_BAL | no | ||
X2 | Feature | Integer | Sex | SEX | no | |
X3 | Feature | Integer | Education Level | EDUCATION | no | |
X4 | Feature | Integer | Marital Status | MARRIAGE | no | |
X5 | Feature | Integer | Age | AGE | no | |
X6 | Feature | Integer | PAY_0 | no | ||
X7 | Feature | Integer | PAY_2 | no | ||
X8 | Feature | Integer | PAY_3 | no | ||
X9 | Feature | Integer | PAY_4 | no |
0 to 10 of 25
Additional Variable Information
This research employed a binary variable, default payment (Yes = 1, No = 0), as the response variable. This study reviewed the literature and used the following 23 variables as explanatory variables: X1: Amount of the given credit (NT dollar): it includes both the individual consumer credit and his/her family (supplementary) credit. X2: Gender (1 = male; 2 = female). X3: Education (1 = graduate school; 2 = university; 3 = high school; 4 = others). X4: Marital status (1 = married; 2 = single; 3 = others). X5: Age (year). X6 - X11: History of past payment. We tracked the past monthly payment records (from April to September, 2005) as follows: X6 = the repayment status in September, 2005; X7 = the repayment status in August, 2005; . . .;X11 = the repayment status in April, 2005. The measurement scale for the repayment status is: -1 = pay duly; 1 = payment delay for one month; 2 = payment delay for two months; . . .; 8 = payment delay for eight months; 9 = payment delay for nine months and above. X12-X17: Amount of bill statement (NT dollar). X12 = amount of bill statement in September, 2005; X13 = amount of bill statement in August, 2005; . . .; X17 = amount of bill statement in April, 2005. X18-X23: Amount of previous payment (NT dollar). X18 = amount paid in September, 2005; X19 = amount paid in August, 2005; . . .;X23 = amount paid in April, 2005.
Dataset Files
File | Size |
---|---|
default of credit card clients.xls | 5.3 MB |
Papers Citing this Dataset
Sort by Year, desc
By Sheikh Islam, William Eberle, Sheikh Ghafoor. 2018
Published in ArXiv.
By Clement Fung, Jamie Koerner, Stewart Grant, Ivan Beschastnikh. 2018
Published in ArXiv.
0 to 3 of 3
Reviews
There are no reviews for this dataset yet.
pip install ucimlrepo
from ucimlrepo import fetch_ucirepo # fetch dataset default_of_credit_card_clients = fetch_ucirepo(id=350) # data (as pandas dataframes) X = default_of_credit_card_clients.data.features y = default_of_credit_card_clients.data.targets # metadata print(default_of_credit_card_clients.metadata) # variable information print(default_of_credit_card_clients.variables)
Yeh, I. (2009). Default of Credit Card Clients [Dataset]. UCI Machine Learning Repository. https://doi.org/10.24432/C55S3H.
Creators
I-Cheng Yeh
DOI
License
This dataset is licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0) license.
This allows for the sharing and adaptation of the datasets for any purpose, provided that the appropriate credit is given.