Audit Data
Donated on 7/13/2018
Exhaustive one year non-confidential data in the year 2015 to 2016 of firms is collected from the Auditor Office of India to build a predictor for classifying suspicious firms.
Dataset Characteristics
Multivariate
Subject Area
Business
Associated Tasks
Classification
Feature Type
Real
# Instances
777
# Features
-
Dataset Information
Additional Information
The goal of the research is to help the auditors by building a classification model that can predict the fraudulent firm on the basis the present and historical risk factors. The information about the sectors and the counts of firms are listed respectively as Irrigation (114), Public Health (77), Buildings and Roads (82), Forest (70), Corporate (47), Animal Husbandry (95), Communication (1), Electrical (4), Land (5), Science and Technology (3), Tourism (1), Fisheries (41), Industries (37), Agriculture (200).
Has Missing Values?
Yes
Variables Table
Variable Name | Role | Type | Description | Units | Missing Values |
---|---|---|---|---|---|
no | |||||
no | |||||
no | |||||
no | |||||
no | |||||
no | |||||
no | |||||
no | |||||
no | |||||
no |
0 to 10 of 18
Additional Variable Information
Many risk factors are examined from various areas like past records of audit office, audit-paras, environmental conditions reports, firm reputation summary, on-going issues report, profit-value records, loss-value records, follow-up reports etc. After in-depth interview with the auditors, important risk factors are evaluated and their probability of existence is calculated from the present and past records.
Dataset Files
File | Size |
---|---|
audit_data/audit_risk.csv | 79.3 KB |
audit_data/trial.csv | 39 KB |
Reviews
There are no reviews for this dataset yet.
pip install ucimlrepo
from ucimlrepo import fetch_ucirepo # fetch dataset audit_data = fetch_ucirepo(id=475) # data (as pandas dataframes) X = audit_data.data.features y = audit_data.data.targets # metadata print(audit_data.metadata) # variable information print(audit_data.variables)
Hooda, N. (2018). Audit Data [Dataset]. UCI Machine Learning Repository. https://doi.org/10.24432/C5930Q.
Creators
Nishtha Hooda
DOI
License
This dataset is licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0) license.
This allows for the sharing and adaptation of the datasets for any purpose, provided that the appropriate credit is given.