CNAE-9
Donated on 8/2/2012
This is a data set containing 1080 documents of free text business descriptions of Brazilian companies categorized into a subset of 9 categories
Dataset Characteristics
Multivariate, Text
Subject Area
Business
Associated Tasks
Classification
Feature Type
Integer
# Instances
1080
# Features
-
Dataset Information
Additional Information
This is a data set containing 1080 documents of free text business descriptions of Brazilian companies categorized into a subset of 9 categories cataloged in a table called National Classification of Economic Activities (Classificação Nacional de Atividade Econômicas - CNAE). The original texts were pre-processed to obtain the current data set: initially, it was kept only letters and then it was removed prepositions of the texts. Next, the words were transformed to their canonical form. Finally, each document was represented as a vector, where the weight of each word is its frequency in the document. This data set is highly sparse (99.22% of the matrix is filled with zeros).
Has Missing Values?
No
Variables Table
Variable Name | Role | Type | Description | Units | Missing Values |
---|---|---|---|---|---|
no | |||||
no | |||||
no | |||||
no | |||||
no | |||||
no | |||||
no | |||||
no | |||||
no | |||||
no |
0 to 10 of 857
Additional Variable Information
In the data set there are 857 attributes, 1 attributes with the class of instance and 856 with word frequency: 1. category: range 1 - 9 (integer) 2 - 857. word frequency: (integer)
Dataset Files
File | Size |
---|---|
CNAE-9.data | 1.8 MB |
CNAE-9.names | 2.2 KB |
Reviews
There are no reviews for this dataset yet.
pip install ucimlrepo
from ucimlrepo import fetch_ucirepo # fetch dataset cnae_9 = fetch_ucirepo(id=233) # data (as pandas dataframes) X = cnae_9.data.features y = cnae_9.data.targets # metadata print(cnae_9.metadata) # variable information print(cnae_9.variables)
Ciarelli, P. & Oliveira, E. (2009). CNAE-9 [Dataset]. UCI Machine Learning Repository. https://doi.org/10.24432/C51G7P.
Creators
Patrick Ciarelli
Elias Oliveira
DOI
License
This dataset is licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0) license.
This allows for the sharing and adaptation of the datasets for any purpose, provided that the appropriate credit is given.