CNAE-9 Data Set
Download: Data Folder, Data Set Description
Abstract: This is a data set containing 1080 documents of free text business descriptions of Brazilian companies categorized into a
subset of 9 categories
|
|
Data Set Characteristics: |
Multivariate, Text |
Number of Instances: |
1080 |
Area: |
Business |
Attribute Characteristics: |
Integer |
Number of Attributes: |
857 |
Date Donated |
2012-08-03 |
Associated Tasks: |
Classification |
Missing Values? |
N/A |
Number of Web Hits: |
77718 |
Source:
Patrick Marques Ciarelli, pciarelli '@' lcad.inf.ufes.br, Department of Electrical Engineering, Federal University of Espirito Santo
Elias Oliveira, elias '@' lcad.inf.ufes.br, Department of Information Science, Federal University of Espirito Santo
Data Set Information:
This is a data set containing 1080 documents of free text business descriptions of Brazilian companies categorized into a
subset of 9 categories cataloged in a table called National Classification of Economic Activities (Classificação Nacional de
Atividade Econômicas - CNAE). The original texts were pre-processed to obtain the current data set: initially, it was kept only
letters and then it was removed prepositions of the texts. Next, the words were transformed to their canonical form. Finally,
each document was represented as a vector, where the weight of each word is its frequency in the document. This data set is
highly sparse (99.22% of the matrix is filled with zeros).
Attribute Information:
In the data set there are 857 attributes, 1 attributes with the class of instance and 856 with word frequency:
1. category: range 1 - 9 (integer)
2 - 857. word frequency: (integer)
Relevant Papers:
Patrick Marques Ciarelli, Elias Oliveira, 'Agglomeration and Elimination of Terms for Dimensionality Reduction',
Ninth International Conference on Intelligent Systems Design and Applications, pp.547-552, 2009
Patrick Marques Ciarelli, Elias Oliveira, Evandro O. T. Salles, 'An Evolving System Based on Probabilistic Neural Network',
Brazilian Symposium on Artificial Neural Network, 2010
Citation Request:
If you have no special citation requests, please leave this field blank.
|