Center for Machine Learning and Intelligent Systems
About  Citation Policy  Donate a Data Set  Contact


Repository Web            Google
View ALL Data Sets

CNAE-9 Data Set
Download: Data Folder, Data Set Description

Abstract: This is a data set containing 1080 documents of free text business descriptions of Brazilian companies categorized into a subset of 9 categories

Data Set Characteristics:  

Multivariate, Text

Number of Instances:

1080

Area:

Business

Attribute Characteristics:

Integer

Number of Attributes:

857

Date Donated

2012-08-03

Associated Tasks:

Classification

Missing Values?

N/A

Number of Web Hits:

14918


Source:

Patrick Marques Ciarelli, pciarelli '@' lcad.inf.ufes.br, Department of Electrical Engineering, Federal University of Espirito Santo
Elias Oliveira, elias '@' lcad.inf.ufes.br, Department of Information Science, Federal University of Espirito Santo


Data Set Information:

This is a data set containing 1080 documents of free text business descriptions of Brazilian companies categorized into a
subset of 9 categories cataloged in a table called National Classification of Economic Activities (Classificação Nacional de
Atividade Econômicas - CNAE). The original texts were pre-processed to obtain the current data set: initially, it was kept only
letters and then it was removed prepositions of the texts. Next, the words were transformed to their canonical form. Finally,
each document was represented as a vector, where the weight of each word is its frequency in the document. This data set is
highly sparse (99.22% of the matrix is filled with zeros).


Attribute Information:

In the data set there are 857 attributes, 1 attributes with the class of instance and 856 with word frequency:
1. category: range 1 - 9 (integer)
2 - 857. word frequency: (integer)


Relevant Papers:

Patrick Marques Ciarelli, Elias Oliveira, 'Agglomeration and Elimination of Terms for Dimensionality Reduction',
Ninth International Conference on Intelligent Systems Design and Applications, pp.547-552, 2009

Patrick Marques Ciarelli, Elias Oliveira, Evandro O. T. Salles, 'An Evolving System Based on Probabilistic Neural Network',
Brazilian Symposium on Artificial Neural Network, 2010



Citation Request:

If you have no special citation requests, please leave this field blank.


Supported By:

 In Collaboration With:

About  ||  Citation Policy  ||  Donation Policy  ||  Contact  ||  CML