Center for Machine Learning and Intelligent Systems
About  Citation Policy  Donate a Data Set  Contact


Repository Web            Google
View ALL Data Sets

× Check out the beta version of the new UCI Machine Learning Repository we are currently testing! Contact us if you have any issues, questions, or concerns. Click here to try out the new site.

CNAE-9 Data Set
Download: Data Folder, Data Set Description

Abstract: This is a data set containing 1080 documents of free text business descriptions of Brazilian companies categorized into a subset of 9 categories

Data Set Characteristics:  

Multivariate, Text

Number of Instances:

1080

Area:

Business

Attribute Characteristics:

Integer

Number of Attributes:

857

Date Donated

2012-08-03

Associated Tasks:

Classification

Missing Values?

N/A

Number of Web Hits:

71360


Source:

Patrick Marques Ciarelli, pciarelli '@' lcad.inf.ufes.br, Department of Electrical Engineering, Federal University of Espirito Santo
Elias Oliveira, elias '@' lcad.inf.ufes.br, Department of Information Science, Federal University of Espirito Santo


Data Set Information:

This is a data set containing 1080 documents of free text business descriptions of Brazilian companies categorized into a
subset of 9 categories cataloged in a table called National Classification of Economic Activities (Classificação Nacional de
Atividade Econômicas - CNAE). The original texts were pre-processed to obtain the current data set: initially, it was kept only
letters and then it was removed prepositions of the texts. Next, the words were transformed to their canonical form. Finally,
each document was represented as a vector, where the weight of each word is its frequency in the document. This data set is
highly sparse (99.22% of the matrix is filled with zeros).


Attribute Information:

In the data set there are 857 attributes, 1 attributes with the class of instance and 856 with word frequency:
1. category: range 1 - 9 (integer)
2 - 857. word frequency: (integer)


Relevant Papers:

Patrick Marques Ciarelli, Elias Oliveira, 'Agglomeration and Elimination of Terms for Dimensionality Reduction',
Ninth International Conference on Intelligent Systems Design and Applications, pp.547-552, 2009

Patrick Marques Ciarelli, Elias Oliveira, Evandro O. T. Salles, 'An Evolving System Based on Probabilistic Neural Network',
Brazilian Symposium on Artificial Neural Network, 2010



Citation Request:

If you have no special citation requests, please leave this field blank.


Supported By:

 In Collaboration With:

About  ||  Citation Policy  ||  Donation Policy  ||  Contact  ||  CML