Spambase Data Set
Below are papers that cite this data set, with context shown.
Papers were automatically harvested and associated with this data set, in collaboration with Rexa.info.
Return to Spambase data set page.
Don R. Hush and Clint Scovel and Ingo Steinwart. Los Alamos National Laboratory Stability of Unstable Learning Algorithms. Modeling, Algorithms and Informatics Group, CCS-3. 2003.
are synthetically generated according to Fukunaga's so-called I-4I and I-# distributions (Fukunaga, 1990), and the third is the Spambase data set from the UCI repository (Blake & Merz, 1998). For the synthetic data sets we set d = 8 and generate samples from R d # f 1; 1g according to the I-4I and I-# distributions. For both distributions the
Yongmei Wang and Ian H. Witten. Modeling for Optimal Probability Prediction. ICML. 2002.
runs of ten-fold cross-validation. According to both the negative log-likelihood and the classification rate, the estimator New provides either the best or nearly the best results for six of the datasets. For the other two Spambase and WDBC), its results are intermediate and comparable with other estimators. Along with this, it also reduces the model dimensionality, which the MLE can never do. 5.
C. Titus Brown and Harry W. Bullen and Sean P. Kelly and Robert K. Xiao and Steven G. Satterfield and John G. Hagedorn and Judith E. Devaney. Visualization and Data Mining in an 3D Immersive Environment: Summer Project 2003.
was to determine what attributes were most meaningful in determining what the block contained. The most important attributes turned out to be the size and shape of the block. Figure 4.9: Page block data set in museum environment. 34 4.10 Spambase The spambase data set was analysed by Sean Kelly. This dataset contains roughly 4000 instances of 58 attributes each, representing e-mail messages. One
Christos Dimitrakakis and Samy Bengioy. Online Policy Adaptation for Ensemble Classifiers. IDIAP.
2.72% 3.10% 2.80% 2.69% 8.33% 6.48% 7.75% 7.41% 56.1% 61.9% 68.1% 48.3% Table 1: Classification error on the UCI breast, forest, heart, ionosphere, letter, optdigits, pendigits, spambase and vowel datasets using 32 experts. 7 times out of 9 respectively. For each dataset we have also calculated the cumulative margin distribution resulting from equation (1). For the RL mixture there was a constant