Horse Colic Data Set
Below are papers that cite this data set, with context shown.
Papers were automatically harvested and associated with this data set, in collaboration with Rexa.info.
Return to Horse Colic data set page.
Julie Greensmith. New Frontiers For An Artificial Immune System. Digital Media Systems Laboratory HP Laboratories Bristol. 2003.
with a 1 and the absence of that word is marked with a 0. This gives rise to the creation of the feature vectors for use within the classifier. For example, our most informative words from the tiny dataset above were horse and monitor. So, the feature vector for the document D containing the words "I rode a horse today" would look like "1 0", thus denoting the presence of the word horse within the
Richard Nock and Marc Sebban and David Bernard. A SIMPLE LOCALLY ADAPTIVE NEAREST NEIGHBOR RULE WITH APPLICATION TO POLLUTION FORECASTING. International Journal of Pattern Recognition and Artificial Intelligence Vol. 2003.
section. The justification for the better choice of the additional neighbors in the k-sNN algorithm is now visually evident from Fig. 2, when looking at the µ(k) curves. For the Horse Colic dataset, the curve of the sNN rule is clearly located over the two other curves. For particular points of 45 # µ(k) # 60, the accuracies of the NN and tNN rules are similar, but they are beaten by the sNN
Marc Sebban and Richard Nock and Stéphane Lallich. Stopping Criterion for Boosting-Based Data Reduction Techniques: from Binary to Multiclass Problem. Journal of Machine Learning Research, 3. 2002.
a weighted decision rule provides better results than the unweighted rule. Among them, 7 datasets (Balance, Echocardiogram, German, Horse Colic Led, Pima and Vehicle) see important improvements, ranging from 1% to } 5%. In contrast, only one dataset sees significant accuracy decrease (Car,
Mukund Deshpande and George Karypis. Using conjunction of attribute values for classification. CIKM. 2002.
We performed our experiments using a 10 way cross validation scheme and computed average accuracy across different runs. We ran our experiments using a support threshold of 1.0% for all the datasets, except hepati, horse where we used a support threshold of 2.0% and for lymph and zoo we used the support threshold of 5.0%. This was done to ensure that the composite features generated are
Huan Liu and Hiroshi Motoda and Lei Yu. Feature Selection with Selective Sampling. ICML. 2002.
2 and 3 in Table 2) by simply treating them as continuous. The results are reported in Table 5. ReliefS works as well as or better than ReliefF except for 3 cases (some particular bucket sizes for data sets PrimaryTumor, Zoo, Colic . The detailed re0.95 0.955 0.96 0.965 0.97 0.975 0.98 0.985 0.99 0.995 1 102030405060708090100 Precision Percentage by bucket size from 7 to 1 ReliefS ReliefF 0.95 0.955
Kai Ming Ting and Ian H. Witten. Issues in Stacked Generalization. J. Artif. Intell. Res. (JAIR, 10. 1999.
C4.5 NB IB1 1 0.36 0.20 0.42 0.63 0.30 0.04 2 0.39 0.19 0.41 0.65 0.28 0.07 C4.5 for ff 1 ; NB for ff 2 ; IB1 for ff 3 . Table 5: (a) Weights generated by MLR (model ~ M 0 ) for the Horse and Credit datasets. Splice Abalone Waveform Class C4.5 NB IB1 C4.5 NB IB1 C4.5 NB IB1 1 0.23 0.43 0.36 0.25 0.25 0.39 0.16 0.59 0.34 2 0.15 0.72 0.12 0.27 0.20 0.25 0.14 0.72 0.07 3 0.08 0.52 0.40 0.30 0.18 0.39 0.04
Mark A. Hall. Department of Computer Science Hamilton, NewZealand Correlation-based Feature Selection for Machine Learning. Doctor of Philosophy at The University of Waikato. 1999.
examples described by 35 nominal features. Features measure properties of leaves and various plant abnormalities. There are 19 classes (diseases). Horse colic (hc) There are 368 instances in this dataset, provided by Mary McLeish and Matt Cecile from the University of Guelph. There are 27 attributes, of which 7 are continuous. Features include whether a horse is young or old, whether it had surgery,
Eibe Frank and Ian H. Witten. Generating Accurate Rule Sets Without Global Optimization. ICML. 1998.
has classes 1 and 3 combined and classes 4 to 7 deleted, and the horse colic dataset has attributes 3, 25, 26, 27, 28 deleted with attribute 24 being used as the class. We also deleted all identifier attributes from the datasets. 4 We used Revision 8 of C4.5. Table 2: Experimental
Gabor Melli. A Lazy Model-Based Approach to On-Line Classification. University of British Columbia. 1989.
were: echocardiogram, hayes-roth, heart, horse colic andiris datasets. These datasets (marked in Table 7.1 with a * symbol beside their name) contain a sampling of attribute types and domains. For this initial study however the datasets needed to be small enough (#
H. Altay G uvenir and Aynur Akkus. WEIGHTED K NEAREST NEIGHBOR CLASSIFICATION ON FEATURE PROJECTIONS. Department of Computer Engineering and Information Science Bilkent University.
row of each k value presents the accuracy of the WkNNFP algorithm with equal feature weigths, while the second row shows the accuracy obtained by WkNNFP using Table 1: Comparison on some real-world datasets. Data Set: cleveland glass horse hungarian iris liver sonar wine No. of Instances 303 214 368 294 150 345 208 178 No. of Features 13 9 22 13 4 6 60 13 No. of Classes 2 6 2 2 3 2 2 3 No. of Missing
Kai Ming Ting and Ian H. Witten. Stacked Generalization: when does it work. Department of Computer Science University of Waikato.
2 -- 0.00 0.93 0.01 0.00 0.00 0.07 Table 8: Ave. error rates of BestCV, Majority Vote and MLR (model ~ M 0 ), along with the standard error (#SE) between BestCV and the worst level-0 generalizers. Dataset #SE BestCV Majority MLR Horse 0.5 17.1 15.0 15.2 Splice 2.5 4.5 4.0 3.8 Abalone 3.3 40.1 39.0 37.9 Led24 8.7 32.8 31.8 32.1 Credit 8.9 17.4 16.1 16.2 Nettalk(s) 10.8 12.7 12.2 11.5 Coding 12.7 25.0
Alexander K. Seewald. Dissertation Towards Understanding Stacking Studies of a General Ensemble Learning Scheme ausgefuhrt zum Zwecke der Erlangung des akademischen Grades eines Doktors der technischen Naturwissenschaften.
breast-w Compressed glyph visualization for dataset colic Compressed glyph visualization for dataset credit-a Compressed glyph visualization for dataset credit-g Compressed glyph visualization for dataset diabetes Compressed glyph visualization for
James J. Liu and James Tin and Yau Kwok. An Extended Genetic Rule Induction Algorithm. Department of Computer Science Wuhan University.
has classes 1 and 3 combined and classes 4 to 7 deleted, and the horse colic dataset has attributes 3, 25, 26, 27, 28 deleted and with attribute 24 being used as the class label. We also deleted all identifier attributes from the datasets. Table 1: Datasets used in the experiments.