Labor Relations Data Set
Below are papers that cite this data set, with context shown.
Papers were automatically harvested and associated with this data set, in collaboration with Rexa.info.
Return to Labor Relations data set page.
Rudy Setiono. Feedforward Neural Network Construction Using Cross Validation. Neural Computation, 13. 2001.
and the test set. The average number of hidden units ranged from 2.46 for the labor data set to 19.44 for the soybean data set. Most of the networks for the latter data set contain the maximum 20 hidden units. It might be possible to improve the overall predictive accuracy of these networks
Endre Boros and Peter Hammer and Toshihide Ibaraki and Alexander Kogan and Eddy Mayoraz and Ilya B. Muchnik. An Implementation of Logical Analysis of Data. IEEE Trans. Knowl. Data Eng, 12. 2000.
several equivalent tests involving different items, as a way of verifying the validity of that patient's answers. 9.3 labor Productivity in China 4 Using the official China Statistical Yearbooks, a dataset was compiled in [8, 7] for the analysis of labor productivity in the developing Chinese economy. The dataset provides 290 annual observations for the 29 Chinese provinces for the period 1985 --
Gary M. Weiss and Haym Hirsh. A Quantitative Study of Small Disjuncts: Experiments and Results. Department of Computer Science Rutgers University. 2000.
noise. What is much more apparent, however, is that many concepts with low EC values are extremely tolerant of noise, whereas none of the concepts with high EC's are. For example, two of the low-EC datasets, blackjack and labor are so tolerant of noise that when 50% random class noise is added to the training set (i.e., the class value is replaced with a randomly selected valid value 50% of the
Lorne Mason and Jonathan Baxter and Peter L. Bartlett and Marcus Frean. Boosting Algorithms as Gradient Descent. NIPS. 1999.
the minimum of AdaBoost's test error and the minimum of the normalized sigmoid cost very nearly coincide. In the labor data set AdaBoost's test error converges and overfitting does not occur. For this data set both the normalized sigmoid cost and the exponential cost converge. In the vote1 data set AdaBoost initially
Richard Maclin. Boosting Classifiers Regionally. AAAI/IAAI. 1998.
as the number of hidden units in the component network. The results of these experiments are similar to those obtained using the nearest neighbor methods, and produce significant gains for two other data sets labor and ionosphere). Together, these experiments indicate that the overall RegionBoost approach can produce significant gains for many (though not all) data sets. One question which might be
Huan Liu and Rudy Setiono. A Probabilistic Approach to Feature Selection - A Filter Solution. ICML. 1996.
was divided by Quinlan [ Quinlan, 1993 ] into 490 training instances and 200 test instances. 6. Labor The dataset contains instances for acceptable and unacceptable contracts. It is a small dataset with 16 features, a training set of 40 instances, and a testing set of 17 instances. 7. Mushroom The dataset has a
Oya Ekin and Peter L. Hammer and Alexander Kogan and Pawel Winter. Distance-Based Classification Methods. e p o r t RUTCOR ffl Rutgers Center for Operations Research ffl Rutgers University. 1996.
that was produced by Strathclyde University. In this version each case is described by 24 continuous attributes. There are no missing values. RRR 3-96 Page 7 4.7 Labor Negotiations This data set includes all collective agreements reached in the business and personal services sector for locals with at least 500 members (teachers, nurses, university staff, police, etc) in Canada in 1987 and
George H. John and Ron Kohavi and Karl Pfleger. Irrelevant Features and the Subset Selection Problem. ICML. 1994.
was divided by Quinlan into 490 training instances and 200 test instances. Labor The dataset contains instances for acceptable and unacceptable contracts. It is a small dataset with 16 features, a training set of 40 instances, and a test set of 17 instances. Our results show that the main
Huan Liu and Rudy Setiono. To appear in Proceedings of IEA-AIE96 FEATURE SELECTION AND CLASSIFICATION -- A PROBABILISTIC WRAPPER APPROACH. Department of Information Systems and Computer Science National University of Singapore.
selection is conducted, which includes CorrAL [JKP94], Monks13 [TBB + 91], and Parity5+5. The other type is real-world data including Credit, Vote, and Labor [Qui93, MA94]. The choice of these datasets can simplify the comparison of this work with some published work. These datasets were used in [JKP94] in which comparisons with different methods were described. For the artificial datasets, no
John G. Cleary and Leonard E. Trigg. Experiences with OB1, An Optimal Bayes Decision Tree Learner. Department of Computer Science University of Waikato.
however, naive Bayes performs very well, and on some datasets (such as heart-c and labor it performs considerably better than the OB1 results shown (presumably because its attribute independence assumption isn't violated). The next section investigates
Alexander K. Seewald. Meta-Learning for Stacked Classification. Austrian Research Institute for Artificial Intelligence.
using only seven folds shows the exact same result. When removing the base-classifier dependent features, IBk is still the best classifier with an additional error on labor the smallest dataset. In this case MLR which is also a global learner is equally good. So we may tentatively conclude that for this meta-dataset, there seems to be no single feature which can predict the significant
Karthik Ramakrishnan. UNIVERSITY OF MINNESOTA.
classifier is shown as a straight line across the x-axis for comparison purposes. . . . . . . . . . . . . . . . . . 40 17 Bagging, Boosting, and Distance-Weighted test set error rates for the labor data set as the number of classifiers in the ensemble increases. The test set error rate for a single decision tree classifier is shown as a straight line across the x-axis for comparison purposes. . . . . .
Ron Kohavi and George H. John. Automatic Parameter Selection by Minimizing Estimated Error. Computer Science Dept. Stanford University.
69.84Sigma1.77 72.44Sigma1.73 78.18Sigma0.94 .973 X vote 435 95.64Sigma0.52 95.41Sigma0.47 97.71Sigma0.68 .172 vote1 435 88.02Sigma1.77 87.58Sigma1.52 92.40Sigma1.20 .342 labor negotiation dataset, note that the entire dataset is very small with only 57 instances. Because of the small size, the state evaluations in C4.5-AP had high variance, and the search did not find good parameter values.
Alexander K. Seewald. Dissertation Towards Understanding Stacking Studies of a General Ensemble Learning Scheme ausgefuhrt zum Zwecke der Erlangung des akademischen Grades eines Doktors der technischen Naturwissenschaften.
vote. When removing the base classifier dependent features, IBk is still the best classifier with an additional error on labor the smallest dataset. In this case 12 All base learners plus 1R and DecisionStump 36 MLR, another high-bias and global learner, is equally good. So we may tentatively conclude that for this meta dataset, there seems to
YongSeog Kim and W. Nick Street and Filippo Menczer. Optimal Ensemble Construction via Meta-Evolutionary Ensembles. Business Information Systems, Utah State University.
It is interesting that MEE performs worse performance in two data sets, labor and segment, compared to both ordinary ensemble methods. Note also that MEE shows comparable performance compared to GEFS with a win-loss-tie score (4-5-6). However, we note that such
Ida G. Sprinkhuizen-Kuyper and Elena Smirnova and I. Nalbantis. Reliability yields Information Gain. IKAT, Universiteit Maastricht.
(the case of polynomial kernel) of which the information gain is 0.72 (we even obtained perfect information!) and the labor dataset (the case of polynomial kernel) of which the information gain is 0.42. 8. Conclusion For practical application of machine learning algorithms knowledge about the reliability of individual instances
Chris Drummond and Robert C. Holte. C4.5, Class Imbalance, and Cost Sensitivity: Why Under-Sampling beats Over-Sampling. Institute for Information Technology, National Research Council Canada.
as they produced cost curves that captured all the qualitative features we observed in a larger set of experiments (including other UCI data sets: vote, hepatitis, labor letter-k and glass2). For these data sets, under-sampling combined with C4.5 is a useful baseline to evaluate other algorithms. Over-sampling, on the other hand, is not to