Paddy Dataset
Donated on 7/14/2025
Agriculture occupies a third of Earth's surface and is vital for food production. Rice, grown from paddy seeds, feeds nearly half the global population. To meet rising food demands, this study aims to enhance rice production using Machine Learning (ML) to predict factors affecting paddy growth. A Hybrid ML Model with Combined Wrapper Feature Selection (HMLCWFS) was developed to address challenges like overfitting and computational costs. Five Feature Selection (FS) methods—Backward Elimination, Stepwise Forward Selection, Feature Importance, Exhaustive FS, and Gradient Boosting—were applied. Selected features were merged using Poincaré’s formula to form a refined dataset. ML models such as Decision Tree, Random Forest, SVM, KNN, and Naive Bayes were trained and tested. The model not only forecasts yield but also recommends paddy varieties based on farmers' preferences. Results show that combined FS techniques effectively identify key factors for improving paddy productivity.
Dataset Characteristics
Tabular
Subject Area
Computer Science
Associated Tasks
Classification, Regression, Clustering
Feature Type
Categorical
# Instances
2790
# Features
45
Dataset Information
Has Missing Values?
No
Introductory Paper
By Muthukumaran S, John Peter K, Dilipkumar E, Savithri S, Senbagam K. 2023
Published in International Journal of Electronics and Communication Engineering
Variables Table
| Variable Name | Role | Type | Description | Units | Missing Values |
|---|---|---|---|---|---|
| Hectares | Feature | Integer | no | ||
| Agriblock | Feature | Categorical | no | ||
| Variety | Feature | Categorical | no | ||
| Soil Types | Feature | Categorical | no | ||
| Seedrate(in Kg) | Feature | Integer | no | ||
| LP_Mainfield(in Tonnes) | Feature | Continuous | no | ||
| Nursery | Feature | Categorical | no | ||
| Nursery area (Cents) | Feature | Integer | no | ||
| LP_nurseryarea(in Tonnes) | Feature | Integer | no | ||
| DAP_20days | Feature | Integer | no |
0 to 10 of 45
Additional Variable Information
LP_nurseryarea(in Tonnes)-Manure used for Land Preparation, DAP_20days-DAP sowed for the first 20 days
Class Labels
Agriblock, Variety of Paddy, Soil Types, Type of Nursery, LP_nurseryarea(in Tonnes), DAP_20days
Dataset Files
| File | Size |
|---|---|
| paddydataset.csv | 515.5 KB |
Reviews
There are no reviews for this dataset yet.
pip install ucimlrepo
from ucimlrepo import fetch_ucirepo # fetch dataset paddy_dataset = fetch_ucirepo(id=1186) # data (as pandas dataframes) X = paddy_dataset.data.features y = paddy_dataset.data.targets # metadata print(paddy_dataset.metadata) # variable information print(paddy_dataset.variables)
Subramaniyan, M. (2023). Paddy Dataset [Dataset]. UCI Machine Learning Repository. https://doi.org/10.24432/C55W3J.
Keywords
Creators
Muthukumaran Subramaniyan
muthumphil11@gmail.com
DOI
License
This dataset is licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0) license.
This allows for the sharing and adaptation of the datasets for any purpose, provided that the appropriate credit is given.