An open-source remote sensing dataset and pipeline for agricultural land use classification, featuring 95,186 datapoints with satellite and climatology data.
CropHarvest is an open-source remote sensing dataset and processing pipeline specifically designed for agricultural applications. It aggregates diverse agricultural land use datasets and pairs them with satellite imagery and climatology data to create a standardized resource for training and benchmarking crop classification models. The project solves the problem of fragmented and inaccessible agricultural remote sensing data by providing a unified, ready-to-use dataset.
Researchers, data scientists, and agricultural technologists working on crop-type classification, land use mapping, and remote sensing applications. It is particularly useful for those developing machine learning models for agricultural monitoring.
Developers choose CropHarvest because it offers a large, curated dataset with integrated satellite and climatology data, along with a complete processing pipeline. Its open-source nature and benchmark support enable reproducible research and faster model development compared to assembling disparate data sources manually.
Open source remote sensing dataset with benchmarks
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
With 95,186 datapoints including 33,205 multiclass labels, it provides a substantial foundation for training crop classification models, as stated in the README.
Combines Sentinel-2, Sentinel-1, SRTM elevation, and ERA5 climatology data, offering a comprehensive remote sensing package for diverse analysis.
Includes code to merge datasets, export satellite data, and generate (X,y) tuples, accelerating research with benchmark support and a Dataset object.
Allows data filtering by bounding boxes using the Task object, enabling region-specific model training as demonstrated in the FAQ.
Requires a conda environment and specific package versions like fiona and rasterio, making setup more cumbersome and error-prone compared to Linux/MacOS.
Data is downloaded from Zenodo and not live, which may not suit applications needing the most recent satellite imagery or frequent updates.
Only 35% of datapoints have multiclass labels, potentially hindering fine-grained classification tasks without additional data or balancing efforts.