Showing 36 of 88 projects
A high-performance, functional tabular data processing library for Clojure, similar to Python's Pandas or R's data.table.
A benchmark for evaluating protein language models through five biologically relevant semi-supervised learning tasks.
A Delphi and Lazarus (FPC) library for bidirectional conversion between JSON and DataSets, including validation and structure management.
A decentralized data management system built on Git and git-annex for versioning and distributing large datasets.
A toolkit and dataset for autonomous driving research, including trajectory prediction, 3D LiDAR detection, scene parsing, and video inpainting.
A large-scale multi-domain dataset of over 20k annotated task-oriented dialogues for training and evaluating virtual assistants.
A large-scale StarCraft: Brood War replay dataset for AI research, containing 65,646 games with frame and action data.
A benchmark dataset for long-range (up to 250m) dense depth estimation in autonomous driving, featuring 360° LiDAR ground truth.
A curated list of open-access resources and tools for Natural Language Processing (NLP) focused on the German language.
A reading comprehension dataset with Wikipedia summaries, full stories, and question-answer pairs for narrative understanding.
A curated collection of robotics and computer vision datasets for research and development.
A foundation model for multi-species genome understanding, achieving state-of-the-art performance on 28 genomic tasks.
A curated collection of data sets and tools for empirical software engineering and mining software repositories research.
A fast Apache Spark testing helper library with beautifully formatted error messages for Scala applications.
A .NET framework for extracting and exporting text and data from a wide variety of document formats.
A neural network for real-time 6D object pose tracking in video using RGB-D data, trained only on synthetic images.
A curated list of deep learning research papers and implementations for high dynamic range image and video synthesis.
A dataset of millions of news articles labeled by credibility type for training fake news detection algorithms.
A benchmark dataset and toolkit for RF-based drone detection and identification using raw IQ data and deep learning models.
A Clojure dataset manipulation library providing a dplyr-like API on top of tech.ml.dataset.
A collection of large-scale datasets for source code analysis and machine learning on code, including GitHub repositories, identifiers, and commit data.
A ROS-based dataset and tools for autonomous vehicle development with seasonal multi-sensor data from Ford vehicles.
A centralized Python framework for agricultural machine learning, providing access to public datasets, benchmarks, pretrained models, and synthetic data generation.
A Python devkit for loading, exploring, and manipulating the PandaSet, a large-scale autonomous driving dataset with LiDAR, camera, and annotations.
Fortran application interfaces for accessing netCDF scientific data files, providing self-describing, network-transparent data storage.
A large-scale image dataset for self-supervised pretraining without humans, designed to reduce privacy concerns.
Tools for compiling and using the Maluuba NewsQA dataset, a machine reading comprehension dataset based on CNN articles.
FLAME dataset and deep learning models for fire detection in aerial imagery using UAVs, supporting classification and segmentation tasks.
A long-term autonomous driving dataset from Europe with multi-sensor data (GPS-RTK, LiDAR, cameras, IMU) for localization and mapping research.
An open-source remote sensing dataset and pipeline for agricultural land use classification, featuring 95,186 datapoints with satellite and climatology data.
A set of Vue.js 3 components for displaying datasets with built-in filtering, pagination, and sorting.
A large-scale driving behavior dataset with LiDAR point clouds, dashboard videos, and sensor data for autonomous driving research.
A high-performance Java library for generating realistic business data with internationalization support.
A curated collection of LiDAR place recognition methods, datasets, and algorithms for robotics and autonomous systems.
A benchmark dataset and meta self-learning method for multi-source domain adaptation in scene text recognition.
A YOLO-based object detection system specifically trained to identify DJI drones in images and video.
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.