Showing 36 of 46 projects
A real-time, no-code ORM that provides APIs and documentation automatically, allowing frontend clients to customize JSON responses.
A comprehensive JVM-based deep learning ecosystem for building, training, and deploying models with support for model import and distributed training.
A high-performance distributed POSIX file system for cloud-native environments, storing data in object storage and metadata in databases.
A fast distributed SQL query engine for big data analytics, enabling interactive queries across diverse data sources.
An open source machine learning server for developers and data scientists, supporting event collection, algorithm deployment, and REST API queries.
A distributed caching platform that bridges computation frameworks and storage systems for large-scale analytics and ML workloads.
Azkaban is a batch workflow job scheduler created at LinkedIn to manage Hadoop jobs.
A Scala API for Cascading that simplifies writing Hadoop MapReduce jobs with Scala integration.
A distributed, multi-tenant gateway providing serverless SQL on data warehouses and lakehouses.
Native integration library for using Elasticsearch with Hadoop, Spark, and Hive for real-time search and analytics on big data.
A graph database framework for storing and querying large-scale graphs with rich properties and in-database aggregation.
A federated Big Data orchestration service that simplifies job execution across distributed clusters by abstracting infrastructure complexity.
A comprehensive benchmark suite for evaluating speed, throughput, and resource utilization of big data frameworks like Hadoop, Spark, and streaming engines.
A native Go client library and command-line tool for HDFS that connects directly to the namenode via protocol buffers.
A curated list of amazingly awesome Hadoop and Hadoop ecosystem resources.
LinkedIn's previous generation Kafka to HDFS pipeline for batch data ingestion.
A pure Python HDFS client and Hadoop minicluster wrapper for interacting with Hadoop Distributed File System.
A collection of R packages for interacting with Hadoop ecosystems, enabling big data analysis from R.
A Docker image for Apache Spark on YARN, built on Hadoop and CentOS for easy deployment.
An R package providing a lightweight frontend to use Apache Spark for distributed data processing from R.
An open-source security analytics platform that integrates big data technologies for centralized security monitoring, threat detection, and investigation.
A collection of GIS tools for spatial analysis of big data using Hadoop, integrating with ArcGIS Geoprocessing.
A Scala framework for distributed supervised learning of decision tree ensemble models, inspired by Google's PLANET.
A framework enabling spatial data analysis within Hadoop ecosystems using Hive and SparkSQL.
A framework for building scalable machine learning models in Hadoop using the Scalding DSL.
Open-source platform for network security analytics using flow and packet analysis to detect unknown threats at cloud scale.
A JVM agent profiler that sends memory, CPU tracing, and CPU load metrics to StatsD, InfluxDB, and other backends.
An open-source big data security analytics tool that analyzes network packet capture (pcap) files using Apache Pig.
Define, run, and deploy big data applications on AWS, OpenStack, and local machines using Docker.
Interactive visualization tool for monitoring Hadoop HDFS cluster usage and file storage efficiency.
A Python wrapper for Cascading that enables building and controlling Hadoop data processing workflows entirely in Python.
A Hadoop library for reading and processing packet capture (PCAP) files in MapReduce jobs and Hive queries.
A Go-based toolkit for fast ETL and feature extraction on Hadoop, optimized for rapid development and execution.
A lightweight tool for searching Hadoop jobs, visualizing performance, and viewing cluster utilization.
A Hadoop log aggregator and dashboard for visualizing cluster utilization across users.
A practical guide to exploratory data analytics using Hadoop with Pig and Ruby for terabyte-scale data processing.
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.