Open-Awesome
CategoriesAlternativesStacksSelf-HostedExplore
Open-Awesome

© 2026 Open-Awesome. Curated for the developer elite.

TermsPrivacyAboutGitHubRSS
  1. Home
  2. Tags
  3. Hadoop

Hadoop

48 projects

Showing 36 of 48 projects

APIJSON
APIJSONJava

A real-time, no-code ORM that provides APIs and documentation automatically, allowing frontend clients to customize JSON responses.

#crud#no-code#orm
Stars18.4k
Forks2.3k
Last commit1 day ago
Deeplearning4j
Deeplearning4jJava

A comprehensive JVM-based deep learning ecosystem for building, training, and deploying models with support for model import and distributed training.

#distributed-training#intellij#spark-integration
Stars14.2k
Forks3.8k
Last commit
JuiceFS
JuiceFSGo

A high-performance distributed POSIX file system for cloud-native environments, storing data in object storage and metadata in databases.

#filesystem#data-storage#high-performance
Stars13.7k
Forks1.2k
Last commit1 day ago
Trino
TrinoJava

A fast distributed SQL query engine for big data analytics, enabling interactive queries across diverse data sources.

#database#distributed-systems#query-engine
Stars12.9k
Forks3.6k
Last commit1 day ago
PredictionIO
PredictionIOScala

An open source machine learning server for developers and data scientists, supporting event collection, algorithm deployment, and REST API queries.

#event-collection#spark#hbase
Stars12.5k
Forks1.9k
Last commit5 years ago
Alluxio
AlluxioJava

A distributed caching platform that bridges computation frameworks and storage systems for large-scale analytics and ML workloads.

#data-orchestration#spark#memory-speed
Stars7.2k
Forks2.9k
Last commit1 year ago
Azkaban (.5k)
Azkaban (.5k)Java

Azkaban is a batch workflow job scheduler created at LinkedIn to manage Hadoop jobs.

#hacktoberfest#gradle#batch-processing
Stars4.5k
Forks1.6k
Last commit1 year ago
Scalding
ScaldingScala

A Scala API for Cascading that simplifies writing Hadoop MapReduce jobs with Scala integration.

#cascading#mapreduce#functional-programming
Stars3.5k
Forks698
Last commit3 years ago
Apache Kyuubi
Apache KyuubiScala

A distributed, multi-tenant gateway providing serverless SQL on data warehouses and lakehouses.

#hiveserver2-alternative#hacktoberfest#spark
Stars2.3k
Forks1.0k
Last commit2 days ago
Elasticsearch Hadoop
Elasticsearch HadoopJava

Native integration library for using Elasticsearch with Hadoop, Spark, and Hive for real-time search and analytics on big data.

#apache-spark#mapreduce#data-integration
Stars2.0k
Forks994
Last commit10 days ago
Gaffer
GafferJava

A graph database framework for storing and querying large-scale graphs with rich properties and in-database aggregation.

#apache-spark#parquet#entity-relation
Stars1.8k
Forks364
Last commit1 year ago
Genie
GenieJava

A federated Big Data orchestration service that simplifies job execution across distributed clusters by abstracting infrastructure complexity.

#data-orchestration#spark#netflixoss
Stars1.8k
Forks372
Last commit4 months ago
HiBench
HiBenchJava

A comprehensive benchmark suite for evaluating speed, throughput, and resource utilization of big data frameworks like Hadoop, Spark, and streaming engines.

#apache-spark#performance-testing#distributed-systems
Stars1.5k
Forks769
Last commit5 months ago
hdfs - A native go client for HDFS
hdfs - A native go client for HDFSGo

A native Go client library and command-line tool for HDFS that connects directly to the namenode via protocol buffers.

#distributed-storage#command-line-tool#protocol-buffers
Stars1.4k
Forks361
Last commit
Hadoop
Hadoop

A curated list of amazingly awesome Hadoop and Hadoop ecosystem resources.

#awesome-list#data-engineering#big-data
Stars1.1k
Forks254
Last commit2 years ago
camus
camusJava

LinkedIn's previous generation Kafka to HDFS pipeline for batch data ingestion.

#batch-processing#linkedin#kafka
Stars884
Forks454
Last commit5 years ago
Snakebite
SnakebitePython

A pure Python HDFS client and Hadoop minicluster wrapper for interacting with Hadoop Distributed File System.

#python-hdfs-client#python-library#distributed-storage
Stars858
Forks213
Last commit4 years ago
RHadoop
RHadoop

A collection of R packages for interacting with Hadoop ecosystems, enabling big data analysis from R.

#mapreduce#data-science#hbase
Stars760
Forks275
Last commit10 years ago
docker-spark
docker-sparkShell

A Docker image for Apache Spark on YARN, built on Hadoop and CentOS for easy deployment.

#apache-spark#containerization#cluster-computing
Stars757
Forks277
Last commit5 years ago
SparkR <img class="emoji" alt="heart" src="https://cdn.jsdelivr.net/gh/qinwf/awesome-R@3c66da6e291bcc0520b1649125b0bed750896a9a/heart.png" height="20" align="absmiddle" width="20">
SparkR <img class="emoji" alt="heart" src="https://cdn.jsdelivr.net/gh/qinwf/awesome-R@3c66da6e291bcc0520b1649125b0bed750896a9a/heart.png" height="20" align="absmiddle" width="20">R

An R package providing a lightweight frontend to use Apache Spark for distributed data processing from R.

#apache-spark#r-package#data-science
Stars642
Forks322
Last commit
OpenSOC
OpenSOC

An open-source security analytics platform that integrates big data technologies for centralized security monitoring, threat detection, and investigation.

#security-analytics#real-time-processing#behavioral-analytics
Stars582
Forks188
Last commit6 years ago
gis-tools-for-hadoop
gis-tools-for-hadoop

A collection of GIS tools for spatial analysis of big data using Hadoop, integrating with ArcGIS Geoprocessing.

#arcgis#geospatial#apache-hive
Stars523
Forks251
Last commit4 years ago
brushfire
brushfireScala

A Scala framework for distributed supervised learning of decision tree ensemble models, inspired by Google's PLANET.

#ensemble-learning#classification#scalding
Stars390
Forks43
Last commit7 years ago
spatial-framework-for-hadoop
spatial-framework-for-hadoopJava

A framework enabling spatial data analysis within Hadoop ecosystems using Hive and SparkSQL.

#geospatial#java#gis
Stars376
Forks158
Last commit13 days ago
Conjecture
ConjectureJava

A framework for building scalable machine learning models in Hadoop using the Scalding DSL.

#recommender-systems#classification#cross-validation
Stars360
Forks56
Last commit8 years ago
Apache Spot (incubating)
Apache Spot (incubating)Python

Open-source platform for network security analytics using flow and packet analysis to detect unknown threats at cloud scale.

#security-analytics#telemetry#spot
Stars356
Forks226
Last commit3 years ago
statsd-jvm-profiler
statsd-jvm-profilerJava

A JVM agent profiler that sends memory, CPU tracing, and CPU load metrics to StatsD, InfluxDB, and other backends.

#jvm-profiling#metrics-collection#cpu-profiling
Stars335
Forks82
Last commit4 months ago
packetpig
packetpigPython

An open-source big data security analytics tool that analyzes network packet capture (pcap) files using Apache Pig.

#security-analytics#intrusion-detection#data-visualization
Stars298
Forks84
Last commit8 years ago
ferry
ferryPython

Define, run, and deploy big data applications on AWS, OpenStack, and local machines using Docker.

#devops#spark#data-science
Stars254
Forks25
Last commit11 years ago
hdfs-du
hdfs-duJavaScript

Interactive visualization tool for monitoring Hadoop HDFS cluster usage and file storage efficiency.

#d3-js#javascript-infovis-toolkit#storage-optimization
Stars228
Forks82
Last commit5 years ago
pycascading
pycascadingPython

A Python wrapper for Cascading that enables building and controlling Hadoop data processing workflows entirely in Python.

#cascading#mapreduce#workflow-engine
Stars221
Forks35
Last commit6 years ago
hadoop-pcap
hadoop-pcapJava

A Hadoop library for reading and processing packet capture (PCAP) files in MapReduce jobs and Hive queries.

#mapreduce#serde#pcap
Stars216
Forks101
Last commit3 years ago
Crunch
CrunchGo

A Go-based toolkit for fast ETL and feature extraction on Hadoop, optimized for rapid development and execution.

#hive#pig#feature-extraction
Stars212
Forks16
Last commit11 years ago
inviso
invisoJavaScript

A lightweight tool for searching Hadoop jobs, visualizing performance, and viewing cluster utilization.

#job-visualization#rest-api#performance-analysis
Stars205
Forks64
Last commit3 years ago
White Elephant
White ElephantJava

A Hadoop log aggregator and dashboard for visualizing cluster utilization across users.

#jruby#dashboard#log-aggregation
Stars190
Forks61
Last commit12 years ago
Big Data For Chimps
Big Data For ChimpsRuby

A practical guide to exploratory data analytics using Hadoop with Pig and Ruby for terabyte-scale data processing.

#exploratory-analysis#data-science#terabyte-processing
Stars169
Forks63
Last commit
Page 1 of 2Next

Related Tags

Community-curated · Updated weekly · 100% open source

Found a gem we're missing?

Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.

Submit a projectStar on GitHub
3 days ago
1 year ago
10 years ago
11 years ago
#Big Data41
#Data Processing17
#Mapreduce10
#Spark9
#Java9
#Distributed Computing9
#Machine Learning8
#Hive8
#Apache Spark8
#Scala7
#Hbase6
#Hdfs6