Showing 12 of 48 projects
An open-source toolkit for analyzing web archives at scale using Apache Spark.
A scalable malware processing and analytics platform built on Hadoop Pig for binary data extraction and analysis.
A collection of libraries for large-scale data processing in Hadoop ecosystems, including Spark, Pig, and incremental MapReduce.
A Go database/sql driver for Apache Avatica server, enabling Go applications to connect to Phoenix and other Avatica-backed databases.
An R extension for distributed computing using Apache Hive, enabling HQL queries in R and R functions in Hive.
A Scalding library for machine learning and statistical analysis, featuring Mahout vector integration, K-Means clustering, and Naive-Bayes classifiers.
A collection of interactive Jupyter notebooks for learning Hadoop, Spark, and MapReduce with hands-on tutorials and demos.
A production-grade HBase ORM library for clean, fast, and fun object-oriented data access, also compatible with Google Cloud Bigtable.
Mozilla's utility library for Hadoop, HBase, Pig, and related big data technologies.
A simple utility for testing Apache Hive scripts locally without requiring Java development skills.
A unit test framework for Hive scripts that provides an embedded Hive environment with Derby database and HiveThriftService.
Code samples demonstrating how to use popular applications on Amazon Elastic MapReduce (EMR).
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.