Showing 36 of 258 projects
A Lua library providing a functional, streaming interface to zlib for compression and decompression.
Optimized bit-level Reader and Writer for Go, enabling efficient reading and writing of arbitrary bit lengths.
Tools for compiling and using the Maluuba NewsQA dataset, a machine reading comprehension dataset based on CNN articles.
Framework-agnostic PHP package to load JSON of any size into Laravel lazy collections with minimal memory usage.
An experimental Go client for Apache Spark Connect, enabling Go applications to interact with Spark clusters via gRPC.
A joblib backend that enables Python parallel computing tasks to run on Apache Spark clusters.
A Node.js utility to merge multiple GeoJSON files into a single FeatureCollection, supporting both in-memory and streaming modes.
Standardizes and processes chemical molecule structures for the ChEMBL database using RDKit.
A Ruby wrapper for Apache Spark, enabling large-scale data processing with Ruby's expressive syntax.
A Python wrapper for Cascading that enables building and controlling Hadoop data processing workflows entirely in Python.
A Hadoop library for reading and processing packet capture (PCAP) files in MapReduce jobs and Hive queries.
A Go-based toolkit for fast ETL and feature extraction on Hadoop, optimized for rapid development and execution.
A Python library for building lazy data processing and machine learning workflows that handle datasets larger than memory.
Jackson extension for reading and writing CSV data as JSON-like data structures.
A monolith codebase that powers the Iowa Environmental Mesonet's environmental data ingest, processing, and web services.
A command-line tool that runs PostgreSQL queries and outputs results directly as CSV format.
A JavaScript library for transforming complex JSON objects using intuitive field path syntax and chained transformations.
An idiomatic Go wrapper for the GDAL library, providing efficient raster and vector geospatial data processing.
A fast and reliable Go library for reading, writing, and manipulating Microsoft Excel XLSX files.
A Ruby gem for reading CSV files with best practices out-of-the-box and zero configuration.
A Rust library providing helper functions for serde serialization and deserialization of containers, struct fields, and other common patterns.
A CSV parser for Swift that conforms to RFC 4180 standards for reliable CSV file handling.
Rust bindings for libbz2 providing streaming bzip2 compression and decompression.
A practical guide to exploratory data analytics using Hadoop with Pig and Ruby for terabyte-scale data processing.
A Go library for building data processing workflows and pipelines with functional operations, cycles, and fan-out capabilities.
An Apache Spark framework for efficient data processing, extraction, and derivation from web archives and archival collections.
A Spark library for reading from and writing to Google BigQuery using DataFrames and SQL.
A Go library for filtering, sanitizing, and converting data with built-in rules and functions.
A PHP library providing Python-inspired iteration tools for efficient data processing with loops and streams.
A .NET type provider for reading Excel files with static type safety and IntelliSense support.
A recursive, pattern-matching framework for transforming JSON data using JSPath queries, inspired by XSLT.
A Java library for reading, writing, and transforming public transit data in the GTFS format.
A scalable malware processing and analytics platform built on Hadoop Pig for binary data extraction and analysis.
A Python package for processing and normalizing high-dimensional morphological feature data from high-throughput cell imaging experiments.
JavaScript bindings for libosmium to work with OpenStreetMap data, suitable for small extracts and prototyping.
Operator and codec library for building real-time streaming applications on Apache Apex.
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.