Showing 21 of 93 projects
A curated reference hub of tools and real-world examples for designing effective threat detection and response pipelines.
Docker image for Logstash 1.4.5 with optional Elasticsearch 1.7.0 and Kibana 3.1.2 integration.
An idiomatic Clojure machine learning library providing a unified interface for classification, regression, and unsupervised models.
A Go-based toolkit for fast ETL and feature extraction on Hadoop, optimized for rapid development and execution.
An R package providing a toolbox of pipeline-friendly functions for manipulating and querying non-tabular data stored in list objects.
A monolith codebase that powers the Iowa Environmental Mesonet's environmental data ingest, processing, and web services.
An R package providing multiple pipeline styles (operator, object, function) for readable function chaining and data transformation.
A CLI tool to send GeoJSON files to geojson.io for instant visualization and editing.
A Go library for building data processing workflows and pipelines with functional operations, cycles, and fan-out capabilities.
A visual development platform for building, deploying, and managing streaming analytics applications with multiple engine bindings.
Serverless data pipeline for crawling PDFs from the web and extracting structured data using AWS Textract.
Python utilities for parallel uploads and downloads to Amazon S3 using multipart uploads and range requests.
An open-source data pipeline that aggregates and standardizes heterogeneous public COVID-19 data from multiple global sources.
Collect, validate, and send ROS 2 data to build APIs and dashboards with reliable data pipelines.
A universal data converter that translates JSON, BSON, YAML, CSV, XML, and MT940 to any format using Go templates.
A Python framework for building and deploying serverless data and ML pipelines on AWS using AWS CDK.
A buffered output plugin for Fluentd that sends time-series data to InfluxDB.
An open-source framework for developing large-scale anomaly detection models using Apache Spark.
A modern analytics pipeline for tracking and analyzing GitHub contributions across repositories with AI-powered summaries and leaderboards.
A Logstash input plugin that reads data from DynamoDB tables via table scans and DynamoDB Streams for near real-time data processing.
A Docker container providing a complete streaming environment for experimenting with Kafka, Spark Streaming, and Cassandra.
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.