Showing 24 of 60 projects
A Python library for comparing Pandas, Polars, Spark, and Snowpark DataFrames with detailed reporting and flexible matching.
Generate Word2Vec vectors for DBpedia entities from Wikipedia dumps, linking words and topics to structured knowledge.
An engine for ML/data tracking, visualization, explainability, drift detection, and dashboards, integrated with Polyaxon.
Kotlin bindings and extensions for Apache Spark, enabling idiomatic Kotlin development with data classes, lambdas, and null safety.
A fast Apache Spark testing helper library with beautifully formatted error messages for Scala applications.
A library for writing Apache Spark applications in Haskell, enabling resilient analytics that scale to thousands of nodes.
A Fish shell plugin for generating sparklines in the terminal with improved performance and additional flags.
A free, open-source alternative to Spark UI and Spark History Server with enhanced CPU and memory metrics visualizations.
A serverless proxy for Spark clusters that provides a functional programming framework and deployment model for Spark applications.
A bi-directional connector enabling Apache Spark to read from and write to Neo4j graph databases using Spark DataSource APIs.
A language server implementing the Microsoft Language Server Protocol for Ada, SPARK, and GPR project files.
An idiomatic Clojure dataframe library that runs on Apache Spark, providing a seamless interface for data processing and machine learning.
An open-source FHIR server developed in C#, supporting multiple FHIR versions for healthcare data interoperability.
A distributed Spark/Scala implementation of Isolation Forest and Extended Isolation Forest algorithms for scalable unsupervised outlier detection.
Define, run, and deploy big data applications on AWS, OpenStack, and local machines using Docker.
A Ruby wrapper for Apache Spark, enabling large-scale data processing with Ruby's expressive syntax.
A Scala/Spark library for measuring fairness and mitigating bias in large-scale machine learning workflows.
An Apache Spark framework for efficient data processing, extraction, and derivation from web archives and archival collections.
An open-source toolkit for analyzing web archives at scale using Apache Spark.
A toolset for formal specification and generation of verifiable binary parsers, message generators, and protocol state machines.
An experimental Rust client for Apache Spark Connect, providing a DataFrame API to interact with Spark clusters.
A Clojure wrapper for Deeplearning4j, providing idiomatic access to neural networks, data import, and distributed training.
A deep learning system for automatic spoken language identification from audio files using TensorFlow and Caffe.
A multi-processor, 64-bit, formally-verified general-purpose operating system for x86-64, written in SPARK/Ada.
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.