A Rust DataFrame and data engineering library with PySpark/SQL-like syntax, built for business data pipelines with Microsoft stack integration.
Elusion is a Rust-based DataFrame and data engineering library that provides PySpark and SQL-like syntax for data transformations. It solves the problem of building efficient data pipelines for business analytics by offering a familiar API with high performance and extensive connector support for Microsoft ecosystems and other data sources.
Data engineers and data analysts working with business data pipelines, especially those in Microsoft-centric environments who need to process CSV, Excel, Parquet, and other common formats with reliable performance.
Developers choose Elusion for its combination of Rust's performance, familiar DataFrame syntax, and deep Microsoft stack integration, all while maintaining flexibility in query construction and offering built-in pipeline orchestration without external tools.
DataFrame / Data Engineering Library with familiar syntax like ones we love: PySpark and SQL, focused on user experience, spead and accuracy.
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
Provides native connectors for Fabric OneLake, SharePoint, and Azure Blob Storage, making it seamless for businesses already invested in Microsoft ecosystems, as detailed in the README's feature list.
Offers both fluent chainable operations and raw SQL syntax, allowing queries to be built in any sequence without enforcing order, aligning with the library's core philosophy of developer flexibility.
Includes a scheduler with intervals from 1 minute to 30 days and a medallion architecture framework for bronze/silver/gold pipelines, eliminating dependency on external orchestration tools like Airflow.
Features high-performance copy operations with streaming, batching, and compression, plus Redis caching for 6-10x improvements in repeated queries, as noted in the caching section.
Built on DataFusion's single-node query engine, it cannot scale horizontally for distributed data processing, limiting usability for very large datasets as admitted in the README.
Relies on Cargo feature flags for modularity, but missing features cause runtime errors (e.g., 'API feature not enabled'), adding setup complexity and potential confusion during development.
Automatically normalizes column names to lowercase and replaces spaces with underscores, which can break queries with special characters, as warned in the BREAKAGE section about group_by_all().
elusion is an open-source alternative to the following products:
Pandas is a fast, powerful, and flexible open-source data analysis and manipulation library for Python, built on top of NumPy.
Polars is a fast, multi-threaded DataFrame library implemented in Rust with Python and Node.js bindings, designed for efficient data manipulation and analysis on large datasets.
PySpark is the Python API for Apache Spark, a unified analytics engine for large-scale data processing, enabling distributed data processing with Python.