Open-Awesome
CategoriesAlternativesStacksSelf-HostedExplore
Open-Awesome

© 2026 Open-Awesome. Curated for the developer elite.

TermsPrivacyAboutGitHubRSS
  1. Home
  2. Rust
  3. datafusion

datafusion

Apache-2.0Rust

An extensible SQL query engine written in Rust, using Apache Arrow as its in-memory format for building fast database and analytic systems.

Visit WebsiteGitHubGitHub
8.9k stars2.2k forks0 contributors

What is datafusion?

Apache DataFusion is an extensible SQL query engine written in Rust that uses Apache Arrow as its in-memory format. It provides a high-performance foundation for building custom database and analytic systems, with built-in support for SQL, DataFrames, and multiple data formats. It solves the problem of creating fast, tailored data processing engines without starting from scratch.

Target Audience

Developers and engineers building domain-specific query engines, new database platforms, data pipelines, or custom query languages. It is ideal for those needing a performant, extensible base for data-intensive applications.

Value Proposition

Developers choose DataFusion for its excellent performance, full-featured extensibility, and strong community support. Its unique selling point is providing a production-ready, customizable query engine that balances out-of-the-box functionality with deep customization capabilities.

Overview

Apache DataFusion SQL Query Engine

Use Cases

Best For

  • Building domain-specific query engines for specialized workloads
  • Creating new database platforms with custom optimizations
  • Developing high-performance data pipelines for analytics
  • Implementing custom query languages on a robust foundation
  • Accelerating SQL queries in Rust-based data systems
  • Integrating Apache Arrow-based data processing into applications

Not Ideal For

  • Teams needing a complete database with built-in storage, ACID transactions, and user management out of the box
  • Organizations without Rust development expertise or integrated into non-Rust ecosystems
  • Projects requiring immediate support for data formats beyond CSV, Parquet, JSON, and Avro without custom development
  • Applications that demand a GUI or web interface for ad-hoc querying without additional tooling

Pros & Cons

Pros

High-Performance Execution Engine

Features a columnar, streaming, multi-threaded, and vectorized execution engine optimized for fast data processing, as stated in the README's performance claims.

Extensible Architecture

Allows deep customization of data sources, query languages, functions, and operators, enabling tailored solutions for specific workloads like domain-specific query engines.

Dual Query Interfaces

Provides both SQL and DataFrame APIs for flexible querying, catering to different use cases from ad-hoc analysis to programmatic data processing.

Built-in Format Support

Includes native support for popular data formats such as CSV, Parquet, JSON, and Avro, reducing dependency on external libraries for common tasks.

Strong Community Backing

Backed by the Apache Foundation with active development, Discord community, and related projects like DataFusion Python, ensuring ongoing support and evolution.

Cons

Rust Dependency Barrier

Requires Rust knowledge for core customization and extensions, which can be a significant hurdle for teams not already invested in the Rust ecosystem.

Limited Out-of-the-Box Features

As a foundational query engine, it lacks many features of mature databases, such as built-in security, transaction management, or GUI tools, necessitating additional development.

Complex Integration for Non-Rust Projects

While Python bindings exist, integrating DataFusion into non-Rust applications may involve performance overhead and complexity, especially for real-time or embedded use cases.

Frequently Asked Questions

Quick Stats

Stars8,856
Forks2,155
Contributors0
Open Issues1,676
Last commit22 hours ago
CreatedSince 2021

Tags

#columnar-database#apache-arrow#dataframe#datafusion#sql-query-engine#query-engine#extensible-architecture#python#analytics-engine#big-data#data-processing#rust#arrow#olap#sql

Built With

A
Apache Arrow
R
Rust

Links & Resources

Website

Included in

Rust56.6k
Auto-fetched 22 hours ago

Related Projects

PathwayPathway

Python ETL framework for stream processing, real-time analytics, LLM pipelines, and RAG.

Stars63,065
Forks1,679
Last commit1 day ago
polarspolars

Extremely fast Query Engine for DataFrames, written in Rust

Stars38,703
Forks2,869
Last commit3 days ago
CocoIndexCocoIndex

Incremental engine for long horizon agents 🌟 Star if you like it!

Stars10,215
Forks801
Last commit1 day ago
pg_analyticspg_analytics

Simple, Elastic-quality search for Postgres

Stars8,913
Forks395
Last commit23 hours ago
Community-curated · Updated weekly · 100% open source

Found a gem we're missing?

Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.

Submit a projectStar on GitHub