Open-Awesome
CategoriesAlternativesStacksSelf-HostedExplore
Open-Awesome

© 2026 Open-Awesome. Curated for the developer elite.

TermsPrivacyAboutGitHubRSS
  1. Home
  2. Database Tools
  3. OctoSQL

OctoSQL

MPL-2.0Gov0.13.0

A CLI tool and dataflow engine that lets you query and join data from multiple databases and file formats using SQL.

GitHubGitHub
5.3k stars214 forks0 contributors

What is OctoSQL?

OctoSQL is a command-line tool and dataflow engine that provides a unified SQL interface for querying, joining, and transforming data from multiple databases and file formats. It solves the problem of data fragmentation by allowing users to run SQL queries across heterogeneous sources like JSON files, CSV, Parquet, and relational databases as if they were a single database.

Target Audience

Data engineers, analysts, and developers who need to query and join data across multiple formats and databases without complex ETL pipelines, especially those working with streaming data or ad-hoc data analysis.

Value Proposition

Developers choose OctoSQL for its ability to seamlessly join data across different sources using standard SQL, its extensible plugin architecture, and its built-in streaming capabilities with strong consistency guarantees, all while offering competitive performance for direct file queries.

Overview

OctoSQL is a query tool that allows you to join, analyse and transform data from multiple databases and file formats using SQL.

Use Cases

Best For

  • Joining data from a CSV file with a PostgreSQL table using SQL
  • Querying and analyzing JSON log files directly from the command line
  • Building real-time data pipelines with streaming SQL and windowed aggregations
  • Creating a unified SQL interface for a custom application with multiple data backends
  • Performing ad-hoc data analysis across mixed file formats without loading into a database
  • Extending SQL support to new data sources via a plugin system

Not Ideal For

  • Production systems requiring sub-second query latency on terabytes of data
  • Teams needing a web-based GUI or collaborative dashboarding tools
  • Environments where downloading and managing external plugins is restricted by security policies
  • Applications heavily dependent on database-specific SQL extensions or stored procedures

Pros & Cons

Pros

Cross-Source SQL Joins

Enables JOIN operations between disparate sources like CSV files and PostgreSQL tables using standard SQL, eliminating the need for manual ETL pipelines.

Streaming Dataflow Engine

Handles infinite streams with event-time processing, watermarks, and internally consistent outputs, making it suitable for real-time aggregations and windowed queries.

Extensible Plugin System

Allows adding support for new databases (e.g., PostgreSQL, MySQL) via installable plugins, with a SQL interface for browsing and managing plugins.

Advanced Type System

Features union types, type assertions, and conversion functions (e.g., int(text)), providing robustness for heterogeneous and messy data schemas.

Query Optimization Transparency

Offers visual query plans with predicate pushdown and join strategy selection (Stream Join, Lookup Join), helping users understand and tune performance.

Cons

Plugin Management Overhead

Requires manual plugin installation and YAML configuration for databases, adding complexity compared to tools with built-in connectors.

Performance Trade-offs

Benchmarks show it's slower than DataFusion for CSV queries and relies on caching for competitive speeds, indicating limitations in raw throughput.

CLI-Only Interface

Lacks a graphical user interface or web-based IDE, making it less accessible for non-technical users or collaborative workflows.

Immature Ecosystem

The plugin repository is limited compared to established frameworks like Apache Spark, and external contributions to core code are not accepted.

Frequently Asked Questions

Quick Stats

Stars5,255
Forks214
Contributors0
Open Issues42
Last commit1 year ago
CreatedSince 2019

Tags

#stream-processing#plugin-system#redis#data-integration#sql-query-engine#query-engine#cli-tool#nosql#database-connector#dataflow#postgresql#cli#mysql#json#data-analysis#go#query#sql

Built With

G
Go

Included in

Database Tools5.1k
Auto-fetched 1 day ago

Related Projects

osqueryosquery

SQL powered operating system instrumentation, monitoring, and analytics.

Stars23,229
Forks2,566
Last commit2 days ago
TrinoTrino

Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)

Stars12,745
Forks3,577
Last commit1 day ago
SteampipeSteampipe

Zero-ETL, infinite possibilities. Live query APIs, code & more with SQL. No DB required.

Stars7,792
Forks333
Last commit1 day ago
CloudQueryCloudQuery

Data pipelines for cloud config and security data. Build cloud asset inventory, CSPM, FinOps, and vulnerability management solutions. Extract from AWS, Azure, GCP, and 70+ cloud and SaaS sources.

Stars6,380
Forks546
Last commit1 day ago
Community-curated · Updated weekly · 100% open source

Found a gem we're missing?

Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.

Submit a projectStar on GitHub