A portable Python dataframe library that compiles to SQL and works with over 20 backends for unified data manipulation.
Ibis is a portable Python dataframe library that provides a unified API for data manipulation across more than 20 different backends including SQL databases, data warehouses, and dataframe engines. It solves the problem of Python dataframe libraries being tightly coupled to specific execution engines by compiling expressions into the backend's native language, typically SQL.
Data engineers, data scientists, and analysts who work with multiple data systems and want a consistent Python interface for data manipulation across different backends.
Developers choose Ibis because it allows them to write backend-agnostic data manipulation code that can run anywhere, combining Python's flexibility with SQL's performance while enabling seamless transitions between local development and production deployment.
the portable Python dataframe library
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
Ibis provides a consistent Python dataframe API across over 20 backends, including DuckDB and BigQuery, allowing code to run anywhere with minimal changes, as highlighted in the portability section.
Automatically translates Python operations into optimized SQL for most backends, enabling scalable performance while using Python syntax, as shown in the SQL generation examples.
With interactive mode, users get immediate feedback on data manipulations, facilitating iterative exploration, demonstrated in the penguins dataset example.
Expressions are lazily evaluated, deferring computations until necessary, which can improve efficiency in complex pipelines, as mentioned in the key features.
Not all backends support every Ibis feature uniformly; some operations may have limited support or behave differently, requiring backend-specific tuning and knowledge.
Installing and configuring Ibis with multiple backends involves managing dependencies and connections, which is more complex than using a single-purpose library like pandas.
The SQL compilation layer can introduce performance overhead compared to writing native SQL or using optimized engines directly, especially for simple queries.