A Python library for agile data preparation workflows that works with Pandas, Dask, cuDF, Dask-cuDF, Vaex, and PySpark.
Optimus is a Python library that provides agile data preparation workflows, enabling users to load, process, plot, and create machine learning models using various backends like Pandas, Dask, cuDF, Dask-cuDF, Vaex, and PySpark. It solves the problem of engine-specific code by offering a unified API, making data tasks consistent and scalable across different computing environments.
Data scientists, data engineers, and analysts who need to perform data cleaning, transformation, and analysis across multiple data processing frameworks without rewriting code.
Developers choose Optimus for its opinionated, easy-to-use API that reduces the learning curve and allows seamless switching between local and distributed computing engines, enhancing productivity and workflow agility.
:truck: Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
Enables writing code once that runs on Pandas, Dask, cuDF, Vaex, or Spark, facilitating seamless scaling from local laptops to remote GPU clusters without rewriting.
Includes over 100 functions for string manipulation, date processing, and data cleaning, reducing the need for custom code and speeding up common tasks.
Provides out-of-box profiling and cleaning functions, making data exploration and issue resolution more efficient for large datasets.
Supports loading from and saving to various formats (CSV, JSON, Parquet) and databases (Oracle, MySQL), simplifying data pipeline integration.
Installing different backends requires separate pip commands (e.g., pyoptimus[dask]), which complicates dependency management and can lead to installation issues.
Only compatible with Python 3.7 or 3.8, restricting adoption for teams using newer Python releases like 3.9 or above.
The unified API may not expose all advanced features of underlying engines, potentially leading to performance trade-offs or missing engine-specific optimizations.