A Python library that automates the tedious parts of exploratory data analysis with cleaning, feature engineering, visualization, and versioning.
Dora is a Python library that automates the painful parts of exploratory data analysis. It provides convenience functions for data cleaning, feature selection and extraction, visualization, model validation partitioning, and data versioning. It is designed to work alongside common Python data tools like pandas, scikit-learn, and matplotlib.
Data scientists, analysts, and machine learning engineers who regularly perform exploratory data analysis in Python and want to reduce repetitive coding tasks.
Developers choose Dora because it consolidates multiple EDA steps into a single, cohesive toolkit, saving time and ensuring reproducibility through built-in data versioning and transformation logging.
Tools for exploratory data analysis in Python
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
Provides convenient functions like impute_missing_values() for average imputation and scale_input_values() for standardization, reducing pandas boilerplate code as shown in the Cleaning section.
Offers methods for feature removal, one-hot encoding via extract_ordinal_feature(), and custom transformations with extract_feature(), streamlining EDA workflows in the Feature Selection & Extraction examples.
Allows saving snapshots with snapshot() and reverting with use_snapshot(), while logging all transformations to ensure reproducibility, demonstrated in the Data Versioning section.
Designed to work alongside pandas, scikit-learn, and matplotlib, making it easy to incorporate into existing Python data science pipelines, as stated in the Summary.
The README specifies installation from GitHub for the latest code, bypassing PyPI, which complicates dependency management and lacks the stability of official package releases.
Only supports basic plotting with plot_feature() and explore(), lacking interactive or advanced graphing options that might be needed for in-depth analysis, as seen in the Visualization section.
Beyond a few code snippets in the README, there is minimal documentation, no API reference, or tutorials, which could hinder adoption and troubleshooting for new users.