An Automated Machine Learning Python package for tabular data with feature engineering, hyperparameter tuning, explanations, and automatic documentation.
MLJAR AutoML is a Python package that automates the machine learning workflow for tabular data. It handles everything from data preprocessing and feature engineering to model training, hyperparameter tuning, and generating detailed documentation. It solves the problem of time-consuming manual ML pipeline construction by providing a streamlined, automated approach with built-in explainability.
Data scientists, ML engineers, and analysts working with structured data who want to accelerate model development, ensure reproducibility, and gain insights into model behavior without sacrificing transparency.
Developers choose MLJAR AutoML for its balance of automation and transparency, offering multiple tailored modes, extensive model explanations, and automatic report generation, all while supporting a wide range of algorithms and fairness-aware training.
Python package for AutoML on Tabular Data with Feature Engineering, Hyper-Parameters Tuning, Explanations and Automatic Documentation
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
Handles end-to-end workflow from data preprocessing to hyperparameter tuning, as evidenced by the automated pipeline that includes feature engineering and algorithm selection.
Offers Explain, Perform, Compete, and Optuna modes for different use cases, such as data exploration or competition-level tuning, with adapted validation strategies.
Generates detailed Markdown reports with SHAP plots, decision tree visualizations, and feature importance, enhancing interpretability for each model trained.
Supports bias mitigation techniques like sample weighting and smart grid search for sensitive features, enabling ethical ML practices in sensitive applications.
Automatically saves all models and detailed reports in the results_path directory, leading to significant disk usage, especially for large datasets or long runs.
Primarily designed for structured data, making it unsuitable for tasks involving images, audio, or unstructured text without manual preprocessing.
Requires Python >=3.9 and compatible NumPy, which can cause integration issues in environments with older or restricted Python setups.