A Python library for automated exploratory data analysis (EDA) with high-density visualizations and target analysis in two lines of code.
Sweetviz is a Python library that automates exploratory data analysis (EDA) by generating detailed, interactive HTML reports from datasets with just two lines of code. It visualizes target characteristics, compares datasets (like training vs. test data), and analyzes feature associations to help data scientists quickly understand their data. The tool is designed to save time during the initial data exploration phase of machine learning projects.
Data scientists, machine learning engineers, and analysts who need to quickly explore and understand datasets, especially when preparing data for modeling or comparing subsets like training and test sets.
Developers choose Sweetviz for its ability to produce comprehensive, publication-ready EDA reports with minimal code, unifying numerical, categorical, and mixed-type associations in a single visualization. Its ease of use and integration with Jupyter notebooks make it a time-saving alternative to manual plotting and analysis.
Visualize and compare datasets, target values and associations, with one line of code.
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
Creates comprehensive EDA reports with just two lines of code, as shown in the basic usage where analyze() and show_html() produce a full, self-contained HTML application in seconds.
Unifies numerical (Pearson's correlation), categorical (uncertainty coefficient), and categorical-numerical (correlation ratio) associations in a single graph, providing insights for all data types without manual integration.
Specifically designed for analyzing target variables and comparing datasets like training vs. test data, with dedicated functions such as compare() and compare_intra() for intra-set splits.
Supports Jupyter and Colab notebooks via show_notebook(), allowing embedded reports with customizable width, height, and scaling, as detailed in the notebook-specific parameters.
The README documents frequent installation problems like ModuleNotFoundError, requiring troubleshooting steps such as uninstalling/reinstalling and checking for script naming conflicts, which can delay setup.
Target analysis only works with boolean or numerical features, restricting its use for categorical target variables common in classification tasks, as admitted in the analyze() function parameters.
Pairwise association analysis runs in quadratic time (n^2), and the library warns about thresholds requiring explicit parameter overrides for large feature sets, making it slow and cumbersome for big datasets.