An R package that automates exploratory data analysis and data treatment with one-line reports and visualizations.
DataExplorer is an R package that automates exploratory data analysis (EDA) and data treatment tasks. It provides a suite of functions to quickly profile datasets, generate comprehensive visualizations, and perform basic feature engineering, all designed to accelerate the initial phase of data analysis.
Data scientists, statisticians, and analysts working in R who need to efficiently explore and understand new datasets before modeling or further analysis.
It dramatically reduces the time and code required for initial data exploration by offering one-line reporting and a consistent API for common EDA tasks, making it easier to maintain reproducibility and focus on insights.
Automate Data Exploration and Treatment
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
The create_report() function generates comprehensive HTML summaries of data structure, missing values, and correlations with a single call, as shown in the airquality and diamonds examples, drastically reducing initial EDA time.
Seamless plotly support allows for hover, zoom, and pan in visualizations via plotly = TRUE, enhancing exploratory depth without extra coding, as demonstrated in the interactive report feature.
Functions like introduce() and plot_intro() provide quick insights into dimensions, types, and missingness, evidenced by the detailed output tables and plots for datasets like airquality.
Offers a wide range of static and interactive plots—from histograms to correlation heatmaps—through consistent APIs, such as plot_bar() for frequency distributions and plot_correlation() for relationships.
The README admits known limitations with plotly conversion, where geom_label and facet_wrap may not render correctly, potentially degrading interactive plot quality for complex visualizations.
As an R package, it's inaccessible to data scientists using Python or other tools, limiting cross-ecosystem adoption and integration in mixed-language teams.
While it handles grouping and dummifying, transformations are elementary compared to dedicated wrangling libraries, lacking support for advanced operations like time-series feature extraction or ML-specific preprocessing.