A lightweight Python tool for generating rich summary statistics of pandas and Polars dataframes directly in the console.
Skimpy is a Python library that provides detailed, formatted summary statistics for pandas and Polars dataframes directly within the console or interactive Python environment. It enhances exploratory data analysis by offering a more comprehensive and visually appealing alternative to basic methods like `df.describe()`, helping data scientists quickly understand their data's structure and characteristics.
Data scientists and analysts working with pandas or Polars dataframes who need quick, detailed summaries during exploratory data analysis in interactive environments like Jupyter notebooks or the Python console.
Developers choose Skimpy because it delivers immediate, actionable insights with rich, color-coded output tailored to different data types, all without leaving the console, making it faster and more visually informative than standard describe methods.
skimpy is a light weight tool that provides summary statistics about variables in data frames within the console.
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
Leverages the Rich library to display color-coded, formatted tables in the console, as shown in the example with separate sections for numeric, categorical, datetime, and other data types, enhancing readability during exploratory analysis.
Works with both pandas and Polars dataframes, adapting summaries to each library's data types, which is explicitly mentioned in the key features and allows flexibility across popular Python data tools.
Provides tailored metrics per column type, including missing values, percentiles, histograms, and uniqueness for numerics, categories, strings, and datetimes, going far beyond basic df.describe().
Offers a simple skim() function and a generate_test_data() utility for immediate experimentation, minimizing setup time in interactive environments like Jupyter notebooks.
The README explicitly recommends setting data types manually for 'richer statistical summaries,' indicating that automatic detection can be inaccurate or less detailed, potentially leading to suboptimal outputs.
Designed solely for interactive console use; it lacks built-in export options to files or integration with web-based dashboards, making it unsuitable for automated reporting workflows.
Focuses on predefined summary formats without obvious configuration for adjusting metrics, thresholds, or output styling, which may restrict advanced users who need tailored summaries.