Showing 28 of 28 projects
A powerful Python library for data analysis and manipulation, providing fast, flexible data structures.
A powerful Python library for data manipulation and analysis, providing fast, flexible data structures.
An open-source data-centric AI library for automatically detecting and fixing data quality issues in machine learning datasets.
Fuzzy string matching library for Python that calculates similarity between strings using Levenshtein Distance.
A Python library using machine learning for accurate and scalable fuzzy matching, record deduplication, and entity resolution on structured data.
A sample MySQL database with integrated test suite for testing applications and database servers.
A flexible and expressive API for performing statistical data validation on dataframe-like objects.
A Python library for visualizing missing data in pandas DataFrames using matrix, bar, heatmap, and dendrogram plots.
A Python library that fixes mojibake and other Unicode text glitches by detecting and correcting encoding mix-ups.
A Python library that fixes mojibake and other Unicode text glitches by detecting and correcting encoding mix-ups.
An open-source Python library for low-code data preparation, offering fast EDA, data cleaning, and collection from APIs and databases.
A Python library for approximate and phonetic string matching, implementing algorithms like Levenshtein distance and Soundex.
A Python library for agile data preparation workflows that works with Pandas, Dask, cuDF, Dask-cuDF, Vaex, and PySpark.
Python library providing clean, chainable functions for data cleaning and manipulation with pandas DataFrames.
An R package for reshaping and tidying data into a consistent format for easier analysis.
A cleaned and normalized time series dataset of global COVID-19 confirmed cases, deaths, and recoveries, updated daily.
A PHP library that sanitizes user input to prevent Cross-Site Scripting (XSS) attacks.
An R package for joining data frames on inexact matching using string distance, regex, numeric tolerance, and other fuzzy criteria.
A cohesive set of functions for string manipulation in R, built on stringi with consistent and user-friendly design.
A Python library that automates the tedious parts of exploratory data analysis with cleaning, feature engineering, visualization, and versioning.
A VS Code extension for visually exploring, cleaning, and transforming tabular data with automatic Pandas code generation.
Automatically builds high-performance interpretable machine learning models with minimal features using a single line of code.
A JavaScript library for sanitizing and validating objects with synchronous and asynchronous support.
A high-performance data profiler for discovering and validating complex patterns in datasets, enabling data cleaning and quality analysis.
A high-performance data profiler for discovering and validating complex patterns like functional dependencies, inclusion dependencies, and association rules.
A systematic R package for parsing strings and converting them to snake_case, camelCase, and other naming conventions.
A command-line tool for validating, cleaning, and minimizing GTFS transit feed files while preserving semantic equivalence.
A Python tool to fix invalid GeoJSON objects and files via CLI or library.
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.