Showing 7 of 7 projects
Generate comprehensive data quality profiling and exploratory data analysis reports for Pandas and Spark DataFrames with a single line of code.
A portable Python dataframe library that compiles to SQL and works with over 20 backends for unified data manipulation.
An open-source library for building massively scalable machine learning pipelines on Apache Spark.
A flexible and expressive API for performing statistical data validation on dataframe-like objects.
A state-of-the-art Natural Language Processing library built on Apache Spark, offering 100,000+ pretrained models and pipelines in 200+ languages.
A curated list of awesome Apache Spark packages, libraries, and resources for data engineers and scientists.
A Python library for agile data preparation workflows that works with Pandas, Dask, cuDF, Dask-cuDF, Vaex, and PySpark.
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.