A Python package for generating synthetic tabular and time-series data using state-of-the-art generative models like GANs and Gaussian Mixtures.
YData Synthetic is a Python library that generates artificial tabular and time-series data using state-of-the-art generative models like GANs and Gaussian Mixtures. It solves the problem of data scarcity, privacy concerns, and bias by creating statistically similar synthetic datasets that can be used for machine learning development and data sharing without exposing real sensitive information.
Data scientists, machine learning engineers, and researchers who need to generate synthetic data for privacy compliance, dataset balancing, augmentation, or model training without access to real sensitive datasets.
Developers choose YData Synthetic for its comprehensive set of generative models, including specialized architectures for tabular and time-series data, a user-friendly Streamlit UI for low-code workflows, and the ability to quickly generate data without GPU dependencies using the Gaussian Mixture model.
Synthetic data generators for tabular and time-series data
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
The Streamlit app provides a guided workflow for training and sampling, making synthetic data generation accessible without deep coding, as emphasized in the features for quick, low-code experiences.
Includes multiple GAN architectures like CTGAN, WGAN, and DRAGAN, plus specialized models for tabular and time-series data, offering flexibility for various use cases, as listed in the key features.
The fast Gaussian Mixture Model enables quick synthetic data generation without requiring GPU resources, ideal for environments with limited hardware, highlighted as a key feature for quickstarting.
CTGAN is specifically designed for high-quality synthetic tabular data with conditional features, addressing challenges in tabular synthesis, as noted in the features for conditional architectures.
Supports models like TimeGAN and DoppelGANger for generating synthetic sequential data, extending utility beyond tabular datasets, with examples provided for stock and FCC MBA datasets.
Focuses solely on tabular and sequential data, lacking built-in support for other common AI data types like images or text, which restricts its applicability in broader machine learning projects.
The Streamlit app requires separate installation with pip install ydata-synthetic[streamlit] and does not support Jupyter Notebooks, adding extra steps and potential compatibility issues for users.
Advanced GAN models like CTGAN and TimeGAN demand significant computational resources and training time, which may be prohibitive for users without access to GPUs or large-scale infrastructure.
Heavily promotes YData Fabric for end-to-end solutions, which could lead to vendor lock-in or reduced focus on enhancing the open-source library's standalone capabilities and documentation.