An intelligent data search and enrichment library for machine learning that automatically finds and adds relevant external features to ML pipelines.
Upgini is an intelligent data search and enrichment library for machine learning that automatically finds and adds relevant external features from hundreds of public, community, and premium data sources to ML pipelines. It solves the problem of manual and time-consuming external data integration by providing a low-code, automated solution that identifies features which actually improve model accuracy.
Data scientists, ML engineers, and researchers working on supervised ML tasks with tabular data who need to enhance model performance by incorporating external data sources without extensive manual effort.
Developers choose Upgini because it automates the entire feature search and enrichment process, uses advanced techniques like LLMs for feature generation, validates accuracy uplift, and integrates seamlessly with existing Scikit-learn pipelines, saving significant time and boosting model accuracy effectively.
Data search & enrichment library for Machine Learning → Easily find and add relevant features to your ML & AI pipeline from hundreds of public and premium external data sources, including open & commercial LLMs
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
Upgini identifies features that genuinely improve model accuracy, not just correlated variables, as emphasized in the README's accuracy uplift validation and automated feature search.
It accesses external data from 239 countries with up to 41 years of history, including weather, economic indicators, and demographics, providing broad enrichment options as listed in the data sources table.
The library offers a Scikit-learn-compatible interface, making it easy to integrate into existing ML pipelines without major changes, as highlighted in the features.
Utilizes LLMs, GNNs, and RNNs to generate optimal feature sets from connected data sources, enhancing predictive power beyond raw data, as described in the automated feature generation section.
Upgini requires sending data to its servers for processing, which can raise privacy concerns and necessitates an internet connection, as indicated by the need for API keys and cloud-based search.
It is primarily designed for supervised ML on tabular data, so it's not suitable for other data types like images or unstructured text without specific support, as noted in the task descriptions.
While free for basic use, premium features like phone and email enrichment require registration and an API key, indicating a freemium model with limitations, as mentioned in the 'Open up all capabilities' section.