A modular deep learning framework for PyTorch to build neural networks on heterogeneous tabular data.
PyTorch Frame is a deep learning extension for PyTorch designed specifically for heterogeneous tabular data, supporting diverse column types like numerical, categorical, text, timestamp, and images. It provides a modular framework to implement and experiment with deep tabular models, aiming to democratize deep learning research on tabular data beyond traditional tree-based models.
Researchers and practitioners working with tabular data who want to apply deep learning models, especially those dealing with multi-modal data types (e.g., text, images) alongside traditional columns. It also targets developers needing to integrate tabular models with other PyTorch libraries or large language models.
PyTorch Frame offers a modular architecture that separates feature encoding, column interaction modeling, and decoding, enabling flexible experimentation and reusability. It uniquely supports integration with external APIs and models (e.g., OpenAI, Hugging Face) for text embedding, and includes benchmark datasets and implementations of state-of-the-art deep tabular models.
Tabular Deep Learning Library for PyTorch
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
Separates models into FeatureEncoder, TableConv, and Decoder components, enabling flexible experimentation and reuse as shown in the ExampleTransformer implementation.
Handles diverse column types like text, images, and embeddings, allowing deep learning on complex tabular data that traditional methods struggle with.
Seamlessly integrates with external APIs like OpenAI and Hugging Face for text embeddings, with code examples provided for each service.
Includes ready-to-use datasets and implementations of SOTA models like FTTransformer, facilitating research and comparison against GBDTs.
Deep models are significantly slower to train than GBDTs, as admitted in the benchmark, making them less suitable for time-sensitive applications.
Some models cause out-of-memory errors on larger datasets, like Trompt in the benchmark, limiting use on big data without ample resources.
Relies on third-party services for text embeddings, which can incur costs, require internet access, and add complexity to deployments.