A Python implementation of Factorization Machines for recommendation and classification tasks using stochastic gradient descent with adaptive regularization.
pyFM is a Python library that implements Factorization Machines, a model class used for supervised learning tasks like recommendation systems and classification. It estimates interactions between categorical variables in high-dimensional sparse data by combining feature engineering with factorization techniques. The library uses stochastic gradient descent with adaptive regularization as its learning method.
Data scientists and machine learning engineers building recommendation systems, click-through rate prediction models, or any application requiring modeling of feature interactions in sparse datasets.
Developers choose pyFM for its straightforward implementation of Factorization Machines in Python, seamless integration with scikit-learn workflows, and adaptive regularization that automates hyperparameter tuning during training.
Factorization machines in python
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
Implements stochastic gradient descent with adaptive regularization that automatically adjusts during training, preventing overfitting without manual hyperparameter tuning, as evidenced by the training logs showing decreasing MSE.
Designed to work with scikit-learn's DictVectorizer for easy feature encoding from dictionary data, simplifying preprocessing and fitting into existing Python machine learning workflows.
Supports both regression and classification tasks with configurable parameters, demonstrated in the README examples for rating prediction and binary classification.
Accepts categorical and real-valued features transformed into sparse matrices via DictVectorizer, mimicking libFM's approach for high-dimensional sparse data common in recommendation systems.
No mention of GPU support, multi-threading, or advanced optimizations, making it potentially slower for large-scale datasets compared to C++ libraries like libFM.
The README provides only toy and basic real-world examples; advanced usage, hyperparameter tuning guidance, and production deployment tips are lacking.
Requires data to be converted to dictionary format for DictVectorizer, adding an extra preprocessing step if data is already in numpy arrays or pandas DataFrames.