A Naive Bayes machine learning implementation in Elixir with multiple models and storage options.
Simple Bayes is a Naive Bayes classifier implementation for Elixir, providing a simple yet powerful probabilistic approach to text categorization and classification tasks. It enables developers to build scalable machine learning models for applications like spam detection, content categorization, and automatic medical diagnosis by offering multiple algorithm models and flexible storage options.
Elixir developers building text classification systems, such as spam filters, content moderation tools, or document categorization pipelines, who need a scalable, configurable Naive Bayes implementation.
Developers choose Simple Bayes for its clean, efficient implementation that balances simplicity with practical features like multiple storage backends (memory, file system, Dets), configurable models (multinomial, binarized multinomial, Bernoulli), and text processing enhancements (stop word filtering, TF-IDF, stemming).
A Naive Bayes machine learning implementation in Elixir.
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
Supports multinomial, binarized multinomial, and Bernoulli models, allowing developers to tailor the classifier to specific text scenarios, as detailed in the feature matrix.
Offers in-memory, file system, and Dets storage options, enabling scalable persistence for different data sizes, with performance trade-offs explicitly noted in the README.
Includes stop word filtering, additive smoothing, TF-IDF weighting, and optional stemming, all configurable to enhance classification accuracy, as shown in usage examples.
Allows keyword weighting during training to emphasize important features, demonstrated with weighted terms in the initial code snippet.
Only implements Naive Bayes classifiers, making it unsuitable for projects requiring other machine learning algorithms like decision trees or clustering.
Requires an optional dependency on the Stemmer library for word stemming, adding complexity and maintenance overhead beyond the core package.
The README admits that performance varies with data size—file system with base64 is faster for less data, while Dets excels with more—requiring manual optimization.