A Ruby library for text classification with Bayesian, LSI, logistic regression, k-NN, and TF-IDF algorithms.
Classifier is a comprehensive Ruby gem for text classification that provides five different machine learning algorithms: Bayesian classification, Latent Semantic Indexing (LSI), logistic regression, k-Nearest Neighbors, and TF-IDF. It enables developers to build applications for spam detection, sentiment analysis, content categorization, and other text processing tasks with features like streaming support and efficient persistence. The library prioritizes performance and flexibility, combining native extensions for speed with a modular design suitable for real-world, large-scale workflows.
Ruby developers building applications that require text classification, such as spam filters, sentiment analysis tools, content moderation systems, or document categorization pipelines. It is particularly suited for those handling large datasets or needing real-time classification with multiple algorithm choices.
Developers choose Classifier over alternatives because it offers five algorithms instead of just two, includes an incremental LSI implementation that is 400x faster for streaming data, and provides native C extensions for 5-50x performance improvements. Its pluggable persistence supports various backends like file, Redis, S3, and SQL, and it can train on multi-gigabyte datasets without loading all data into memory.
A general classifier module to allow Bayesian and LSI classifications.
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
Offers five distinct classifiers including Bayesian, LSI, logistic regression, k-NN, and TF-IDF, providing flexibility for varied text classification tasks like spam detection and sentiment analysis, as highlighted in the README's comparison table.
Includes native C extensions for LSI operations, delivering 5-50x speed improvements, and incremental LSI using Brand's algorithm for 400x faster streaming updates, making it efficient for large datasets.
Supports pluggable storage backends like file, Redis, S3, and SQL, allowing easy model deployment and management in production environments, as demonstrated in the persistence guide.
Enables training on multi-gigabyte datasets without loading all data into memory, and provides a command-line interface with pre-trained models for instant classification without coding, shown in the CLI examples.
Limited to classical machine learning algorithms and lacks support for modern neural networks or deep learning, which may underperform on complex NLP tasks requiring contextual embeddings or transfer learning.
As a Ruby gem, it has a smaller community and fewer integrations compared to Python-dominated ML libraries, potentially hindering collaboration, tooling, and access to pre-trained models from larger ecosystems.
Requires compiling native extensions with 'rake compile', which can be error-prone on some systems and adds an extra step for deployment, and the LGPL license might impose restrictions for some commercial uses.