A fast, sklearn-like feature processing library for Go that generates optimized transformers from struct tags.
go-featureprocessing is a Go library that provides fast, scikit-learn-like feature preprocessing for machine learning pipelines. It allows developers to define feature transformations (like scaling, encoding, vectorization) directly on struct fields using tags, then generates optimized, allocation-free transformation code. It solves the problem of slow feature processing in Go by avoiding reflection and cgo overhead while maintaining a simple, declarative API.
Go developers building machine learning pipelines, data processing systems, or high-performance applications that require efficient feature engineering. It's particularly useful for teams transitioning from Python's scikit-learn who need similar functionality in Go.
Developers choose go-featureprocessing for its exceptional performance (100ns per sample), ease of use through struct tags, and ability to generate production-ready transformers without manual optimization. Its unique selling point is combining sklearn-like convenience with Go-native speed through code generation.
🔥 Fast, simple sklearn-like feature processing for Go
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
Achieves ~100ns per sample with zero allocations, outperforming sklearn in batch mode as shown in benchmarks, making it ideal for low-latency applications.
Allows developers to define transformations like min-max scaling or one-hot encoding directly on struct fields using tags, simplifying configuration and reducing boilerplate code.
Uses go:generate to produce optimized, allocation-free transformers with 100% test coverage and benchmarks, ensuring production-ready code without manual optimization efforts.
Transformers can be serialized to/from JSON using standard Go routines, enabling easy saving, loading, and integration with other tools for model deployment.
The reflection-based fallback is labeled as beta, lacks serialization support, and is up to 20x slower, making it unreliable for production if code generation isn't feasible.
Requires predefined Go structs; any changes to transformations or fields necessitate re-running code generation, which can hinder rapid iteration and dynamic workflows.
Admits gaps like advanced NLP support ('more advanced NLP will be added later'), so it may not cover all sklearn preprocessing options, limiting use in complex pipelines.