A scalable real-time search platform for streaming data using Apache Storm, Kafka, and Lucene.
Straw is a real-time streaming search platform that allows users to register Lucene-style queries and receive instant alerts when those queries match incoming data from streams like Twitter. It solves the problem of monitoring high-velocity text data for specific patterns or keywords at scale. The platform uses a distributed architecture to handle both large volumes of data and many concurrent queries efficiently.
Data engineers and developers building real-time alerting systems, social media monitoring tools, or any application requiring low-latency search on continuous data streams.
Developers choose Straw for its scalable, open-source architecture that supports full Lucene query capabilities and offers both cloud deployment and local development options. Its benchmarking utilities and support for multiple search engines (Luwak and Elasticsearch) provide flexibility and performance insights.
A platform for real-time streaming search
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
Uses Apache Storm to parallelize search workloads, enabling efficient handling of high-volume data streams and many concurrent queries, as described in the architecture section.
Supports both Elasticsearch Percolators and Lucene-Luwak, providing flexibility for performance benchmarking and choice based on specific use cases, as highlighted in the features.
Includes automated AWS provisioning scripts for production scaling and Docker-based setup for local development, making it adaptable from testing to full deployment.
Enables full Lucene queries on live data streams like Twitter, allowing for immediate alerts and complex pattern matching in real-time scenarios.
The local demo requires running scripts multiple times due to acknowledged bugs, and AWS deployment involves numerous prerequisites and manual configuration steps, increasing initial effort.
Relies on older versions like Apache Storm 0.9.5 from 2015, which may lack modern features, security updates, and community support, making integration with current systems challenging.
README notes issues like scripts needing to be run twice, and the project hasn't seen recent updates, indicating potential instability and sparse troubleshooting resources.