A distributed data pipeline service for collecting, aggregating, and dispatching large volumes of application events and log data.
Suro is Netflix's distributed data pipeline service for collecting, aggregating, and dispatching large volumes of application events and log data. It solves the problem of handling high-throughput streaming data with minimal message loss in scalable distributed systems. The service provides flexible routing and extensible architecture for custom data destinations.
Engineering teams at large-scale companies needing a reliable, scalable data pipeline for event and log processing, particularly those within the NetflixOSS ecosystem.
Developers choose Suro for its proven scalability at Netflix, best-effort delivery with retry mechanisms, and flexible architecture that integrates easily with existing data infrastructure.
Netflix's distributed Data Pipeline
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
Distributed architecture allows scaling horizontally to handle increasing loads, as explicitly stated in the features for high-volume data.
Supports large numbers of connections and high-volume streaming data flow, making it suitable for event-heavy applications like those at Netflix.
Enables dynamic dispatching to different locations with configurable rules, providing adaptability in data flow as highlighted in the README.
Simple design allows users to add custom data sinks, facilitating integration with various systems, as mentioned in the flexible architecture feature.
Implements best-effort delivery with retries and store-forward to minimize message loss, ensuring data integrity in distributed environments.
Requires manual setup of JSON configuration files for routing, sinks, and inputs, which can be cumbersome and error-prone, as shown in the runServer instructions.
Built on Java and Gradle, adding memory and performance overhead compared to lighter-weight alternatives, limiting use in non-JVM stacks.
As part of NetflixOSS, it has fewer community contributions and integrations than mainstream tools like Apache Kafka, which may hinder adoption.
Focus on reliability over speed means it may not be ideal for applications requiring immediate data availability, due to potential delays from retries.