A Scala-based event data simulator that generates realistic web traffic for a fake music streaming service.
Eventsim is a Scala-based event data simulator that generates realistic, pseudo-random web traffic for a fake music streaming service. It creates user sessions with configurable parameters like arrival rates, session lengths, and state transitions, allowing developers to produce large volumes of test data for product development, correctness testing, and demos. The simulator outputs data to files or Apache Kafka, making it suitable for big data pipelines and performance testing.
Data engineers, QA engineers, and developers who need realistic synthetic event data for testing data pipelines, demos, or performance evaluations without accessing real user data.
Eventsim provides a deterministic, configurable way to generate statistically realistic event data that mimics real user behavior, with support for scalable parallel execution and integration with Apache Kafka for streaming workflows.
Event data simulator. Generates a stream of pseudo-random events from a set of users, designed to simulate web traffic.
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
Uses Poisson processes and log-normal distributions to simulate user arrivals and session times, closely mimicking real web traffic patterns as described in the README.
Allows precise control over parameters like user growth, damping factors, and state transitions via JSON configuration files, enabling tailored data generation for testing scenarios.
Relies on seeded pseudo-random number generators, ensuring the same configuration produces identical data sets every time, which is ideal for reliable testing and debugging.
Can directly output events to Apache Kafka, facilitating easy integration with modern stream processing pipelines for real-time data testing, as highlighted in the features.
The simulator is designed for a music streaming service; adapting it for other domains requires significant Scala code changes, not just configuration tweaks, as admitted in the README.
Not multi-threaded by default, which can bottleneck performance when generating large datasets, though the README suggests running parallel instances as a workaround.
Requires setting up Java 8 and Scala with sbt for assembly, adding setup overhead and potential barriers for teams not versed in the JVM ecosystem.