LinkedIn's previous generation Kafka to HDFS pipeline for batch data ingestion.
Camus is LinkedIn's previous generation data pipeline that transfers data from Kafka message queues to Hadoop HDFS in batch mode. It solves the problem of making streaming data available for batch analytics by providing a reliable, scalable ingestion pipeline between these two distributed systems.
Data engineers and infrastructure teams who need to move data from Kafka to Hadoop for batch processing, analytics, or data warehousing purposes.
Developers chose Camus for its proven reliability at LinkedIn's scale, straightforward integration with existing Kafka and Hadoop ecosystems, and its specialized focus on batch-oriented data transfer between these specific technologies.
LinkedIn's previous generation Kafka to HDFS pipeline.
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
Designed and used by LinkedIn for large-scale data transfer, ensuring high reliability in production environments as per its focus on scalability.
Directly connects Kafka topics to HDFS with automated partition management and offset tracking, simplifying setup for batch ingestion.
Tailored for scheduled batch processing, making it efficient for analytics workloads that don't require real-time data access.
Seamlessly integrates with HDFS and Hadoop tools, leveraging existing infrastructure for data storage and processing.
LinkedIn has phased out Camus in favor of Gobblin, meaning no future updates, bug fixes, or official support.
Cannot handle real-time streaming data, which is a significant drawback for modern data pipelines that often require low latency.
Only supports output to Hadoop HDFS, making it inflexible for environments moving to cloud storage or other file systems.
Users are forced to migrate to Gobblin or other tools, adding complexity and potential downtime to existing pipelines.