Question 1

How does Gobblin compare to Apache NiFi for data ingestion?

Accepted Answer

Gobblin is optimized for large-scale, batch-oriented ingestion with built-in data management, while NiFi offers a GUI-driven, real-time flow-based approach. Gobblin excels in petabyte-scale reliability, whereas NiFi is better for interactive, low-latency scenarios.

Question 2

Can Gobblin handle real-time streaming from Kafka?

Accepted Answer

Yes, Gobblin supports stream execution and is commonly used for Kafka to data lake ingestion. However, it prioritizes reliability and scale, so latency might be higher compared to dedicated stream processors like Flink.

Question 3

How to set up Gobblin for GDPR compliance deletions?

Accepted Answer

Gobblin includes compliance management features; configure retention policies and deletion jobs using its lifecycle management capabilities. The documentation provides guidance on enforcing GDPR deletions on HDFS or ADLS.

Question 4

What are the performance benchmarks for Gobblin at scale?

Accepted Answer

Specific benchmarks aren't in the README, but production use at companies like LinkedIn indicates proven performance. For metrics, refer to community case studies or the list of powered-by companies.

Question 5

Is Gobblin suitable for cloud-native deployments on AWS?

Accepted Answer

Yes, it supports cloud storage like S3 and can be deployed in cloud environments. However, integration with cloud-native services may require additional configuration and expertise.

Question 6

How to extend Gobblin with custom sources or sinks?

Accepted Answer

Gobblin is extensible via its API; implement custom connectors by extending provided classes. The sample project and documentation offer examples for building and integrating new sources.

Question 7

Gobblin or Airflow: which one for workflow orchestration?

Accepted Answer

Gobblin is not a general-purpose workflow system; it focuses on data integration tasks. Use Airflow for broader orchestration and integrate Gobblin for specific data management jobs like ingestion or replication.

Gobblin

What is Gobblin?

Overview

Use Cases

Best For

Related Projects

Found a gem we're missing?

Not Ideal For

Pros & Cons

Pros

Cons

Frequently Asked Questions