How to migrate from KCL 2.x to 3.x in Python?

Follow the Amazon Kinesis migration guide, as KCL 3.x has identical programming interfaces but requires updates to configuration, dependencies, and IAM permissions for new DynamoDB tables. Ensure to handle graceful lease handoff and optimized RCU usage.

Amazon KCLpy vs using AWS Lambda with Kinesis triggers?

KCLpy is for long-running, stateful consumers with checkpointing and distributed management, ideal for complex processing. Lambda is serverless and stateless, better for simple, event-driven functions without state persistence.

How to handle errors in my KCLpy record processor?

Implement error handling in the process_records method, and use checkpointing strategically in shutdown hooks like shutdown_requested. Design for idempotency to manage duplicate records during failures, as recommended in release notes.

What are the cost implications of using KCLpy with DynamoDB?

KCLpy uses DynamoDB for lease management and checkpointing, so costs depend on read/write capacity. KCL 3.x optimizes RCU usage with global secondary indexes, but monitor DynamoDB tables and adjust provisioning based on load.

Is KCLpy suitable for low-latency real-time processing?

Yes, but the Java daemon and checkpointing may introduce overhead. Optimize by adjusting properties like idleTimeBetweenReadsInMillis (minimum 200ms default) to balance latency and throughput, as noted in release notes.

How to scale KCLpy applications automatically?

KCLpy automatically scales by reassigning leases based on worker utilization metrics and stream throughput. Configure properties for load balancing and ensure IAM permissions for DynamoDB and CloudWatch metrics in KCL 3.x.

amazon-kinesis-client-python — Python Interface for Kinesis Client

What is amazon-kinesis-client-python?

Amazon Kinesis Client Library for Python (KCLpy) is a Python interface to the Amazon KCL MultiLangDaemon that enables developers to build robust, distributed applications for processing streaming data from Amazon Kinesis. It abstracts away the complexities of distributed computing, such as load balancing, fault tolerance, and checkpointing, allowing developers to focus solely on implementing their record processing logic. The library leverages a Java-based daemon to provide language-agnostic, battle-tested stream processing infrastructure.

Target Audience

Python developers and data engineers building scalable, fault-tolerant applications that need to consume and process real-time streaming data from Amazon Kinesis Data Streams. It is suited for teams requiring reliable distributed stream processing without managing the underlying infrastructure complexities.

Value Proposition

Developers choose KCLpy because it provides a production-ready, managed solution for distributed stream processing by leveraging the battle-tested Amazon KCL for Java, ensuring high reliability and scalability. Its unique selling point is the abstraction of complex tasks like shard management, checkpointing, and load balancing, offering a simple Python interface while maintaining the robust features of the underlying Java library.

Overview

Amazon Kinesis Client Library for Python

Use Cases

Best For

Building fault-tolerant Python applications that process high-volume streaming data from Amazon Kinesis with automatic recovery from instance failures.
Implementing distributed data processing pipelines where automatic load balancing across multiple consumer instances is required to handle variable stream volumes.
Developing real-time analytics or event-driven microservices in Python that need reliable checkpointing to ensure exactly-once or at-least-once processing semantics.
Migrating or integrating Python-based stream processors into an existing Amazon KCL ecosystem that uses the MultiLangDaemon for language-agnostic processing.
Scaling consumer applications dynamically in response to shard splits and merges in Kinesis streams without manual intervention in shard lifecycle management.
Reducing operational overhead for Python teams by offloading infrastructure concerns like lease management and graceful handoffs to a managed library.

Not Ideal For

Projects not using Amazon Kinesis or on other cloud providers, due to tight AWS integration.
Environments where Java runtime is unavailable or teams prefer pure Python solutions to avoid Java dependencies.
Simple, single-instance streaming tasks where the overhead of distributed management is unnecessary.
Rapid prototyping requiring minimal setup, as configuration involves properties files and Java installation.

Pros & Cons

Pros

Managed Distributed Processing

Automatically handles load balancing, fault tolerance, and reacts to stream volume changes, as highlighted in the Key Features, reducing operational overhead.

Reliable Checkpointing

Manages checkpointing of processed records to ensure data integrity and recovery, which is crucial for fault-tolerant applications, as stated in the README.

Language-Agnostic Infrastructure

Leverages a battle-tested Java-based MultiLangDaemon, allowing Python developers to benefit from robust KCL features without writing Java code, per the Philosophy section.

Graceful Lease Handoff

In KCL 3.x, minimizes data reprocessing during lease reassignments by allowing complete checkpointing before transfer, improving efficiency as described in release notes.

Cons

Complex Setup and Dependencies

Requires Java installation and downloading jars via setup commands, with environment variables like KCL_MVN_REPO_SEARCH_URL, adding initial configuration overhead.

Vendor Lock-in to AWS

Tightly coupled with Amazon Kinesis and AWS services like DynamoDB for checkpointing, limiting portability and increasing dependency on AWS ecosystem.

Breaking Changes and Migration Hassles

Release notes show breaking changes, such as dependency incompatibilities with JDK 8 in version 3.0.2, requiring careful migration planning and updates.

amazon-kinesis-client-python

What is amazon-kinesis-client-python?

Overview

Use Cases

Best For

Not Ideal For

Pros & Cons

Pros

Cons

Frequently Asked Questions

Related Projects

Found a gem we're missing?

amazon-kinesis-client-python

What is amazon-kinesis-client-python?

Overview

Use Cases

Best For

Not Ideal For

Pros & Cons

Pros

Cons

Frequently Asked Questions

Related Projects

Found a gem we're missing?