Question 1

How to set up Kreuzberg for OCR on a Python project?

Accepted Answer

Install the PyPI package, then configure an OCR backend like Tesseract or PaddleOCR separately. The README specifies that Python supports Tesseract, PaddleOCR, and EasyOCR, but each requires its own installation and setup.

Question 2

Kreuzberg vs. Apache Tika for document processing?

Accepted Answer

Kreuzberg offers better performance via Rust and native bindings, plus code intelligence, but Tika is Java-based with a simpler server setup. Kreuzberg excels in polyglot environments and AI pipelines, while Tika might be easier for pure Java ecosystems.

Question 3

Does Kreuzberg work in serverless functions like AWS Lambda?

Accepted Answer

Yes via WASM binding for browsers/Cloudflare Workers, but native bindings have large binaries that may exceed size limits. The platform table shows WASM support, but performance and feature parity depend on the environment.

Question 4

How to extract functions from a JavaScript file using Kreuzberg?

Accepted Answer

Use the code intelligence feature with tree-sitter; results include functions, classes, and docstrings in ExtractionResult.code_intelligence. The README notes this works for 248 languages, including JavaScript, with semantic chunking.

Question 5

What's the difference between Kreuzberg's CLI and REST API?

Accepted Answer

The CLI is for batch processing and local use, while the REST API enables microservices deployment. Both are built from the same Rust core, but the API adds HTTP overhead for remote access.

Question 6

Can Kreuzberg handle password-protected PDFs?

Accepted Answer

Yes, it supports encrypted PDFs with single or multiple password attempts using RC4 and AES encryption. The README mentions this under key features with a link to configuration details.

Question 7

Is Kreuzberg's OCR good for handwritten text?

Accepted Answer

It depends on the backend; Tesseract and PaddleOCR have varying accuracy for handwriting, and VLM OCR via LLMs might help but requires API costs. The README lists multiple backends but doesn't specify handwriting performance.

Kreuzberg

What is Kreuzberg?

Overview

Use Cases

Best For

Related Projects

Found a gem we're missing?

Not Ideal For

Pros & Cons

Pros

Cons

Frequently Asked Questions