Question 1

How to install nlg-eval on Windows without Bash?

Accepted Answer

The README notes that on Windows, you might need to manually locate the nlg-eval script if it's not found in the PATH. Check the linked GitHub issue for details on running the setup in alternative terminals.

Question 2

nlg-eval vs. SacreBLEU for machine translation evaluation: which should I use?

Accepted Answer

nlg-eval includes multiple metrics like BLEU and METEOR in one package, while SacreBLEU focuses on standardized BLEU scoring. Use nlg-eval for broader metric comparisons, but SacreBLEU if you need strict reproducibility for BLEU alone.

Question 3

How to evaluate a single generated sentence with nlg-eval?

Accepted Answer

Use the compute_individual_metrics function from the functional API, passing a list of reference strings and a hypothesis string, as demonstrated in the README for single-example evaluation.

Question 4

Why does nlg-eval require Java to be installed?

Accepted Answer

It depends on external tools like METEOR that are implemented in Java, so Java 1.8 or higher is a prerequisite for full functionality.

Question 5

Can I use nlg-eval in a Docker container for CI/CD pipelines?

Accepted Answer

Yes, by setting the NLGEVAL_DATA environment variable to a mounted volume, you can share pre-downloaded models across containers, as mentioned in the external data directory section.

Question 6

How to fix CIDEr scores being zero when evaluating small datasets?

Accepted Answer

As per the README, you need to apply patches from vrama91/coco-caption to switch IDF modes, since the default corpus mode fails for single or few examples.

NLG-eval

What is NLG-eval?

Overview

Use Cases

Best For

Found a gem we're missing?

Not Ideal For

Pros & Cons

Pros

Cons

Frequently Asked Questions