Question 1

How do I run compare-mt on my own machine translation outputs?

Accepted Answer

Install the package via pip and setup.py, then use the compare-mt command with reference and system output files. The basic example in the README compares two Slovak-English systems and generates an HTML report with analyses like word accuracy and n-gram differences.

Question 2

compare-mt vs MT-ComparEval: which is better for error analysis?

Accepted Answer

compare-mt excels at aggregate statistical analysis and command-line automation for batch processing, while MT-ComparEval is better for interactive visualization of individual examples. Choose based on whether you need automated insights or manual inspection, as referenced in the README comparison.

Question 3

Can compare-mt handle summarization system evaluation?

Accepted Answer

Yes, it supports ROUGE metrics for summarization. An example in the README demonstrates comparing two summarization systems using score types like rouge1, rouge2, and rougeL, allowing for task-specific analysis.

Question 4

What do I need to use COMET with compare-mt?

Accepted Answer

First, install unbabel-comet via pip, ensure you have a GPU for optimal performance, then pass the source file and select score_type=comet in the command. The README notes that COMET runs on XLM-R and is GPU-intensive, so hardware setup is crucial.

Question 5

How to analyze word likelihoods from language models with compare-mt?

Accepted Answer

Use the compare-ll script with likelihood files and options like bucket_type=freq or label. The README provides examples for comparing word log likelihoods across systems, which can also be applied to language models for accuracy insights.

compare-mt

What is compare-mt?

Overview

Use Cases

Best For

Related Projects

Found a gem we're missing?

Not Ideal For

Pros & Cons

Pros

Cons

Frequently Asked Questions