Conversation
There was a problem hiding this comment.
Pull Request Overview
Adds a new translation evaluation feature and updates import paths to the new translation submodule.
- Updates test imports to use
ml_filter.translation.translate. - Introduces
translation_evaluation.pywith functions to load references, prepare inputs, and compute COMET scores. - Extends the CLI (
__main__.py) with anevaluate_translationscommand.
Reviewed Changes
Copilot reviewed 4 out of 5 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| tests/test_translate.py | Updated import path for Translator. |
| tests/conftest.py | Updated import paths for translation clients and Translator. |
| src/ml_filter/translation/translation_evaluation.py | Added new module for reference loading, input prep, and COMET evaluation. |
| src/ml_filter/main.py | Imported evaluate_translations and added a CLI subcommand for translation evaluation. |
Comments suppressed due to low confidence (1)
src/ml_filter/translation/translation_evaluation.py:33
- The docstring for
_prepare_translation_inputlists alangparameter that isn’t in the signature. Please update the docstring to match the actual parameters.
lang: Language code.
src/ml_filter/__main__.py
Outdated
| @click.option("--gold-path", required=True, help="Path to gold reference JSONL file") | ||
| @click.option("--model-name", default="Unbabel/wmt22-cometkiwi-da", help="COMET model to use") | ||
| @click.option("--languages", type=str, required=True, help="Comma-separated list of supported language codes") | ||
| @click.option("--batch-size", help="Batch size for processing translations") |
There was a problem hiding this comment.
CLI option --batch-size has no type specified; it will be parsed as a string. Consider adding type=int to ensure batch_size is passed as an integer.
| @click.option("--batch-size", help="Batch size for processing translations") | |
| @click.option("--batch-size", type=int, help="Batch size for processing translations") |
| for filename in os.listdir(data_dir): | ||
| if filename.endswith(".jsonl"): | ||
| file_path = os.path.join(data_dir, filename) | ||
| lang = filename.split("_")[5] |
There was a problem hiding this comment.
Splitting by underscore and accessing index 5 is brittle and may raise IndexError for unexpected filenames. Add a check on segment length or use a more robust parsing method.
le1nux
left a comment
There was a problem hiding this comment.
Functionalitywise I think it is all correct. There a few stylistic issues.
Did not get to review the plotting functions themselves, yet.
| batch_size: int, | ||
| model_name: str = "Unbabel/wmt22-cometkiwi-da", | ||
| ) -> None: | ||
| """Evaluate translation quality for a set of files using a COMET model. |
There was a problem hiding this comment.
Maybe explain how we actually want to evaluate it? Like 2-3 sentences about pitching the idea.
| target_texts = [] | ||
| with open(file_path, "r") as f: | ||
| for line_num, line in enumerate(f, 1): | ||
| if not line: |
There was a problem hiding this comment.
line is never a boolean, but it treated here like one. It also works with "None" but in my opinion it is bad style.
| if not line: | |
| if line is None: |
There was a problem hiding this comment.
should we maybe even raise an exception here?
There was a problem hiding this comment.
line will never be None since it is a string.
The current solution checks for empty strings.
| languages: str, | ||
| batch_size: int, | ||
| ): | ||
| """CLI entry point for evaluating translation quality.""" |
There was a problem hiding this comment.
we should document that the files in data_dir need to folllow a certain convention
|
|
||
| target_texts = _prepare_translation_input(file_path, gold_dict) | ||
|
|
||
| if target_texts: |
There was a problem hiding this comment.
| if target_texts: | |
| if len(target_texts) > 0: |
| except json.JSONDecodeError as e: | ||
| logging.warning(f"Skipping invalid line {line_num} in {file_path}: {e}") | ||
| continue | ||
| return target_texts |
There was a problem hiding this comment.
we should add a warning if len(target_texts) != len(gold_dict)
If some languages misses certain translations it introduces biases.
|
|
||
| # Step 3: Count translation scores | ||
| for sample_id, trans_score in id_to_translation_score.items(): | ||
| if sample_id in id_to_gt_quality_score: |
There was a problem hiding this comment.
what if this is not the case? Should we issue a warning?
| if sample_id in id_to_gt_quality_score: | ||
| gt_score = float(id_to_gt_quality_score[sample_id]) | ||
| trans_score = trans_score.lower() | ||
| if trans_score in score_classes: |
| if filename.endswith(".json"): | ||
| parts = filename.split("_") | ||
| if len(parts) != 8: | ||
| continue |
| languages: list[str], | ||
| output_dir: Path, | ||
| ) -> None: | ||
| """Plot histograms for translation quality results. |
There was a problem hiding this comment.
we should also mention the convention of the file names in data_dir here
| output_path=output_path, | ||
| ) | ||
|
|
||
| output_path = os.path.join(output_dir, f"{lang}_translation_quality_vs_gt_histogram.png") |
There was a problem hiding this comment.
If we want this for the paper, I would always plot as PDF
No description provided.