Apertium - Phrase-based MT system

by Himanshu Choudhary

project
gsoc
gsoc2022
apertiumPhraseBasedMTSystem

Apertium - Phrase-based MT system

The previously developed Machine Translation systems for Sumerian face sparsity issues because of their low-resource. The advanced translation techniques work better with the higher availability of data, whereas for low-resource languages, Rule Based/Symbolic Machine translation is a good alternative which can be used to improvise the accuracy. The objective is to build a sumerian-english machine translation system using one of the widely used machine translation platforms Apertium. This project also intended to compare the translation accuracy of both Neural Netowrk based and Rule Based machine translation System.

The Phrase Based Machine Translation for Sumerian-English can be found here -

The Sumerian and English and morph dictionaries can be found here -

Previous Work


The SOTA neural netwrok based translation can be found here -

The whole integrated translation pipeline with pos and ner tagger can be found here -

Objectives and Deliverables


:heavy_check_mark: –> Completed Tasks :white_check_mark: –> Ongoing Tasks

# Status Objectives Associated Deliverables issue(s)
1 :heavy_check_mark: createing sux(sumerian) morphological analyser morphological dict with default tags for sumerian morph representation, basic morph analyzer
2 :heavy_check_mark: Updating previous eng morph dict with sumeiran words morphological english dict for sumerian words translation and/or handling bi-dictioanry results
3 :heavy_check_mark: The Bi-lingual dictionary and rules for the sumerian to english translation The sux-eng.dix file and .rtx file containing the trasnfer rules (basic)
4 :heavy_check_mark: Integrated apertium sux pipeline and testing the integrated machine translation pipeline and testing comparision between nn based (will be added later) and rule based model
5 :heavy_check_mark: updateding transfer rules and SVO reordering, sumerian to english the final compact updated Transfer rules with and post processing if required
6 :heavy_check_mark: Integrated pipeline and NMT Comparision Developing a notebook interface to try out both Neural network and Rule based engine for (sentence or file), with comparison

Tentative timeline

:heavy_check_mark: –> Completed Tasks :white_check_mark: –> Ongoing Tasks :raised_hands: –> Work Demonstration

Week Objectives Deliverables
1 - 2 :heavy_check_mark: learning apertium (The RBMT toolkit) :heavy_check_mark: the basic functanality to use apertium
3 - 4 :heavy_check_mark: learning sumerian morphology :heavy_check_mark: Basic grammer of Sumerian and translations
5 - 8 :heavy_check_mark: creating basic sumerian analyzer , bi-lingual dict, rule trasnfer and testing :heavy_check_mark: The integrated translation pipeline sux-eng
9 - 11 :heavy_check_mark: updating trasnfer rule (verbal and noun phrase) and re-ordering SVO :heavy_check_mark: the improved machine translation results
11 - 15 :heavy_check_mark: Working on integrated pipeline and Neural Network based comparision :heavy_check_mark: Final robust pipeline and model comparisions

Results & Evaluation


The evaluation is done using the BLEU metric on the dev set as mentioned above -

Machine Translation System Mean Median
rule based 19.156 20.4517
neural network 18.868 6.881

The mean and median scores for both Rule Based and NMT Engine
with weightage of (0.75,0.25,0,0) over n-grams

Note - As it can be observed from the bleu scores, the mean scores are almost similar for both Rule based and NMT, but the median score is pretty high for the Rule based engine, which indicates that the overall translation performance for each sentence is somewhat good for rule based engine with respect to NMT which is performing very good for some of the sentences and quite low for the remaining sentences.


Week 0

by Himanshu Choudhary


Week 1

by Himanshu Choudhary


Week 2

by Himanshu Choudhary


Week 3

by Himanshu Choudhary


Week 4

by Himanshu Choudhary


Week 5

by Himanshu Choudhary


Week 6

by Himanshu Choudhary


Week 7

by Himanshu Choudhary


Eval 1

by Himanshu Choudhary


Week 8

by Himanshu Choudhary


Week 9

by Himanshu Choudhary


Week 10

by Himanshu Choudhary


Week 11

by Himanshu Choudhary


Week 12

by Himanshu Choudhary


Week 13

by Himanshu Choudhary


Week 14

by Himanshu Choudhary


Week 15

by Himanshu Choudhary


Eval 2

by Himanshu Choudhary