CDLI-Blog | Apertium - Phrase-based MT system

Apertium - Phrase-based MT system

The previously developed Machine Translation systems for Sumerian face sparsity issues because of their low-resource. The advanced translation techniques work better with the higher availability of data, whereas for low-resource languages, Rule Based/Symbolic Machine translation is a good alternative which can be used to improvise the accuracy. The objective is to build a sumerian-english machine translation system using one of the widely used machine translation platforms Apertium. This project also intended to compare the translation accuracy of both Neural Netowrk based and Rule Based machine translation System.

The Phrase Based Machine Translation for Sumerian-English can be found here -

apertium-sux-eng

The Sumerian and English and morph dictionaries can be found here -

Previous Work

The SOTA neural netwrok based translation can be found here -

Semi-Supervised-NMT-for-Sumerian-English

The whole integrated translation pipeline with pos and ner tagger can be found here -

Sumerian-Translation-Pipeline

Objectives and Deliverables

–> Completed Tasks –> Ongoing Tasks

#	Objectives	Associated Deliverables
1	createing sux(sumerian) morphological analyser	morphological dict with default tags for sumerian morph representation, basic morph analyzer
2	Updating previous eng morph dict with sumeiran words	morphological english dict for sumerian words translation and/or handling bi-dictioanry results
3	The Bi-lingual dictionary and rules for the sumerian to english translation	The sux-eng.dix file and .rtx file containing the trasnfer rules (basic)
4	Integrated apertium sux pipeline and testing	the integrated machine translation pipeline and testing comparision between nn based (will be added later) and rule based model
5	updateding transfer rules and SVO reordering, sumerian to english	the final compact updated Transfer rules with and post processing if required
6	Integrated pipeline and NMT Comparision	Developing a notebook interface to try out both Neural network and Rule based engine for (sentence or file), with comparison

english morph dict - https://github.com/apertium/apertium-eng/blob/main/apertium-eng.eng.dix
sumerian morph dict - https://github.com/cdli-gh/apertium-sux/blob/main/apertium-sux.sux.lexd
sumerian-englush bi-dict - https://github.com/cdli-gh/apertium-sux-eng/blob/main/apertium-sux-eng.sux-eng.dix
sumerian-english rule transfer - https://github.com/cdli-gh/apertium-sux-eng/blob/main/apertium-sux-eng.sux-eng.rtx

Tentative timeline

–> Completed Tasks –> Ongoing Tasks –> Work Demonstration

Week	Objectives	Deliverables
1 - 2	learning apertium (The RBMT toolkit)	the basic functanality to use apertium
3 - 4	learning sumerian morphology	Basic grammer of Sumerian and translations
5 - 8	creating basic sumerian analyzer , bi-lingual dict, rule trasnfer and testing	The integrated translation pipeline sux-eng
9 - 11	updating trasnfer rule (verbal and noun phrase) and re-ordering SVO	the improved machine translation results
11 - 15	Working on integrated pipeline and Neural Network based comparision	Final robust pipeline and model comparisions

Results & Evaluation

The evaluation is done using the BLEU metric on the dev set as mentioned above -

Machine Translation System	Mean	Median
rule based	19.156	20.4517
neural network	18.868	6.881

The mean and median scores for both Rule Based and NMT Engine
with weightage of (0.75,0.25,0,0) over n-grams

Note - As it can be observed from the bleu scores, the mean scores are almost similar for both Rule based and NMT, but the median score is pretty high for the Rule based engine, which indicates that the overall translation performance for each sentence is somewhat good for rule based engine with respect to NMT which is performing very good for some of the sentences and quite low for the remaining sentences.

Apertium - Phrase-based MT system

Apertium - Phrase-based MT system

Previous Work

Objectives and Deliverables

The related files

Tentative timeline

Results & Evaluation