Week 0
Week 1
Week 2
Week 3
Week 4
Week 5
Week 6
Week 7
Eval 1
Week 8
Week 9
Week 10
Week 11
Week 12
Week 13
Week 14
Week 15
Eval 2
The previously developed Machine Translation systems for Sumerian face sparsity issues because of their low-resource. The advanced translation techniques work better with the higher availability of data, whereas for low-resource languages, Rule Based/Symbolic Machine translation is a good alternative which can be used to improvise the accuracy. The objective is to build a sumerian-english machine translation system using one of the widely used machine translation platforms Apertium. This project also intended to compare the translation accuracy of both Neural Netowrk based and Rule Based machine translation System.
The Phrase Based Machine Translation for Sumerian-English can be found here -
The Sumerian and English and morph dictionaries can be found here -
The SOTA neural netwrok based translation can be found here -
The whole integrated translation pipeline with pos and ner tagger can be found here -
–> Completed Tasks –> Ongoing Tasks
# | Status | Objectives | Associated Deliverables | issue(s) |
---|---|---|---|---|
1 | createing sux(sumerian) morphological analyser | morphological dict with default tags for sumerian morph representation, basic morph analyzer | ||
2 | Updating previous eng morph dict with sumeiran words | morphological english dict for sumerian words translation and/or handling bi-dictioanry results | ||
3 | The Bi-lingual dictionary and rules for the sumerian to english translation | The sux-eng.dix file and .rtx file containing the trasnfer rules (basic) | ||
4 | Integrated apertium sux pipeline and testing | the integrated machine translation pipeline and testing comparision between nn based (will be added later) and rule based model | ||
5 | updateding transfer rules and SVO reordering, sumerian to english | the final compact updated Transfer rules with and post processing if required | ||
6 | Integrated pipeline and NMT Comparision | Developing a notebook interface to try out both Neural network and Rule based engine for (sentence or file), with comparison |
english morph dict - https://github.com/apertium/apertium-eng/blob/main/apertium-eng.eng.dix
sumerian morph dict - https://github.com/cdli-gh/apertium-sux/blob/main/apertium-sux.sux.lexd
sumerian-englush bi-dict - https://github.com/cdli-gh/apertium-sux-eng/blob/main/apertium-sux-eng.sux-eng.dix
sumerian-english rule transfer - https://github.com/cdli-gh/apertium-sux-eng/blob/main/apertium-sux-eng.sux-eng.rtx
–> Completed Tasks –> Ongoing Tasks –> Work Demonstration
Week | Objectives | Deliverables |
---|---|---|
1 - 2 | learning apertium (The RBMT toolkit) | the basic functanality to use apertium |
3 - 4 | learning sumerian morphology | Basic grammer of Sumerian and translations |
5 - 8 | creating basic sumerian analyzer , bi-lingual dict, rule trasnfer and testing | The integrated translation pipeline sux-eng |
9 - 11 | updating trasnfer rule (verbal and noun phrase) and re-ordering SVO | the improved machine translation results |
11 - 15 | Working on integrated pipeline and Neural Network based comparision | Final robust pipeline and model comparisions |
The evaluation is done using the BLEU metric on the dev set as mentioned above -
Machine Translation System | Mean | Median |
---|---|---|
rule based | 19.156 | 20.4517 |
neural network | 18.868 | 6.881 |
The mean and median scores for both Rule Based and NMT Engine
with weightage of (0.75,0.25,0,0) over n-grams
Note - As it can be observed from the bleu scores, the mean scores are almost similar for both Rule based and NMT, but the median score is pretty high for the Rule based engine, which indicates that the overall translation performance for each sentence is somewhat good for rule based engine with respect to NMT which is performing very good for some of the sentences and quite low for the remaining sentences.
Week 0
Week 1
Week 2
Week 3
Week 4
Week 5
Week 6
Week 7
Eval 1
Week 8
Week 9
Week 10
Week 11
Week 12
Week 13
Week 14
Week 15
Eval 2