Week 6- XLM and Pre-Training

by Rachit

project
research
internship
unsupervised
nmt

Welcome of CDLI Blogs.

Please update the author name and add tags too.

This page should contain the report made for every week.

Replace Project# with your project name.

Week Summary

A complete report of the work done during the week must be written here.

Daily Work Update

# Day Date A short description of the work done
1 Monday 2020/06/01 Cloned Facebook’s implementation of XLM and understood code
2 Tuesday 2020/06/02 Re-wrote/Heavily modified data preperation code for sumerian-english texts
3 Wednesday 2020/06/03 Resolved all issues and errors, started pre-training on 1M sumerian and 20M english monolingual data (general texts)
4 Thursday 2020/06/04 Still pre-training, reached 200 epochs. Prepared scripts to be used for the next steps
5 Friday 2020/06/05 Training stopped and evaluated. Poor results, probably because of very out-of-domain English data. Created data_prep_2
6 Saturday 2020/06/06 Created data with English data from UrIII Admin texts and started training
7 Sunday 2020/06/07 Created end-to-end inference script for evaluation and getting translation for an input