Welcome of CDLI Blogs.
Please update the author name and add tags too.
This page should contain the report made for every week.
Replace Project# with your project name.
A complete report of the work done during the week must be written here.
|#||Day||Date||A short description of the work done|
|1||Monday||2020/06/01||Wrote and applied pre-processing scripts on monolingual and parallel data|
|2||Tuesday||2020/06/02||Trained BBPE, BPE and BertWordPiece Tokenizers on the pre-processed text, saved vocabulary and compared results|
|3||Wednesday||2020/06/03||Experimented with CLTK/Akkadian Tokenizer. Created train and test data files. Added alternate version of data using last year’s preprocessing.|
|4||Thursday||2020/06/04||Aligned and prepared data according to FairSeq and OpenNMT|
|5||Friday||2020/06/05||Cleaned and prepared newly obtained ((non-)administrative) data, analysed supervised techniques|
|6||Saturday||2020/06/06||Completed model pipeline shell scripts and set up GPU server|
|7||Sunday||2020/06/07||Prepared and Finalised Benchmark Dataset|