Welcome of CDLI Blogs.
Please update the author name and add tags too.
This page should contain the report made for every week.
Replace Project# with your project name.
A complete report of the work done during the week must be written here.
# | Day | Date | A short description of the work done |
---|---|---|---|
1 | Monday | 2020/06/01 | Wrote and applied pre-processing scripts on monolingual and parallel data |
2 | Tuesday | 2020/06/02 | Trained BBPE, BPE and BertWordPiece Tokenizers on the pre-processed text, saved vocabulary and compared results |
3 | Wednesday | 2020/06/03 | Experimented with CLTK/Akkadian Tokenizer. Created train and test data files. Added alternate version of data using last year’s preprocessing. |
4 | Thursday | 2020/06/04 | Aligned and prepared data according to FairSeq and OpenNMT |
5 | Friday | 2020/06/05 | Cleaned and prepared newly obtained ((non-)administrative) data, analysed supervised techniques |
6 | Saturday | 2020/06/06 | Completed model pipeline shell scripts and set up GPU server |
7 | Sunday | 2020/06/07 | Prepared and Finalised Benchmark Dataset |