Morphological annotation of the gold corpus
Annotation format and fields
Annotation workflow design
Preliminary dictionary data
To feed our dictionary with forms and associated morphological analysis, we re-used the JSON output of the ORACC:ETCSRI project coupled with an XML harvest of the texts created using xslt. The corpus was first converted to a CoNLL-like format which also served in our Linked Open Data proof of concept.
From this CoNLL data file, a pseudo CDLI-CoNLL file was created with all unique form and associated morphological sequence with analysis. This was done by combining and re-arranging the data in the columns until the result matched our CDLI-CoNLL morphology columns: FORM, SEGM, and XPOSTAG. The file has around 5000 entries and can be used to populate the pre-annotation tool dictionary to start annotating Sumerian texts. The file is available here (add the link).
Validation, storage and conversion
Morphology annotators for the MTAAC projects are Lucas Reckling, Jinyan Wang, Émilie Pagé-Perron, Ilya Khait, Heather D. Baker, and Robert K. Englund.
Émilie Pagé-Perron, Lucas Reckling