Extended commodity ID module to cover the ~400 most common terms in the Girsu corpus. Sample outputs suggest we get ~85% accuracy for the commodity labeling task, but precision is closer to 95% because of focus on rules which are not overly general. Implemented labeling of modifiers and vessels, allowing distinction between different subtypes of items (male vs female animals, for example).
|#||Day||Date||A short description of the work done|
|1||Monday||2020/06/01||Finish checking wordnet performance: commodity code now covers 400 common terms (26454 total tokens). Testing suggests a need to refactor to jointly classify all words in an entry, rather than iteratively classifying one at a time.|
|2||Tuesday||2020/06/02||Restructured classification code to allow considering all words in an entry. Helps with cases where e.g. mun “salt” can be counted on its own, or can occur as a modifier describing fish. It is only the counted object in the former case.|
|3||Wednesday||2020/06/03||Improved ability to distinguish adjectives/modifiers from commodities. Tests show need to consider full tablet to disambiguate some implied objects (eg ration texts). Fixed handling of some vessels and metals.|
|4||Thursday||2020/06/04||Meeting. Implement handling of vessels in commodity ID module.|
|5||Friday||2020/06/05||Clean, document, and refactor dev branch of commodity ID module. Merge with master branch.|