Week 8

Week Summary

This week we continue our work on publications and proveniences. I’ve also fixed issue #1053.

The infrastructure for connecting entities and publications should be already in place. However, the biggest challenge is to actually find connections between publications and proveniences.

I tried to do some exploration on pdf mining (which inherits from AWCA). But a lot of them are in the transliteration format, so not sure what we can do with those here.

I tried to contact Rune on his idea of this method of finding “high frequency occurence”. His response - this is interesting, but we need to dig deeper into the actual meaning of the publication. Also Emilie mentioned that different assyriologists will have different words to describe a provenience.

Daily Work Update

#	Day	Date	A short description of the work done
1	Monday	2022/08/01	Issue 1053 fix. Discussion on further work on infrastructure code.
2	Tuesday	2022/08/02	Exploration on pdf mining.
3	Wednesday	2022/08/03	Figured out that we should ditch ML-based mining. Just using regex might be better.
4	Thursday	2022/08/04	Ran provenience finding algorithms on existing text files of publications.
5	Friday	2022/08/05	Compiled an initial set of publication and provenience relationships.
6	Saturday	2022/08/06	Contact Rune for his opinion on this matter.
7	Sunday	2022/08/07	Weekend break.