Searching through Ancient Libraries: New Statistical Indexing Methods for 2D Images of Cuneiform Documents
Cuneiform script is the oldest attested writing system in the world: it was used for over three thousand years all through the ancient Near East, including Ancient Israel. It is recorded on clay tablets and stone slabs and mainly used to write Akkadian (a Semitic language like Hebrew and Arabic), which was the international language of the day. The current project aims to use algorithms in order to index cuneiform script and conduct searches through the large photographic corpus (available as open source databases) consisting of tens of thousands of cuneiform documents from the libraries of the Neo-Assyrian Empire. Given that cuneiform is a syllabic script of abstract stylus strokes impressed upon the clay that make up signs, it has to be transliterated in order to be deciphered by modern scholars. However, the millions of signs found in the photographs and line art available are hardly indexed. Employing new indexing and search methods will make them accessible to the scholarly world. This is achieved by applying training-based Keyword Spotting (KWS) methods, largely based on statistical optical and language models that adopt the Query-by-String (QbS) paradigm. A second part of our project is educational: by the creation of a gamified human-machine interface, researchers as well as students across the world will be able to treat photographs first hand for research and teaching purposes.
Joint Research Projects on Data Science Ludwig-Maximilians-Universität München (LMU) - Tel Aviv University (TAU)
Yoram Cohen (TAU)
Jonathan Berant, NLP and ML Consultant (Computer Science, TAU)