Result: Describing Language Variation in the Colophons of Armenian Manuscripts

Title:
Describing Language Variation in the Colophons of Armenian Manuscripts
Publisher Information:
European Language Resources Association (ELRA) 2022
Document Type:
Electronic Resource Electronic Resource
Availability:
Open access content. Open access content
info:eu-repo/semantics/openAccess
Note:
English
Other Numbers:
UCDLC oai:dial.uclouvain.be:boreal:262404
boreal:262404
1372960947
Contributing Source:
UNIVERSITE CATHOLIQUE DE LOUVAIN
From OAIsterĀ®, provided by the OCLC Cooperative.
Accession Number:
edsoai.on1372960947
Database:
OAIster

Further Information

The colophons of Armenian manuscripts constitute a large textual corpus spanning a millennium of written culture. These texts are highly diverse and rich in terms of linguistic variation. This poses a challenge to NLP tools, especially considering the fact that linguistic resources designed or suited for Armenian are still scarce. In this paper, we deal with a sub-corpus of colophons written to commemorate the rescue of a manuscript and dating from 1286 to ca. 1450, a thematic group distinguished by a particularly high concentration of words exhibiting linguistic variation. The text is processed (lemmatization, POS-tagging, and inflectional tagging) using the tools of the GREgORI Project and evaluated. Through a selection of examples, we show how variation is dealt with at each linguistic level (phonology, orthography, flexion, vocabulary, syntax). Complex variation, at the level of tokens or lemmata, is considered as well. The results of this work are used to enrich and refine the linguistic resources of the GREgORI project, which in turn benefits the processing of other texts.