Difference between revisions of "Lexical masks in JSON"

From Simia
Jump to navigation Jump to search
(Created page with "{{pubdate|22|June|{{subst:CURRENTYEAR}}}} We have released lexical masks as ShEx files before, schemata for lexicographic forms that can be used to validate whether the data...")
 
(No difference)

Latest revision as of 15:18, 15 November 2020

We have released lexical masks as ShEx files before, schemata for lexicographic forms that can be used to validate whether the data is complete.

We saw that it was quite challenging to turn these ShEx files into forms for entering the data, such as Lucas Werkmeister’s Lexeme Forms. So we adapted our approach slightly to publish JSON files that keep the structures in an easier to parse and understand format, and to also provide a script that translates these JSON files into ShEx Entity Schemas.

Furthermore, we published more masks for more languages and parts of speech than before.

Full documentation can be found on wiki: https://www.wikidata.org/wiki/Wikidata:Lexical_Masks#Paper

Background can be found in the paper: https://www.aclweb.org/anthology/2020.lrec-1.372/

Thanks Bruno, Saran, and Daniel for your great work!

Simia

Previous entry:
Major bill for US National Parks passed
Next entry:
Starting Abstract Wikipedia