CLARIN-LV

CLARIN (Common Language Resources and Technology Infrastructure) is a research infrastructure that was initiated from the vision that all digital language resources and tools from all over Europe and beyond are accessible through a single sign-on online environment for the support of researchers in the humanities and social sciences.

Featured Dataverses

In order to use this feature you must have at least one published or linked dataverse.

Publish Dataverse

Are you sure you want to publish your dataverse? Once you do so it must remain published.

Publish Dataverse

This dataverse cannot be published because the dataverse it is in has not been published.

Delete Dataverse

Are you sure you want to delete your dataverse? You cannot undelete this dataverse.

91 to 98 of 98 Results

Tēzaurs.lv 2020 Oct 5, 2022 Spektors, Andrejs; Pretkalniņa, Lauma; Grūzītis, Normunds; Paikens, Pēteris; Rituma, Laura; Saulīte, Baiba, 2019, "Tēzaurs.lv 2020", https://hdl.handle.net/20.500.12574/9, AiLab IMCS UL Tezaurs is a machine-readable lexicon and an online dictionary for Latvian. The initial human-oriented version of this resource was made publicly in 2009, comprising more than 125,000 entries. Since then, Tezaurs has been updated once every three months and so far it has grown to more than 300,000 entries referring to more than 280 sources. The dic... This Dataset is harvested from our partners. Clicking the link will take you directly to the archival source of the data.
LVBERT - Latvian BERT Apr 19, 2022 Znotiņš, Artūrs, 2020, "LVBERT - Latvian BERT", https://hdl.handle.net/20.500.12574/43, AiLab IMCS UL LVBERT is the first publicly available monolingual BERT language model pre-trained for Latvian. For training we used the original implementation of BERT on TensorFlow with the whole-word masking and the next sentence prediction objectives. We used BERT-BASE configuration with 12 layers, 768 hidden units, 12 heads, 128 sequence length, 128 mini-batc... This Dataset is harvested from our partners. Clicking the link will take you directly to the archival source of the data.
Latvian AMR Sembank Apr 15, 2022 Znotiņš, Artūrs; Paikens, Pēteris; Grūzītis, Normunds, 2020, "Latvian AMR Sembank", https://hdl.handle.net/20.500.12574/40, AiLab IMCS UL An automatically derived AMR annotation layer of the FullStack multi-layer text corpus of Latvian. First, Latvian UD Treebank (v2.5) sentences were translated to English using a state-of-the-art Latvian-English neural MT system (Hugo.lv). Second, a state-of-the-art AMR parser for English (AMREager) was applied to the MT-translated sentences. Additi... This Dataset is harvested from our partners. Clicking the link will take you directly to the archival source of the data.
Rendering of personal names in Latvian: database Apr 12, 2022 ---, ---, 2017, "Rendering of personal names in Latvian: database", https://hdl.handle.net/20.500.12574/61, Latvian Language Agency The application „Rendering of personal names in Latvian” is electronic multilingual dictionary of names. Currently information about rendering of personal names and versions of rendering, rules of rendering and further reading about 28 languages can be found on this web-site. The dictionary is based on the principles of rendering of proper names pu... This Dataset is harvested from our partners. Clicking the link will take you directly to the archival source of the data.
LUIS: data collection for task oriented dialogue system creation Apr 6, 2022 Gunta, Nešpore-Bērzkalne; Skadiņa, Inguna; Grūzītis, Normunds; Znotiņš, Artūrs; Goško, Didzis, 2021, "LUIS: data collection for task oriented dialogue system creation", https://hdl.handle.net/20.500.12574/47, AiLab IMCS UL This multi-targeted dataset contains several datasets that allow to train goal-oriented dialogue systems for student service domain in Latvian. The dataset contains a manually annotated dataset of domain-specific dialog intents, a manually created and annotated dataset of generalised and formalised dialog scenarios based on corpus evidence, dataset... This Dataset is harvested from our partners. Clicking the link will take you directly to the archival source of the data.
The Linguistic Map Jan 18, 2022 Vanags, Pēteris; Trumpa, Edmunds; Laumane, Benita; Markus, Dace; Šuplinska, Ilga; Ernštreits, Valts; Rapa, Sanda; Pūtele, Iveta; Frīdenberga, Anna; Kazakeviča, Agita; Markus-Narvila, Liene; Leikuma, Lidija, 2016, "The Linguistic Map", https://hdl.handle.net/20.500.12574/60, Latvian Language Agency „The Linguistic Map” has been designed as an electronic informative learning aid, providing an overview of the history of Latvian linguistics and delving into its chronology and themes as well as its branches, sub-branches, and the individuals involved in this work. „The Linguistic Map” currently contains entries about individuals, events, places,... This Dataset is harvested from our partners. Clicking the link will take you directly to the archival source of the data.
Corpus of the Tests of the State Language Proficiency Testing Sep 27, 2021 Auziņa, Ilze; Darģis, Roberts; Levāne-Petrova, Kristīne; Pokratniece, Kristīne; Vēvere, Daira, 2018, "Corpus of the Tests of the State Language Proficiency Testing", https://hdl.handle.net/20.500.12574/49, AiLab IMCS UL The Corpus includes a collection of 900 Latvian language proficiency tests: 150 tests per each proficiency level (A1, A2, B1, B2, C1, C2). Error annotation has been perfomed in all texts. This Dataset is harvested from our partners. Clicking the link will take you directly to the archival source of the data.
Mühlenbach Endzelin Latvian Dictionary (MEV) Sep 27, 2021 Andronova, Everita; Spektors, Andrejs; Nešpore, Gunta; Grūzītis, Normunds, 2004, "Mühlenbach Endzelin Latvian Dictionary (MEV)", https://hdl.handle.net/20.500.12574/38, AiLab IMCS UL The electronic version of the principal and supplementary volumes of K. Mīlenbahs and J. Endzelīns’ “Dictionary of the Latvian Language”, with the facsimiles of the entries accessible. The dictionary offers a wide range of search functions; the information can be found in the modern orthographic or the original spelling of the entries, in the whole... This Dataset is harvested from our partners. Clicking the link will take you directly to the archival source of the data.

Tēzaurs.lv 2020

Oct 5, 2022

Spektors, Andrejs; Pretkalniņa, Lauma; Grūzītis, Normunds; Paikens, Pēteris; Rituma, Laura; Saulīte, Baiba, 2019, "Tēzaurs.lv 2020", https://hdl.handle.net/20.500.12574/9, AiLab IMCS UL

Tezaurs is a machine-readable lexicon and an online dictionary for Latvian. The initial human-oriented version of this resource was made publicly in 2009, comprising more than 125,000 entries. Since then, Tezaurs has been updated once every three months and so far it has grown to more than 300,000 entries referring to more than 280 sources. The dic...