DataverseLV

Metrics

268 Downloads

Featured Dataverses

In order to use this feature you must have at least one published or linked dataverse.

Publish Dataverse

Are you sure you want to publish your dataverse? Once you do so it must remain published.

Publish Dataverse

This dataverse cannot be published because the dataverse it is in has not been published.

Delete Dataverse

Are you sure you want to delete your dataverse? You cannot undelete this dataverse.

Metadata Source: Harvested

191 to 200 of 202 Results

Full Stack of Latvian Language Resources for NLU Nov 23, 2022 - CLARIN-LV Grūzītis, Normunds; Pretkalniņa, Lauma; Saulīte, Baiba; Rituma, Laura; Nešpore-Bērzkalne, Gunta; Paikens, Pēteris; Auziņa, Ilze; Znotiņš, Artūrs; Levāne-Petrova, Kristīne; Darģis, Roberts, 2019, "Full Stack of Latvian Language Resources for NLU", https://hdl.handle.net/20.500.12574/5, AiLab IMCS UL This repository contains a multilayer text corpus of Latvian. The multilayer corpus is anchored in cross-lingual state-of-the-art representations: Universal Dependencies (UD), FrameNet, PropBank and Abstract Meaning Representation (AMR). This Dataset is harvested from our partners. Clicking the link will take you directly to the archival source of the data.
The Lithuanian-Latvian-Latgalian Dictionary Nov 23, 2022 - CLARIN-LV Leikuma, Lidija; Bernāne, Līga; Cibuļs, Juris; Butkus, Alvydas; Butkienė, Violeta; Vaisvalavičienė, Kristina; Sperga, Ilze, 2013, "The Lithuanian-Latvian-Latgalian Dictionary", https://hdl.handle.net/20.500.12574/52, Rēzekne Academy of Technologies "The Lithuanian-Latvian-Latgalian Dictionary" (hereinafter — "the LLL dictionary") has been compiled on the basis of "Lthe Lithuanian Language Written Sources Frequency Dictionary" ("Dažninis rašytinės lietuvių kalbų žodynas" , hereinafter — "the Frequency dictionary;" comp. by Utka A., Kaunas, VDU, 2009;)). It consists of 42,061 headwords based on... This Dataset is harvested from our partners. Clicking the link will take you directly to the archival source of the data.
Annotated longitudinal corpus of Latvian children's language Nov 23, 2022 - CLARIN-LV Auziņa, Ilze; Darģis, Roberts; Rābante-Buša, Guna; Levāne-Petrova, Kristīne; Saulīte, Baiba, 2017, "Annotated longitudinal corpus of Latvian children's language", https://hdl.handle.net/20.500.12574/7, AiLab IMCS UL The collection contains three longitudinal corpora of monolingual Latvian speaking children, and one longitudinal corpus of simultaneous Latvian-Russian bilingual child. Participants were recorded for 30 minutes each week for 16 months, resulting in 134 hours of speech. 34 hours of obtained speech samples are orthographically transcribed. This Dataset is harvested from our partners. Clicking the link will take you directly to the archival source of the data.
LVMED: Latvian Pronunciation Dictionary of the Medical Domain Oct 26, 2022 - CLARIN-LV Darģis, Roberts; Akmane, Agate; Naļivaiko, Inga; Grūzītis, Normunds; Auziņa, Ilze; Saulīte, Baiba; Stepanovs, Kaspars, 2021, "LVMED: Latvian Pronunciation Dictionary of the Medical Domain", https://hdl.handle.net/20.500.12574/68, AiLab IMCS UL A machine-readable pronunciation dictionary of the medical domain derived from a large text corpus of historical medical records. Consists of 109k entries in the CSV format: first column - a wordform; second column - its pronunciation in the IPA encoding. The dictionary contains Latvian words and terms used in the medical domain, as well as abbrevi... This Dataset is harvested from our partners. Clicking the link will take you directly to the archival source of the data.
Tēzaurs.lv 2020 Oct 5, 2022 - CLARIN-LV Spektors, Andrejs; Pretkalniņa, Lauma; Grūzītis, Normunds; Paikens, Pēteris; Rituma, Laura; Saulīte, Baiba, 2019, "Tēzaurs.lv 2020", https://hdl.handle.net/20.500.12574/9, AiLab IMCS UL Tezaurs is a machine-readable lexicon and an online dictionary for Latvian. The initial human-oriented version of this resource was made publicly in 2009, comprising more than 125,000 entries. Since then, Tezaurs has been updated once every three months and so far it has grown to more than 300,000 entries referring to more than 280 sources. The dic... This Dataset is harvested from our partners. Clicking the link will take you directly to the archival source of the data.
LVBERT - Latvian BERT Apr 19, 2022 - CLARIN-LV Znotiņš, Artūrs, 2020, "LVBERT - Latvian BERT", https://hdl.handle.net/20.500.12574/43, AiLab IMCS UL LVBERT is the first publicly available monolingual BERT language model pre-trained for Latvian. For training we used the original implementation of BERT on TensorFlow with the whole-word masking and the next sentence prediction objectives. We used BERT-BASE configuration with 12 layers, 768 hidden units, 12 heads, 128 sequence length, 128 mini-batc... This Dataset is harvested from our partners. Clicking the link will take you directly to the archival source of the data.
Latvian AMR Sembank Apr 15, 2022 - CLARIN-LV Znotiņš, Artūrs; Paikens, Pēteris; Grūzītis, Normunds, 2020, "Latvian AMR Sembank", https://hdl.handle.net/20.500.12574/40, AiLab IMCS UL An automatically derived AMR annotation layer of the FullStack multi-layer text corpus of Latvian. First, Latvian UD Treebank (v2.5) sentences were translated to English using a state-of-the-art Latvian-English neural MT system (Hugo.lv). Second, a state-of-the-art AMR parser for English (AMREager) was applied to the MT-translated sentences. Additi... This Dataset is harvested from our partners. Clicking the link will take you directly to the archival source of the data.
Rendering of personal names in Latvian: database Apr 12, 2022 - CLARIN-LV ---, ---, 2017, "Rendering of personal names in Latvian: database", https://hdl.handle.net/20.500.12574/61, Latvian Language Agency The application „Rendering of personal names in Latvian” is electronic multilingual dictionary of names. Currently information about rendering of personal names and versions of rendering, rules of rendering and further reading about 28 languages can be found on this web-site. The dictionary is based on the principles of rendering of proper names pu... This Dataset is harvested from our partners. Clicking the link will take you directly to the archival source of the data.
LUIS: data collection for task oriented dialogue system creation Apr 6, 2022 - CLARIN-LV Gunta, Nešpore-Bērzkalne; Skadiņa, Inguna; Grūzītis, Normunds; Znotiņš, Artūrs; Goško, Didzis, 2021, "LUIS: data collection for task oriented dialogue system creation", https://hdl.handle.net/20.500.12574/47, AiLab IMCS UL This multi-targeted dataset contains several datasets that allow to train goal-oriented dialogue systems for student service domain in Latvian. The dataset contains a manually annotated dataset of domain-specific dialog intents, a manually created and annotated dataset of generalised and formalised dialog scenarios based on corpus evidence, dataset... This Dataset is harvested from our partners. Clicking the link will take you directly to the archival source of the data.
The Linguistic Map Jan 18, 2022 - CLARIN-LV Vanags, Pēteris; Trumpa, Edmunds; Laumane, Benita; Markus, Dace; Šuplinska, Ilga; Ernštreits, Valts; Rapa, Sanda; Pūtele, Iveta; Frīdenberga, Anna; Kazakeviča, Agita; Markus-Narvila, Liene; Leikuma, Lidija, 2016, "The Linguistic Map", https://hdl.handle.net/20.500.12574/60, Latvian Language Agency „The Linguistic Map” has been designed as an electronic informative learning aid, providing an overview of the history of Latvian linguistics and delving into its chronology and themes as well as its branches, sub-branches, and the individuals involved in this work. „The Linguistic Map” currently contains entries about individuals, events, places,... This Dataset is harvested from our partners. Clicking the link will take you directly to the archival source of the data.

Full Stack of Latvian Language Resources for NLU

Nov 23, 2022 - CLARIN-LV

Grūzītis, Normunds; Pretkalniņa, Lauma; Saulīte, Baiba; Rituma, Laura; Nešpore-Bērzkalne, Gunta; Paikens, Pēteris; Auziņa, Ilze; Znotiņš, Artūrs; Levāne-Petrova, Kristīne; Darģis, Roberts, 2019, "Full Stack of Latvian Language Resources for NLU", https://hdl.handle.net/20.500.12574/5, AiLab IMCS UL

This repository contains a multilayer text corpus of Latvian. The multilayer corpus is anchored in cross-lingual state-of-the-art representations: Universal Dependencies (UD), FrameNet, PropBank and Abstract Meaning Representation (AMR).