DataverseLV

Metrics

268 Downloads

Latvia University of Life Sciences and Technologies

University of Latvia

Rīga Stradiņš University

Riga Technical University

CLARIN-LV

Featured Dataverses

In order to use this feature you must have at least one published or linked dataverse.

Publish Dataverse

Are you sure you want to publish your dataverse? Once you do so it must remain published.

Publish Dataverse

This dataverse cannot be published because the dataverse it is in has not been published.

Delete Dataverse

Are you sure you want to delete your dataverse? You cannot undelete this dataverse.

231 to 240 of 240 Results

Annotated longitudinal corpus of Latvian children's language Nov 23, 2022 - CLARIN-LV Auziņa, Ilze; Darģis, Roberts; Rābante-Buša, Guna; Levāne-Petrova, Kristīne; Saulīte, Baiba, 2017, "Annotated longitudinal corpus of Latvian children's language", https://hdl.handle.net/20.500.12574/7, AiLab IMCS UL The collection contains three longitudinal corpora of monolingual Latvian speaking children, and one longitudinal corpus of simultaneous Latvian-Russian bilingual child. Participants were recorded for 30 minutes each week for 16 months, resulting in 134 hours of speech. 34 hours of obtained speech samples are orthographically transcribed. This Dataset is harvested from our partners. Clicking the link will take you directly to the archival source of the data.
LVMED: Latvian Pronunciation Dictionary of the Medical Domain Oct 26, 2022 - CLARIN-LV Darģis, Roberts; Akmane, Agate; Naļivaiko, Inga; Grūzītis, Normunds; Auziņa, Ilze; Saulīte, Baiba; Stepanovs, Kaspars, 2021, "LVMED: Latvian Pronunciation Dictionary of the Medical Domain", https://hdl.handle.net/20.500.12574/68, AiLab IMCS UL A machine-readable pronunciation dictionary of the medical domain derived from a large text corpus of historical medical records. Consists of 109k entries in the CSV format: first column - a wordform; second column - its pronunciation in the IPA encoding. The dictionary contains Latvian words and terms used in the medical domain, as well as abbrevi... This Dataset is harvested from our partners. Clicking the link will take you directly to the archival source of the data.
Tēzaurs.lv 2020 Oct 5, 2022 - CLARIN-LV Spektors, Andrejs; Pretkalniņa, Lauma; Grūzītis, Normunds; Paikens, Pēteris; Rituma, Laura; Saulīte, Baiba, 2019, "Tēzaurs.lv 2020", https://hdl.handle.net/20.500.12574/9, AiLab IMCS UL Tezaurs is a machine-readable lexicon and an online dictionary for Latvian. The initial human-oriented version of this resource was made publicly in 2009, comprising more than 125,000 entries. Since then, Tezaurs has been updated once every three months and so far it has grown to more than 300,000 entries referring to more than 280 sources. The dic... This Dataset is harvested from our partners. Clicking the link will take you directly to the archival source of the data.
LVBERT - Latvian BERT Apr 19, 2022 - CLARIN-LV Znotiņš, Artūrs, 2020, "LVBERT - Latvian BERT", https://hdl.handle.net/20.500.12574/43, AiLab IMCS UL LVBERT is the first publicly available monolingual BERT language model pre-trained for Latvian. For training we used the original implementation of BERT on TensorFlow with the whole-word masking and the next sentence prediction objectives. We used BERT-BASE configuration with 12 layers, 768 hidden units, 12 heads, 128 sequence length, 128 mini-batc... This Dataset is harvested from our partners. Clicking the link will take you directly to the archival source of the data.
Latvian AMR Sembank Apr 15, 2022 - CLARIN-LV Znotiņš, Artūrs; Paikens, Pēteris; Grūzītis, Normunds, 2020, "Latvian AMR Sembank", https://hdl.handle.net/20.500.12574/40, AiLab IMCS UL An automatically derived AMR annotation layer of the FullStack multi-layer text corpus of Latvian. First, Latvian UD Treebank (v2.5) sentences were translated to English using a state-of-the-art Latvian-English neural MT system (Hugo.lv). Second, a state-of-the-art AMR parser for English (AMREager) was applied to the MT-translated sentences. Additi... This Dataset is harvested from our partners. Clicking the link will take you directly to the archival source of the data.
Rendering of personal names in Latvian: database Apr 12, 2022 - CLARIN-LV ---, ---, 2017, "Rendering of personal names in Latvian: database", https://hdl.handle.net/20.500.12574/61, Latvian Language Agency The application „Rendering of personal names in Latvian” is electronic multilingual dictionary of names. Currently information about rendering of personal names and versions of rendering, rules of rendering and further reading about 28 languages can be found on this web-site. The dictionary is based on the principles of rendering of proper names pu... This Dataset is harvested from our partners. Clicking the link will take you directly to the archival source of the data.
LUIS: data collection for task oriented dialogue system creation Apr 6, 2022 - CLARIN-LV Gunta, Nešpore-Bērzkalne; Skadiņa, Inguna; Grūzītis, Normunds; Znotiņš, Artūrs; Goško, Didzis, 2021, "LUIS: data collection for task oriented dialogue system creation", https://hdl.handle.net/20.500.12574/47, AiLab IMCS UL This multi-targeted dataset contains several datasets that allow to train goal-oriented dialogue systems for student service domain in Latvian. The dataset contains a manually annotated dataset of domain-specific dialog intents, a manually created and annotated dataset of generalised and formalised dialog scenarios based on corpus evidence, dataset... This Dataset is harvested from our partners. Clicking the link will take you directly to the archival source of the data.
The Linguistic Map Jan 18, 2022 - CLARIN-LV Vanags, Pēteris; Trumpa, Edmunds; Laumane, Benita; Markus, Dace; Šuplinska, Ilga; Ernštreits, Valts; Rapa, Sanda; Pūtele, Iveta; Frīdenberga, Anna; Kazakeviča, Agita; Markus-Narvila, Liene; Leikuma, Lidija, 2016, "The Linguistic Map", https://hdl.handle.net/20.500.12574/60, Latvian Language Agency „The Linguistic Map” has been designed as an electronic informative learning aid, providing an overview of the history of Latvian linguistics and delving into its chronology and themes as well as its branches, sub-branches, and the individuals involved in this work. „The Linguistic Map” currently contains entries about individuals, events, places,... This Dataset is harvested from our partners. Clicking the link will take you directly to the archival source of the data.
Corpus of the Tests of the State Language Proficiency Testing Sep 27, 2021 - CLARIN-LV Auziņa, Ilze; Darģis, Roberts; Levāne-Petrova, Kristīne; Pokratniece, Kristīne; Vēvere, Daira, 2018, "Corpus of the Tests of the State Language Proficiency Testing", https://hdl.handle.net/20.500.12574/49, AiLab IMCS UL The Corpus includes a collection of 900 Latvian language proficiency tests: 150 tests per each proficiency level (A1, A2, B1, B2, C1, C2). Error annotation has been perfomed in all texts. This Dataset is harvested from our partners. Clicking the link will take you directly to the archival source of the data.
Mühlenbach Endzelin Latvian Dictionary (MEV) Sep 27, 2021 - CLARIN-LV Andronova, Everita; Spektors, Andrejs; Nešpore, Gunta; Grūzītis, Normunds, 2004, "Mühlenbach Endzelin Latvian Dictionary (MEV)", https://hdl.handle.net/20.500.12574/38, AiLab IMCS UL The electronic version of the principal and supplementary volumes of K. Mīlenbahs and J. Endzelīns’ “Dictionary of the Latvian Language”, with the facsimiles of the entries accessible. The dictionary offers a wide range of search functions; the information can be found in the modern orthographic or the original spelling of the entries, in the whole... This Dataset is harvested from our partners. Clicking the link will take you directly to the archival source of the data.

Annotated longitudinal corpus of Latvian children's language

Nov 23, 2022 - CLARIN-LV

Auziņa, Ilze; Darģis, Roberts; Rābante-Buša, Guna; Levāne-Petrova, Kristīne; Saulīte, Baiba, 2017, "Annotated longitudinal corpus of Latvian children's language", https://hdl.handle.net/20.500.12574/7, AiLab IMCS UL

The collection contains three longitudinal corpora of monolingual Latvian speaking children, and one longitudinal corpus of simultaneous Latvian-Russian bilingual child. Participants were recorded for 30 minutes each week for 16 months, resulting in 134 hours of speech. 34 hours of obtained speech samples are orthographically transcribed.