CLARIN-LV

CLARIN (Common Language Resources and Technology Infrastructure) is a research infrastructure that was initiated from the vision that all digital language resources and tools from all over Europe and beyond are accessible through a single sign-on online environment for the support of researchers in the humanities and social sciences.

Featured Dataverses

In order to use this feature you must have at least one published or linked dataverse.

Publish Dataverse

Are you sure you want to publish your dataverse? Once you do so it must remain published.

Publish Dataverse

This dataverse cannot be published because the dataverse it is in has not been published.

Delete Dataverse

Are you sure you want to delete your dataverse? You cannot undelete this dataverse.

31 to 40 of 124 Results

Spelling normalization tool for Latvian 18th century texts Nov 6, 2025 Pretkalniņa, Lauma; Andronova, Everita; Frīdenberga, Anna; Skrūzmane, Elga; Siliņa-Piņķe, Renāte; Trumpa, Anta; Vanags, Pēteris, 2025, "Spelling normalization tool for Latvian 18th century texts", https://hdl.handle.net/20.500.12574/140, AiLab IMCS UL The spelling normalization tool (pilot converter) is meant for converting any 18th century Latvian Unicode-encoded text into a more modern spelling. This version of the tool takes care of normalizing the roots of the words, thus, it is meant for for facillitating user-friendly corpora search in tools like Sketch Engine. The tool consists of 134 uni... This Dataset is harvested from our partners. Clicking the link will take you directly to the archival source of the data.
Latvian Sign Language Landmark Corpus Nov 5, 2025 Neimane, Liene Krista, 2025, "Latvian Sign Language Landmark Corpus", https://hdl.handle.net/20.500.12574/139, Liene Krista Neimane The corpus contains MediaPipe-extracted landmark data representing 45 Latvian Sign Language signs. It includes 33 alphabet letters (a-z), 11 numbers (0-10), and a pause, which were captured from videos featuring several different signers. The collection covers both isolated signs and sign combinations forming complete words or short sentences. For... This Dataset is harvested from our partners. Clicking the link will take you directly to the archival source of the data.
Dictionary of Contemporary Latvian Language (MLVV) (2025-06-21) Oct 6, 2025 Zuicena, Ieva; Auziņa, Ieva; Briede, Santa; Jansone, Irēna Ilga; Kuplā, Ieva; Lejniece, Gunta; Migla, Ilga; Oldere, Laimdota; Ozola, Ārija; Požarnova, Vija; Rapa, Sanda; Roze, Anitra; Šmidebergs, Imants; Šnē, Dorisa; Šnē, Māra; Timuška, Agris; Grasmanis, Mikus; Pretkalniņa, Lauma; Znotiņš, Artūrs, 2025, "Dictionary of Contemporary Latvian Language (MLVV) (2025-06-21)", https://hdl.handle.net/20.500.12574/133, Latvian Language Institute of the University of Latvia “Contemporary dictionary of Latvian language” (MLVV), developed by the Latvian Language institute of University of Latvia, is a new explanatory dictionary based on Latvian language materials obtained during the last decade. The analysis of the word stock is based on MLVV card files, internet sources, as well as, on last decade’s encyclopaedias and... This Dataset is harvested from our partners. Clicking the link will take you directly to the archival source of the data.
Tēzaurs.lv 2025 (Summer Edition) Oct 6, 2025 Spektors, Andrejs; Pretkalniņa, Lauma; Grūzītis, Normunds; Paikens, Pēteris; Rituma, Laura; Saulīte, Baiba; Nešpore-Bērzkalne, Gunta; Lokmane, Ilze; Klints, Agute; Stāde, Madara; Grasmanis, Mikus; Auziņa, Ilze; Znotiņš, Artūrs; Darģis, Roberts; Bārzdiņš, Guntis, 2025, "Tēzaurs.lv 2025 (Summer Edition)", https://hdl.handle.net/20.500.12574/132, AiLab IMCS UL Tezaurs.lv is the largest open machine-readable dictionary for Latvian. This version contains more than 405,000 entries based on 345 sources. The dictionary is enriched with phonetic, morphological, derivational, semantic and other annotations, inflection tables, corpus examples, and integrated with the Latvian WordNet data. This dataset is availab... This Dataset is harvested from our partners. Clicking the link will take you directly to the archival source of the data.
Embedding Model Fine-Tuning Dataset Sep 16, 2025 Deksne, Daiga, 2025, "Embedding Model Fine-Tuning Dataset", https://hdl.handle.net/20.500.12574/136, University of Latvia Dataset for Embedding Model Fine-Tuning has been created within the framework of the National Research Program project "Analysis of the applicability of artificial intelligence methods in the field of EU fund projects". For the purposes of this project, we fine-tuned the bge-m3 model developed by BAAI (Chen et al., 2024). For fine-tuning, we collec... This Dataset is harvested from our partners. Clicking the link will take you directly to the archival source of the data.
Procurement Validation Dataset Sep 16, 2025 Deksne, Daiga; Skadiņš, Raivis; Hohbergs, Andris; Jaunzars, Rūdolfs; Petrovs, Andrejs; Rūdule, Justīne; Pinnis, Mārcis, 2025, "Procurement Validation Dataset", https://hdl.handle.net/20.500.12574/135, University of Latvia The Procurement Validation Dataset was created within the framework of the State Research Programme project "Analysis of the Applicability of Artificial Intelligence Methods in the Field of European Union Fund Projects". The dataset consists of 30 procurement documents evaluated by CFCA experts. The procurement checklists prepared by the experts ha... This Dataset is harvested from our partners. Clicking the link will take you directly to the archival source of the data.
Dictionary of Contemporary Latvian Language (MLVV) (2025-03-20) Jul 31, 2025 Zuicena, Ieva; Auziņa, Ieva; Briede, Santa; Jansone, Irēna Ilga; Kuplā, Ieva; Lejniece, Gunta; Migla, Ilga; Oldere, Laimdota; Ozola, Ārija; Požarnova, Vija; Rapa, Sanda; Roze, Anitra; Šmidebergs, Imants; Šnē, Dorisa; Šnē, Māra; Timuška, Agris; Grasmanis, Mikus; Pretkalniņa, Lauma; Znotiņš, Artūrs, 2025, "Dictionary of Contemporary Latvian Language (MLVV) (2025-03-20)", https://hdl.handle.net/20.500.12574/128, Latvian Language Institute of the University of Latvia “Contemporary dictionary of Latvian language” (MLVV), developed by the Latvian Language institute of University of Latvia, is a new explanatory dictionary based on Latvian language materials obtained during the last decade. The analysis of the word stock is based on MLVV card files, internet sources, as well as, on last decade’s encyclopaedias and... This Dataset is harvested from our partners. Clicking the link will take you directly to the archival source of the data.
Tēzaurs.lv 2025 (Spring Edition) Jul 31, 2025 Spektors, Andrejs; Pretkalniņa, Lauma; Grūzītis, Normunds; Paikens, Pēteris; Rituma, Laura; Saulīte, Baiba; Nešpore-Bērzkalne, Gunta; Lokmane, Ilze; Klints, Agute; Stāde, Madara; Grasmanis, Mikus; Auziņa, Ilze; Znotiņš, Artūrs; Darģis, Roberts; Bārzdiņš, Guntis, 2025, "Tēzaurs.lv 2025 (Spring Edition)", https://hdl.handle.net/20.500.12574/127, AiLab IMCS UL Tezaurs.lv is the largest open machine-readable dictionary for Latvian. This version contains more than 405,000 entries based on 345 sources. The dictionary is enriched with phonetic, morphological, derivational, semantic and other annotations, inflection tables, corpus examples, and integrated with the Latvian WordNet data. This dataset is availab... This Dataset is harvested from our partners. Clicking the link will take you directly to the archival source of the data.
LV portāls e-consultations (2020-2024) Jul 21, 2025 Pauniņš, Artis, 2025, "LV portāls e-consultations (2020-2024)", https://hdl.handle.net/20.500.12574/131, University of Latvia This dataset contains articles from e-consultations about the legislation of the Republic of Latvia. The articles are stored in JSON files that contain the HTML of questions and answers as well as other metadata, such as source URL, title and authors to get citations. The citations to all the articles are available here: https://html-preview.github... This Dataset is harvested from our partners. Clicking the link will take you directly to the archival source of the data.
Ilvars - Latvian Male VITS Text-to-Speech Model (vers. 2023) Jul 9, 2025 Darģis, Roberts; Auziņa, Ilze, 2023, "Ilvars - Latvian Male VITS Text-to-Speech Model (vers. 2023)", https://hdl.handle.net/20.500.12574/89, AiLab IMCS UL A neural model for text-to-speech (TTS) synthesis in Latvian. Trained using VITS on a 25-hour speech corpus of audiobooks read in a male voice. Available for academic and non-commercial purposes via an API. To get access to the API, please, send a request to info@ailab.lv. This Dataset is harvested from our partners. Clicking the link will take you directly to the archival source of the data.

Spelling normalization tool for Latvian 18th century texts

Nov 6, 2025

Pretkalniņa, Lauma; Andronova, Everita; Frīdenberga, Anna; Skrūzmane, Elga; Siliņa-Piņķe, Renāte; Trumpa, Anta; Vanags, Pēteris, 2025, "Spelling normalization tool for Latvian 18th century texts", https://hdl.handle.net/20.500.12574/140, AiLab IMCS UL

The spelling normalization tool (pilot converter) is meant for converting any 18th century Latvian Unicode-encoded text into a more modern spelling. This version of the tool takes care of normalizing the roots of the words, thus, it is meant for for facillitating user-friendly corpora search in tools like Sketch Engine. The tool consists of 134 uni...