CLARIN-LV

CLARIN (Common Language Resources and Technology Infrastructure) is a research infrastructure that was initiated from the vision that all digital language resources and tools from all over Europe and beyond are accessible through a single sign-on online environment for the support of researchers in the humanities and social sciences.

Featured Dataverses

In order to use this feature you must have at least one published or linked dataverse.

Publish Dataverse

Are you sure you want to publish your dataverse? Once you do so it must remain published.

Publish Dataverse

This dataverse cannot be published because the dataverse it is in has not been published.

Delete Dataverse

Are you sure you want to delete your dataverse? You cannot undelete this dataverse.

21 to 30 of 98 Results

Ilvars - Latvian Male VITS Text-to-Speech Model (vers. 2022) Mar 26, 2025 Darģis, Roberts; Auziņa, Ilze, 2022, "Ilvars - Latvian Male VITS Text-to-Speech Model (vers. 2022)", https://hdl.handle.net/20.500.12574/71, AiLab IMCS UL A neural model for text-to-speech synthesis in Latvian. Trained using VITS on a 20-hour speech corpus of audiobooks read in a male voice. Currently released for research purposes only. This Dataset is harvested from our partners. Clicking the link will take you directly to the archival source of the data.
SELMA Latvian NER Dataset Mar 26, 2025 Rābante-Buša, Guna; Grūzītis, Normunds; Bārzdiņš, Guntis; Mendes, Afonso, 2022, "SELMA Latvian NER Dataset", https://hdl.handle.net/20.500.12574/98, AiLab IMCS UL A dataset of hierarchically annotated named entities in Latvian news articles (provided by the Latvian Information Agency LETA) for the development and evaluation of transition-based parsers for named entity recognition (NER). This Dataset is harvested from our partners. Clicking the link will take you directly to the archival source of the data.
LATE Dev&Test Set V1 for Latvian ASR Mar 26, 2025 Darģis, Roberts; Znotiņš, Artūrs; Auziņa, Ilze; Rābante-Buša, Guna, 2024, "LATE Dev&Test Set V1 for Latvian ASR", https://hdl.handle.net/20.500.12574/99, AiLab IMCS UL A Latvian speech corpus for the development (validation), testing and comparison of ASR models. The audio data is segmented and aligned with the corresponding orthographic transcriptions which are human verified. The LATE-media subset contains both verbatim (raw) and formatted transcriptions (with punctuation, capitalisation, numbers, abbreviations... This Dataset is harvested from our partners. Clicking the link will take you directly to the archival source of the data.
Word sense annotated "The Little Prince" fragments in Latvian 1.0 Mar 26, 2025 Nešpore, Gunta; Rituma, Laura, 2023, "Word sense annotated "The Little Prince" fragments in Latvian 1.0", https://hdl.handle.net/20.500.12574/80, AiLab IMCS UL Annotation of word senses for a running text corpus of 1200 tokens (beginning of The Little Prince by Antoine de Saint-Exupéry) as an evaluation corpus for Latvian WSD systems. Data is provided in a tab-separated format similar to CoNLL, indexing senses to the Tēzaurs.lv word sense IDs as of Tēzaurs.lv 2022 (http://hdl.handle.net/20.500.12574/66) d... This Dataset is harvested from our partners. Clicking the link will take you directly to the archival source of the data.
Latvian Blog Corpus 2015 Mar 26, 2025 Laizāns, Mārtiņš; Pretkalniņa, Lauma, 2015, "Latvian Blog Corpus 2015", https://hdl.handle.net/20.500.12574/79, AiLab IMCS UL Authomaticaly harvested Latvian blog corpus. This Dataset is harvested from our partners. Clicking the link will take you directly to the archival source of the data.
Dataset for Latvian Phonetic Analysis Jan 30, 2025 Trumpa, Edmunds; Ozola, Anete; Jansone, Laura Paula, 2024, "Dataset for Latvian Phonetic Analysis", https://hdl.handle.net/20.500.12574/122, Latvian Language Institute of the University of Latvia The dataset is intended for the characterization, classification and visualization of the phonetic features of syllable intonation characteristic of the modern Latvian language. The dataset contains the following folders: (1) Questionnaires (4 questionnaires with 171 sentences); (2) Recordings (855 utterances spoken by five speakers); (3) Graphs of... This Dataset is harvested from our partners. Clicking the link will take you directly to the archival source of the data.
Dictionary of Contemporary Latvian Language (MLVV) (2024-09-22) Jan 16, 2025 Zuicena, Ieva; Auziņa, Ieva; Briede, Santa; Jansone, Irēna Ilga; Kuplā, Ieva; Lejniece, Gunta; Migla, Ilga; Oldere, Laimdota; Ozola, Ārija; Požarnova, Vija; Rapa, Sanda; Roze, Anitra; Šmidebergs, Imants; Šnē, Dorisa; Šnē, Māra; Timuška, Agris; Grasmanis, Mikus; Pretkalniņa, Lauma; Znotiņš, Artūrs, 2024, "Dictionary of Contemporary Latvian Language (MLVV) (2024-09-22)", https://hdl.handle.net/20.500.12574/109, Latvian Language Institute of the University of Latvia “Contemporary dictionary of Latvian language” (MLVV), developed by the Latvian Language institute of University of Latvia, is a new explanatory dictionary based on Latvian language materials obtained during the last decade. The analysis of the word stock is based on MLVV card files, internet sources, as well as, on last decade’s encyclopaedias and... This Dataset is harvested from our partners. Clicking the link will take you directly to the archival source of the data.
Tēzaurs.lv 2024 (Autumn Edition) Jan 16, 2025 Spektors, Andrejs; Pretkalniņa, Lauma; Grūzītis, Normunds; Paikens, Pēteris; Rituma, Laura; Saulīte, Baiba; Nešpore-Bērzkalne, Gunta; Lokmane, Ilze; Klints, Agute; Stāde, Madara; Grasmanis, Mikus; Auziņa, Ilze; Znotiņš, Artūrs; Darģis, Roberts; Bārzdiņš, Guntis, 2024, "Tēzaurs.lv 2024 (Autumn Edition)", https://hdl.handle.net/20.500.12574/110, AiLab IMCS UL Tezaurs.lv is the largest open machine-readable dictionary for Latvian. This version contains more than 405,000 entries based on 345 sources. The dictionary is enriched with phonetic, morphological, derivational, semantic and other annotations, inflection tables, corpus examples, and it is integrated with the Latvian WordNet data. This dataset is a... This Dataset is harvested from our partners. Clicking the link will take you directly to the archival source of the data.
Corpus of Contemporary Latgalian Speech Jan 7, 2025 Martena, Sanita; Nau, Nicole; Kļavinska, Antra; Juško-Štekele, Angelika; Kociņš-Kūceņš, Armands; Sprukte, Ausma; Briška, Anna; Gusāns, Ingars; Mazure, Laura, 2024, "Corpus of Contemporary Latgalian Speech", https://hdl.handle.net/20.500.12574/105, Rēzekne Academy of Technologies The corpus consists of audio recordings and their transcripts. It documents natural, spontaneous speech, including field research recordings, interviews, TV and radio broadcasts. This Dataset is harvested from our partners. Clicking the link will take you directly to the archival source of the data.
Latgalian Tezaurs 2025 (Winter Edition) Dec 20, 2024 Kļavinska, Antra; Martena, Sanita; Nau, Nicole; Šuplinska, Ilga; Anna, Briška, 2024, "Latgalian Tezaurs 2025 (Winter Edition)", https://hdl.handle.net/20.500.12574/116, Rēzekne Academy of Technologies Latgalian Tezaurs (LTG T) is a lexical database and online dictionary of Latgalian (ISO 639-3 ltg). The pilot version of December 2024 contains more than 450 entries, including many idioms and other multi-word units. Entries include spelling variants and dialect forms and name the sources where the lexical unit has been documented. Audio recordings... This Dataset is harvested from our partners. Clicking the link will take you directly to the archival source of the data.

Ilvars - Latvian Male VITS Text-to-Speech Model (vers. 2022)

Mar 26, 2025

Darģis, Roberts; Auziņa, Ilze, 2022, "Ilvars - Latvian Male VITS Text-to-Speech Model (vers. 2022)", https://hdl.handle.net/20.500.12574/71, AiLab IMCS UL

A neural model for text-to-speech synthesis in Latvian. Trained using VITS on a 20-hour speech corpus of audiobooks read in a male voice. Currently released for research purposes only.