21 to 30 of 98 Results
Mar 26, 2025
Darģis, Roberts; Auziņa, Ilze, 2022, "Ilvars - Latvian Male VITS Text-to-Speech Model (vers. 2022)", https://hdl.handle.net/20.500.12574/71, AiLab IMCS UL
A neural model for text-to-speech synthesis in Latvian. Trained using VITS on a 20-hour speech corpus of audiobooks read in a male voice. Currently released for research purposes only.This Dataset is harvested from our partners. Clicking the link will take you directly to the archival source of the data. |
Mar 26, 2025
Rābante-Buša, Guna; Grūzītis, Normunds; Bārzdiņš, Guntis; Mendes, Afonso, 2022, "SELMA Latvian NER Dataset", https://hdl.handle.net/20.500.12574/98, AiLab IMCS UL
A dataset of hierarchically annotated named entities in Latvian news articles (provided by the Latvian Information Agency LETA) for the development and evaluation of transition-based parsers for named entity recognition (NER).This Dataset is harvested from our partners. Clicking the link will take you directly to the archival source of the data. |
Mar 26, 2025
Darģis, Roberts; Znotiņš, Artūrs; Auziņa, Ilze; Rābante-Buša, Guna, 2024, "LATE Dev&Test Set V1 for Latvian ASR", https://hdl.handle.net/20.500.12574/99, AiLab IMCS UL
A Latvian speech corpus for the development (validation), testing and comparison of ASR models. The audio data is segmented and aligned with the corresponding orthographic transcriptions which are human verified. The LATE-media subset contains both verbatim (raw) and formatted transcriptions (with punctuation, capitalisation, numbers, abbreviations...This Dataset is harvested from our partners. Clicking the link will take you directly to the archival source of the data. |
Mar 26, 2025
Nešpore, Gunta; Rituma, Laura, 2023, "Word sense annotated "The Little Prince" fragments in Latvian 1.0", https://hdl.handle.net/20.500.12574/80, AiLab IMCS UL
Annotation of word senses for a running text corpus of 1200 tokens (beginning of The Little Prince by Antoine de Saint-Exupéry) as an evaluation corpus for Latvian WSD systems. Data is provided in a tab-separated format similar to CoNLL, indexing senses to the Tēzaurs.lv word sense IDs as of Tēzaurs.lv 2022 (http://hdl.handle.net/20.500.12574/66) d...This Dataset is harvested from our partners. Clicking the link will take you directly to the archival source of the data. |
Mar 26, 2025
Laizāns, Mārtiņš; Pretkalniņa, Lauma, 2015, "Latvian Blog Corpus 2015", https://hdl.handle.net/20.500.12574/79, AiLab IMCS UL
Authomaticaly harvested Latvian blog corpus.This Dataset is harvested from our partners. Clicking the link will take you directly to the archival source of the data. |
Jan 30, 2025
Trumpa, Edmunds; Ozola, Anete; Jansone, Laura Paula, 2024, "Dataset for Latvian Phonetic Analysis", https://hdl.handle.net/20.500.12574/122, Latvian Language Institute of the University of Latvia
The dataset is intended for the characterization, classification and visualization of the phonetic features of syllable intonation characteristic of the modern Latvian language. The dataset contains the following folders: (1) Questionnaires (4 questionnaires with 171 sentences); (2) Recordings (855 utterances spoken by five speakers); (3) Graphs of...This Dataset is harvested from our partners. Clicking the link will take you directly to the archival source of the data. |
Jan 16, 2025
Zuicena, Ieva; Auziņa, Ieva; Briede, Santa; Jansone, Irēna Ilga; Kuplā, Ieva; Lejniece, Gunta; Migla, Ilga; Oldere, Laimdota; Ozola, Ārija; Požarnova, Vija; Rapa, Sanda; Roze, Anitra; Šmidebergs, Imants; Šnē, Dorisa; Šnē, Māra; Timuška, Agris; Grasmanis, Mikus; Pretkalniņa, Lauma; Znotiņš, Artūrs, 2024, "Dictionary of Contemporary Latvian Language (MLVV) (2024-09-22)", https://hdl.handle.net/20.500.12574/109, Latvian Language Institute of the University of Latvia
“Contemporary dictionary of Latvian language” (MLVV), developed by the Latvian Language institute of University of Latvia, is a new explanatory dictionary based on Latvian language materials obtained during the last decade. The analysis of the word stock is based on MLVV card files, internet sources, as well as, on last decade’s encyclopaedias and...This Dataset is harvested from our partners. Clicking the link will take you directly to the archival source of the data. |
Jan 16, 2025
Spektors, Andrejs; Pretkalniņa, Lauma; Grūzītis, Normunds; Paikens, Pēteris; Rituma, Laura; Saulīte, Baiba; Nešpore-Bērzkalne, Gunta; Lokmane, Ilze; Klints, Agute; Stāde, Madara; Grasmanis, Mikus; Auziņa, Ilze; Znotiņš, Artūrs; Darģis, Roberts; Bārzdiņš, Guntis, 2024, "Tēzaurs.lv 2024 (Autumn Edition)", https://hdl.handle.net/20.500.12574/110, AiLab IMCS UL
Tezaurs.lv is the largest open machine-readable dictionary for Latvian. This version contains more than 405,000 entries based on 345 sources. The dictionary is enriched with phonetic, morphological, derivational, semantic and other annotations, inflection tables, corpus examples, and it is integrated with the Latvian WordNet data. This dataset is a...This Dataset is harvested from our partners. Clicking the link will take you directly to the archival source of the data. |
Jan 7, 2025
Martena, Sanita; Nau, Nicole; Kļavinska, Antra; Juško-Štekele, Angelika; Kociņš-Kūceņš, Armands; Sprukte, Ausma; Briška, Anna; Gusāns, Ingars; Mazure, Laura, 2024, "Corpus of Contemporary Latgalian Speech", https://hdl.handle.net/20.500.12574/105, Rēzekne Academy of Technologies
The corpus consists of audio recordings and their transcripts. It documents natural, spontaneous speech, including field research recordings, interviews, TV and radio broadcasts.This Dataset is harvested from our partners. Clicking the link will take you directly to the archival source of the data. |
Dec 20, 2024
Kļavinska, Antra; Martena, Sanita; Nau, Nicole; Šuplinska, Ilga; Anna, Briška, 2024, "Latgalian Tezaurs 2025 (Winter Edition)", https://hdl.handle.net/20.500.12574/116, Rēzekne Academy of Technologies
Latgalian Tezaurs (LTG T) is a lexical database and online dictionary of Latgalian (ISO 639-3 ltg). The pilot version of December 2024 contains more than 450 entries, including many idioms and other multi-word units. Entries include spelling variants and dialect forms and name the sources where the lexical unit has been documented. Audio recordings...This Dataset is harvested from our partners. Clicking the link will take you directly to the archival source of the data. |