121 to 130 of 202 Results
Mar 26, 2025 - CLARIN-LV
Auziņa, Ilze; Darģis, Roberts; Rābante-Buša, Guna; Timinska-Ļaksa, Ilze; Gailīte, Elīna; Auziņa, Arta, 2024, "LATE Conversational Speech Corpus V1 (LATE-sarunas)", https://hdl.handle.net/20.500.12574/113, AiLab IMCS UL
Corpus contains recordings of informal conversations, interviews and public speeches and their transcripts in orthographic transcription. Metadata has been added to each audio recording: gender and age group of the speaker, information about the form of speech – dialogue, monologue, spontaneous or prepared speech, etc.This Dataset is harvested from our partners. Clicking the link will take you directly to the archival source of the data. |
Mar 26, 2025 - CLARIN-LV
Auziņa, Ilze; Darģis, Roberts; Levāne-Petrova, Kristīne; Auziņa, Arta; Saulīte, Baiba; Ļaksa-Timinska, Ilze; Gailīte, Elīna; Nešpore-Bērzkalne, Gunta; Rābante-Buša, Guna; Pokratniece, Kristīne; Klints, Agute, 2024, "LATE Media Speech Corpus V1 (LATE-mediji)", https://hdl.handle.net/20.500.12574/114, AiLab IMCS UL
The corpus contains audio recordings of media broadcasts and their transcripts in orthographic transcription. The data are transcribed in the orthography of Standard Latvian, observing also the principles of punctuation.This Dataset is harvested from our partners. Clicking the link will take you directly to the archival source of the data. |
Mar 26, 2025 - CLARIN-LV
Auziņa, Ilze; Rābante-Buša, Guna; Darģis, Roberts, 2024, "LATE Phonetically Annotated Speech Corpus V1 (fonLATE)", https://hdl.handle.net/20.500.12574/115, AiLab IMCS UL
A small subset of phonetically annotated data has been derived from the LATE-sarunas and LATE-media. The phonetic annotation is available at two levels: (1) the dictionary or standard pronunciation of a word or segment, regardless of its actual pronunciation made by the particular speaker, and (2) the actual pronunciation of a word or segment.This Dataset is harvested from our partners. Clicking the link will take you directly to the archival source of the data. |
Mar 26, 2025 - CLARIN-LV
Znotiņš, Artūrs; Auziņa, Ilze; Saulīte, Baiba; Darģis, Roberts; Grūzītis, Normunds, 2024, "LVMED: Test Set for Latvian ASR in the Radiology Domain", https://hdl.handle.net/20.500.12574/117, AiLab IMCS UL
A Latvian speech corpus for the testing and comparison of ASR models in the radiology domain. It consists of authentic dictations of CT, XR, MR, MG, US examination reports.This Dataset is harvested from our partners. Clicking the link will take you directly to the archival source of the data. |
Mar 26, 2025 - CLARIN-LV
Darģis, Roberts; Auziņa, Ilze, 2022, "Ilvars - Latvian Male VITS Text-to-Speech Model (vers. 2022)", https://hdl.handle.net/20.500.12574/71, AiLab IMCS UL
A neural model for text-to-speech synthesis in Latvian. Trained using VITS on a 20-hour speech corpus of audiobooks read in a male voice. Currently released for research purposes only.This Dataset is harvested from our partners. Clicking the link will take you directly to the archival source of the data. |
Mar 26, 2025 - CLARIN-LV
Rābante-Buša, Guna; Grūzītis, Normunds; Bārzdiņš, Guntis; Mendes, Afonso, 2022, "SELMA Latvian NER Dataset", https://hdl.handle.net/20.500.12574/98, AiLab IMCS UL
A dataset of hierarchically annotated named entities in Latvian news articles (provided by the Latvian Information Agency LETA) for the development and evaluation of transition-based parsers for named entity recognition (NER).This Dataset is harvested from our partners. Clicking the link will take you directly to the archival source of the data. |
Mar 26, 2025 - CLARIN-LV
Darģis, Roberts; Znotiņš, Artūrs; Auziņa, Ilze; Rābante-Buša, Guna, 2024, "LATE Dev&Test Set V1 for Latvian ASR", https://hdl.handle.net/20.500.12574/99, AiLab IMCS UL
A Latvian speech corpus for the development (validation), testing and comparison of ASR models. The audio data is segmented and aligned with the corresponding orthographic transcriptions which are human verified. The LATE-media subset contains both verbatim (raw) and formatted transcriptions (with punctuation, capitalisation, numbers, abbreviations...This Dataset is harvested from our partners. Clicking the link will take you directly to the archival source of the data. |
Mar 26, 2025 - CLARIN-LV
Nešpore, Gunta; Rituma, Laura, 2023, "Word sense annotated "The Little Prince" fragments in Latvian 1.0", https://hdl.handle.net/20.500.12574/80, AiLab IMCS UL
Annotation of word senses for a running text corpus of 1200 tokens (beginning of The Little Prince by Antoine de Saint-Exupéry) as an evaluation corpus for Latvian WSD systems. Data is provided in a tab-separated format similar to CoNLL, indexing senses to the Tēzaurs.lv word sense IDs as of Tēzaurs.lv 2022 (http://hdl.handle.net/20.500.12574/66) d...This Dataset is harvested from our partners. Clicking the link will take you directly to the archival source of the data. |
Mar 26, 2025 - CLARIN-LV
Laizāns, Mārtiņš; Pretkalniņa, Lauma, 2015, "Latvian Blog Corpus 2015", https://hdl.handle.net/20.500.12574/79, AiLab IMCS UL
Authomaticaly harvested Latvian blog corpus.This Dataset is harvested from our partners. Clicking the link will take you directly to the archival source of the data. |
Jan 30, 2025 - CLARIN-LV
Trumpa, Edmunds; Ozola, Anete; Jansone, Laura Paula, 2024, "Dataset for Latvian Phonetic Analysis", https://hdl.handle.net/20.500.12574/122, Latvian Language Institute of the University of Latvia
The dataset is intended for the characterization, classification and visualization of the phonetic features of syllable intonation characteristic of the modern Latvian language. The dataset contains the following folders: (1) Questionnaires (4 questionnaires with 171 sentences); (2) Recordings (855 utterances spoken by five speakers); (3) Graphs of...This Dataset is harvested from our partners. Clicking the link will take you directly to the archival source of the data. |