DataverseLV

Metrics

282 Downloads

Featured Dataverses

In order to use this feature you must have at least one published or linked dataverse.

Publish Dataverse

Are you sure you want to publish your dataverse? Once you do so it must remain published.

Publish Dataverse

This dataverse cannot be published because the dataverse it is in has not been published.

Delete Dataverse

Are you sure you want to delete your dataverse? You cannot undelete this dataverse.

Metadata Source: Harvested

121 to 130 of 202 Results

LATE Conversational Speech Corpus V1 (LATE-sarunas) Mar 26, 2025 - CLARIN-LV Auziņa, Ilze; Darģis, Roberts; Rābante-Buša, Guna; Timinska-Ļaksa, Ilze; Gailīte, Elīna; Auziņa, Arta, 2024, "LATE Conversational Speech Corpus V1 (LATE-sarunas)", https://hdl.handle.net/20.500.12574/113, AiLab IMCS UL Corpus contains recordings of informal conversations, interviews and public speeches and their transcripts in orthographic transcription. Metadata has been added to each audio recording: gender and age group of the speaker, information about the form of speech – dialogue, monologue, spontaneous or prepared speech, etc. This Dataset is harvested from our partners. Clicking the link will take you directly to the archival source of the data.
LATE Media Speech Corpus V1 (LATE-mediji) Mar 26, 2025 - CLARIN-LV Auziņa, Ilze; Darģis, Roberts; Levāne-Petrova, Kristīne; Auziņa, Arta; Saulīte, Baiba; Ļaksa-Timinska, Ilze; Gailīte, Elīna; Nešpore-Bērzkalne, Gunta; Rābante-Buša, Guna; Pokratniece, Kristīne; Klints, Agute, 2024, "LATE Media Speech Corpus V1 (LATE-mediji)", https://hdl.handle.net/20.500.12574/114, AiLab IMCS UL The corpus contains audio recordings of media broadcasts and their transcripts in orthographic transcription. The data are transcribed in the orthography of Standard Latvian, observing also the principles of punctuation. This Dataset is harvested from our partners. Clicking the link will take you directly to the archival source of the data.
LATE Phonetically Annotated Speech Corpus V1 (fonLATE) Mar 26, 2025 - CLARIN-LV Auziņa, Ilze; Rābante-Buša, Guna; Darģis, Roberts, 2024, "LATE Phonetically Annotated Speech Corpus V1 (fonLATE)", https://hdl.handle.net/20.500.12574/115, AiLab IMCS UL A small subset of phonetically annotated data has been derived from the LATE-sarunas and LATE-media. The phonetic annotation is available at two levels: (1) the dictionary or standard pronunciation of a word or segment, regardless of its actual pronunciation made by the particular speaker, and (2) the actual pronunciation of a word or segment. This Dataset is harvested from our partners. Clicking the link will take you directly to the archival source of the data.
LVMED: Test Set for Latvian ASR in the Radiology Domain Mar 26, 2025 - CLARIN-LV Znotiņš, Artūrs; Auziņa, Ilze; Saulīte, Baiba; Darģis, Roberts; Grūzītis, Normunds, 2024, "LVMED: Test Set for Latvian ASR in the Radiology Domain", https://hdl.handle.net/20.500.12574/117, AiLab IMCS UL A Latvian speech corpus for the testing and comparison of ASR models in the radiology domain. It consists of authentic dictations of CT, XR, MR, MG, US examination reports. This Dataset is harvested from our partners. Clicking the link will take you directly to the archival source of the data.
Ilvars - Latvian Male VITS Text-to-Speech Model (vers. 2022) Mar 26, 2025 - CLARIN-LV Darģis, Roberts; Auziņa, Ilze, 2022, "Ilvars - Latvian Male VITS Text-to-Speech Model (vers. 2022)", https://hdl.handle.net/20.500.12574/71, AiLab IMCS UL A neural model for text-to-speech synthesis in Latvian. Trained using VITS on a 20-hour speech corpus of audiobooks read in a male voice. Currently released for research purposes only. This Dataset is harvested from our partners. Clicking the link will take you directly to the archival source of the data.
SELMA Latvian NER Dataset Mar 26, 2025 - CLARIN-LV Rābante-Buša, Guna; Grūzītis, Normunds; Bārzdiņš, Guntis; Mendes, Afonso, 2022, "SELMA Latvian NER Dataset", https://hdl.handle.net/20.500.12574/98, AiLab IMCS UL A dataset of hierarchically annotated named entities in Latvian news articles (provided by the Latvian Information Agency LETA) for the development and evaluation of transition-based parsers for named entity recognition (NER). This Dataset is harvested from our partners. Clicking the link will take you directly to the archival source of the data.
LATE Dev&Test Set V1 for Latvian ASR Mar 26, 2025 - CLARIN-LV Darģis, Roberts; Znotiņš, Artūrs; Auziņa, Ilze; Rābante-Buša, Guna, 2024, "LATE Dev&Test Set V1 for Latvian ASR", https://hdl.handle.net/20.500.12574/99, AiLab IMCS UL A Latvian speech corpus for the development (validation), testing and comparison of ASR models. The audio data is segmented and aligned with the corresponding orthographic transcriptions which are human verified. The LATE-media subset contains both verbatim (raw) and formatted transcriptions (with punctuation, capitalisation, numbers, abbreviations... This Dataset is harvested from our partners. Clicking the link will take you directly to the archival source of the data.
Word sense annotated "The Little Prince" fragments in Latvian 1.0 Mar 26, 2025 - CLARIN-LV Nešpore, Gunta; Rituma, Laura, 2023, "Word sense annotated "The Little Prince" fragments in Latvian 1.0", https://hdl.handle.net/20.500.12574/80, AiLab IMCS UL Annotation of word senses for a running text corpus of 1200 tokens (beginning of The Little Prince by Antoine de Saint-Exupéry) as an evaluation corpus for Latvian WSD systems. Data is provided in a tab-separated format similar to CoNLL, indexing senses to the Tēzaurs.lv word sense IDs as of Tēzaurs.lv 2022 (http://hdl.handle.net/20.500.12574/66) d... This Dataset is harvested from our partners. Clicking the link will take you directly to the archival source of the data.
Latvian Blog Corpus 2015 Mar 26, 2025 - CLARIN-LV Laizāns, Mārtiņš; Pretkalniņa, Lauma, 2015, "Latvian Blog Corpus 2015", https://hdl.handle.net/20.500.12574/79, AiLab IMCS UL Authomaticaly harvested Latvian blog corpus. This Dataset is harvested from our partners. Clicking the link will take you directly to the archival source of the data.
Dataset for Latvian Phonetic Analysis Jan 30, 2025 - CLARIN-LV Trumpa, Edmunds; Ozola, Anete; Jansone, Laura Paula, 2024, "Dataset for Latvian Phonetic Analysis", https://hdl.handle.net/20.500.12574/122, Latvian Language Institute of the University of Latvia The dataset is intended for the characterization, classification and visualization of the phonetic features of syllable intonation characteristic of the modern Latvian language. The dataset contains the following folders: (1) Questionnaires (4 questionnaires with 171 sentences); (2) Recordings (855 utterances spoken by five speakers); (3) Graphs of... This Dataset is harvested from our partners. Clicking the link will take you directly to the archival source of the data.

LATE Conversational Speech Corpus V1 (LATE-sarunas)

Mar 26, 2025 - CLARIN-LV

Auziņa, Ilze; Darģis, Roberts; Rābante-Buša, Guna; Timinska-Ļaksa, Ilze; Gailīte, Elīna; Auziņa, Arta, 2024, "LATE Conversational Speech Corpus V1 (LATE-sarunas)", https://hdl.handle.net/20.500.12574/113, AiLab IMCS UL

Corpus contains recordings of informal conversations, interviews and public speeches and their transcripts in orthographic transcription. Metadata has been added to each audio recording: gender and age group of the speaker, information about the form of speech – dialogue, monologue, spontaneous or prepared speech, etc.