CLARIN-LV

CLARIN (Common Language Resources and Technology Infrastructure) is a research infrastructure that was initiated from the vision that all digital language resources and tools from all over Europe and beyond are accessible through a single sign-on online environment for the support of researchers in the humanities and social sciences.

Featured Dataverses

In order to use this feature you must have at least one published or linked dataverse.

Publish Dataverse

Are you sure you want to publish your dataverse? Once you do so it must remain published.

Publish Dataverse

This dataverse cannot be published because the dataverse it is in has not been published.

Delete Dataverse

Are you sure you want to delete your dataverse? You cannot undelete this dataverse.

11 to 20 of 98 Results

Dictionary of Latvian Literary Language (LLVV) (2025-03-05) Apr 22, 2025 Ceplītis, Laimdots; Spektors, Andrejs, 2025, "Dictionary of Latvian Literary Language (LLVV) (2025-03-05)", https://hdl.handle.net/20.500.12574/126, AiLab IMCS UL In the 20th century, UL Latvian language institute (former Language and literature institute of the Academy of Sciences) has produced the largest lexicographic source of Latvian language, which has been digitalized (2001–2022) by UL Institute of Mathematics and Computer Sciences. The dictionary contains words of standard Latvian used since 19th cen... This Dataset is harvested from our partners. Clicking the link will take you directly to the archival source of the data.
Dictionary of Latvian Literary Language (LLVV) (2024-02) Apr 22, 2025 Ceplītis, Laimdots; Spektors, Andrejs, 2024, "Dictionary of Latvian Literary Language (LLVV) (2024-02)", https://hdl.handle.net/20.500.12574/100, AiLab IMCS UL In the 20th century, UL Latvian language institute (former Language and literature institute of the Academy of Sciences) has produced the largest lexicographic source of Latvian language, which has been digitalized (2001–2022) by UL Institute of Mathematics and Computer Sciences. The dictionary contains words of standard Latvian used since 19th cen... This Dataset is harvested from our partners. Clicking the link will take you directly to the archival source of the data.
Corpus of Latvian Early Novels (2025-03-11) Mar 27, 2025 Baklāne, Anda; Saulespurēns, Valdis; Ozols, Artis; Krasovska, Marlēna; Vēveris, Viesturs; Eglāja-Kristsone, Eva; Rožkalne, Anita; Skaistkalne, Evija, 2025, "Corpus of Latvian Early Novels (2025-03-11)", https://hdl.handle.net/20.500.12574/125, National Library of Latvia Corpus of Latvian novels, first published before 1940. This Dataset is harvested from our partners. Clicking the link will take you directly to the archival source of the data.
Corpus of Latvian Early Novels Mar 26, 2025 Baklāne, Anda; Saulespurēns, Valdis; Ozols, Artis; Krasovska, Marlēna, 2021, "Corpus of Latvian Early Novels", https://hdl.handle.net/20.500.12574/78, National Library of Latvia Corpus of Latvian novels, first published before the 1940. This Dataset is harvested from our partners. Clicking the link will take you directly to the archival source of the data.
Corpus of Contemporary Latgalian Speech (MuLaR) Mar 26, 2025 Martena, Sanita; Nau, Nicole; Kļavinska, Antra; Juško-Štekele, Angelika; Kociņš-Kūceņš, Armands; Sprukte, Ausma; Briška, Anna; Gusāns, Ingars; Mazure, Laura, 2025, "Corpus of Contemporary Latgalian Speech (MuLaR)", https://hdl.handle.net/20.500.12574/118, Rēzekne Academy of Technologies The corpus consists of audio recordings and their transcripts. It documents natural, spontaneous speech, including field research recordings, interviews, TV and radio broadcasts. This Dataset is harvested from our partners. Clicking the link will take you directly to the archival source of the data.
Latvian Sign Language Corpus Mar 26, 2025 Bethere, Dina; Barone, Lelde; Immure, Inese; Intsone, Agija; Liniņa, Ilona; Ozola, Elza; Romanovska, Agnese; Straupeniece, Daiga; Darģis, Roberts, 2025, "Latvian Sign Language Corpus", https://hdl.handle.net/20.500.12574/121, RTU Liepaja The corpus contains video news produced by the Latvian Deaf Union and news from Latvian public media with sign language interpretation. Video recordings of Latvian sign language utterances are segmented and arranged in three levels: SIGN, CONCEPT and SENTENCE. The corpus comprises 12,500 signs over 150 minutes. Data is browsable with ELAN software. This Dataset is harvested from our partners. Clicking the link will take you directly to the archival source of the data.
LATE Conversational Speech Corpus V1 (LATE-sarunas) Mar 26, 2025 Auziņa, Ilze; Darģis, Roberts; Rābante-Buša, Guna; Timinska-Ļaksa, Ilze; Gailīte, Elīna; Auziņa, Arta, 2024, "LATE Conversational Speech Corpus V1 (LATE-sarunas)", https://hdl.handle.net/20.500.12574/113, AiLab IMCS UL Corpus contains recordings of informal conversations, interviews and public speeches and their transcripts in orthographic transcription. Metadata has been added to each audio recording: gender and age group of the speaker, information about the form of speech – dialogue, monologue, spontaneous or prepared speech, etc. This Dataset is harvested from our partners. Clicking the link will take you directly to the archival source of the data.
LATE Media Speech Corpus V1 (LATE-mediji) Mar 26, 2025 Auziņa, Ilze; Darģis, Roberts; Levāne-Petrova, Kristīne; Auziņa, Arta; Saulīte, Baiba; Ļaksa-Timinska, Ilze; Gailīte, Elīna; Nešpore-Bērzkalne, Gunta; Rābante-Buša, Guna; Pokratniece, Kristīne; Klints, Agute, 2024, "LATE Media Speech Corpus V1 (LATE-mediji)", https://hdl.handle.net/20.500.12574/114, AiLab IMCS UL The corpus contains audio recordings of media broadcasts and their transcripts in orthographic transcription. The data are transcribed in the orthography of Standard Latvian, observing also the principles of punctuation. This Dataset is harvested from our partners. Clicking the link will take you directly to the archival source of the data.
LATE Phonetically Annotated Speech Corpus V1 (fonLATE) Mar 26, 2025 Auziņa, Ilze; Rābante-Buša, Guna; Darģis, Roberts, 2024, "LATE Phonetically Annotated Speech Corpus V1 (fonLATE)", https://hdl.handle.net/20.500.12574/115, AiLab IMCS UL A small subset of phonetically annotated data has been derived from the LATE-sarunas and LATE-media. The phonetic annotation is available at two levels: (1) the dictionary or standard pronunciation of a word or segment, regardless of its actual pronunciation made by the particular speaker, and (2) the actual pronunciation of a word or segment. This Dataset is harvested from our partners. Clicking the link will take you directly to the archival source of the data.
LVMED: Test Set for Latvian ASR in the Radiology Domain Mar 26, 2025 Znotiņš, Artūrs; Auziņa, Ilze; Saulīte, Baiba; Darģis, Roberts; Grūzītis, Normunds, 2024, "LVMED: Test Set for Latvian ASR in the Radiology Domain", https://hdl.handle.net/20.500.12574/117, AiLab IMCS UL A Latvian speech corpus for the testing and comparison of ASR models in the radiology domain. It consists of authentic dictations of CT, XR, MR, MG, US examination reports. This Dataset is harvested from our partners. Clicking the link will take you directly to the archival source of the data.

Dictionary of Latvian Literary Language (LLVV) (2025-03-05)

Apr 22, 2025

Ceplītis, Laimdots; Spektors, Andrejs, 2025, "Dictionary of Latvian Literary Language (LLVV) (2025-03-05)", https://hdl.handle.net/20.500.12574/126, AiLab IMCS UL

In the 20th century, UL Latvian language institute (former Language and literature institute of the Academy of Sciences) has produced the largest lexicographic source of Latvian language, which has been digitalized (2001–2022) by UL Institute of Mathematics and Computer Sciences. The dictionary contains words of standard Latvian used since 19th cen...