CLARIN (Common Language Resources and Technology Infrastructure) is a research infrastructure that was initiated from the vision that all digital language resources and tools from all over Europe and beyond are accessible through a single sign-on online environment for the support of researchers in the humanities and social sciences.
Featured Dataverses

In order to use this feature you must have at least one published or linked dataverse.

Publish Dataverse

Are you sure you want to publish your dataverse? Once you do so it must remain published.

Publish Dataverse

This dataverse cannot be published because the dataverse it is in has not been published.

Delete Dataverse

Are you sure you want to delete your dataverse? You cannot undelete this dataverse.

Advanced Search

1 to 10 of 114 Results
Jan 23, 2026
Auziņa, Ilze; Darģis, Roberts; Rābante-Buša, Guna; Ļaksa-Timinska, Ilze; Gailīte, Elīna; Auziņa, Arta, 2024, "LATE Conversational Speech Corpus V1 (LATE-sarunas)", https://hdl.handle.net/20.500.12574/113, AiLab IMCS UL
Corpus contains recordings of informal conversations, interviews and public speeches and their transcripts in orthographic transcription. Metadata has been added to each audio recording: gender and age group of the speaker, information about the form of speech – dialogue, monologue, spontaneous or prepared speech, etc.
This Dataset is harvested from our partners. Clicking the link will take you directly to the archival source of the data.
Dec 22, 2025
Spektors, Andrejs; Pretkalniņa, Lauma; Grūzītis, Normunds; Paikens, Pēteris; Rituma, Laura; Saulīte, Baiba; Nešpore-Bērzkalne, Gunta; Lokmane, Ilze; Klints, Agute; Stāde, Madara; Grasmanis, Mikus; Auziņa, Ilze; Znotiņš, Artūrs; Darģis, Roberts; Bārzdiņš, Guntis, 2025, "Tēzaurs.lv 2026 (Winter Edition)", https://hdl.handle.net/20.500.12574/151, AiLab IMCS UL
Tezaurs.lv is the largest open machine-readable dictionary for Latvian. This version contains more than 410,000 entries based on 350 sources. The dictionary is enriched with phonetic, morphological, derivational, semantic and other annotations, inflection tables, corpus examples, and integrated with the Latvian WordNet data. This dataset is availab...
This Dataset is harvested from our partners. Clicking the link will take you directly to the archival source of the data.
Dec 22, 2025
Spektors, Andrejs; Pretkalniņa, Lauma; Grūzītis, Normunds; Paikens, Pēteris; Rituma, Laura; Saulīte, Baiba; Nešpore-Bērzkalne, Gunta; Lokmane, Ilze; Klints, Agute; Stāde, Madara; Grasmanis, Mikus; Auziņa, Ilze; Znotiņš, Artūrs; Darģis, Roberts; Bārzdiņš, Guntis, 2025, "Tēzaurs.lv 2025 (Autumn Edition)", https://hdl.handle.net/20.500.12574/137, AiLab IMCS UL
Tezaurs.lv is the largest open machine-readable dictionary for Latvian. This version contains more than 410,000 entries based on 350 sources. The dictionary is enriched with phonetic, morphological, derivational, semantic and other annotations, inflection tables, corpus examples, and integrated with the Latvian WordNet data. This dataset is availab...
This Dataset is harvested from our partners. Clicking the link will take you directly to the archival source of the data.
Dec 22, 2025
Zuicena, Ieva; Auziņa, Ieva; Briede, Santa; Jansone, Irēna Ilga; Kuplā, Ieva; Lejniece, Gunta; Migla, Ilga; Oldere, Laimdota; Ozola, Ārija; Požarnova, Vija; Rapa, Sanda; Roze, Anitra; Šmidebergs, Imants; Šnē, Dorisa; Šnē, Māra; Timuška, Agris; Grasmanis, Mikus; Pretkalniņa, Lauma; Znotiņš, Artūrs, 2025, "Dictionary of Contemporary Latvian Language (MLVV) (2025-12-21)", https://hdl.handle.net/20.500.12574/150, Latvian Language Institute, Faculty of Humanities, University of Latvia
“Contemporary dictionary of Latvian language” (MLVV), developed by the Latvian Language Institute of the Faculty of Humanities at the University of Latvia, is a new explanatory dictionary based on Latvian language materials obtained during the last decade. The analysis of the word stock is based on MLVV card files, internet sources, as well as, on...
This Dataset is harvested from our partners. Clicking the link will take you directly to the archival source of the data.
Dec 22, 2025
Zuicena, Ieva; Auziņa, Ieva; Briede, Santa; Jansone, Irēna Ilga; Kuplā, Ieva; Lejniece, Gunta; Migla, Ilga; Oldere, Laimdota; Ozola, Ārija; Požarnova, Vija; Rapa, Sanda; Roze, Anitra; Šmidebergs, Imants; Šnē, Dorisa; Šnē, Māra; Timuška, Agris; Grasmanis, Mikus; Pretkalniņa, Lauma; Znotiņš, Artūrs, 2025, "Dictionary of Contemporary Latvian Language (MLVV) (2025-09-22)", https://hdl.handle.net/20.500.12574/138, Latvian Language Institute of the University of Latvia
“Contemporary dictionary of Latvian language” (MLVV), developed by the Latvian Language institute of University of Latvia, is a new explanatory dictionary based on Latvian language materials obtained during the last decade. The analysis of the word stock is based on MLVV card files, internet sources, as well as, on last decade’s encyclopaedias and...
This Dataset is harvested from our partners. Clicking the link will take you directly to the archival source of the data.
Dec 22, 2025
Ceplītis, Laimdots; Spektors, Andrejs, 2025, "Dictionary of Latvian Literary Language (LLVV) (2025-12-21)", https://hdl.handle.net/20.500.12574/149, AiLab IMCS UL
In the 20th century, the Latvian Language Institute of the University of Latvia (UL LLI, former Language and literature institute of the Academy of Sciences) has produced the largest lexicographic source of Latvian language, which has been digitalized (2001–2022) by the Institute of Mathematics and Computer Sciences, UL. The dictionary contains wor...
This Dataset is harvested from our partners. Clicking the link will take you directly to the archival source of the data.
Dec 22, 2025
Ceplītis, Laimdots; Spektors, Andrejs, 2025, "Dictionary of Latvian Literary Language (LLVV) (2025-03-05)", https://hdl.handle.net/20.500.12574/126, AiLab IMCS UL
In the 20th century, UL Latvian language institute (former Language and literature institute of the Academy of Sciences) has produced the largest lexicographic source of Latvian language, which has been digitalized (2001–2022) by UL Institute of Mathematics and Computer Sciences. The dictionary contains words of standard Latvian used since 19th cen...
This Dataset is harvested from our partners. Clicking the link will take you directly to the archival source of the data.
Dec 19, 2025
Grasmanis, Mikus; Valkovska, Baiba; Levāne-Petrova, Kristīne, 2025, "Latvian word frequency dataset", https://hdl.handle.net/20.500.12574/148, AiLab IMCS UL
This frequency list contains the 25,000 most frequent Latvian lemmas, obtained from 18 morphologically annotated corpora totalling 1.5 billion tokens from the Latvian National Corpora Collection (Korpuss.lv) and Tēzaurs.lv. Supporting academic and practical applications, including language teaching, machine translation, and speech technologies, the...
This Dataset is harvested from our partners. Clicking the link will take you directly to the archival source of the data.
Dec 19, 2025
Reinsone, Sanita; Kaščejeva, Simona; Spektors, Andrejs; Pakalns, Guntis, 2025, "Latvian Folk Legend Corpus of LPT (in Latvian)", https://hdl.handle.net/20.500.12574/147, Digital Humanities Center of the University of Latvia
The corpus includes Latvian legends published in volumes 13, 14, and 15 of "Latvian Folk Tales and Legends" (1925–1937), compiled by Pēteris Šmits. The volumes were digitised in the late 1990s; a revised version and the preparation of the German-language texts were carried out in 2012. Metadata refinement and the development of a new corpus version...
This Dataset is harvested from our partners. Clicking the link will take you directly to the archival source of the data.
Dec 11, 2025
Auziņa, Ilze; Darģis, Roberts; Rābante-Buša, Guna; Levāne-Petrova, Kristīne; Saulīte, Baiba, 2017, "Annotated longitudinal corpus of Latvian children's language", https://hdl.handle.net/20.500.12574/7, AiLab IMCS UL
The collection contains three longitudinal corpora of monolingual Latvian speaking children, and one longitudinal corpus of simultaneous Latvian-Russian bilingual child. Participants were recorded for 30 minutes each week for 16 months, resulting in 134 hours of speech. 34 hours of obtained speech samples are orthographically transcribed.
This Dataset is harvested from our partners. Clicking the link will take you directly to the archival source of the data.
Add Data

Sign up or log in to create a dataverse or add a dataset.

Share Dataverse

Share this dataverse on your favorite social media networks.

Link Dataverse
Reset Modifications

Are you sure you want to reset the selected metadata fields? If you do this, any customizations (hidden, required, optional) you have done will no longer appear.