CLARIN-LV

CLARIN (Common Language Resources and Technology Infrastructure) is a research infrastructure that was initiated from the vision that all digital language resources and tools from all over Europe and beyond are accessible through a single sign-on online environment for the support of researchers in the humanities and social sciences.

Featured Dataverses

In order to use this feature you must have at least one published or linked dataverse.

Publish Dataverse

Are you sure you want to publish your dataverse? Once you do so it must remain published.

Publish Dataverse

This dataverse cannot be published because the dataverse it is in has not been published.

Delete Dataverse

Are you sure you want to delete your dataverse? You cannot undelete this dataverse.

21 to 30 of 124 Results

Latvian word frequency dataset Dec 19, 2025 Grasmanis, Mikus; Valkovska, Baiba; Levāne-Petrova, Kristīne, 2025, "Latvian word frequency dataset", https://hdl.handle.net/20.500.12574/148, AiLab IMCS UL This frequency list contains the 25,000 most frequent Latvian lemmas, obtained from 18 morphologically annotated corpora totalling 1.5 billion tokens from the Latvian National Corpora Collection (Korpuss.lv) and Tēzaurs.lv. Supporting academic and practical applications, including language teaching, machine translation, and speech technologies, the... This Dataset is harvested from our partners. Clicking the link will take you directly to the archival source of the data.
Latvian Folk Legend Corpus of LPT (in Latvian) Dec 19, 2025 Reinsone, Sanita; Kaščejeva, Simona; Spektors, Andrejs; Pakalns, Guntis, 2025, "Latvian Folk Legend Corpus of LPT (in Latvian)", https://hdl.handle.net/20.500.12574/147, Digital Humanities Center of the University of Latvia The corpus includes Latvian legends published in volumes 13, 14, and 15 of "Latvian Folk Tales and Legends" (1925–1937), compiled by Pēteris Šmits. The volumes were digitised in the late 1990s; a revised version and the preparation of the German-language texts were carried out in 2012. Metadata refinement and the development of a new corpus version... This Dataset is harvested from our partners. Clicking the link will take you directly to the archival source of the data.
Annotated longitudinal corpus of Latvian children's language Dec 11, 2025 Auziņa, Ilze; Darģis, Roberts; Rābante-Buša, Guna; Levāne-Petrova, Kristīne; Saulīte, Baiba, 2017, "Annotated longitudinal corpus of Latvian children's language", https://hdl.handle.net/20.500.12574/7, AiLab IMCS UL The collection contains three longitudinal corpora of monolingual Latvian speaking children, and one longitudinal corpus of simultaneous Latvian-Russian bilingual child. Participants were recorded for 30 minutes each week for 16 months, resulting in 134 hours of speech. 34 hours of obtained speech samples are orthographically transcribed. This Dataset is harvested from our partners. Clicking the link will take you directly to the archival source of the data.
Corpus of Latvian Autobiographies Dec 3, 2025 Reinsone, Sanita; Matulis, Haralds; Ļaksa-Timinska, Ilze; Žvarte, Elvīra, 2025, "Corpus of Latvian Autobiographies", https://hdl.handle.net/20.500.12574/145, Institute of Literature, Folklore and Art of the University of Latvia The corpus consists of 74 unpublished autobiographies, life stories, and memoirs in Latvian, written between 1900 and 2024. All materials have been collected, digitised, and are preserved in the Autobiography Collection of the Archives of Latvian Folklore, Institute of Literature, Folklore and Art, University of Latvia. The corpus has been created... This Dataset is harvested from our partners. Clicking the link will take you directly to the archival source of the data.
Latgalian Tezaurs 2026 (Winter Edition) Nov 27, 2025 Kļavinska, Antra; Martena, Sanita; Nau, Nicole; Šuplinska, Ilga; Anna, Briška, 2025, "Latgalian Tezaurs 2026 (Winter Edition)", https://hdl.handle.net/20.500.12574/144, AiLab IMCS UL Latgalian Tezaurs (LTG T) is a lexical database and online dictionary of Latgalian (ISO 639-3 ltg). This version contains more than 750 entries, including many idioms and other multi-word units. Entries include spelling variants and dialect forms and name the sources where the lexical unit has been documented. Audio recordings illustrate pronunciat... This Dataset is harvested from our partners. Clicking the link will take you directly to the archival source of the data.
Latgalian Tezaurs 2025 (Winter Edition) Nov 27, 2025 Kļavinska, Antra; Martena, Sanita; Nau, Nicole; Šuplinska, Ilga; Anna, Briška, 2024, "Latgalian Tezaurs 2025 (Winter Edition)", https://hdl.handle.net/20.500.12574/116, Rēzekne Academy of Technologies Latgalian Tezaurs (LTG T) is a lexical database and online dictionary of Latgalian (ISO 639-3 ltg). The pilot version of December 2024 contains more than 450 entries, including many idioms and other multi-word units. Entries include spelling variants and dialect forms and name the sources where the lexical unit has been documented. Audio recordings... This Dataset is harvested from our partners. Clicking the link will take you directly to the archival source of the data.
Latvian and Latgalian Parallel Sample Treebank (Cairo) Nov 26, 2025 Pretkalniņa, Lauma; Nešpore-Bērzkalne, Gunta; Pokratniece, Kristīne; Rituma, Laura, 2025, "Latvian and Latgalian Parallel Sample Treebank (Cairo)", https://hdl.handle.net/20.500.12574/143, AiLab IMCS UL This corpus contains 20 Latvian and Latgalian sample sentences annotated in the same hybrid annotation model used in Latvian Treebank. Sentences used in this corpora are the same sentences that are used in "Cairo" sample corpora that showcase anntoation choices for Universal Dependency treebanks, and this corpus serves as a basis for both UD-Latvia... This Dataset is harvested from our partners. Clicking the link will take you directly to the archival source of the data.
LVTB - Latvian Treebank v2.16 (2025-05-15) Nov 25, 2025 Rituma, Laura; Pretkalniņa, Lauma; Saulīte, Baiba; Nešpore-Bērzkalne, Gunta; Grūzītis, Normunds; Znotiņš, Artūrs, 2025, "LVTB - Latvian Treebank v2.16 (2025-05-15)", https://hdl.handle.net/20.500.12574/129, AiLab IMCS UL Latvian Treebank (LVTB) is being developed since 2010. It is manually annotated according to a hybrid dependency-constituency grammar model. This version of LVTB contains data used for deriving the corresponding version of Latvian UD Treebank (UDLV-LVTB). This Dataset is harvested from our partners. Clicking the link will take you directly to the archival source of the data.
The Corpus of Early Written Latvian (2022) Nov 20, 2025 Andronova, Everita; Spektors, Andrejs; Vanags, Pēteris; Baltiņa, Maija; Trumpa, Anta; Trumpa, Edmunds; Grūzītis, Normunds; Siliņa-Piņķe, Renāte; Frīdenberga, Anna; Skrūzmane, Elga; Ķauķīte, Sintija; Pretkalniņa, Lauma, 2022, "The Corpus of Early Written Latvian (2022)", https://hdl.handle.net/20.500.12574/90, AiLab IMCS UL The Corpus of early written Latvian ‘SENIE’ provides access to the texts of written Latvian of the 16th–18th century, and its aim is to facilitate studies of early Latvian in general (e.g. the lexis, morphology and syntax of the texts) and to serve as the basis for "The Historical dictionary of Latvian (16th–17th cc.)". The Corpus was first launche... This Dataset is harvested from our partners. Clicking the link will take you directly to the archival source of the data.
The Corpus of Early Written Latvian (2025) Nov 20, 2025 Andronova, Everita; Baltiņa, Maija; Frīdenberga, Anna; Grūzītis, Normunds; Ķauķīte, Sintija; Pokratniece, Kristīne; Pretkalniņa, Lauma; Siliņa-Piņķe, Renāte; Skrūzmane, Elga; Spektors, Andrejs; Spektors, Mārtiņš; Štrausa, Ilze; Trumpa, Anta; Trumpa, Edmunds; Vanags, Pēteris, 2025, "The Corpus of Early Written Latvian (2025)", https://hdl.handle.net/20.500.12574/141, AiLab IMCS UL The Corpus of early written Latvian 'SENIE' provides access to the texts and facsimiles of written Latvian of the 16th–18th century. Its aim is to facilitate studies of early Latvian in general and to serve as the basis for 'The Historical dictionary of Latvian (16th–17th cc.)'. Corpus serves as a unique digital repository of early Latvian texts, w... This Dataset is harvested from our partners. Clicking the link will take you directly to the archival source of the data.

Latvian word frequency dataset

Dec 19, 2025

Grasmanis, Mikus; Valkovska, Baiba; Levāne-Petrova, Kristīne, 2025, "Latvian word frequency dataset", https://hdl.handle.net/20.500.12574/148, AiLab IMCS UL

This frequency list contains the 25,000 most frequent Latvian lemmas, obtained from 18 morphologically annotated corpora totalling 1.5 billion tokens from the Latvian National Corpora Collection (Korpuss.lv) and Tēzaurs.lv. Supporting academic and practical applications, including language teaching, machine translation, and speech technologies, the...