CLARIN (Common Language Resources and Technology Infrastructure) is a research infrastructure that was initiated from the vision that all digital language resources and tools from all over Europe and beyond are accessible through a single sign-on online environment for the support of researchers in the humanities and social sciences.
Featured Dataverses

In order to use this feature you must have at least one published or linked dataverse.

Publish Dataverse

Are you sure you want to publish your dataverse? Once you do so it must remain published.

Publish Dataverse

This dataverse cannot be published because the dataverse it is in has not been published.

Delete Dataverse

Are you sure you want to delete your dataverse? You cannot undelete this dataverse.

Advanced Search

31 to 40 of 98 Results
Dec 14, 2024
Baklāne, Anda; Saulespurēns, Valdis; Ozols, Artis, 2022, ""Karogs" corpus", https://hdl.handle.net/20.500.12574/83, National Library of Latvia
Corpus contains texts of the magazine "Karogs" from 1940 to 1994.
This Dataset is harvested from our partners. Clicking the link will take you directly to the archival source of the data.
Dec 14, 2024
Darģis, Roberts, 2022, "Corpus of Latvian PhD Theses (Disertācijas)", https://hdl.handle.net/20.500.12574/93, AiLab IMCS UL
The corpus consists of PhD theses and summaries published in the University of Latvia, Riga Technical University, Riga Stradins University and Liepaja University until 2020.
This Dataset is harvested from our partners. Clicking the link will take you directly to the archival source of the data.
Dec 14, 2024
Levāne-Petrova, Kristīne; Darģis, Roberts; Pokratniece, Kristīne; Lasmanis, Viesturs Jūlijs, 2023, "Balanced Corpus of Modern Latvian (LVK2022)", https://hdl.handle.net/20.500.12574/84, AiLab IMCS UL
The Balanced Corpus of Modern Latvian, which contains unique texts not yet included in other so far developed balanced corpora (LVK2013 and LVK2018). The corpus is primarily based on the design principles of previous balanced corpora. It contains authentic contemporary texts (mostly created after 2000) of various genres with metadata. Unlike its pr...
This Dataset is harvested from our partners. Clicking the link will take you directly to the archival source of the data.
Dec 14, 2024
Levāne-Petrova, Kristīne; Darģis, Roberts, 2018, "Balanced Corpus of Modern Latvian (LVK2018)", https://hdl.handle.net/20.500.12574/11, AiLab IMCS UL
LVK2018 is a balanced and representative 10 million word text corpus of modern Latvian. It represents five different genres: journalism (60%), fiction (20%), scientific (10%), legal (8%), transcriptions (2%). LVK2018 is an extended version of LVK2013.
This Dataset is harvested from our partners. Clicking the link will take you directly to the archival source of the data.
Dec 14, 2024
Levāne-Petrova, Kristīne; Pokratniece, Kristīne; Vēvere, Daira; Poikāns, Ilmārs; Andronova, Everita, 2013, "Balanced Corpus of Modern Latvian (LVK2013) Līdzsvarots latviešu valodas korpuss (LVK2013)", https://hdl.handle.net/20.500.12574/44, AiLab IMCS UL
LVK2013 is the 4.5 million representative corpus of contemporary Latvian. LVK2013 is designed as a general language, representative and balanced corpus that aims to cover the variety of existing texts in some estimated proportions. The corpus contains six different sections: journalism (55%), fiction (20%), scientific (10%), legal (8%), other texts...
This Dataset is harvested from our partners. Clicking the link will take you directly to the archival source of the data.
Dec 14, 2024
Spektors, Andrejs; Grūzītis, Normunds; Darģis, Roberts; Auziņa, Ilze; Saulīte, Baiba; Levāne-Petrova, Kristīne, 2018, "Rainis", https://hdl.handle.net/20.500.12574/41, AiLab IMCS UL
This specialised text corpus contains all of Rainis work: plays, poetry, prose, journalism, translations, letters,etc.
This Dataset is harvested from our partners. Clicking the link will take you directly to the archival source of the data.
Dec 14, 2024
Darģis, Roberts, 2022, "Corpus of Legal Acts of the Republic of Latvia (Likumi)", https://hdl.handle.net/20.500.12574/65, AiLab IMCS UL
The corpus contains all legal acts of the Republic of Latvia published on the website likumi. lv (until February 2022).
This Dataset is harvested from our partners. Clicking the link will take you directly to the archival source of the data.
Dec 14, 2024
Auziņa, Ilze; Kaija, Inga; Levāne-Petrova, Kristīne; Pokratniece, Kristīne; Darģis, Roberts, 2021, "Latvian Learner Corpus (LaVa)", https://hdl.handle.net/20.500.12574/42, AiLab IMCS UL
The corpus includes almost 1000 texts created by foreign students studying at a Latvian higher education institution who are learning Latvian as a foreign language in the first or second semester. The morphologically annotated texts have been checked manually; the language learners' errors have been manually annotated.
This Dataset is harvested from our partners. Clicking the link will take you directly to the archival source of the data.
Dec 14, 2024
Andronova, Everita; Spektors, Andrejs; Vanags, Pēteris; Baltiņa, Maija; Trumpa, Anta; Trumpa, Edmunds; Grūzītis, Normunds; Siliņa-Piņķe, Renāte; Frīdenberga, Anna; Skrūzmane, Elga; Ķauķīte, Sintija; Pretkalniņa, Lauma, 2022, "The Corpus of Early Written Latvian (2022)", https://hdl.handle.net/20.500.12574/90, AiLab IMCS UL
The Corpus of early written Latvian ‘SENIE’ provides access to the texts of written Latvian of the 16th–18th century, and its aim is to facilitate studies of early Latvian in general (e.g. the lexis, morphology and syntax of the texts) and to serve as the basis for "The Historical dictionary of Latvian (16th–17th cc.)". The Corpus was first launche...
This Dataset is harvested from our partners. Clicking the link will take you directly to the archival source of the data.
Dec 14, 2024
Andronova, Everita; Spektors, Andrejs; Vanags, Pēteris; Baltiņa, Maija; Trumpa, Anta; Trumpa, Edmunds; Grūzītis, Normunds; Siliņa-Piņķe, Renāte; Frīdenberga, Anna; Skrūzmane, Elga; Ķauķīte, Sintija, 2002, "The Corpus of Early Written Latvian", https://hdl.handle.net/20.500.12574/12, AiLab IMCS UL
The Corpus of early written Latvian ‘SENIE’(16th-18th cc.) provides access to the texts of written Latvian of the 16th–18th century, and its aim is to facilitate studies of early Latvian in general (e.g. the lexis, morphology and syntax of the texts) and to serve as the basis for the Historical dictionary of the Latvian language. The Corpus was fir...
This Dataset is harvested from our partners. Clicking the link will take you directly to the archival source of the data.
Add Data

Sign up or log in to create a dataverse or add a dataset.

Share Dataverse

Share this dataverse on your favorite social media networks.

Link Dataverse
Reset Modifications

Are you sure you want to reset the selected metadata fields? If you do this, any customizations (hidden, required, optional) you have done will no longer appear.