11 to 20 of 116 Results
Dec 19, 2025
Grasmanis, Mikus; Valkovska, Baiba; Levāne-Petrova, Kristīne, 2025, "Latvian word frequency dataset", https://hdl.handle.net/20.500.12574/148, AiLab IMCS UL
This frequency list contains the 25,000 most frequent Latvian lemmas, obtained from 18 morphologically annotated corpora totalling 1.5 billion tokens from the Latvian National Corpora Collection (Korpuss.lv) and Tēzaurs.lv. Supporting academic and practical applications, including language teaching, machine translation, and speech technologies, the...This Dataset is harvested from our partners. Clicking the link will take you directly to the archival source of the data. |
Dec 19, 2025
Reinsone, Sanita; Kaščejeva, Simona; Spektors, Andrejs; Pakalns, Guntis, 2025, "Latvian Folk Legend Corpus of LPT (in Latvian)", https://hdl.handle.net/20.500.12574/147, Digital Humanities Center of the University of Latvia
The corpus includes Latvian legends published in volumes 13, 14, and 15 of "Latvian Folk Tales and Legends" (1925–1937), compiled by Pēteris Šmits. The volumes were digitised in the late 1990s; a revised version and the preparation of the German-language texts were carried out in 2012. Metadata refinement and the development of a new corpus version...This Dataset is harvested from our partners. Clicking the link will take you directly to the archival source of the data. |
Dec 11, 2025
Auziņa, Ilze; Darģis, Roberts; Rābante-Buša, Guna; Levāne-Petrova, Kristīne; Saulīte, Baiba, 2017, "Annotated longitudinal corpus of Latvian children's language", https://hdl.handle.net/20.500.12574/7, AiLab IMCS UL
The collection contains three longitudinal corpora of monolingual Latvian speaking children, and one longitudinal corpus of simultaneous Latvian-Russian bilingual child. Participants were recorded for 30 minutes each week for 16 months, resulting in 134 hours of speech. 34 hours of obtained speech samples are orthographically transcribed.This Dataset is harvested from our partners. Clicking the link will take you directly to the archival source of the data. |
Dec 3, 2025
Reinsone, Sanita; Matulis, Haralds; Ļaksa-Timinska, Ilze; Žvarte, Elvīra, 2025, "Corpus of Latvian Autobiographies", https://hdl.handle.net/20.500.12574/145, Institute of Literature, Folklore and Art of the University of Latvia
The corpus consists of 74 unpublished autobiographies, life stories, and memoirs in Latvian, written between 1900 and 2024. All materials have been collected, digitised, and are preserved in the Autobiography Collection of the Archives of Latvian Folklore, Institute of Literature, Folklore and Art, University of Latvia. The corpus has been created...This Dataset is harvested from our partners. Clicking the link will take you directly to the archival source of the data. |
Nov 27, 2025
Kļavinska, Antra; Martena, Sanita; Nau, Nicole; Šuplinska, Ilga; Anna, Briška, 2025, "Latgalian Tezaurs 2026 (Winter Edition)", https://hdl.handle.net/20.500.12574/144, AiLab IMCS UL
Latgalian Tezaurs (LTG T) is a lexical database and online dictionary of Latgalian (ISO 639-3 ltg). This version contains more than 750 entries, including many idioms and other multi-word units. Entries include spelling variants and dialect forms and name the sources where the lexical unit has been documented. Audio recordings illustrate pronunciat...This Dataset is harvested from our partners. Clicking the link will take you directly to the archival source of the data. |
Nov 27, 2025
Kļavinska, Antra; Martena, Sanita; Nau, Nicole; Šuplinska, Ilga; Anna, Briška, 2024, "Latgalian Tezaurs 2025 (Winter Edition)", https://hdl.handle.net/20.500.12574/116, Rēzekne Academy of Technologies
Latgalian Tezaurs (LTG T) is a lexical database and online dictionary of Latgalian (ISO 639-3 ltg). The pilot version of December 2024 contains more than 450 entries, including many idioms and other multi-word units. Entries include spelling variants and dialect forms and name the sources where the lexical unit has been documented. Audio recordings...This Dataset is harvested from our partners. Clicking the link will take you directly to the archival source of the data. |
Nov 26, 2025
Pretkalniņa, Lauma; Nešpore-Bērzkalne, Gunta; Pokratniece, Kristīne; Rituma, Laura, 2025, "Latvian and Latgalian Parallel Sample Treebank (Cairo)", https://hdl.handle.net/20.500.12574/143, AiLab IMCS UL
This corpus contains 20 Latvian and Latgalian sample sentences annotated in the same hybrid annotation model used in Latvian Treebank. Sentences used in this corpora are the same sentences that are used in "Cairo" sample corpora that showcase anntoation choices for Universal Dependency treebanks, and this corpus serves as a basis for both UD-Latvia...This Dataset is harvested from our partners. Clicking the link will take you directly to the archival source of the data. |
Nov 25, 2025
Rituma, Laura; Pretkalniņa, Lauma; Saulīte, Baiba; Nešpore-Bērzkalne, Gunta; Grūzītis, Normunds; Znotiņš, Artūrs, 2025, "LVTB - Latvian Treebank v2.17", https://hdl.handle.net/20.500.12574/142, AiLab IMCS UL
Latvian Treebank (LVTB) is being developed since 2010. It is manually annotated according to a hybrid dependency-constituency grammar model. This version of LVTB contains data used for deriving the corresponding version of Latvian UD Treebank (UDLV-LVTB).This Dataset is harvested from our partners. Clicking the link will take you directly to the archival source of the data. |
Nov 25, 2025
Rituma, Laura; Pretkalniņa, Lauma; Saulīte, Baiba; Nešpore-Bērzkalne, Gunta; Grūzītis, Normunds; Znotiņš, Artūrs, 2025, "LVTB - Latvian Treebank v2.16 (2025-05-15)", https://hdl.handle.net/20.500.12574/129, AiLab IMCS UL
Latvian Treebank (LVTB) is being developed since 2010. It is manually annotated according to a hybrid dependency-constituency grammar model. This version of LVTB contains data used for deriving the corresponding version of Latvian UD Treebank (UDLV-LVTB).This Dataset is harvested from our partners. Clicking the link will take you directly to the archival source of the data. |
Nov 20, 2025
Andronova, Everita; Spektors, Andrejs; Vanags, Pēteris; Baltiņa, Maija; Trumpa, Anta; Trumpa, Edmunds; Grūzītis, Normunds; Siliņa-Piņķe, Renāte; Frīdenberga, Anna; Skrūzmane, Elga; Ķauķīte, Sintija; Pretkalniņa, Lauma, 2022, "The Corpus of Early Written Latvian (2022)", https://hdl.handle.net/20.500.12574/90, AiLab IMCS UL
The Corpus of early written Latvian ‘SENIE’ provides access to the texts of written Latvian of the 16th–18th century, and its aim is to facilitate studies of early Latvian in general (e.g. the lexis, morphology and syntax of the texts) and to serve as the basis for "The Historical dictionary of Latvian (16th–17th cc.)". The Corpus was first launche...This Dataset is harvested from our partners. Clicking the link will take you directly to the archival source of the data. |
