NLP

This page lists some useful resources for students and researchers interested in text-as-data

Corpus

Collection of multiple Norwegian corpuses that are suitable for training large language models or conducting independent research. The corpus contains government reports, Stortingsforhandlingene, Evalueringsrapporter, laws and NOUs, online newspapers, Wikipedia, and out-of-copyrights books from the Norwegian National Library.

Norwegian Parliamentary Debates Dataset 1945–2024

with Jon Fiva and Henning Øien, Accepted, Nature Scientific Data (2024)

Data set with all Norwegian Parliamentary speeches in the period 1945 – 2024. We also include speaker and speech meta data (e.g., committee membership, district, minister, elected, deputy…). Can be merged with Fiva and Smith, 2022 for comprehensive background data on national-level politicians.

Norwegian NLP resources