NLP

This page lists some useful resources for students and researchers interested in text-as-data

Corpus

Norwegian Colossal Corpus

Norwegian Parliamentary Debates Dataset 1945–2024

  with Jon Fiva and Henning Øien, Accepted, Nature Scientific Data (2024)

Data set with all Norwegian Parliamentary speeches in the period 1945 – 2024. We also include speaker and speech meta data (e.g., committee membership, district, minister, elected, deputy…). Can be merged with Fiva and Smith, 2022 for comprehensive background data on national-level politicians.

Norwegian NLP resources


Methods

Intro to Quanteda

Text Algorithms in Economics (Ash and Hansen, 2023)

Text as Data (Gentzkow, Kelly, and Taddy, 2019)

Multilanguage Word Embeddings for Social Scientists (Wirshing et al., 2024)


Other useful resources

Oslo-Bergen Tagger

Friends Don’t Let Friends Make Bad Graphs