flowchart LR ACTOR[Actor] BELIEF[Policy belief] ACTOR -->|Stance| BELIEF
sustainability.discourses: conceptual foundations and technical documentation
An automated text analysis approach to track sustainabiliy discourse over time using automated content analysis of media data
🚧 A preprint documenting our analysis with more technical details and including model metrics is under preparation 🚧
Introduction
Sustainability is often subject of societal debate and discourse. For example, actors involved in urban sustainability governance are sometimes forced, sometimes interested in publicly taking a stance in relation to various topics. For example, within the field of sustainable urban transport, discourse topics may range from support of certain modes of transport over others to reactions to emerging new technologies, such as electrification of vehicles or mobility platforms.
When actors take stances towards salient discussion points within topics, over time they create discourse networks (Leifeld 2020). Here, we understand discourse networks as a heuristic to understand societal discourse that views it as composed of the interplay between actors, policy issues, as well as actor stances towards them and (in extension) vis-à-vis each other.
Discourse networks are an essential component of governance and complement to material policymaking (such as the actual policies passed, implemented, not implemented or evaluated). Discourse establishes different narratives (Schlaufer et al. 2022), what policy options are on agendas but also for example influences subtle practices of implementation.
The study of discourse networks has until now mostly relied on high-quality annotations of newspaper articles. Here, we illustrate the technical details of a complementary approach relying on automated text analysis of media data to track and analyze discourse networks. We introduce the conceptual foundations and document the technical details of an exemplary analysis pipeline as developed in a project analyzing the discourse around sustainable urban transport in the city of Zürich.
The results of the analysis pipeline can be interactively explored in a interactive web application, which is regularly updated with recent data and improved models.
The components of discourse
Below, the three essential components of discourse in our model, based on a discourse network heuristic to understand it, are shown.
- Actors: The agents in the discourse
- Policy beliefs: Key beliefs around the fundamental policy option in a given discourse topics, which actors can have a stance toward
- Stances: A qualified link between organizations and beliefs. Possible stances could for example be Opposition, Support or Neutral
A concrete example could look as follows, around a core policy belief in the sustainable urban transport discourse.
flowchart TB UMVERKEHR[Organisation: \n umverkehR ] FDP[Organisation: \n FDP Zürich ] BELIEF[Policy belief: \n German: Der motorisierte Individualverkehr in der Stadt sollte reduziert werden \n English: Motorized individual vehicular traffic in the city should be reduced] UMVERKEHR -->|Stance: Support| BELIEF FDP -->|Stance: Opposition| BELIEF
Automated analysis
Data (currently)
Currently, our analysis relies on a media data corpus provided by Swissdox@LiRi. We cover the timespan from 2010 up until the current data.
Analysis pipeline
The diagram below shows an abstracted view of our analysis pipeline. Blue components describe extracted data, while black descriptions on arrows describe modeling steps (algorithmic or manual), which are described in more detail below.
flowchart TD article article_zh[Zürich related article] paragraph SDG-target[assigned SDG targets] topic[assigned discourse topics and policy beliefs] organisation[detected organisations] stance[stances of organizations for policy beliefs] qual[content analysis] org-belief[Organization - Belief pairs] article -->|geolocate, filter| article_zh article_zh -->|split| paragraph paragraph -->|text classification, relevance filter| SDG-target SDG-target -->|text classification| topic paragraph -->|NER + entity linking| organisation organisation --> org-belief qual -->|main policy belief per topic| topic topic --> org-belief org-belief -->|stance detection| stance classDef node fill:#fefefe,stroke:#fefefe; linkStyle default color:#000000
Locating articles
Our pipeline starts by identifying Zürich related articles. For an article to be judged Zürich-related, it either needed to appear in a newspaper-specific local Zürich news section or match one of a list of regular expressions containing place names for Zürich.
Splitting articles into paragraphs
The analysis pipeline works with paragraphs as the core textual elements for analysis. This is mostly due to a trade-off between classification accuracy and integrating article context. As such, all articles are initially split into paragraphs for processing based on xml paragraph tags.
Text classification for SDG targets
In an initial step, a relevance classifier (based on statistical model trained on annotations and codebooks developed in the project) is used to classifiy the occurence of one or more SDG targets of relevance in a paragraph.
Currently implemented is:
- 🚲 SDG target 11.2, sustainable transport systems.
We implement new targets step by step. Currently in progress is:
- 🌆 SDG target 11.1, affordable housing
Planned afterwards is:
- 🌇 Climate adaptation and mitigation
- 🌳 Urban green areas
Text classification for discourse topics
Depending on the SDG target occuring in a paragraph, discourse topic related to the SDG target are assigned to the paragraph. This happens based on a rule-based classifier.
The full set of topics per SDG target is extracted using close reading of paragraphs and qualitative content analysis.
Assignment of core policy beliefs per topic
Every topic in the discourse is assigned a core policy belief based on close reading of the corpus. Core policy beliefs are “watershed” statements that capture the core issue at stake in a topic and which any participants participating with intent in a discourse need to be able to take a stance on.
The main policy belief per topic is also extracted during the same process of close reading of paragraphs and qualitative content analysis used to establish the set of topics.
NER + entity linking
From every paragraph, organisational actors are extracted using a pre-trained named entity recognition (NER) model. Extracted entities are are linked to an existing, curated and regularly updated registry of relevant actors in the discourse based on pattern matching.
Stance detection
Stance detection combines extracted actors with extracted topics. If an actor and a topic co-occur in a paragraph, the goal is to classify the stance of the actor toward the core policy belief in a topic. This feature was part of a MA thesis by Viviane Walker on using Large Language Models for stance detection. It is implemented in the Python package stance-llm.
We will soon publish a preprint on our learnings developing the package. Key results can be found in the presentation given at the ECPR General Conference in Dublin, 2024, below:
References
Citation
@online{angst2023,
author = {Angst, Mario},
title = {Sustainability.discourses: Conceptual Foundations and
Technical Documentation},
date = {2023-08-10},
langid = {en}
}