sustainability.discourses: conceptual foundations and technical documentation

An automated text analysis approach to track sustainabiliy discourse over time using automated content analysis of media data

Author

Affiliation

Mario Angst

University of Zürich, Digital Society Initiative

Published

March 10, 2025

Note

An open access article documenting our analysis with more technical details and including model metrics is available at:

Angst M, Müller NN, Walker V. Automated extraction of discourse networks from large volumes of media data. Network Science. 2025;13:e4. doi:10.1017/nws.2025.4

Introduction

Sustainability is often subject of societal debate and discourse. For example, actors involved in urban sustainability governance are sometimes forced, sometimes interested in publicly taking a stance in relation to various topics. For example, within the field of sustainable urban transport, discourse topics may range from support of certain modes of transport over others to reactions to emerging new technologies, such as electrification of vehicles or mobility platforms.

When actors take stances towards salient discussion points within topics, over time they create discourse networks (Leifeld 2020). Here, we understand discourse networks as a heuristic to understand societal discourse that views it as composed of the interplay between actors, policy issues, as well as actor stances towards them and (in extension) vis-à-vis each other.

Discourse networks are an essential component of governance and complement to material policymaking (such as the actual policies passed, implemented, not implemented or evaluated). Discourse establishes different narratives (Schlaufer et al. 2022), what policy options are on agendas but also for example influences subtle practices of implementation.

The study of discourse networks has until now mostly relied on high-quality annotations of newspaper articles. Here, we illustrate the technical details of a complementary approach relying on automated text analysis of media data to track and analyze discourse networks. We introduce the conceptual foundations and document the technical details of an exemplary analysis pipeline as developed in a project analyzing the discourse around sustainable urban transport in the city of Zürich.

The results of the analysis pipeline can be interactively explored in an interactive web application, which is regularly updated with recent data and improved models.

The components of discourse

Below, the three essential components of discourse in our model, based on a discourse network heuristic to understand it, are shown.

Actors: The agents in the discourse
Policy beliefs: Key beliefs around the fundamental policy option in a given discourse topics, which actors can have a stance toward
Stances: A qualified link between organizations and beliefs. Possible stances could for example be Opposition, Support or Neutral

flowchart LR
  ACTOR[Actor]
  BELIEF[Policy belief]
  ACTOR -->|Stance| BELIEF

A concrete example could look as follows, around a core policy belief in the sustainable urban transport discourse.

flowchart TB
  UMVERKEHR[Organisation: \n umverkehR ]
  FDP[Organisation: \n FDP Zürich ]
  BELIEF[Policy belief: \n German: Der motorisierte Individualverkehr in der Stadt sollte reduziert werden \n English: Motorized individual vehicular traffic in the city should be reduced]
  UMVERKEHR -->|Stance: Support| BELIEF
  FDP -->|Stance: Opposition| BELIEF

Automated analysis

Data (currently)

Currently, our analysis relies on a media data corpus provided by Swissdox@LiRi. We cover the timespan from 2010 up until the current date (inference is run daily).

Open source/ Reproducibility

Code and trained/ developed models (statistical/ rule-based) to reproduce the classification API used in the sustainability.discourses project is available at https://doi.org/10.5281/zenodo.14702517.

Analysis pipeline

The diagram below shows an abstracted view of our analysis pipeline. Blue components describe extracted data, while black descriptions on arrows describe modeling steps (algorithmic or manual), which are described in more detail below.

flowchart TD
  article
  article_zh[Zürich related article]
  paragraph
  SDG-target[assigned SDG targets]
  topic[assigned discourse topics and policy beliefs]
  organisation[detected organisations]
  stance[stances of organizations for policy beliefs]
  qual[content analysis]
  org-belief[Organization - Belief pairs]
  article -->|geolocate, filter| article_zh
  article_zh -->|split| paragraph
  paragraph -->|text classification, relevance filter| SDG-target
  SDG-target -->|text classification| topic
  paragraph -->|NER + entity linking| organisation
  organisation --> org-belief
  qual -->|main policy belief per topic| topic
  topic --> org-belief
  org-belief -->|stance detection| stance
  classDef node fill:#fefefe,stroke:#fefefe;
  linkStyle default color:#000000

Locating articles

Our pipeline starts by identifying Zürich related articles. For an article to be judged Zürich-related, it either needed to appear in a newspaper-specific local Zürich news section or match one of a list of regular expressions containing place names for Zürich.

Splitting articles into paragraphs

The analysis pipeline works with paragraphs as the core textual elements for analysis. This is mostly due to a trade-off between classification accuracy and integrating article context. As such, all articles are initially split into paragraphs for processing based on xml paragraph tags.

Text classification for SDG targets

In an initial step, a relevance classifier (based on statistical model trained on annotations and codebooks developed in the project) is used to classifiy the occurence of one or more SDG targets of relevance in a paragraph.

Currently implemented is:

🚲 SDG target 11.2, sustainable transport systems.

We implement new targets step by step. Currently in progress is:

🌆 SDG target 11.1, affordable housing

Planned afterwards is:

🌇 Climate adaptation and mitigation
🌳 Urban green areas

Text classification for discourse topics

Depending on the SDG target occuring in a paragraph, discourse topic related to the SDG target are assigned to the paragraph. This happens based on a rule-based classifier.

The full set of topics per SDG target is extracted using close reading of paragraphs and qualitative content analysis.

Assignment of core policy beliefs per topic

Every topic in the discourse is assigned a core policy belief based on close reading of the corpus. Core policy beliefs are “watershed” statements that capture the core issue at stake in a topic and which any participants participating with intent in a discourse need to be able to take a stance on.

The main policy belief per topic is also extracted during the same process of close reading of paragraphs and qualitative content analysis used to establish the set of topics.

NER + entity linking

From every paragraph, organisational actors are extracted using a pre-trained named entity recognition (NER) model. Extracted entities are are linked to an existing, curated and regularly updated registry of relevant actors in the discourse based on pattern matching.

Stance detection

Stance detection combines extracted actors with extracted topics. If an actor and a topic co-occur in a paragraph, the goal is to classify the stance of the actor toward the core policy belief in a topic. This feature was part of a MA thesis by Viviane Walker on using Large Language Models for stance detection. It is implemented in the Python package stance-llm.

A preprint on our learnings developing the package is available at https://doi.org/10.31235/osf.io/5a3k8_v1. Key results are also contained in the presentation given at the ECPR General Conference in Dublin, 2024, below:

References

Leifeld, Philip. 2020. “Policy Debates and Discourse Network Analysis: A Research Agenda.” Politics and Governance 8 (June): 180. https://doi.org/10.17645/pag.v8i2.3249.

Schlaufer, Caroline, Johanna Kuenzler, Michael D Jones, and Elizabeth A Shanahan. 2022. “The Narrative Policy Framework: A Traveler’s Guide to Policy Stories.” Politische Vierteljahresschrift 63 (2): 249–73. https://doi.org/10.1007/s11615-022-00379-6.

Citation

BibTeX citation:

@online{angst2025,
  author = {Angst, Mario},
  title = {Sustainability.discourses: Conceptual Foundations and
    Technical Documentation},
  date = {2025-03-10},
  langid = {en}
}

For attribution, please cite this work as:

Angst, Mario. 2025. “Sustainability.discourses: Conceptual Foundations and Technical Documentation.” March 10, 2025.