Stance detection with LLMs

Promises and pitfalls of using large language models to identify actor stances in political discourse

Viviane Walker, Mario Angst, Gerold Schneider

University of Zürich

Should you use LLMs for stance detection?

The challenge

General task

In a text, identify

  • the stance (support, opposition, irrelevant)

  • of a (any) named entity

  • toward a given (any) statement

  • based on the text.

Rationale

  • identification of stances of actors highly relevant to social science research
  • this type of stance detection task still a challenge
  • very general task -> LLMs are very general models

Research Question

Can zero-shot stance classification with LLMs achieve adequate performance in an applied, empirical social science research setting?

State of the Art

  • zero-shot stance detection: by an LLMs > by a fine-tuned BERT (Zhang, Ding, and Jing 2022)
  • prompt engineering for zero-shot stance detection (Liu et al. 2023)
  • little research on real-world task examples
  • most of the world does not speak English

Methods

Prompt chain example: s2

Prompt chain example: is2

Prompt chain example: nise

Evaluation data

  • source: Swissdox@LiRI
  • sample: 1710
  • characteristics: manually annotated German newspaper article paragraphs
  • domain: urban sustainable transport discourse
  • five normative statements:
    • Air travel should be reduced.
    • The number of parking spaces for motorized individual traffic in the city should reduced.
    • The bicycle as a form of mobility should be promoted.
    • E-mobility in the form of e-cars, e-buses, e-scooters and e-bikes should be promoted.
    • Driving speed in the city should be reduced to mitigate emissions.

Evaluation setup

Four distinct LLMs finetuned on German text:

Results

prompt chain Precision Recall F1 macro avg F1 weighted
Llamasauerkraut70b sis (not masked) 0.67 0.72 0.69 0.74
Llamasauerkraut8b is2 (not masked) 0.64 0.55 0.55 0.66
Llama3kafka8b sis (masked entity) 0.58 0.62 0.59 0.66
Llama3discoleo8b is2 (not masked) 0.60 0.49 0.50 0.63

More results

Conclusion

So… Should you use LLMs for stance detection? Probably not.

  • slow inference
  • high quality evaluation sets needed
  • probably not worth it for non-repeated measurements
  • special considerations to make (entity masking!)
  • data redundancy is helpful
  • leverage constrained generation

Software

pip install stance-llm
pip install guidance 
1from guidance import models
from stance_llm.process import detect_stance

2disco7b = models.Transformers("DiscoResearch/DiscoLM_German_7b_v1")

my_eg = [
  {"text":"Emily will LLMs in den Papageienzoo sperren und streng beaufsichtigen.", 
  "ent_text": "Emily", 
3  "statement": "LLMs sollten in den Papageienzoo gesperrt werden."}]
  
classification = detect_stance(
            eg = my_eg,
            llm = disco7b,
4            chain_label = "is"
        )
        
classification.stance
1
stance-llm relies on guidance
2
Choose an LLM accessible through guidance (Transformers, LlamaCpp, OpenAI)
3
The task format
4
This is where you can choose a prompt chain

Test it on your own data

More documentation, features and possibility to contribute here:

https://github.com/urban-sustainability-lab-zurich/stance-llm

Thanks

Additional members of the annotation team:

  • Neitah Müller
  • Myriam Pham-Truffert

Funding:

DIZH (Digitalisierungsinitiative der Zürcher Hochschulen)

Contact:

mario.angst@uzh.ch

References

Liu, Pengfei, Weizhe Yuan, Jinlan Fu, Zhengbao Jiang, Hiroaki Hayashi, and Graham Neubig. 2023. “Pre-Train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing.” ACM Computing Surveys 55 (9): 1–35. https://doi.org/10.1145/3560815.
Zhang, Bowen, Daijun Ding, and Liwen Jing. 2022. “How Would Stance Detection Techniques Evolve After the Launch of Chatgpt?” arXiv Preprint arXiv:2212.14548.