Stance detection with LLMs

Promises and pitfalls of using large language models to identify actor stances in political discourse

Viviane Walker, Mario Angst, Gerold Schneider

University of Zürich

Should you use LLMs for stance detection?

The challenge

General task

In a text, identify

the stance (support, opposition, irrelevant)
of a (any) named entity
toward a given (any) statement
based on the text.

Rationale

identification of stances of actors highly relevant to social science research
this type of stance detection task still a challenge
very general task -> LLMs are very general models

Research Question

Can zero-shot stance classification with LLMs achieve adequate performance in an applied, empirical social science research setting?

State of the Art

zero-shot stance detection: by an LLMs > by a fine-tuned BERT (Zhang, Ding, and Jing 2022)
prompt engineering for zero-shot stance detection (Liu et al. 2023)
little research on real-world task examples
most of the world does not speak English

Methods

Prompt chain example: s2

Prompt chain example: is2

Prompt chain example: nise

Evaluation data

source: Swissdox@LiRI
sample: 1710
characteristics: manually annotated German newspaper article paragraphs
domain: urban sustainable transport discourse
five normative statements:
- Air travel should be reduced.
- The number of parking spaces for motorized individual traffic in the city should reduced.
- The bicycle as a form of mobility should be promoted.
- E-mobility in the form of e-cars, e-buses, e-scooters and e-bikes should be promoted.
- Driving speed in the city should be reduced to mitigate emissions.

Evaluation setup

Four distinct LLMs finetuned on German text:

VAGOsolutions/Llama-3-SauerkrautLM-70b-Instruct
VAGOsolutions/Llama-3-SauerkrautLM-8b-Instruct
DiscoResearch/Llama3-DiscoLeo-Instruct-8B-v0.1
seedboxai/Llama-3-KafkaLM-8B-v0.1
–> Evaluate each LLM with each prompt chain on the entire evaluation data

Results

	prompt chain	Precision	Recall	F1 macro avg	F1 weighted
Llamasauerkraut70b	sis (not masked)	0.67	0.72	0.69	0.74
Llamasauerkraut8b	is2 (not masked)	0.64	0.55	0.55	0.66
Llama3kafka8b	sis (masked entity)	0.58	0.62	0.59	0.66
Llama3discoleo8b	is2 (not masked)	0.60	0.49	0.50	0.63

More results

Conclusion

So… Should you use LLMs for stance detection? Probably not.

slow inference
high quality evaluation sets needed
probably not worth it for non-repeated measurements
special considerations to make (entity masking!)
data redundancy is helpful
leverage constrained generation

Software

pip install stance-llm
pip install guidance

1from guidance import models
from stance_llm.process import detect_stance

2disco7b = models.Transformers("DiscoResearch/DiscoLM_German_7b_v1")

my_eg = [
  {"text":"Emily will LLMs in den Papageienzoo sperren und streng beaufsichtigen.", 
  "ent_text": "Emily", 
3  "statement": "LLMs sollten in den Papageienzoo gesperrt werden."}]
  
classification = detect_stance(
            eg = my_eg,
            llm = disco7b,
4            chain_label = "is"
        )
        
classification.stance

1: stance-llm relies on guidance
2: Choose an LLM accessible through guidance (Transformers, LlamaCpp, OpenAI)
3: The task format
4: This is where you can choose a prompt chain

Test it on your own data

More documentation, features and possibility to contribute here:

https://github.com/urban-sustainability-lab-zurich/stance-llm

Thanks

Additional members of the annotation team:

Neitah Müller
Myriam Pham-Truffert

Funding:

DIZH (Digitalisierungsinitiative der Zürcher Hochschulen)

Contact:

mario.angst@uzh.ch

References

Liu, Pengfei, Weizhe Yuan, Jinlan Fu, Zhengbao Jiang, Hiroaki Hayashi, and Graham Neubig. 2023. “Pre-Train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing.” ACM Computing Surveys 55 (9): 1–35. https://doi.org/10.1145/3560815.

Zhang, Bowen, Daijun Ding, and Liwen Jing. 2022. “How Would Stance Detection Techniques Evolve After the Launch of Chatgpt?” arXiv Preprint arXiv:2212.14548.