Documentation
Read more about the methodology, where the skills are generated from, and the research foundations of MediaGuard.
Methodology
MediaGuard is an LLM-based system for detecting rhetorical manipulation techniques in video transcripts and news text. It uses large language models (Mistral) to identify manipulation techniques, guided by skill definitions that encode technique taxonomies and recognition criteria.
Evaluation protocol
- Input: Article text converted to pseudo-transcript format (sentence-level segments with timestamps).
- Analysis: Mistral API (
mistral-small-latest) with skills context; returns JSON alerts withtechnique,quote,start,end. - Span-based metrics: Gold span matches prediction if the predicted quote overlaps the gold span (character-level) and the predicted technique maps to the same gold technique.
- LLM-as-judge: Mistral receives text, gold spans, and predicted alerts, then returns a score 0–1 indicating how well predictions match gold.
Where skills are generated from
MediaGuard uses a multi-source skill generation approach. Skills are built from the following sources:
1. PRTA (ACL 2020)
Base definitions from the PRTA paper — A System to Support the Analysis of Propaganda Techniques in the News. Provides the Tanbih framework and core technique definitions.
2. SemEval-2020 Task 11
The SemEval-2020 Task 11 taxonomy defines 14 manipulation techniques (e.g., appeal-to-authority, loaded-language, appeal-to-fear-prejudice) with span identification. Our technique slugs align with this taxonomy.
3. PropaInsight (COLING 2025)
PropaInsight enriches each technique with:
- Appeals — Emotional/arousal evoked in readers (e.g., fear, credibility, belonging)
- Intent — Author motive (e.g., persuade, discredit, create urgency)
- Common confusions — Distinguishing cues (e.g., legitimate skepticism vs manufactured doubt)
4. Merged definitions
The skill generator combines:
data/technique-definitions-static.json— PRTA + SemEval base definitionsdata/semeval-export.json— Curated technique examples from the 50-item evaluation setdata/propainsight-supplement.json— Appeals, intent, and confusions per technique
Output: output/technique-definitions.json and output/skills-dataset/{slug}/SKILL.md for each of the 14 techniques.
Benchmark results
MediaGuard was benchmarked on a 50-item evaluation set with gold-annotated spans.
| Agent | Samples | F1 |
|---|---|---|
| skills-dataset (multi-source) | 50 | 64.2% |
| Agent | Judge score |
|---|---|
| skills-dataset (multi-source) | 70.0% |
PropaInsight enrichment improves over the single-source baseline (+6.4% F1, +8.2% judge).
14 techniques (SemEval taxonomy)
- appeal-to-authority
- appeal-to-fear-prejudice
- bandwagon-reductio-ad-hitlerum
- black-and-white-fallacy
- causal-oversimplification
- doubt
- exaggeration-minimisation
- flag-waving
- loaded-language
- name-calling-labeling
- repetition
- slogans
- thought-terminating-cliches
- whataboutism-straw-men-red-herring
Further reading
For the full paper-style conclusion, evaluation protocol, and per-technique performance, see the CONCLUSION.md in the repository.