Documentation

Read more about the methodology, where the skills are generated from, and the research foundations of MediaGuard.

Methodology

MediaGuard is an LLM-based system for detecting rhetorical manipulation techniques in video transcripts and news text. It uses large language models (Mistral) to identify manipulation techniques, guided by skill definitions that encode technique taxonomies and recognition criteria.

Evaluation protocol

  • Input: Article text converted to pseudo-transcript format (sentence-level segments with timestamps).
  • Analysis: Mistral API (mistral-small-latest) with skills context; returns JSON alerts with technique, quote, start, end.
  • Span-based metrics: Gold span matches prediction if the predicted quote overlaps the gold span (character-level) and the predicted technique maps to the same gold technique.
  • LLM-as-judge: Mistral receives text, gold spans, and predicted alerts, then returns a score 0–1 indicating how well predictions match gold.

Where skills are generated from

MediaGuard uses a multi-source skill generation approach. Skills are built from the following sources:

1. PRTA (ACL 2020)

Base definitions from the PRTA paper — A System to Support the Analysis of Propaganda Techniques in the News. Provides the Tanbih framework and core technique definitions.

2. SemEval-2020 Task 11

The SemEval-2020 Task 11 taxonomy defines 14 manipulation techniques (e.g., appeal-to-authority, loaded-language, appeal-to-fear-prejudice) with span identification. Our technique slugs align with this taxonomy.

3. PropaInsight (COLING 2025)

PropaInsight enriches each technique with:

  • Appeals — Emotional/arousal evoked in readers (e.g., fear, credibility, belonging)
  • Intent — Author motive (e.g., persuade, discredit, create urgency)
  • Common confusions — Distinguishing cues (e.g., legitimate skepticism vs manufactured doubt)

4. Merged definitions

The skill generator combines:

  • data/technique-definitions-static.json — PRTA + SemEval base definitions
  • data/semeval-export.json — Curated technique examples from the 50-item evaluation set
  • data/propainsight-supplement.json — Appeals, intent, and confusions per technique

Output: output/technique-definitions.json and output/skills-dataset/{slug}/SKILL.md for each of the 14 techniques.

Benchmark results

MediaGuard was benchmarked on a 50-item evaluation set with gold-annotated spans.

Span-based metrics
Agent Samples F1
skills-dataset (multi-source) 50 64.2%
LLM-as-judge
Agent Judge score
skills-dataset (multi-source) 70.0%

PropaInsight enrichment improves over the single-source baseline (+6.4% F1, +8.2% judge).

14 techniques (SemEval taxonomy)

  • appeal-to-authority
  • appeal-to-fear-prejudice
  • bandwagon-reductio-ad-hitlerum
  • black-and-white-fallacy
  • causal-oversimplification
  • doubt
  • exaggeration-minimisation
  • flag-waving
  • loaded-language
  • name-calling-labeling
  • repetition
  • slogans
  • thought-terminating-cliches
  • whataboutism-straw-men-red-herring

Further reading

For the full paper-style conclusion, evaluation protocol, and per-technique performance, see the CONCLUSION.md in the repository.