Topics

Browse posts by category and tag — every topic we cover, with the latest pieces under each.

Tags

Categories

attacks 12 posts

Embedding Inversion: Reconstructing Text From Vectors

Embedding inversion recovers the original text from a model's embedding vectors, breaking the assumption that embeddings are an opaque, privacy-safe
Adversarial Attacks on Vision-Language Models: CLIP, LLaVA, GPT-4

Vision-language models expand the adversarial attack surface beyond image classifiers: adversarial images can manipulate text outputs, carry visual
Adversarial Patch Attacks: Physical Perturbations That Fool ML

Adversarial patches are large, visible, localized perturbations designed to survive physical-world conditions — printing, lighting, and camera optics.
Universal Adversarial Perturbations: One Vector That Fools Inputs

Unlike per-image attacks, universal adversarial perturbations are input-agnostic: a single crafted noise vector causes misclassification across virtually
Adversarial Robustness in NLP: Why Text Attacks Are Different

Discrete input spaces, semantic constraints, and human-perceptibility rules change what counts as an adversarial example in text.
Data Poisoning and Backdoor Attacks on Foundation Models

Training data manipulation, backdoor triggers, and Trojan attacks against large-scale models. What the threat model actually requires and where the

defenses 3 posts

primer 2 posts

red-team 1 posts

GCG-Class Adversarial Suffix Attacks: A 2026 Practitioner Primer

The math, the cost curve, and why optimization-based attacks are now within reach of solo practitioners. With reproducible setup and what defenders

Research 1 posts

UAR: Measuring Neural Network Robustness Against Attacks You Haven't Seen Yet

OpenAI's Unforeseen Attack Robustness metric quantifies how well a classifier holds up against adversarial perturbations outside its training distribution