Tag

#adversarial-ml

16 posts tagged adversarial-ml.

deep-dive

PLAA: What a 92.78% NIDS Evasion Rate Actually Tells You About Feature-Space Attacks

A new arXiv paper builds adversarial network traffic at the packet level instead of the flow level, hitting a 92.78% evasion rate against deep-learning NIDS. Here's why that framing matters more than the number.
June 29, 2026
attacks

Embedding Inversion: Reconstructing Text From Vectors

Embedding inversion recovers the original text from a model's embedding vectors, breaking the assumption that embeddings are an opaque, privacy-safe
June 12, 2026
defenses

Adversarial Training Methods: PGD-AT, TRADES, and MART

Adversarial training is the most defensible empirical robustness method, but 'adversarial training' isn't one thing.
May 21, 2026
defenses

Evaluating Adversarial Robustness Without Fooling Yourself

Most defenses that claim robustness are later broken — not because the idea was bad, but because the evaluation was.
May 21, 2026
primer

Adversarial Examples vs. Data Poisoning: Timing Is Everything

Adversarial examples attack a deployed model at inference; data poisoning attacks the model before it is deployed.
May 10, 2026
primer

Membership Inference vs. Model Inversion: Privacy Attacks

Membership inference asks 'was this sample in the training set?' Model inversion asks 'what samples were in the training set?
May 10, 2026
attacks

Adversarial Attacks on Vision-Language Models: CLIP, LLaVA, GPT-4

Vision-language models expand the adversarial attack surface beyond image classifiers: adversarial images can manipulate text outputs, carry visual
May 10, 2026
attacks

Adversarial Patch Attacks: Physical Perturbations That Fool ML

Adversarial patches are large, visible, localized perturbations designed to survive physical-world conditions — printing, lighting, and camera optics.
May 10, 2026
attacks

Universal Adversarial Perturbations: One Vector That Fools Inputs

Unlike per-image attacks, universal adversarial perturbations are input-agnostic: a single crafted noise vector causes misclassification across virtually
May 10, 2026
attacks

Adversarial Robustness in NLP: Why Text Attacks Are Different

Discrete input spaces, semantic constraints, and human-perceptibility rules change what counts as an adversarial example in text.
May 9, 2026
attacks

Data Poisoning and Backdoor Attacks on Foundation Models

Training data manipulation, backdoor triggers, and Trojan attacks against large-scale models. What the threat model actually requires and where the
May 9, 2026
attacks

Evasion Attacks on Image Classifiers: FGSM, PGD, and C&W

The three foundational gradient-based evasion attacks, what each one actually optimizes, and what the benchmark numbers mean when you're evaluating a defense.
May 9, 2026
attacks

Model Inversion Attacks: Reconstructing Training Data from Output

From Fredrikson's pharmacogenetics exploit to Geiping's gradient inversion, model inversion attacks recover private training data in ways most ML
May 9, 2026
attacks

Adversarial Transferability: Why Black-Box Attacks Work at All

Adversarial examples transfer across models with different architectures and training sets. Understanding why changes what you think defenses need to
May 9, 2026
red-team

GCG-Class Adversarial Suffix Attacks: A 2026 Practitioner Primer

The math, the cost curve, and why optimization-based attacks are now within reach of solo practitioners. With reproducible setup and what defenders
May 6, 2026
attacks

Model Extraction via Query-Based Functional Stealing

Query-based model stealing attacks can recover a functionally equivalent model from API access alone. The economics matter more than the technique: here's
May 6, 2026