Tag
#adversarial-ml
16 posts tagged adversarial-ml.
- deep-dive
PLAA: What a 92.78% NIDS Evasion Rate Actually Tells You About Feature-Space Attacks
A new arXiv paper builds adversarial network traffic at the packet level instead of the flow level, hitting a 92.78% evasion rate against deep-learning NIDS. Here's why that framing matters more than the number.
- attacks
Embedding Inversion: Reconstructing Text From Vectors
Embedding inversion recovers the original text from a model's embedding vectors, breaking the assumption that embeddings are an opaque, privacy-safe
- defenses
Adversarial Training Methods: PGD-AT, TRADES, and MART
Adversarial training is the most defensible empirical robustness method, but 'adversarial training' isn't one thing.
- defenses
Evaluating Adversarial Robustness Without Fooling Yourself
Most defenses that claim robustness are later broken — not because the idea was bad, but because the evaluation was.
- primer
Adversarial Examples vs. Data Poisoning: Timing Is Everything
Adversarial examples attack a deployed model at inference; data poisoning attacks the model before it is deployed.
- primer
Membership Inference vs. Model Inversion: Privacy Attacks
Membership inference asks 'was this sample in the training set?' Model inversion asks 'what samples were in the training set?
- attacks
Adversarial Attacks on Vision-Language Models: CLIP, LLaVA, GPT-4
Vision-language models expand the adversarial attack surface beyond image classifiers: adversarial images can manipulate text outputs, carry visual
- attacks
Adversarial Patch Attacks: Physical Perturbations That Fool ML
Adversarial patches are large, visible, localized perturbations designed to survive physical-world conditions — printing, lighting, and camera optics.
- attacks
Universal Adversarial Perturbations: One Vector That Fools Inputs
Unlike per-image attacks, universal adversarial perturbations are input-agnostic: a single crafted noise vector causes misclassification across virtually
- attacks
Adversarial Robustness in NLP: Why Text Attacks Are Different
Discrete input spaces, semantic constraints, and human-perceptibility rules change what counts as an adversarial example in text.
- attacks
Data Poisoning and Backdoor Attacks on Foundation Models
Training data manipulation, backdoor triggers, and Trojan attacks against large-scale models. What the threat model actually requires and where the
- attacks
Evasion Attacks on Image Classifiers: FGSM, PGD, and C&W
The three foundational gradient-based evasion attacks, what each one actually optimizes, and what the benchmark numbers mean when you're evaluating a defense.
- attacks
Model Inversion Attacks: Reconstructing Training Data from Output
From Fredrikson's pharmacogenetics exploit to Geiping's gradient inversion, model inversion attacks recover private training data in ways most ML
- attacks
Adversarial Transferability: Why Black-Box Attacks Work at All
Adversarial examples transfer across models with different architectures and training sets. Understanding why changes what you think defenses need to
- red-team
GCG-Class Adversarial Suffix Attacks: A 2026 Practitioner Primer
The math, the cost curve, and why optimization-based attacks are now within reach of solo practitioners. With reproducible setup and what defenders
- attacks
Model Extraction via Query-Based Functional Stealing
Query-based model stealing attacks can recover a functionally equivalent model from API access alone. The economics matter more than the technique: here's