Topics
Browse posts by category and tag — every topic we cover, with the latest pieces under each.
Tags
- #adversarial-ml 15
- #ml-security 9
- #adversarial-examples 5
- #privacy 4
- #adversarial-robustness 2
- #backdoor-attacks 2
- #data-poisoning 2
- #evasion-attacks 2
- #image-classifiers 2
- #llm-security 2
- #membership-inference 2
- #model-inversion 2
- #red-team 2
- #robustness 2
- #adaptive-attacks 1
- #adversarial-defense 1
- #adversarial-nlp 1
- #adversarial-patches 1
- #adversarial-training 1
- #alignment 1
- #api-security 1
- #autoattack 1
- #autonomous-vehicles 1
- #black-box-attacks 1
- #carlini-wagner 1
- #certified-robustness 1
- #clip-attacks 1
- #differential-privacy 1
- #embedding-inversion 1
- #evaluation 1
- #evasion 1
- #federated-learning 1
- #fgsm 1
- #formal-verification 1
- #foundation-models 1
- #gcg 1
- #gdpr 1
- #gradient-inversion 1
- #gradient-masking 1
- #input-agnostic-attacks 1
- #jailbreaking 1
- #mart 1
- #memorization 1
- #metrics 1
- #ml-privacy 1
- #model-extraction 1
- #model-stealing 1
- #multimodal-adversarial-attacks 1
- #neural-networks 1
- #nlp 1
- #object-detection 1
- #optimization-attacks 1
- #pgd 1
- #physical-adversarial-attacks 1
- #production-ml 1
- #randomized-smoothing 1
- #robustbench 1
- #robustness-accuracy-tradeoff 1
- #robustness-evaluation 1
- #text-attacks 1
- #threat-modeling 1
- #trades 1
- #training-data 1
- #training-data-extraction 1
- #training-data-privacy 1
- #transferability 1
- #transformers 1
- #trojan-ml 1
- #unforeseen-attacks 1
- #universal-adversarial-perturbations 1
- #vec2text 1
- #vector-store 1
- #vision-language-models 1
- #visual-adversarial-examples 1
Categories
attacks 12 posts
- Embedding Inversion: Reconstructing Text From VectorsEmbedding inversion recovers the original text from a model's embedding vectors, breaking the assumption that embeddings are an opaque, privacy-safe
- Adversarial Attacks on Vision-Language Models: CLIP, LLaVA, GPT-4Vision-language models expand the adversarial attack surface beyond image classifiers: adversarial images can manipulate text outputs, carry visual
- Adversarial Patch Attacks: Physical Perturbations That Fool MLAdversarial patches are large, visible, localized perturbations designed to survive physical-world conditions — printing, lighting, and camera optics.
- Universal Adversarial Perturbations: One Vector That Fools InputsUnlike per-image attacks, universal adversarial perturbations are input-agnostic: a single crafted noise vector causes misclassification across virtually
- Adversarial Robustness in NLP: Why Text Attacks Are DifferentDiscrete input spaces, semantic constraints, and human-perceptibility rules change what counts as an adversarial example in text.
- Data Poisoning and Backdoor Attacks on Foundation ModelsTraining data manipulation, backdoor triggers, and Trojan attacks against large-scale models. What the threat model actually requires and where the
defenses 3 posts
- Adversarial Training Methods: PGD-AT, TRADES, and MARTAdversarial training is the most defensible empirical robustness method, but 'adversarial training' isn't one thing.
- Evaluating Adversarial Robustness Without Fooling YourselfMost defenses that claim robustness are later broken — not because the idea was bad, but because the evaluation was.
- Certified Robustness via Randomized Smoothing: What It GuaranteesRandomized smoothing gives you a provable robustness radius. Understanding what that certificate means in practice — and where it breaks — is more useful
primer 2 posts
- Adversarial Examples vs. Data Poisoning: Timing Is EverythingAdversarial examples attack a deployed model at inference; data poisoning attacks the model before it is deployed.
- Membership Inference vs. Model Inversion: Privacy AttacksMembership inference asks 'was this sample in the training set?' Model inversion asks 'what samples were in the training set?