Jailbreak
User crafts the malicious prompt themselves to bypass guardrails.
Crafted input overrides intended instructions — directly, or smuggled in through external / retrieved content the model treats as trusted.
User crafts the malicious prompt themselves to bypass guardrails.
Payload hides in a web page, file, or email the model later reads.
The model reveals PII, secrets, proprietary data, or fragments of its own training data — the biggest mover of the 2025 list.
Compromised base models, datasets, fine-tune adapters, plugins, or packages anywhere in the AI build chain.
Fine-tuned, re-uploaded copies can hide backdoors with no provenance trail.
Typosquatted and trojanized model artifacts mirror classic package attacks.
Malicious data corrupts the model — introducing backdoors, bias, or degraded behavior at pre-training, fine-tuning, or retrieval time.
A hidden phrase flips the model to attacker-chosen behavior.
Skewed outputs that evade standard accuracy benchmarks.
Unvalidated model output flows straight into downstream systems and executes — turning a text response into XSS, SQLi, or RCE.
Stored / reflected XSS
SQL injection
Remote code exec
Phishing · data exfil
Too much functionality, permission, or autonomy lets an agent take damaging actions on the user's behalf — the defining risk of agentic AI.
Secrets, rules, or business logic hidden in the system prompt get extracted by users — assume the prompt is never private.
Flaws in RAG embeddings and vector stores enable poisoning, leakage, and cross-tenant access — new risk for the RAG era.
Weak namespace isolation returns another customer's chunks.
Stored vectors reconstructed back into sensitive source text.
Confident but false output — hallucinations users over-trust and act on. The gap between stated confidence and real accuracy is the danger.
Invented citations, APIs, legal cases stated with full fluency.
Plausible snippets with silent vulnerabilities shipped to prod.
Uncontrolled inference volume drains compute and budget — spanning denial-of-service, runaway cloud cost, and model-extraction theft.
Stream prompts, tool calls & retrievals through real-time guardrails — injection & PII detection, agency limits, and cost circuit-breakers, watched as they happen.