LLM04: Model Denial of Service
Model Denial of Service (DoS) attacks exploit the high computational cost of LLM inference. Attackers craft inputs that maximize resource consumption — extremely long prompts, recursive reasoning loops, multi-step agent workflows that never terminate, or batch requests that exhaust GPU memory. Unlike traditional network DoS, model DoS targets expensive compute resources.
LLM-Specific DoS Vectors
Traditional DoS floods a server with requests. LLM DoS is different — a single carefully crafted request can consume minutes of GPU time and megabytes of memory. Attack vectors include: (1) Maximum-length prompts with complex reasoning tasks. (2) Recursive tool-calling loops in agent frameworks. (3) Fan-out attacks where one request triggers dozens of LLM sub-calls. (4) Context window stuffing with repetitive content that's expensive to process.
Agent Loop Exhaustion
Autonomous agents (LangChain, AutoGPT, Cursor Agent Mode) execute multi-step workflows where each step involves an LLM call. An attacker can craft a task that creates an infinite loop — the agent calls a tool, processes the result, decides it needs more information, calls another tool, and so on. Without a maximum iteration limit, a single malicious request can consume unlimited compute.
Cost Amplification
With pay-per-token LLM APIs, DoS attacks directly translate to financial damage. An attacker who discovers an unprotected endpoint can generate thousands of dollars in API costs in minutes. This is especially dangerous for applications using GPT-4 or Claude at $15-60 per million tokens.
⚔️ Attack Examples & Code Patterns
Agent infinite loop attack
A task that causes an AI agent to loop indefinitely:
Token bomb — maximum compute per request
Crafting a prompt that maximizes output tokens:
🔍 Detection Checklist
- ☐Verify all LLM API calls have max_tokens limits set
- ☐Check agent loops for max_iterations and timeout parameters
- ☐Ensure API endpoints have per-user rate limiting
- ☐Monitor LLM API costs with alerts for unusual spikes
- ☐Test with maximum-length inputs to measure worst-case latency
- ☐Verify streaming responses have timeout handling
🛡️ Mitigation Strategy
Implement strict token limits for input and output. Apply rate limiting per user/API key. Set timeouts on LLM calls and agent execution loops. Monitor GPU/CPU usage and set cost alerts. Use request queuing with priority-based processing.
How Precogs AI Protects You
Precogs AI identifies unbounded LLM calls in your code — missing token limits, infinite agent loops, and unthrottled API endpoints. AutoFix PRs add proper resource controls and circuit breakers.
Start Free ScanHow do LLM denial of service attacks work?
LLM DoS attacks exploit the high computational cost of AI inference. A single crafted request can consume minutes of GPU time. Common vectors include agent infinite loops, maximum-length prompts, and token bombs. Prevention requires token limits, timeouts, rate limiting, and iteration caps on agent loops.
Protect Against LLM04: Model Denial of Service
Precogs AI automatically detects llm04: model denial of service vulnerabilities and generates AutoFix PRs.