LLM04: Model Denial of Service

Verified by Precogs Threat Research
LLM04:2025MEDIUMCWE-400CWE-770CWE-835

Model Denial of Service (DoS) attacks exploit the high computational cost of LLM inference. Attackers craft inputs that maximize resource consumption — extremely long prompts, recursive reasoning loops, multi-step agent workflows that never terminate, or batch requests that exhaust GPU memory. Unlike traditional network DoS, model DoS targets expensive compute resources.

LLM-Specific DoS Vectors

Traditional DoS floods a server with requests. LLM DoS is different — a single carefully crafted request can consume minutes of GPU time and megabytes of memory. Attack vectors include: (1) Maximum-length prompts with complex reasoning tasks. (2) Recursive tool-calling loops in agent frameworks. (3) Fan-out attacks where one request triggers dozens of LLM sub-calls. (4) Context window stuffing with repetitive content that's expensive to process.

Agent Loop Exhaustion

Autonomous agents (LangChain, AutoGPT, Cursor Agent Mode) execute multi-step workflows where each step involves an LLM call. An attacker can craft a task that creates an infinite loop — the agent calls a tool, processes the result, decides it needs more information, calls another tool, and so on. Without a maximum iteration limit, a single malicious request can consume unlimited compute.

Cost Amplification

With pay-per-token LLM APIs, DoS attacks directly translate to financial damage. An attacker who discovers an unprotected endpoint can generate thousands of dollars in API costs in minutes. This is especially dangerous for applications using GPT-4 or Claude at $15-60 per million tokens.

⚔️ Attack Examples & Code Patterns

Agent infinite loop attack

A task that causes an AI agent to loop indefinitely:

# ❌ VULNERABLE — no iteration limit on agent loop
from langchain.agents import AgentExecutor

agent = AgentExecutor(agent=llm_agent, tools=tools)
# This task causes the agent to endlessly search and re-search
result = agent.run("Find the current stock price of every 
  company in the S&P 500 and verify each one is correct")
# Agent loops: search → verify → "not sure" → search again → ...

# ✅ SAFE — iteration limit + timeout
agent = AgentExecutor(
    agent=llm_agent,
    tools=tools,
    max_iterations=10,        # Hard cap on loops
    max_execution_time=30,    # 30-second timeout
    early_stopping_method="generate"
)

Token bomb — maximum compute per request

Crafting a prompt that maximizes output tokens:

# ❌ VULNERABLE — no output token limit
response = openai.chat.completions.create(
    model="gpt-4",
    messages=[{"role": "user", "content": 
        "Write a detailed 50,000-word analysis of..."}],
    # No max_tokens set — model generates until context limit
)
# Cost: ~$3-5 per request at GPT-4 pricing

# ✅ SAFE — enforce output limits
response = openai.chat.completions.create(
    model="gpt-4",
    messages=[{"role": "user", "content": user_input}],
    max_tokens=1000,   # Hard limit on output
    timeout=15         # 15-second timeout
)

🔍 Detection Checklist

  • Verify all LLM API calls have max_tokens limits set
  • Check agent loops for max_iterations and timeout parameters
  • Ensure API endpoints have per-user rate limiting
  • Monitor LLM API costs with alerts for unusual spikes
  • Test with maximum-length inputs to measure worst-case latency
  • Verify streaming responses have timeout handling

🛡️ Mitigation Strategy

Implement strict token limits for input and output. Apply rate limiting per user/API key. Set timeouts on LLM calls and agent execution loops. Monitor GPU/CPU usage and set cost alerts. Use request queuing with priority-based processing.

🛡️

How Precogs AI Protects You

Precogs AI identifies unbounded LLM calls in your code — missing token limits, infinite agent loops, and unthrottled API endpoints. AutoFix PRs add proper resource controls and circuit breakers.

Start Free Scan

How do LLM denial of service attacks work?

LLM DoS attacks exploit the high computational cost of AI inference. A single crafted request can consume minutes of GPU time. Common vectors include agent infinite loops, maximum-length prompts, and token bombs. Prevention requires token limits, timeouts, rate limiting, and iteration caps on agent loops.

Protect Against LLM04: Model Denial of Service

Precogs AI automatically detects llm04: model denial of service vulnerabilities and generates AutoFix PRs.