Documentation

Nimblesite vs raw LLM APIs

Why you need more than a stateless LLM call to ship a real AI feature.

Nimblesite vs raw LLM APIs

Short version: Raw LLM APIs are stateless. Every call is a fresh start. Nimblesite is the agent that wraps all of that — memory, loop, tool dispatch, multi-tenancy — already wired up.

What a raw LLM call looks like

Here's what you write when you talk to Claude or GPT directly:

# Every call is a fresh start. You send the full history every time.
history = load_messages_from_db(conversation_id)
history.append({"role": "user", "content": "Hello"})

response = anthropic.messages.create(
    model="claude-sonnet-4-6",
    messages=history,
    tools=my_tools,
)

# If the response is a tool call, parse it, run it,
# append the result to history, and call again.
while response.stop_reason == "tool_use":
    for block in response.content:
        if block.type == "tool_use":
            result = run_tool(block.name, block.input)
            history.append({"role": "assistant", "content": response.content})
            history.append({
                "role": "user",
                "content": [{
                    "type": "tool_result",
                    "tool_use_id": block.id,
                    "content": result,
                }],
            })

    response = anthropic.messages.create(
        model="claude-sonnet-4-6",
        messages=history,
        tools=my_tools,
    )

# Save the final history back to the database.
save_messages_to_db(conversation_id, history)

return response.content[-1].text

Now multiply that by:

  • Multi-tenancy (each tenant has their own API key and their own messages table scope)
  • Prompt templating (each tenant has their own system prompt)
  • Provider failover (what if Anthropic is down?)
  • Token budget management (what if the history is longer than the context window?)
  • Logging, audit, and replay
  • Handling new SDK versions when the provider ships a breaking change

What the same thing looks like on Nimblesite

curl -X POST https://api.nimblesite.dev/api/v1/chat/$CONFIG_ID \
  -H "X-API-Key: $TENANT_KEY" \
  -d '{"message": "Hello"}'

That's it. Memory, loop, tool dispatch, multi-tenancy, prompt templating, provider failover, logging, audit — all already done.

The honest comparison

Dimension Raw LLM API Nimblesite
State You build it Built in
Agent loop You run it We run it
Tool dispatch You parse & loop We parse, you execute
Multi-tenancy You build it Built in
Prompt templating You build it Built in
Model switching Rewrite your code Edit one JSON field
Provider churn Your problem Our problem
SDK maintenance Forever None

When a raw LLM API is a better fit

  • You're making one-shot completion calls with no memory and no tools
  • You need absolute control over the exact bytes on the wire to the provider
  • You're implementing novel agent architectures that don't fit a standard loop

When Nimblesite is a better fit

  • Everything else.