Nimblesite vs raw LLM APIs
Short version: Raw LLM APIs are stateless. Every call is a fresh start. Nimblesite is the agent that wraps all of that — memory, loop, tool dispatch, multi-tenancy — already wired up.
What a raw LLM call looks like
Here's what you write when you talk to Claude or GPT directly:
# Every call is a fresh start. You send the full history every time.
history = load_messages_from_db(conversation_id)
history.append({"role": "user", "content": "Hello"})
response = anthropic.messages.create(
model="claude-sonnet-4-6",
messages=history,
tools=my_tools,
)
# If the response is a tool call, parse it, run it,
# append the result to history, and call again.
while response.stop_reason == "tool_use":
for block in response.content:
if block.type == "tool_use":
result = run_tool(block.name, block.input)
history.append({"role": "assistant", "content": response.content})
history.append({
"role": "user",
"content": [{
"type": "tool_result",
"tool_use_id": block.id,
"content": result,
}],
})
response = anthropic.messages.create(
model="claude-sonnet-4-6",
messages=history,
tools=my_tools,
)
# Save the final history back to the database.
save_messages_to_db(conversation_id, history)
return response.content[-1].text
Now multiply that by:
- Multi-tenancy (each tenant has their own API key and their own messages table scope)
- Prompt templating (each tenant has their own system prompt)
- Provider failover (what if Anthropic is down?)
- Token budget management (what if the history is longer than the context window?)
- Logging, audit, and replay
- Handling new SDK versions when the provider ships a breaking change
What the same thing looks like on Nimblesite
curl -X POST https://api.nimblesite.dev/api/v1/chat/$CONFIG_ID \
-H "X-API-Key: $TENANT_KEY" \
-d '{"message": "Hello"}'
That's it. Memory, loop, tool dispatch, multi-tenancy, prompt templating, provider failover, logging, audit — all already done.
The honest comparison
| Dimension | Raw LLM API | Nimblesite |
|---|---|---|
| State | You build it | Built in |
| Agent loop | You run it | We run it |
| Tool dispatch | You parse & loop | We parse, you execute |
| Multi-tenancy | You build it | Built in |
| Prompt templating | You build it | Built in |
| Model switching | Rewrite your code | Edit one JSON field |
| Provider churn | Your problem | Our problem |
| SDK maintenance | Forever | None |
When a raw LLM API is a better fit
- You're making one-shot completion calls with no memory and no tools
- You need absolute control over the exact bytes on the wire to the provider
- You're implementing novel agent architectures that don't fit a standard loop
When Nimblesite is a better fit
- Everything else.