Pay for a service.
Get an agent.
One JSON config in. One HTTP endpoint out. Memory, tools, and every major model included. Stop wiring up message tables and prompt templates — slot an agent into your app and run the tool calls it gives you.
{
"name": "Site Editor",
"model": "claude-sonnet-4-6",
"tools": ["read_file", "write_file"]
}
The boring half of every AI feature, already done.
Stop reinventing message tables and the agent loop. Focus on your product while we handle memory, multi-tenancy, and tool dispatch.
Stateful, by default
Pass a conversation_id and we already know what happened. No messages table. No replay loop. No token-budget management. Memory just works.
Two ways to run tools
Client-side tools — agent emits tool calls, your app executes them in your own trust boundary. Or flip one config flag and we run a managed sandbox (filesystem, shell, build pipeline) per agent.
Every model, one API
Claude, GPT, Gemini, Ollama, DeepSeek — same JSON config, same endpoint. New frontier model drops next week? Change one line. Your code never moves.
Multi-tenant from day one
Every config, every conversation, every log scoped by tenant. Per-tenant API keys. Hard isolation at the database layer. You ship; we keep your customers apart.
Tools stay in your trust boundary
The agent decides what to call. Your app actually runs it. We never touch your data, your APIs, or your secrets. Auditors love this.
Two HTTP calls, total
POST a config once. POST a chat message forever. That's the entire integration. No SDKs to learn, no framework to import, no Python in your critical path.
Three layers, three jobs.
Nimblesite sits between raw LLM APIs and DIY frameworks to provide a production-ready agent execution layer.
Two execution modes. One API.
Pick per agent how the tools get executed. Memory, model picker, multi-tenancy, and chat contract stay identical.
Questions you're already asking.
I can already call OpenAI directly. Why do I need this?
add
Because raw LLM APIs are stateless. Every call is a fresh slate. To turn that into a real assistant, you have to store every message, replay the history on every call, parse tool calls, dispatch them, and loop until the model is done — per tenant, per conversation. That whole stack is the agent loop. We run it for you.
How do you handle conversation history?
add
Persisted in PostgreSQL, scoped per conversation and per tenant. You pass a conversation_id on follow-up chat calls; we load the prior turns, hand them to the model, store the new turns, and return the response. You never write a messages table again.
Why don't you execute tools?
add
Because tools touch your data, your APIs, and your secrets. We don't want any of those, and you don't want us to have them. The agent decides what to call; you run it in your own trust boundary. This is the correct architecture.
Can I self-host?
add
No. Nimblesite is a proprietary hosted API — the only way to use it is to call api.nimblesite.dev with your API key. Enterprise customers can discuss dedicated single-tenant deployments; contact sales.
What models are supported?
add
Anthropic, OpenAI, Google Gemini, Ollama, DeepSeek, and anything PydanticAI supports. Switching providers is a JSON edit and the conversation memory carries across.