Agents as a Service

Pay for a service.
Get an agent.

One JSON config in. One HTTP endpoint out. Memory, tools, and every major model included. Stop wiring up message tables and prompt templates — slot an agent into your app and run the tool calls it gives you.

agent.config.json
01 POST /api/v1/configs
{
  "name": "Site Editor",
  "model": "claude-sonnet-4-6",
  "tools": ["read_file", "write_file"]
}
check_circle Awaiting fulfillment...

The boring half of every AI feature, already done.

Stop reinventing message tables and the agent loop. Focus on your product while we handle memory, multi-tenancy, and tool dispatch.

database

Stateful, by default

Pass a conversation_id and we already know what happened. No messages table. No replay loop. No token-budget management. Memory just works.

build

Two ways to run tools

Client-side tools — agent emits tool calls, your app executes them in your own trust boundary. Or flip one config flag and we run a managed sandbox (filesystem, shell, build pipeline) per agent.

hub

Every model, one API

Claude, GPT, Gemini, Ollama, DeepSeek — same JSON config, same endpoint. New frontier model drops next week? Change one line. Your code never moves.

groups

Multi-tenant from day one

Every config, every conversation, every log scoped by tenant. Per-tenant API keys. Hard isolation at the database layer. You ship; we keep your customers apart.

shield_person

Tools stay in your trust boundary

The agent decides what to call. Your app actually runs it. We never touch your data, your APIs, or your secrets. Auditors love this.

bolt

Two HTTP calls, total

POST a config once. POST a chat message forever. That's the entire integration. No SDKs to learn, no framework to import, no Python in your critical path.

Three layers, three jobs.

Nimblesite sits between raw LLM APIs and DIY frameworks to provide a production-ready agent execution layer.

Capability
LLM APIs
Frameworks
Nimblesite
Conversation memory
Ephemeral
Manual (Redis/Vector)
check_circle Built-in
Tool dispatch
Raw JSON only
Boilerplate heavy
check_circle Declarative
Multi-tenancy
Stateless
Build it yourself
check_circle Per-tenant keys
Implementation
100+ lines
500+ lines
check_circle 2 HTTP calls

Two execution modes. One API.

Pick per agent how the tools get executed. Memory, model picker, multi-tenancy, and chat contract stay identical.

smartphone Default

Client-side tools

The agent decides what to call. Your app runs it. Same pattern as OpenAI function calling and Anthropic tool use — but persisted, multi-tenant, and stateful out of the box.

  • check Tools run in your existing backend, mobile app, or CLI.
  • check Your data, your APIs, your secrets — never touched by us.
  • check No SDK. A tool is just a function in your app.
terminal New

Sandboxed agents

We provision a managed sandbox per agent — filesystem, shell, network, build pipeline. Same idea as Code Interpreter or E2B, with conversation memory and the model picker wired in.

  • check Durable per-config workspace, persists across conversations.
  • check Out-of-band SSH / web-terminal access for your users.
  • check One HTTP turn per reply — no client-side tool loop to render.
Compare execution modes →

Architectural integrity

Built for engineers who want a stable HTTP contract, not a framework that breaks every time the model SDK does.

2

HTTP calls, total

security

Hard tenant isolation

Every row has a tenant_id. Every query filters by it. API keys are hashed at rest.

Every model, one API.
Anthropic Claude OpenAI GPT Google Gemini Ollama DeepSeek

Questions you're already asking.

I can already call OpenAI directly. Why do I need this?

add

Because raw LLM APIs are stateless. Every call is a fresh slate. To turn that into a real assistant, you have to store every message, replay the history on every call, parse tool calls, dispatch them, and loop until the model is done — per tenant, per conversation. That whole stack is the agent loop. We run it for you.

How do you handle conversation history?

add

Persisted in PostgreSQL, scoped per conversation and per tenant. You pass a conversation_id on follow-up chat calls; we load the prior turns, hand them to the model, store the new turns, and return the response. You never write a messages table again.

Why don't you execute tools?

add

Because tools touch your data, your APIs, and your secrets. We don't want any of those, and you don't want us to have them. The agent decides what to call; you run it in your own trust boundary. This is the correct architecture.

Can I self-host?

add

No. Nimblesite is a proprietary hosted API — the only way to use it is to call api.nimblesite.dev with your API key. Enterprise customers can discuss dedicated single-tenant deployments; contact sales.

What models are supported?

add

Anthropic, OpenAI, Google Gemini, Ollama, DeepSeek, and anything PydanticAI supports. Switching providers is a JSON edit and the conversation memory carries across.

Ship the agent, not the plumbing.

Two HTTP calls and you have a working agent in your product. Free local prototype, no credit card required.