Documentation

Models

Every major model behind one API. Switch providers with a JSON edit.

Models

Nimblesite sits on top of PydanticAI, so it supports every model provider PydanticAI does. You select a model in your agent config, and the platform handles the rest.

Switching providers

This is the whole story:

{
  "model_config": {
    "provider": "anthropic",
    "model": "claude-sonnet-4-6"
  }
}

Change it to:

{
  "model_config": {
    "provider": "openai",
    "model": "gpt-4o"
  }
}

Your app code doesn't change. Your tool contract doesn't change. The agent's memory persists. The integration is identical.

Supported providers

Provider Example models Billing
anthropic claude-sonnet-4-6, claude-opus-4-7, claude-haiku-4-5 Wallet, 2× vendor rate
openai gpt-5, gpt-5-mini, gpt-4o Wallet, 2× vendor rate
google gemini-2.5-pro, gemini-2.5-flash Wallet, 2× vendor rate
deepseek deepseek-chat, deepseek-reasoner Wallet, 2× vendor rate
ollama Any model running on an Ollama instance you operate Free (you run the host)

New providers land as PydanticAI adds them. If you need one we don't list, open an issue. Per-token rates are listed on the pricing page.

Inference parameters

The model_config block accepts standard inference parameters:

{
  "model_config": {
    "provider": "anthropic",
    "model": "claude-sonnet-4-6",
    "temperature": 0.7,
    "max_tokens": 4096,
    "top_p": 0.9
  }
}

Unknown fields are passed through to the underlying provider. If a provider supports something exotic, you can use it.

How inference is billed

Nimblesite holds the provider keys and runs the inference for you, then deducts the cost from your prepaid wallet at roughly 2× the underlying vendor rate. You never manage ANTHROPIC_API_KEY / OPENAI_API_KEY / GEMINI_API_KEY yourself, you never see a separate bill from those providers, and you never have to update keys when you switch model. One wallet, every model.

What the 2× pays for:

  • The chat → tool-call → result loop, persisted per tenant.
  • Conversation memory in your tenant's Postgres rows — not a third-party vector store.
  • Multi-tenant isolation at the database layer.
  • Sandbox provisioning and lifecycle for sandboxed agents.
  • Ongoing tracking of new providers, model SKUs, and pricing changes.

Per-token rates and per-second container rates are listed on the pricing page. Partner accounts may negotiate a lower markup.

Local development with Ollama

For zero-cost local development, point provider: "ollama" at a local Ollama instance:

{
  "model_config": {
    "provider": "ollama",
    "model": "llama3.2"
  }
}

Set OLLAMA_HOST=http://localhost:11434 in your environment and the platform talks to the local instance. Great for CI, offline work, and testing without burning tokens.

Model churn is our job

The model landscape moves every week. A new frontier model, a new SDK, a new pricing tier, a new tool-call schema. We track that so you don't have to. When a new model lands, it becomes available in Nimblesite as a one-line config change — no refactor of your app, no breaking API, no migration.

That is a big part of what you're paying for.