AI overview

Flow runs AI models on-device with no network calls for local inference. An ai node gets a model one of two ways:

Local LLMs are served by the managed llama-server sidecar. You browse and download them from the Model Hub, and they run through the OpenAI-compatible local provider. See the local runtime.
Cloud providers are reached through the node’s provider field. You have to opt in, and they are gated. See cloud providers.

Capability-driven execution

An ai node binds any model from the Model Hub by modelId, and that model’s capabilities drive both the inspector options and how the node executes:

Reasoning. A thinking toggle passes a per-request reasoning flag. The reasoning traces are stripped so that only the answer is surfaced.
Vision. An image input, given as a path or a URL, is sent as a multimodal content part. Local paths are read and inlined.
Tool use. You bind sandboxed adapters (fs / shell / cli / utility) as tools. The model calls them in a bounded in-node loop. Each call runs through the real, workspace-confined adapter, and results are fed back until the model answers. This is distinct from agentic whole-flow generation.
Embeddings. This drops the sampling knobs and routes to an embeddings call, returning a vector.
Classification. You add a label set, and the node constrains the model to one of those labels. It emits a branchable label (when {{nodeId.label}} == '...').
Structured output. A structured task takes the node’s output schema and prompts for matching JSON. It parses the JSON leniently and spreads the fields into the node output so that each one is branchable.
Otherwise it is a chat or completion call that uses a system prompt, user input, and sampling settings.

A node-level task field (generate / embedding / classify / structured) selects the execution path.

The common contract

Every AI node runs through the same orchestration discipline:

Stateless. Retry state is held by the orchestration engine, not the model. On each retry the orchestrator passes the original input, the model’s previous suggestion, the outcome of applying it, and a retry counter.
PII-sanitized input. The sanitizer redacts dataset names, hostnames, IPs, and credentials before any prompt reaches any model, whether local or cloud.
Local stays local. The managed server listens on localhost only. The only thing that leaves the machine is an ai node with an opt-in cloud provider.
Advisory, never authoritative. A failed AI node never blocks an otherwise valid execution graph. See the execution model.

AI governance

An ai node can opt into a contract (contract: true). The model must then return a structured envelope, which is a primary output plus a confidence score. The engine, never the model, routes that output by the contract’s thresholds. Above autoApproveAbove the output flows on. Inside the review band it pauses at a human review gate. Below suppressBelow it is suppressed onto the node’s .fail fallback. The thresholds live on the node, never in the prompt, and a contract-bound node must declare a .fail edge. Pre-run validation blocks the node otherwise. Every invocation, routing decision, and human verdict lands in the run’s AI decision audit trail.

Governance is enforced the same way on every edition, which means Studio, the CLI/TUI, Flow Code, and the Server. The merged governance verdict is surfaced before a run. That verdict covers contract and model compliance along with the static pre-apply warnings.

Around the contract sit the rest of the controls:

Agent-feature gating. The in-node tool loop and the autonomous run are both opt-ins. They are turned on by org settings (allow_agent_tool_loop, allow_autonomous_run) together with a per-node allowToolLoop. The model can never grant itself tools.
Input security. Untrusted input is fenced in a structural boundary and scanned for prompt-injection patterns. A high-severity signal forces the contract’s human review gate.
Token-level confidence. With confidenceType: "token_level" the engine derives confidence from the provider’s token logprobs instead of trusting the model’s self-report.
Context-window strategy. Oversized input is bounded before it reaches the model, using contextWindowStrategy and maxInputTokens.
Contract-version pinning. A flow declares a target contract-version, and governance flags any node whose major version differs.
Extensible PII rules. Admins layer org-specific redaction patterns over the built-in sanitizer. See credentials and PII.
Reasoning-domain isolation. The managed model server runs under an OS-level sandbox, so loopback-only is enforced by the OS and not just by configuration.

Failover

An ai node can carry a fallbackProvider, along with an optional fallback model. When the primary provider hard-fails, the node retries once on the fallback, and a log line marks the switch.

Agentic mode

An ai node, whether local or cloud, can run in agentic mode. On Run it turns a natural-language request into a flow graph. That graph is reviewed before it merges onto the canvas, and then the node flips into a monitor role. See agentic and monitor.

Model Hub - browse, download, load.
Local runtime - the managed server.
Cloud providers - the opt-in carve-out.