AI overview
Flow runs AI models on-device with no network calls for local
inference. An ai node gets a model one of two ways:
- Local LLMs are served by the managed
llama-serversidecar. You browse and download them from the Model Hub, and they run through the OpenAI-compatiblelocalprovider. See the local runtime. - Cloud providers are reached through the node’s
providerfield. You have to opt in, and they are gated. See cloud providers.
Capability-driven execution
Section titled “Capability-driven execution”An ai node binds any model from the Model Hub by modelId, and that model’s
capabilities drive both the inspector options and how the node executes:
- Reasoning. A thinking toggle passes a per-request reasoning flag. The reasoning traces are stripped so that only the answer is surfaced.
- Vision. An image input, given as a path or a URL, is sent as a multimodal content part. Local paths are read and inlined.
- Tool use. You bind sandboxed adapters (
fs/shell/cli/utility) as tools. The model calls them in a bounded in-node loop. Each call runs through the real, workspace-confined adapter, and results are fed back until the model answers. This is distinct from agentic whole-flow generation. - Embeddings. This drops the sampling knobs and routes to an embeddings call, returning a vector.
- Classification. You add a label set, and the node constrains the model to
one of those labels. It emits a branchable
label(when {{nodeId.label}} == '...'). - Structured output. A structured task takes the node’s output schema and prompts for matching JSON. It parses the JSON leniently and spreads the fields into the node output so that each one is branchable.
- Otherwise it is a chat or completion call that uses a system prompt, user input, and sampling settings.
A node-level task field (generate / embedding / classify /
structured) selects the execution path.
The common contract
Section titled “The common contract”Every AI node runs through the same orchestration discipline:
- Stateless. Retry state is held by the orchestration engine, not the model. On each retry the orchestrator passes the original input, the model’s previous suggestion, the outcome of applying it, and a retry counter.
- PII-sanitized input. The sanitizer redacts dataset names, hostnames, IPs, and credentials before any prompt reaches any model, whether local or cloud.
- Local stays local. The managed server listens on localhost only. The only
thing that leaves the machine is an
ainode with an opt-in cloud provider. - Advisory, never authoritative. A failed AI node never blocks an otherwise valid execution graph. See the execution model.
AI governance
Section titled “AI governance”An ai node can opt into a contract (contract: true). The model must then
return a structured envelope, which is a primary output plus a confidence score.
The engine, never the model, routes that output by the contract’s
thresholds. Above autoApproveAbove the output flows on. Inside the review band
it pauses at a human review gate. Below suppressBelow it is suppressed onto the
node’s .fail fallback. The thresholds live on the node, never in the prompt,
and a contract-bound node must declare a .fail edge. Pre-run validation blocks
the node otherwise. Every invocation, routing decision, and human verdict lands
in the run’s AI decision audit trail.
Governance is enforced the same way on every edition, which means Studio, the CLI/TUI, Flow Code, and the Server. The merged governance verdict is surfaced before a run. That verdict covers contract and model compliance along with the static pre-apply warnings.
Around the contract sit the rest of the controls:
- Agent-feature gating. The in-node tool loop and the autonomous run are
both opt-ins. They are turned on by org settings
(
allow_agent_tool_loop,allow_autonomous_run) together with a per-nodeallowToolLoop. The model can never grant itself tools. - Input security. Untrusted input is fenced in a structural boundary and scanned for prompt-injection patterns. A high-severity signal forces the contract’s human review gate.
- Token-level confidence. With
confidenceType: "token_level"the engine derives confidence from the provider’s token logprobs instead of trusting the model’s self-report. - Context-window strategy. Oversized input is bounded before it reaches the
model, using
contextWindowStrategyandmaxInputTokens. - Contract-version pinning. A flow declares a target
contract-version, and governance flags any node whose major version differs. - Extensible PII rules. Admins layer org-specific redaction patterns over the built-in sanitizer. See credentials and PII.
- Reasoning-domain isolation. The managed model server runs under an OS-level sandbox, so loopback-only is enforced by the OS and not just by configuration.
Failover
Section titled “Failover”An ai node can carry a fallbackProvider, along with an optional fallback
model. When the primary provider hard-fails, the node retries once on the
fallback, and a log line marks the switch.
Agentic mode
Section titled “Agentic mode”An ai node, whether local or cloud, can run in agentic mode. On Run it
turns a natural-language request into a flow graph. That graph is reviewed
before it merges onto the canvas, and then the node flips into a monitor role.
See agentic and monitor.
Related
Section titled “Related”- Model Hub - browse, download, load.
- Local runtime - the managed server.
- Cloud providers - the opt-in carve-out.