Execution model

How the Flow orchestration engine schedules graph execution, manages model lifetime, handles context, and degrades when AI inference fails.

Scheduling

By default the engine runs one node at a time. It topologically sorts the graph, then visits each reachable node in turn, and each node completes before the next node starts. Conditional edges still decide which nodes are reachable. This sequential pass is the default, and it is what runs unless you opt into the concurrency settings below.

Sub-flows are scheduled as composite units. Each subflow collapses to a single virtual node in the outer topological sort. When the scheduler reaches it, the inner sub-graph runs as one unit. If any member fails, the whole unit re-runs up to its retry count before the failure propagates. The unit’s exit status then gates its outgoing edges.

You can opt into concurrency by raising max_parallel_nodes above its default of 1. When you do, independent branches of an acyclic flow run at the same time: a node becomes ready only once every one of its predecessors has reached a terminal status, so it still sees all of its upstream outputs exactly as it would in the sequential pass. Pausing holds back new work, and cancelling stops new dispatch while letting in-flight nodes finish, so a node is never aborted partway through a side effect.

Concurrency is bounded on two further axes. Per-node-type limits (node_type_limits) cap how many nodes of a given kind run at once within the overall parallel budget, so you can, for example, allow several nodes to run together while still bounding how many shell calls happen at the same time. Local AI inference runs through a bounded queue (ai_inference_concurrency): when that queue stays saturated past ai_inference_queue_timeout_ms, an AI node takes its .fail fallback path rather than stalling the rest of the flow. The idea is that AI assistance degrades before graph execution does.

Flows that contain a confirmation gate or an AI-review gate still run sequentially even when concurrency is enabled, because the review verdict is shared across the run.

Edge outcomes and reachability

Edges carry an outcome field that gates whether the target node runs. The field can be pass, fail, or always.

A node runs if and only if at least one of these is true:

It has no incoming edges (a source / entry node).
At least one incoming edge fires given the source node’s terminal status:

Edge outcome	Fires when source is…
`pass`	`succeeded`
`fail`	`failed` or `skipped`
`always`	any terminal status (the default)

If no incoming edge fires, the node is automatically skipped. There is no adapter call, no AI invocation, and no side effect.

[Submit JCL] --pass--> [Download Spool] --pass--> [Notify Success]
            \--fail--> [Notify Failure]

If Submit JCL succeeds, only the top branch runs. If it fails, only Notify Failure runs, and the others are skipped with a clear reason. The canvas shows outcomes as two source handles on each node, a green pass handle and a red fail handle. Each edge then carries a colored pill at its midpoint.

Model lifecycle

LLMs are downloaded from the Model Hub and loaded on demand. Loading a model starts the managed llama-server against it. By default (max_loaded_models of 1) one model is served at a time, so loading a new model stops the previous server, and an already-loaded model is reused instantly.

Raising max_loaded_models keeps several models resident at once, each served by its own llama-server on its own loopback port, and each AI node is routed to the server that hosts its model. When you load beyond that limit, or beyond a memory budget set as a fraction of total RAM, the least-recently-used model is evicted. A model can also be warm-loaded into the registry ahead of time without becoming the active model, so a later run that targets it pays no cold-start cost, and each resident model gets its own inference slots so two resident models do not contend for one shared pool. The resident set is managed from the Model Hub, the desktop app, the server and Flow Code editions, and the model picker in Flow Code for VS Code.

Context handling

Local inference runs against the loaded LLM, and the model’s own context window bounds the prompt. Before any text reaches a model, whether local or cloud, the PII Sanitizer redacts datasets, hostnames, credentials, and IPs. See Credentials and PII.

Scheduled execution

Saved flows can run on a recurring timer. See Scheduling for cadences, catch-up policies, and management surfaces. Scheduled runs execute on the same path as manual runs, so history and audit are the same for both.

Runtime failure handling

Inference failures do not stop the surrounding graph.

Failure	Behavior
Model file corrupted or missing	Clear error; prompts re-download from the Model Hub
Inference crashes	AI node marked failed; the graph continues with the step skipped or escalated, per node fallback policy
Confidence below the low threshold	Deterministic fallback path: raw execution output is surfaced; the AI annotation is skipped
Timeout	Same as low confidence, fallback path taken

The principle: AI is advisory. A failed AI node never blocks an otherwise valid execution graph.

A contract-bound ai node adds engine-enforced governance on top of this. The model returns a confidence score, and the engine routes the output by the contract’s thresholds. The model never does this routing itself. Output above the high threshold is auto-approved. Output inside the review band goes to a human review gate. Output below the low threshold is suppressed onto the node’s .fail fallback. Headless runs on the CLI or in CI fail review-band outputs onto the fallback rather than auto-approving them. See AI governance.