Execution model
How the Flow orchestration engine schedules graph execution, manages model lifetime, handles context, and degrades when AI inference fails.
Scheduling
Section titled “Scheduling”By default the engine runs one node at a time. It topologically sorts the graph, then visits each reachable node in turn, and each node completes before the next node starts. Conditional edges still decide which nodes are reachable. This sequential pass is the default, and it is what runs unless you opt into the concurrency settings below.
Sub-flows are scheduled as composite units. Each subflow collapses to a
single virtual node in the outer topological sort. When the scheduler reaches
it, the inner sub-graph runs as one unit. If any member fails, the whole unit
re-runs up to its retry count before the failure propagates. The unit’s exit
status then gates its outgoing edges.
You can opt into concurrency by raising max_parallel_nodes above its default
of 1. When you do, independent branches of an acyclic flow run at the same
time: a node becomes ready only once every one of its predecessors has reached a
terminal status, so it still sees all of its upstream outputs exactly as it
would in the sequential pass. Pausing holds back new work, and cancelling stops
new dispatch while letting in-flight nodes finish, so a node is never aborted
partway through a side effect.
Concurrency is bounded on two further axes. Per-node-type limits
(node_type_limits) cap how many nodes of a given kind run at once within the
overall parallel budget, so you can, for example, allow several nodes to run
together while still bounding how many shell calls happen at the same time.
Local AI inference runs through a bounded queue (ai_inference_concurrency):
when that queue stays saturated past ai_inference_queue_timeout_ms, an AI node
takes its .fail fallback path rather than stalling the rest of the flow. The
idea is that AI assistance degrades before graph execution does.
Flows that contain a confirmation gate or an AI-review gate still run sequentially even when concurrency is enabled, because the review verdict is shared across the run.
Edge outcomes and reachability
Section titled “Edge outcomes and reachability”Edges carry an outcome field that gates whether the target node runs. The
field can be pass, fail, or always.
A node runs if and only if at least one of these is true:
- It has no incoming edges (a source / entry node).
- At least one incoming edge fires given the source node’s terminal status:
| Edge outcome | Fires when source is… |
|---|---|
pass | succeeded |
fail | failed or skipped |
always | any terminal status (the default) |
If no incoming edge fires, the node is automatically skipped. There is no
adapter call, no AI invocation, and no side effect.
[Submit JCL] --pass--> [Download Spool] --pass--> [Notify Success] \--fail--> [Notify Failure]If Submit JCL succeeds, only the top branch runs. If it fails, only
Notify Failure runs, and the others are skipped with a clear reason. The
canvas shows outcomes as two source handles on each node, a green pass handle
and a red fail handle. Each edge then carries a colored pill at its midpoint.
Model lifecycle
Section titled “Model lifecycle”LLMs are downloaded from the Model Hub and loaded on
demand. Loading a model starts the managed llama-server against it. By
default (max_loaded_models of 1) one model is served at a time, so loading a
new model stops the previous server, and an already-loaded model is reused
instantly.
Raising max_loaded_models keeps several models resident at once, each served
by its own llama-server on its own loopback port, and each AI node is routed
to the server that hosts its model. When you load beyond that limit, or beyond a
memory budget set as a fraction of total RAM, the least-recently-used model is
evicted. A model can also be warm-loaded into the registry ahead of time without
becoming the active model, so a later run that targets it pays no cold-start
cost, and each resident model gets its own inference slots so two resident
models do not contend for one shared pool. The resident set is managed from the
Model Hub, the desktop app, the server and Flow Code editions, and the model
picker in Flow Code for VS Code.
Context handling
Section titled “Context handling”Local inference runs against the loaded LLM, and the model’s own context window bounds the prompt. Before any text reaches a model, whether local or cloud, the PII Sanitizer redacts datasets, hostnames, credentials, and IPs. See Credentials and PII.
Scheduled execution
Section titled “Scheduled execution”Saved flows can run on a recurring timer. See Scheduling for cadences, catch-up policies, and management surfaces. Scheduled runs execute on the same path as manual runs, so history and audit are the same for both.
Runtime failure handling
Section titled “Runtime failure handling”Inference failures do not stop the surrounding graph.
| Failure | Behavior |
|---|---|
| Model file corrupted or missing | Clear error; prompts re-download from the Model Hub |
| Inference crashes | AI node marked failed; the graph continues with the step skipped or escalated, per node fallback policy |
| Confidence below the low threshold | Deterministic fallback path: raw execution output is surfaced; the AI annotation is skipped |
| Timeout | Same as low confidence, fallback path taken |
The principle: AI is advisory. A failed AI node never blocks an otherwise valid execution graph.
A contract-bound ai node adds engine-enforced governance on top of this.
The model returns a confidence score, and the engine routes the output by the
contract’s thresholds. The model never does this routing itself. Output above
the high threshold is auto-approved. Output inside the review band goes to a
human review gate. Output below the low threshold is suppressed onto the node’s
.fail fallback. Headless runs on the CLI or in CI fail review-band outputs
onto the fallback rather than auto-approving them. See
AI governance.