Managing Concurrency in AI Workflows: Parallel Branches and Shared State
How we handle concurrent workflow execution in Rust using tokio::spawn and RwLock, the limitations we've hit, and where we're going next.
How we handle concurrent workflow execution in Rust using tokio::spawn and RwLock, the limitations we've hit, and where we're going next.

When building an AI orchestration platform, you eventually hit the concurrency problem. If a workflow needs to search the web, query a database, and summarize a document, doing those sequentially wastes time. You need parallel branches.
But how do you manage the state of a complex, long-running AI workflow across multiple parallel executions?
In my system, we rely on a deterministic workflow engine built in Rust. Here is how we actually handle concurrency in production today—and why we are planning to change it.
If you look at the core of our workflow engine, you won't find an Actor Model (yet). You'll find standard Rust concurrency primitives.
The entire execution state is wrapped in an Arc<RwLock<WorkflowExecution>>.
// The execution state is shared across parallel branches
let execution = Arc::new(RwLock::new(initial_state));
When a workflow hits a ParallelNode, the engine fans out. We use tokio::spawn to kick off concurrent branches.
let mut handles = vec![];
for branch in parallel_branches {
let state_clone = Arc::clone(&execution);
handles.push(tokio::spawn(async move {
// Execute branch logic
let result = execute_branch(branch).await;
// Acquire write lock to update the shared execution state
let mut state = state_clone.write().await;
state.apply_result(result)?;
Ok::<_, anyhow::Error>(())
}));
}
// Collect results from all branches
for handle in handles {
match handle.await {
Ok(Ok(_)) => { /* branch succeeded */ }
Ok(Err(e)) => { /* branch returned an error */ }
Err(e) => { /* task panicked */ }
}
}
tokio::spawn is incredibly lightweight. We can spawn parallel branches with minimal overhead.Send/Sync traits force us to use a synchronization primitive like RwLock to share mutable state across tasks—the code simply won't compile without it. The lock itself then enforces exclusive access at runtime.This shared-state model got us to production, but it has cracks showing under the weight of AI workloads.
As workflows get more complex, branches spend more time fighting for the RwLock. If one branch is trying to read the state to format a prompt, and another branch is writing a massive tool result, the reader must yield and wait for the lock to be released.
Currently, we explicitly stub out and do not support invoking nested agents inside parallel branches. Why? Because managing the lifecycle, memory isolation, and state mutation of an entirely separate agent within a shared-state parallel branch is a recipe for deadlock.
Tokio isolates panics per task, so one branch crashing won't bring down others. But without a dedicated supervisor, recovery is manual. If a branch panics or hangs indefinitely (a common issue when dealing with third-party LLM APIs), we don't have a clean way to automatically restart just that branch. The workflow engine waits, and eventually, the whole execution times out.
We've reached the limits of Arc<RwLock<T>> for AI orchestration. Our next major architectural shift is moving to the Actor Model.
Instead of sharing state, each agent will become an isolated actor with its own private state and a mailbox (tokio::mpsc channels).
When we implement this:
JoinHandles and automatically restart unresponsive agents with exponential backoff.For now, the RwLock and tokio::spawn pattern holds the line. It's not perfect, but it's a pragmatic reality of building fast. We didn't over-engineer an actor system before we needed it.
But the time for actors is coming.