Nov 12, 20245 min read

Why I Chose Rust for AI Infrastructure

TypeScript was easier. Python was expected. I went with Rust anyway. Here's why.

Rust Programming Language

When I started building an AI agent orchestration platform, everyone assumed I'd use Python. "That's what the AI ecosystem uses." Or TypeScript. "Fast iteration, good DX."

I chose Rust instead.

Not because I wanted to be different. Because the requirements demanded it.

The Requirements

This wasn't a weekend project or a prototype. It was infrastructure that needed to:

  1. Run long-lived processes — Workflows that execute for minutes or hours
  2. Handle concurrent agents — Thousands of parallel executions
  3. Be reliable — Crashes are unacceptable when handling customer operations
  4. Perform well — Low latency for real-time streaming
  5. Deploy simply — Single binary, minimal dependencies

Python and TypeScript can do many things. But these requirements pointed somewhere else.

Why Not Python

Python is the default for AI. The ecosystem is unmatched. Every model, every library, every tutorial.

But for infrastructure:

Memory management is unpredictable. When you're running thousands of concurrent workflows, GC pauses matter. Python's GIL makes true parallelism painful.

Runtime errors everywhere. No type checking means bugs slip through. In a chat interface, a runtime error is annoying. In infrastructure, it's downtime.

Deployment is a mess. venvs, pip conflicts, version mismatches. "It works on my machine" shouldn't be your deployment strategy.

I still use Python for ML tasks. But not for the core engine.

Why Not TypeScript

TypeScript was tempting. I've shipped production TypeScript systems. The DX is great, types are good, the ecosystem is huge.

But:

Memory usage. Node isn't memory-efficient for long-running processes with lots of concurrent state. When each workflow has state, and you're running thousands, it adds up.

Concurrency model. Single-threaded event loop works great for I/O-bound web servers. Less great for compute-heavy workflow orchestration.

Stability guarantees. TypeScript types are erased at runtime. The type system helps with development but doesn't catch certain categories of bugs that matter in production.

For APIs and UIs, TypeScript is excellent. For core infrastructure, I wanted more guarantees.

Why Rust

Rust checks every box:

1. Predictable Performance

No garbage collector. No GC pauses. No surprise latency spikes.

When I say "this workflow step takes 50ms," it takes 50ms. Not 50ms plus maybe a GC pause.

2. True Concurrency

Async/await with tokio gives real parallelism. Each workflow execution can run independently without fighting for locks.

The actor model maps naturally: each agent has its own message queue, processes messages in isolation, communicates through channels.

3. Correctness by Default

Rust's type system catches entire categories of bugs at compile time:

  • Null pointer issues → Option
  • Data races → ownership model
  • Unhandled errors → Result

When the compiler is happy, I'm confident. Not "it compiles so it works" confident — but "entire classes of runtime errors are eliminated" confident.

4. Single Binary Deployment

cargo build --release produces a static binary. Ship it. Run it. No runtime dependencies, no version conflicts, no deployment surprises.

For infrastructure that runs in production, this simplicity matters.

5. Long-term Maintainability

Rust code is explicit. The ownership model forces you to think about who owns data and how it flows.

When I come back to code six months later, the types tell me what's happening. In dynamic languages, I'd be guessing.

The Trade-offs

Rust isn't free:

Learning curve. The borrow checker takes time to understand. The first few weeks are frustrating.

Slower initial development. Writing Rust takes longer than Python for equivalent features. The compiler demands attention.

Ecosystem gaps. The AI ecosystem in Rust is growing but not as mature as Python's. Some things you have to build yourself.

Hiring. Fewer Rust developers than Python or TypeScript developers. The talent pool is smaller.

These are real costs. I accepted them because the benefits outweigh them for this use case.

Where Rust Shines in This Stack

The core engine: workflow execution, memory management, concurrent agent orchestration. This is where performance and correctness matter most.

Specifically:

  • Workflow DAG execution with cycle support
  • Checkpointing and crash recovery
  • Concurrent tool execution with timeout handling
  • Memory retrieval and graph traversal (see Building AI That Remembers)
  • Rate limiting and cost tracking

Each of these benefits from Rust's guarantees.

Where I Use Other Languages

Rust isn't the answer to everything:

  • Python for ML experimentation and model integration
  • TypeScript for web UIs and admin dashboards
  • SQL for database operations (via SurrealDB)

The right tool for the job. Rust is the right tool for the infrastructure core.

The Ecosystem I'm Using

For those curious, the Rust ecosystem that makes this work:

  • tokio — Async runtime
  • axum — HTTP server
  • surrealdb — Unified database (relational + graph + vector)
  • rig-core — LLM provider abstraction
  • governor — Rate limiting
  • tracing — Observability

The ecosystem is smaller than Node or Python, but mature enough for production use.

One Year In

After building with Rust for a year:

What I expected: Performance, reliability, deployment simplicity.

What surprised me: How much faster development became once past the learning curve. The compiler catches so many bugs that I rarely debug runtime issues. Refactoring is safe because the compiler validates changes.

What I'd do differently: Start with more testing infrastructure. Rust makes correctness easy, but you still need tests for business logic.


Would I recommend Rust for every AI project? No. For experiments and prototypes, Python is faster. For web apps, TypeScript is more ergonomic.

But for infrastructure that needs to run reliably at scale? Rust has earned its place.

Enjoyed this article?

Share it with others or connect with me