Mehmet Erturk | Human-in-the-Loop Done Right: Beyond Simple Approve/Reject

Human-in-the-Loop AI Systems

Every AI product pitch includes "human in the loop" somewhere on slide 12. Usually it means: "and then a human approves it."

That's not a system. That's a checkbox.

Real human-in-the-loop is harder. It means designing workflows where human judgment integrates seamlessly — without creating bottlenecks, without approval fatigue, without breaking the user experience.

The Approval Fatigue Problem

The naive approach: require human approval for everything.

AI drafts email → Human approves → Send
AI classifies ticket → Human approves → Route
AI suggests price → Human approves → Apply
AI retrieves data → Human approves → Display

Week 1: humans review carefully. Week 3: humans skim and approve. Week 5: humans approve everything automatically.

You've built a system that slows everything down while providing zero oversight. Congratulations.

The fix isn't more approvals. It's fewer, better-placed approvals.

Where to Put Human Gates

Not every step needs human review. Only these:

1. Irreversible Actions

Actions that can't be undone:

Sending money
Publishing content
Deleting data
External communications

If you can undo it, you probably don't need approval. If you can't, you probably do.

2. High-Stakes Decisions

Where the cost of being wrong is significant:

Changing pricing
Modifying customer contracts
Escalating support tickets to legal

Low stakes? Let it through. A wrong product recommendation is annoying. A wrong legal classification is a lawsuit.

3. Policy Boundaries

Where the AI is close to the edge of what it should do:

Actions near permission boundaries
Responses that touch sensitive topics
Decisions that could set precedent

4. Learning Moments

When you want to validate AI behavior before trusting it more broadly:

New workflow types that haven't been validated
Agent behaviors you're tuning
Edge cases the system hasn't seen before

How I Implement Human Gates

Human gates are workflow nodes. Not special cases. Not afterthoughts. First-class nodes in the execution graph.

flowchart TD
    A[LLM Analysis] --> B[Propose Action]
    B --> C{Human Gate}
    C -->|Approved| D[Execute Action]
    C -->|Rejected| E[Revision Flow]
    C -->|Modified| F[Execute Modified Action]

    class C decision
    class D success
    class E reject
    class F special

The Gate Configuration

Each human gate has:

human_gate:
  description: "Approve refund of $49.99"
  context:
    - order_details
    - customer_history
    - refund_reason
  options:
    - approve
    - reject
    - modify
  timeout: 300  # seconds
  timeout_action: reject  # what happens if nobody responds
  assignee: support_team
  priority: high

The gate isn't just "approve/reject." It provides context so the human can make an informed decision. It has a timeout so the system doesn't block forever.

Timeout Handling

This is where most HITL implementations break. What happens when nobody responds?

Option 1: Auto-reject (safe default) Execution stops gracefully. User is notified. No action taken.

Good for: financial transactions, irreversible actions.

Option 2: Auto-approve (trusted default) Execution continues. Action proceeds.

Good for: low-risk actions where the gate is for auditing, not blocking.

Option 3: Auto-skip The gate is skipped entirely. Execution continues to the next node without executing the gated action.

Good for: optional enhancements that aren't critical to the flow.

Option 4: Escalate Timeout triggers escalation to a different team or higher authority.

Good for: high-stakes decisions that need a response.

In my system, timeout is server-enforced:

On execution resume:
  1. Check timeout_at
  2. If expired → apply timeout_action
  3. If not expired → process human response

No client-side timers. The server enforces the policy. Even if the UI crashes, the timeout still fires.

The Approval UI

The approval interface shows:

┌─────────────────────────────────────────┐
│  Refund Request                         │
│                                         │
│  Customer: Jane Smith (5yr customer)    │
│  Order: #12345 ($49.99)                 │
│  Reason: Product defective              │
│  Agent recommendation: Approve          │
│  Confidence: 0.87                       │
│                                         │
│  Previous refunds: 1 in 12 months       │
│  Customer LTV: $2,340                   │
│                                         │
│  [Approve]  [Modify Amount]  [Reject]   │
│                                         │
│  Auto-reject in 4:32                    │
└─────────────────────────────────────────┘

The human sees everything they need to decide. Not just "approve this?" but "here's the full context, the AI's recommendation, and the confidence level."

Beyond Approve/Reject

Modify

The most underused option. The AI's answer is close but not quite right. Instead of rejecting and restarting, the human modifies:

AI proposes: Refund $49.99
Human modifies: Refund $35.00 (partial, keep shipping)
Execution continues with modified value

This is faster than rejection loops. The human corrects the course without stopping the workflow.

Conditional Approval

"Approve, but only if the customer confirms their address."

flowchart TD
    A[Human Gate] -->|Approved with condition| B[Condition Node]
    B --> C[Customer Responds]
    C -->|Confirmed| D[Execute]
    C -->|Not confirmed| E[Cancel]

    class A decision
    class D success
    class E reject

The human adds a condition. The workflow handles the rest.

Batch Review

When volume is high, individual approvals don't scale. Batch review groups similar decisions:

┌─ Batch Review: Low-Risk Refunds ────────┐
│                                          │
│  ☑ #12345 - $12.99 - Defective          │
│  ☑ #12346 - $8.50 - Wrong item          │
│  ☐ #12347 - $299.00 - Changed mind      │
│  ☑ #12348 - $15.00 - Late delivery      │
│                                          │
│  [Approve Selected (3)]  [Review #12347] │
└──────────────────────────────────────────┘

Routine decisions get batched. Outliers get individual attention. The human's time is spent where it matters.

The Feedback Loop

HITL isn't just about gatekeeping. It's about learning.

Every approval, rejection, and modification is training data:

Action: Refund $49.99
AI confidence: 0.87
Human decision: Approved
→ AI learns: this pattern is correct

Action: Refund $499.00
AI confidence: 0.62
Human decision: Rejected (customer has 5 refunds this month)
→ AI learns: check refund frequency for large amounts

Over time, the AI gets better. The approval rate goes up. You can raise the confidence threshold for auto-approval.

The goal isn't permanent human gates. It's temporary human gates that teach the system when it can be trusted.

Measuring HITL Effectiveness

Track these metrics:

Metric	Target	Signal
Auto-approval rate	Increasing over time	System is learning
Human override rate	Decreasing over time	AI alignment improving
Time to decision	< 5 minutes	No bottleneck
Timeout rate	< 5%	Humans are responsive
Modify vs reject ratio	High modify, low reject	AI is close but needs tuning

If the auto-approval rate isn't increasing, the feedback loop isn't working. If the timeout rate is high, the gates are in the wrong place or the team isn't staffed for it.

Common Mistakes

Gates Everywhere

More gates doesn't mean more safety. It means more fatigue.

Fix: Start with gates on irreversible actions only. Add more based on data, not fear.

No Context in the Gate

"Approve action?" without context leads to rubber-stamping.

Fix: Always show the full context, the AI's reasoning, and the confidence score.

No Timeout Strategy

Execution blocks forever waiting for human response. Users see stuck workflows.

Fix: Every gate has a timeout. Every timeout has an action. No exceptions.

Ignoring the Data

Thousands of approval decisions accumulate. Nobody analyzes them.

Fix: Review approval patterns monthly. Identify gates that can be automated. Identify patterns where AI consistently fails.

Binary Thinking

Approve or reject. Nothing else. The human can't modify, can't add conditions, can't partially approve.

Fix: Give humans real control. Modify, conditional approve, partial approve. The richer the options, the better the outcomes.

Human-in-the-loop isn't a feature you add at the end. It's an architectural decision that shapes how your AI system works.

Put gates where they matter — irreversible actions, high stakes, policy boundaries. Give humans context to decide quickly. Enforce timeouts so nothing blocks forever. Collect feedback so the system learns.

The best HITL systems make themselves less necessary over time. That's the goal.