Technical White Paper

THRAWN

The first native macOS autonomous multi-agent AI command system — built for operators, not researchers.

Classification
Confidential · Internal
Version
2.0 — V2 Agents
Platform
macOS · Swift / SwiftUI
Date
April 2026

An AI workforce. Not a chatbot.

"Thrawn is the first system that treats AI agents as persistent team members with memory, identity, scheduled work, and real system access — running autonomously while you sleep."

Multi-agent AI is a crowded conversation. Most implementations are cloud-hosted SaaS platforms with vendor lock-in, or fragile Python scripts that require an engineer to maintain. Neither gives a solo founder or small team what they actually need: autonomous AI teammates that show up every day, remember what they learned, and get work done without babysitting.

Thrawn is a native macOS application — written entirely in Swift and SwiftUI — that runs a persistent multi-agent command system locally. It ships with a full agent fleet, a structured task board protocol, real shell execution capabilities, multi-provider LLM routing, and a self-improvement mechanism through persistent memory and accumulated skill files.

This document covers the technical architecture, the major evolutionary jumps the system has undergone, the novel design decisions that distinguish it from comparable efforts, and the philosophy behind the V2 agent model.

7
Active Agents
15m
Lead Heartbeat
3
LLM Providers
Runs Unattended

Why existing tools don't cut it

The current state of multi-agent AI tooling falls into three unsatisfying categories:

Category Examples Persistent? Local? Real Shell Access? Self-Improving?
Cloud SaaS agents AutoGPT, CrewAI Cloud Partial
Python frameworks LangGraph, CrewAI Sandboxed
IDE agents Cursor, Copilot Partial Code only
OS-level assistants Apple Intelligence Partial
Thrawn ✓ Always-on ✓ Native ✓ Full ✓ Memory + Skills

The key gap: no existing tool treats agents as persistent entities with memory, identity, and scheduled autonomous work. Every session starts from zero. Thrawn doesn't.


How it's built

Thrawn is a fully native Swift Package Manager project, compiled to a macOS .app bundle with ad-hoc code signing. No Electron. No Python runtime. No web server. The entire system runs as a single native process.

System Architecture — Layer View
UI LAYER AgentRailView ThreadStore TaskBoardView FlowBoardView SetupWizard CORE SERVICES AgentScheduler TaskDispatcher ExecutionService AgentSpecStore FlightRecorder PROVIDER ROUTER OllamaClient (local · free) AnthropicClient (cheap) OpenAIClient (premium) AGENT FLEET Thrawn R2-D2 C-3PO Qui-Gon Lando Boba Bart ★ v2 PERSISTENCE ~/Library/Application Support/Thrawn/ — Task Board · Agent Memory · Skill Files · Handoffs · Flight Recorder · Keychain

Key Architectural Decisions

🍎

100% Native Swift

No Electron, no Python runtime, no web server. Swift Package Manager compiled to a signed macOS .app. This means low memory overhead, fast startup, and direct access to macOS APIs — Keychain, Contacts, Calendar — not just the web.

💾

Persistent File-Based Memory

Every agent has a knowledge/ directory that survives restarts. Agents append facts.md entries, write skill files, and read them back on the next heartbeat. Learning accumulates continuously — not just within a session.

Timer-Driven Heartbeats

No cron, no external process. Swift timers fire every 30 seconds to check the schedule. Thrawn fires every 15 minutes; specialists every hour at their own offsets. The factory runs while you sleep.

🔀

Provider Router with Fallback Chain

Every agent resolves its model at heartbeat time: local → cheap → premium. If OpenAI is down, it falls back to Anthropic; if that's down, Ollama. Reliability beats model quality. Nothing silently fails.

🔓

UNLEASHED Mode

When enabled, agents receive their bash output back and can execute real shell commands extracted from LLM responses. File writes, API calls, curl requests — agents do real work, not simulated work.

📋

Structured Task Board Protocol

Agents never edit the task board directly. They write to agent-updates.json; the TaskDispatcher processes it and applies changes. This serializes all board mutations and prevents race conditions between agents.


The model doesn't matter. Availability does.

Thrawn's ProviderRouter is a deliberate departure from how most systems handle LLM selection. Rather than locking an agent to a provider, the router resolves the best available option at the moment of the heartbeat — and always has a fallback.

Provider Resolution — Decision Flow
Agent Heartbeat resolvedTier(agentId) Tier? from AgentSpec Ollama · kimi-k2.5 .local .premium OpenAI key configured? gpt-4.1 yes ✓ OpenAI GPT-4.1 no Anthropic · Sonnet Ollama · last resort ★ Thrawn (lead) runs on premium tier. Specialists run on local by default.

This design means the system never fails silently. An agent might respond slower if degraded to Ollama, but it always responds. Reliability is the first-class constraint.

Per-agent model binding is stored in agent-specs.json as either .inherit (follows the standard loadout default) or .explicit(tier). Thrawn and Bart are explicitly set to .premium. The rest of the V1 squad inherits, which resolves to .local (Ollama), keeping costs near zero for routine operational work.


The factory floor

The task board is a Markdown file at ~/Library/Application Support/Thrawn/workspace/ops/TASK_BOARD.md. It is the single source of truth for all work in flight. Every agent reads it on every heartbeat. Only Thrawn puts tasks into Ready status. Specialists pick up tasks assigned to them.

Task Lifecycle — Owner/Status State Machine
Thrawn Owner=Thrawn Status=Planning assigns Specialist Owner=Agent Status=Ready ★ heartbeat picks up In Progress Owner=Agent Status=InProgress completes Back to Thrawn Owner=Thrawn Status=Ready ★ ★ READY is the only pickup lane. Thrawn is always the hub.

"Ready is the only pickup lane. Thrawn is always the hub." — The core invariant of the task relay system.

Agents write their updates to agent-updates.json — a sidecar file that the TaskDispatcher processes asynchronously. This prevents any agent from corrupting the board directly. The dispatcher is the only writer to TASK_BOARD.md, making the system safe even with multiple concurrent heartbeats.


How we got here

Thrawn has gone through several distinct jumps in capability. Each one represents a different class of problem solved — not an incremental improvement but a qualitative shift in what the system can do.

Origin
Architecture

Native macOS App · Ollama-Only

The core decision: build a native Swift app, not a script. Established the SwiftUI architecture, ObservableObject graph, and the fundamental loop: heartbeat → LLM → file write → board update. Chose Ollama for zero-cost local inference as the always-available backbone.

V1 Squad
Agents

Full Dev-Ops Squad Deployed

R2-D2 (Dev), C-3PO (Data), Qui-Gon (Research), Lando (Marketing), Boba Fett (QA). Each with its own heartbeat offset, identity file, heartbeat instructions, and knowledge directory. The factory came online — six agents running concurrently on independent schedules.

Milestone
Security

UNLEASHED Mode · Real Shell Execution

The most significant capability jump. In UNLEASHED mode, agents' bash command blocks are parsed from the LLM response and executed via ExecutionService. Results feed back into the next heartbeat. Agents stopped being chatbots and became real workers — writing files, calling APIs, running scripts.

Routing
Model

Multi-Provider Router · Anthropic + Gemini

Added AnthropicClient and GeminiAPIClient with the ProviderRouter fallback chain. Any agent can now be pinned to a tier (local/cheap/premium) without changing code — resolved at heartbeat time. This decoupled model selection from agent identity for the first time.

Premium
Model

OpenAI GPT-4.1 · Thrawn Goes Premium

Full OpenAI SSE streaming integration. The ProviderBackend enum gained a .openai case; AgentScheduler gained the sendToActiveProvider switch branch; ThreadStore gained full provider-aware routing. Thrawn (lead) and V2 agents run on GPT-4.1. Specialists stay on free local Ollama. Cost is controlled by design.

Thread
UX

Command Tab Routes to GPT-4.1

ThreadStore — the direct chat system — was extended to route to OpenAI when configured, falling back to Ollama. Every Command thread is now running on the same brain as Thrawn's autonomous heartbeat. Consistent intelligence across interactive and autonomous modes.

V2
Agents

V2 Agent Philosophy · Bart Deploys

A fundamental rethink of what an agent is. V1 = job description. V2 = distilled human archetype. Bart Simpson: smart ass, brilliant, multi-source web research from a single prompt, no methodology explanation. The architecture now supports personality-first agents with single-purpose superpowers running on premium models.

Identity
UX

Pixel Art Agent Profiles

Agents now have 8-bit pixel art profile pictures on their cards — loaded from DiceBear's pixel-art sprite API, seeded by agent ID for consistency. Status jewel moves to overlay dot. When offline, falls back to a deterministic procedural pixel pattern. Identity is now visual.


Who's running

Heartbeat Schedule — One Hour Window
:00 :10 :20 :30 :40 :50 :00 T T T T B★ R2 3P QG La Bo Thrawn (every 15 min) Bart · v2 (hourly, :15) V1 Specialists (hourly, offset)
🎯
Thrawn
Lead · Premium · :00/:15/:30/:45
⚙️
R2-D2
Dev · Local · :10
🗂️
C-3PO
Data & API · Local · :20
🔍
Qui-Gon
Research · Local · :30
✍️
Lando Calrissian
Marketing · Local · :40
🎯
Boba Fett
QA & Recon · Local · :50
😈
Bart ★
Research & Intel · GPT-4.1 · :15

Real people, distilled

V1 agents are job descriptions. V2 agents are people. The distinction sounds subtle but it changes everything about how you think about what to build next and what to give them to do.

V1 — Job Description Model

Agent defined by role, responsibilities, outputs, escalation criteria. Complete. Formal. Built to cover a function exhaustively. Like a job posting.

V2 — Human Archetype Model

Agent defined by personality, quirks, specific superpowers, and specific blind spots. Frank knows every coffee spot and never misses a day. Danny is a genius but a wildman — you can't count on him. Brenda keeps Danny in line. Built like a real team, not a org chart.

The insight is that specialization through personality is more powerful than specialization through job function. A V2 agent doesn't need to be capable of everything in its domain — it needs to be extraordinary at one thing, while the constraints (personality, reliability, style) are a feature, not a bug.

V2 Agent Characteristics Spectrum
Reliability
95%
Specialization
88%
Personality Depth
92%
Breadth of Role
40%

Low breadth is intentional. A V2 agent that does one thing extraordinarily well and has a distinct voice is more valuable than a generalist that covers everything adequately.


Agents that get smarter over time

Most AI agent systems are stateless — every session starts fresh. Thrawn has two mechanisms for persistent learning:

Continuous Learning Loop
Heartbeat Fires System Prompt Built Memory + Skills Injected into prompt facts.md + skill files LLM Responds with richer context Writes New Facts + Skill Files feeds back into next prompt

Memory (facts.md) — A persistent Markdown file each agent appends to. User preferences, project context, routing decisions that worked. Specific, dated, accumulating. Injected into the next heartbeat prompt so the agent starts where it left off.

Skill Files — When an agent solves something complex, it writes a skill file: the exact procedure, gotchas, when to use it. Next session, that skill is injected back. The agent doesn't re-learn; it references. This is the closest thing to genuine institutional memory in current AI systems.


Why this is different

Multi-agent frameworks are not new. What's new here is the combination — and the platform decision.

🏠

Native First, Always

Writing a multi-agent system in Swift for macOS is unusual to the point of being novel. It means zero Python dependency, direct Keychain access, macOS-native scheduling, and the ability to eventually access Calendar, Contacts, Photos — surfaces no web-based agent can touch.

🧠

Identity-Based Routing

The AgentSpecStore resolves which model tier each agent uses based on its identity, not the request. Thrawn is always premium. Specialists are always local. The cost structure is baked into the agent design, not decided at call time.

🔄

Separation of Board and Mutation

Agents never write the task board directly. They write a sidecar update file. The dispatcher serializes all mutations. This is a safety pattern borrowed from distributed systems applied to a local multi-agent problem — and it works.

👥

Personality-First Agent Design

V2 agents are defined by who they are, not what they do. This is a meaningfully different mental model that produces better prompts, more consistent behavior, and a more intuitive interface for operators adding new agents.

💤

Unattended Operation

The system is designed to run while the operator is away — overnight, across meetings, on weekends. No polling required. Heartbeats fire from Swift timers. UNLEASHED execution happens automatically. The factory never stops.

📈

Accumulating Intelligence

Each heartbeat is slightly smarter than the last because agents append to memory and skills. The value of the system increases over time — not because the models improve, but because the context does. This is a compounding moat.


Where this goes

The current system is functional, battle-tested, and running autonomously. The architectural foundation supports expansions that most comparable systems would require a complete rewrite to achieve.

👁️

Vision Agents

Screen capture integration is already in ScreenCaptureStore. Vision-capable agents that can read and act on what's on screen are a single provider + prompt extension away.

📅

Calendar / Contacts Integration

Native macOS app means direct entitlement access to Calendar, Contacts, Reminders. Agents that know your schedule and people — without any OAuth dance.

🤝

Agent-to-Agent Communication

Current architecture is hub-and-spoke through the task board. Direct agent handoffs — R2-D2 asking Bart to research something mid-task — are a natural next step.

🎙️

Voice Interface

The existing thread system already handles async messaging. A voice layer that converts speech to thread messages and reads responses is architecturally straightforward.

🌐

Browser Automation

UNLEASHED + AppleScript/Accessibility APIs gives agents the ability to drive Safari and Chrome. Real browser control without third-party dependencies.

🏪

Agent Marketplace

V2's personality-first agent design is inherently shareable. An agent is an identity file + heartbeat file + spec entry. Distributable. Installable. A community of archetypes.


A different bet

"Everyone is building agents. Very few are building teammates."

The dominant paradigm in AI tooling today is session-based, request-response, UI-first. Thrawn makes the opposite bets: persistence over sessions, scheduled over reactive, native over web, personality over function, local over cloud-first.

None of these bets are obviously correct for a mass market. All of them are obviously correct for the operator — the founder, the solo executive, the small team — who needs AI teammates that show up, remember, and work autonomously. That person doesn't need another chatbot. They need a factory.

Thrawn is that factory. And it's running right now.

Always Running
Sessions of Memory
V2
Agent Model Live
0
External Dependencies