dimamitya

← writing

boring enough to trust

most agent demos are exciting. most agents in production are not. that’s the goal.

a harness is the scaffolding around a model: the eval tasks, the sandbox, the verification, the retries, the place where you decide what “done” means. it’s unglamorous. it’s also the only reason you can trust a fleet of agents to run without you watching.

the job

more soon.