software engineer · ai agent systems

i build reliable systems for AI agents.

harness juggler. rl environments, orchestration, agent-safety, and the eval harnesses that keep fleets of agents honest.

book a call email me

// about

about

i build the systems ai agents run on: rl environments, eval harnesses, multi-agent orchestration, and the reward design and agent-safety that keep them honest. reward that resists goodhart; safety that's architectural, not just prompt-level.

before that, three years shipping web and smart contracts. i like systems that are boring enough to trust: explicit over implicit, signal without noise, the minimum code that solves today's problem.

// skills

skills

languages

rust
python
typescript

ai / agents / eval

rl-environment design
eval-harness design
reward & grader design
anti-reward-hacking
agent-trajectory data
multi-agent orchestration
agent-safety architecture
llm evaluation

tools

claude code
codex
gemini cli
harbor
terminal-bench

backend / systems

fastapi
async / concurrent
postgres
redis
distributed systems
docker
prometheus

infra

fleet automation
provisioning clis
ci/cd
linux

// work

selected work

claudima ↗
production multi-agent platform in rust: json-schema-typed tools, a sandboxed subagent-spawning primitive, and a two-process agent-safety model that structurally blocks prompt injection from reaching code execution.
foundry ↗
solo full-stack parametric jewelry cad studio. next.js + fastapi + cadquery, with a castability engine.
open source ↗
tooling and harness extensions to open-source agentic-evaluation frameworks like harbor and terminal-bench.

// activity

activity

14,941 contributions in the last year

last year on github ↗lessmore

// writing

writing

boring enough to trust2026-06-30

all posts →

// work with me

work with me

pick what fits, then book below.

intro
15m · free
quick fit-check. is this worth both our time?
agent reliability audit
paid
i pressure-test your agent stack for reliability, prompt-injection, reward-hacking, and eval gaps, then hand you a prioritized report.
harness / eval build
project
a custom eval harness, benchmark, or rl environment for your agents or models.
working session
60m · paid
bring a live problem: flaky multi-agent, reward getting gamed, sandbox design. we fix it together.
hiring
30m · free
you're hiring for agent-infra or eval and think i fit.

embed not loading? open the booking page ↗

// contact

contact

the fastest ways to reach me:

i build reliable systems for AI agents.

about

skills

selected work

claudima ↗

foundry ↗

open source ↗

activity

writing

work with me

intro

agent reliability audit

harness / eval build

working session

hiring

contact