Research.tech · Krakow · April 2026

Harness
engineering.

Making AI-assisted development actually reliable.

The problem

Better prompts won't save you.

  • Complex task, inconsistent output.
  • More instructions, more drift.
  • Bigger model, same failure mode.
The fix

Harness = scaffolding around the model.

01 — Context

What it sees.

AGENTS.md hierarchy, path-scoped rules, generated docs injected on demand.

02 — Enforcement

What it can't skip.

Hooks that run code, not vibes. Verification as a hard gate, not a prompt.

03 — Determinism

What it shouldn't think about.

If inputs map to outputs, ship a CLI. The agent invokes — it doesn't reinvent.

04 — Dispatch

Who helps it.

One orchestrator, specialist sub-agents, parallel models, budgeted edits.

Before we dive in

Run it remote. Not on your laptop.

  • Disposable — agents can't wreck your machine.
  • Reproducible — Dockerfile + JSON config lock the env.
  • Parallel — many agent sessions, no collisions.
  • Powerful — throw a 32-core VM at it when you want.

A devcontainer is Docker + a small spec. VS Code, Codespaces, or any SSH target opens it.

Demo 01

Context in action.

AGENTS.md hierarchy .claude/rules/* rules-injector hook
code.claude.com/docs/en/hooks-guide
Pillar 02 — Enforcement

Hooks run code, not vibes.

Prompts apply only when the model decides to apply them. Hooks fire every time. That's the difference between suggestion and guarantee.

Hooks — surface

Every meaningful moment is a hook point.

Claude Code hook lifecycle
Hooks — resolution

You target with surgical precision.

How a PreToolUse hook resolves
Prompts suggest.
Hooks guarantee.
Pillar 03 — Determinism

If it's deterministic, don't ask the LLM.

  • Same inputs → same outputs → ship a CLI.
  • The agent invokes. It doesn't reinvent.
  • Cheap, fast, testable, diffable. No token tax.
check:architecture generate:skills-inventory rt-debug investigate agent-browser bun run verify
code.claude.com/docs/en/skills
Trusted workflows

Skills turn prompts into procedures.

  • Rigid workflow — same steps, every time.
  • Verification built in — the skill ends when it's proven.
  • Dispatches sub-agents — the skill decides when to fan out.
  • Versioned in the repo.claude/skills/*/SKILL.md.

Recipes, not suggestions. The skill enforces the steps.

Demo 02

/investigate in action.

Sentry · spans · logs · Postgres parallel hypotheses one root-cause report
Pillar 03 — Dispatch

One orchestrator. Many specialists.

  • Claude Code runs as the orchestrator.
  • Codex · Gemini · Claude dispatched in parallel.
  • .subagents/ — pessimist, optimist, risk-analyst.
  • Budget system — N direct edits per sub-agent dispatch.
code.claude.com/docs/en/scheduled-tasks
The time axis

Loops: agents that don't sleep.

  • Wrap any skill on an interval — /loop 3m /watch-pr
  • Polls, retries, audits — without you in the seat.
  • Scheduled tasks turn Claude from REPL into a daemon.

Leave it running. Come back to results.

The closed loop

Nothing is "done" until verified.

  • Edit → verify-reminder → escalating nudge.
  • Finish → verify-stop → hard block.
  • Skill → adherence budget → block on drift.

Self-correcting, not drifting.

If you're building on agents

Four rules.

  • 01Hook what matters. Don't trust prompts.
  • 02Ship a CLI for deterministic work. Don't make the LLM think.
  • 03Make verification non-skippable. A gate, not a suggestion.
  • 04Orchestrate models. Don't pick one.

Questions & pushback welcome.

Yuriy Babyak
Yuriy Babyak
Co-founder, Research.tech
Scan to connect
Scan to connect
Harness engineering · Yuriy Babyak
1 / 13

Keyboard shortcuts

→ · SpaceNext slide
Previous slide
Home · EndFirst · Last slide
FToggle fullscreen
?Toggle this help
EscExit fullscreen / close help

Click anywhere on the left or right half of the slide to navigate.