The previous post covered how we structured the codebase: Effect conventions, ast-grep enforcement, Drift, and CLAUDE.md to collaborate with Claude Code. You describe what you want, review the output, iterate. That works well. This post is about what happens when you step away entirely: giving the agent a list of issues and letting it work through them while you do something else. In autonomous mode, there’s no mid-session correction. Anything the enforcement layer doesn’t catch compounds across commits which is why clean issue tracking, issue review and end-of-session QA matter more, not less.

Otter is an opinionated monorepo template that includes the enforcement layer (Effect-ts, ast-grep rules and Drift), plus fp issue tracking, lifecycle extensions, and agent skills for autonomous work.

fp: Issue Tracking Built for Agents

Otter uses fp for task tracking. Fp is local-first. Issues as markdown live in .fp/ and are committed alongside the code.

What makes fp work for agents is the lifecycle it enforces. FP_AGENTS.md defines the full work session flow, and Claude Code loads it as part of project instructions:

fp issue list --status todo                  # find available work
fp issue update --status in-progress <id>    # claim before starting
# ... work and commit ...
fp comment <id> "progress note"              # log at milestones
fp issue assign <id> --rev <commit>          # attach commits to the issue
fp issue update --status done <id>           # mark complete

The agent claims a task before starting, logs as it goes, and closes with the commits attached. You can come back hours later and reconstruct what happened from the issue’s history alone.

fp Extensions

fp supports TypeScript hooks that fire on lifecycle events. Otter ships with three.

auto-done

When all children of a parent issue are marked done, the parent closes automatically. No loose epics left open after the work is finished.

Check-before-done

Before the agent can mark an issue done, it must run bun run check. If the check fails, the transition is blocked. This came from Anthropic’s long-running agents post: verify codebase health at the start and end of every task, or you’ll compound problems across sessions. An agent that leaves broken code behind makes the next agent’s job harder.

Update-docs

Transitioning an issue to done triggers a reminder to update docs and re-stamp any drift anchors on changed code.

These hooks work the same way ast-grep does: they fire at the moment they’re relevant. CLAUDE.md tells the agent the conventions at session start; the hooks surface them again at the decision point, when the original instructions have long since scrolled out of context.

Writing Good Issues

Before handing off to an autonomous agent, there’s one thing worth doing carefully: writing the issues.

An issue the agent can actually work from has a clear title, enough context that it doesn’t have to guess at intent, and subissues for any separable concerns

fp issue create --title "Add rate limiting to the GitHub adapter"
fp issue create --title "Validate GitHub adapter request schema" --parent 12
fp issue create --title "Return typed RateLimitError on 429 responses" --parent 12
fp issue create --title "Log retry attempts with Effect.logWarning" --parent 12

The agent walks the tree with fp tree, picks up subissues in order, marks them done as it goes.

QA Scenarios and Drift

Before handing off, it’s worth setting up QA scenarios alongside the issues. These are markdown files that describe how to test the feature: what commands to run and what output to expect. The agent runs them after completing work to confirm the implementation actually functions.

The problem is the same one drift was built for: as code changes across sessions, the scenarios go stale. An agent renames a command, and the scenario is still testing the old one. Unlike broken types, there’s no compile error to catch it.

Anchoring QA scenarios with Drift solves this. When the code a scenario references changes, drift lint fails. This is the same gate as ast-grep. The agent can’t mark the task done without addressing it.

One detail worth knowing: drift’s --doc-is-still-accurate flag replaces --force. Before relinking an anchor, the agent has to explicitly confirm it has read both the scenario and the changed code. Without this, agents tend to relink blindly which leads to the lint error clears, but the scenario still describes behavior that no longer exists.

Commit Before Context Compaction

FP_AGENTS.md has a rule that sounds minor but matters a lot in practice: commit before context compaction

When a Claude Code session runs long, the context window compresses prior conversation history. Work that hasn’t been committed only exists in an unsaved file state. It’s an insurance policy against partial sessions.

In practice: commit after each subissue closes, commit after each meaningful milestone within one, and leave a fp comment before ending the session. Every commit is tied to an issue and nothing is lost if a session terminates mid-task.

What bun run check Catches

In interactive mode, you catch problems as they appear. In autonomous mode, bun run check is that feedback loop. The agent runs it after every change and uses the output to self-correct before marking a task done.

bun run checks

Which runs:

  1. oxlint: standard TypeScript issues
  2. oxfmt: formatting (2-space, 100-char, double quotes, sorted imports)
  3. tsgo: type checking with the native TypeScript compiler
  4. ast-grep scan: the Effect-specific architectural rules
  5. drift lint: stale documentation specs

If any of these fail, the agent reads the error, fixes it, and re-runs. The rules are specific enough that most failures are self-correcting: no-silent-catch fires, the agent adds the log before the catchAll, the check passes.

The only thing that isn’t automated is you reviewing the diff before merging. That review is much more targeted because the mechanical stuff was already caught.

Debugging What Happened

When something goes wrong in an autonomous session, you need to reconstruct what happened without replaying the conversation.

fp comments are a breadcrumb trail. The agent logs at every milestone. If an issue ended in-progress instead of done, the comments tell you exactly where it got stuck.

Effect traces are execution evidence. Run with EFFECT_TRACE=1` and the span output is structured JSON on stdout. The agent can grep it to verify behavior without you being there:

EFFECT_TRACE=1 bun run my-app 2>&1 | grep '"name":"fetchUser"'

Spans include duration, parent spans, and any logs emitted inside them. A span with "status":"ERROR" tells the agent something failed and where to look. That’s a lot more actionable than a stack trace.

Where Orchestration Gets Hard

The fp workflow handles a single agent working through a task list sequentially. Where it breaks down is dependent tasks. When one piece of work can’t start until another finishes. Without explicit dependencies, an agent will pick up tasks in the wrong order, or an orchestrator needs to be smart enough to reason about sequencing on its own. Neither is solved out of the box. If you’re building toward parallel agents working the same codebase, that’s the problem you’ll hit first. We will discuss how to solve task management for parallel agents in a future article.

What Changes in Practice

The enforcement layer from the first post is the foundation. Without it, autonomous sessions accumulate problems faster than they can be caught. But enforcement alone isn’t enough. The task lifecycle, QA scenarios, and commit discipline are what make it possible to hand off a session and come back to something coherent.

The agent isn’t doing more. The work is tracked, the verification runs automatically, and the audit trail is there when you come back. Otter packages all of this into a template so the scaffolding is in place before the first commit.