Alex ChernyshAlex ChernyshAgentic behaviorist · Tel Aviv
WritingAssistant
Back to notes

Note

Spec-Driven Development: the workflow I actually use

How I use a lightweight spec-driven workflow in real projects, what SDDRush automates, and where Kotef fits if you want a stronger agent layer.

February 6, 2026·5 min read
AgentsWorkflow
On this page(8)
What it solvesWhat SDD looks like in practiceWhat SDDRush gives youQuick startWhere Kotef fitsWhen this approach pays offWhat not to doRepositories

I use a lightweight spec-driven workflow in real projects. The point is simple: move intent, research, architecture, and tickets out of chat memory and into files. Over time I packaged the repeatable parts into SDDRush.

The loop in one pass

  • write the brief in plain language
  • gather only the research that changes decisions
  • make architecture explicit before implementation starts
  • sync the backlog into durable tickets
  • implement against the ticket, then clean up the state
Compact SDD view
The shape is intentionally small: pin down the work, turn it into tickets, then implement and clean up.

What it solves

The failure mode is old and still common. The plan lives in chat.

Looks fast for a day or two. Then the task spans another session, another engineer, or another agent, and the shape of the work starts to drift.

What breaks first. The original intent gets watered down. Architecture gets re-decided in fragments. Tickets stop matching the real plan. Handoff depends on memory and optimism.

That is the part I want to remove.

What SDD looks like in practice

The workflow is compact:

  1. write a project brief
  2. gather focused research
  3. make architecture decisions explicit
  4. sync the backlog into concrete tickets
  5. implement against the open ticket
  6. clean up status and close what is actually done

Nothing exotic. That is the point. I am not trying to invent a new religion for software delivery. I want a repo trail that survives more than one conversation.

Practical threshold

If a task has enough ambiguity that you would need to explain your reasoning twice, it usually deserves a short spec trail once.

What SDDRush gives you

SDDRush is the small toolkit I built around this workflow. It handles the parts I got tired of recreating by hand.

sdd-init creates the .sdd workspace and the basic file structure. sdd-prompts renders repo-aware prompts from the current project state. sdd-backlog syncs and maintains backlog tickets. sdd-status gives a quick read on what is still open and what has already moved.

Enough for a lot of work. If the repo matters, the task spans multiple sessions, and you do not want the project brief to dissolve into folklore, the toolkit is useful.

Quick start

bash bin/sdd-init /path/to/project --stack "Python/FastAPI" --domain "legal"
python bin/sdd-prompts /path/to/project
python bin/sdd-backlog sync /path/to/project
python bin/sdd-status /path/to/project

Then fill the core .sdd files, implement against backlog/open, and run janitor/status once the repo evidence catches up.

Where Kotef fits

If you want to push the same workflow further, Kotef is the agent layer I built around the same ideas. The more ambitious companion. Not the thing you need first.

Kotef can bootstrap SDD state for a repo, reason over tickets instead of freestyling from a naked prompt, run a planner → researcher → coder → verifier → janitor loop, keep runtime state, checkpoints, and resume paths for longer tasks.

Useful when you want a durable coding and research agent for real repositories. Does not change the underlying point. The workflow needs structure before the agent deserves autonomy.

When this approach pays off

I reach for this setup when the task has more than one non-trivial decision, the repo context matters, the work will span more than one session, another engineer may need to audit the path later.

I do not reach for it when the task is tiny and the overhead outweighs the value.

This is also the workflow I trust for benchmark-style work where grounding, latency, and telemetry matter at once. The Agentic RAG Legal Challenge is a current example, where the system is judged as a full pipeline rather than as a clever prompt.

What translated well there was discipline. Small diffs with visible lineage. Explicit gates for grounding and provenance. Branches that looked good but made the evidence trail worse got killed early.

What did not help. Broadening context for its own sake. Changing too many variables in one run. Trusting a cleaner-looking answer before the path behind it was stable.

What not to do

Common mistakes:

  • turning every small bugfix into a ceremony
  • starting with the agent while the task is still vague
  • treating prompt output as durable project memory
  • writing specs that sound complete but do not constrain implementation

The method should reduce guesswork. If it mainly produces prettier paperwork, something has gone wrong.

This workflow is useful because it makes intent easier to inspect, implementation easier to hand off, backlog state harder to fake. SDDRush exists because I use this pattern enough to want the boring parts scaffolded. Kotef exists because sometimes I want to automate more of the same loop without throwing the structure away.

Resources

Repositories

  • SDDRush on GitHub
  • Kotef on GitHub

✓ Reading complete

Alex ChernyshAlex ChernyshApplied AI Systems & Platform Engineer

More on Agents

Part of the public notes on grounded AI systems, retrieval, evals, and shipping under real constraints.

  • →I Ran 12 AI Agents for 47 Hours. Here's What Survived.Mar 29, 2026·7 min read
  • →Getting AI-Assisted Development to Green Without Breaking the CodeMar 4, 2026·5 min read
  • →Building Agentic AI Systems That Hold UpMar 2, 2026·5 min read
On this page
  • 01What it solves
  • 02What SDD looks like in practice
  • 03What SDDRush gives you
  • 04Quick start
  • 05Where Kotef fits
  • 06When this approach pays off1 min
  • 07What not to do
  • 08Repositories