Most agent loops in 2025 picked a fitness function once and held it constant. SYNAPSE was an attempt to put the decision about what "good" means for this run inside the loop.
The shape of the loop
Most agent frameworks in mid-2025 read the same way. A planner emits a step. An executor runs it. A scorer reports a number. The loop closes. The fitness function (the thing that decides whether the run is going well) is a constant. You set it at the start and live with it.
SYNAPSE tries to break that assumption. In real engineering work the right trade-off is not stable. A first pass cares about correctness. A hardening pass cares about safety and risk. A spike near a deadline cares about wall-clock time at the expense of maintainability. If the agent operates across all those phases, it has to be allowed to revise the criteria, explicitly, legibly, with a record.
The loop has five steps. Generate a candidate. Validate it against quality gates. Score the result against the current metric profile. Adjust the profile if the scenario warrants. Pick the next move. The first and last touch an LLM. The middle three are deterministic Python.
The novel piece is step four. The agent reads the scenario, picks a metric profile (lean into safety because the corridor is noisy; lean into time because the deadline is hard), and re-evaluates the next candidate against the new weights. The MCDM7 vocabulary (PerfGain, SecRisk, DevTime, Maintainability, Cost, Scalability, DX) gives the profile a shape you can argue with. Weight vectors live in a config. Decision logs live in a file. Nothing hides in chat.
The synthetic experiment
The conceptual loop above asks for a much bigger evaluation harness than I built. What actually shipped is a single proof-of-concept run: a continuous 2D pathfinding problem under dynamic wind. Two agents try to move a simulated drone from a start point to a goal under conflicting pressures — time, energy, safety margin, payload integrity.
- StaticAgent uses a fixed weight vector across the whole run.
- SYNAPSEAgent reads the scenario, picks a metric profile (here: lean into safety because wind makes the corridor noisy), and re-evaluates each step.
The question was narrow: under one adversarial scenario, does adapting the criteria actually change the chosen path in a measurable way?
| Agent | Energy | Safety (lower = safer) | Time | Path found |
|---|---|---|---|---|
| StaticAgent | 170.28 | 3.97 | 59.71 s | yes |
| SYNAPSEAgent | 122.32 | 1.24 | 61.50 s | yes |
SYNAPSEAgent used ~28% less energy and scored ~3.2× better on the safety metric (1.24 vs 3.97), with a ~3% time penalty. The CSV is committed verbatim at results/experiment_results_20250708_225100.csv. Nothing has been smoothed.
What worked
A few things held up better than I expected.
The architecture pattern. Separating WHAT the agent optimises for from HOW it executes is a clean cut. Once the metric profile is a first-class object (a dict in a YAML file, not a sentence in a prompt), the agent stops arguing with itself about whether to favour speed or safety. It picks a profile, executes against it, and the next adjustment is visible in a diff.
The vocabulary. MCDM7 is opinionated enough to be useful and small enough to remember. It maps onto the trade-offs senior engineers already negotiate verbally. Making them explicit is the move.
Deterministic where possible. The orchestrator burns no LLM tokens on its own scheduling. Every decision the control plane makes is reproducible from a config and a seed. That is a debuggability argument. When something behaves wrong, you read the decision log instead of guessing what the model thought.
What did not work
Some pieces were genuinely premature.
No real CLI agents to orchestrate. Pre-Claude-Code era. No mature agentic CLI that takes a brief, edits files, runs tests, and returns. SYNAPSE describes a control plane for tools that did not yet exist. The LlamaAdapter in the prototype is a thin wrapper around Ollama. It can generate a candidate. It cannot operate a repository.
No grounding layer. Pre-MCP. The agent has no standardised way to call into a filesystem, a build tool, a linter, or a database. Every adapter has to be hand-rolled. The cost of "give the agent a real tool" is high enough that the prototype only validates the loop on a closed simulation.
The experiment is too small to claim anything statistical. One scenario, one seed, no factorial design, no significance test. The numbers are real. The story they tell is small. The roadmap file in the repo notes this as the gap that closes "research preview" into something publishable.
What this became
SYNAPSE was the sketch. The shape it argued for matured across two follow-up projects.
Kotef — durable single-agent runner
Kotef took SYNAPSE's loop and put it on real repositories. A supervisor flow (planner → researcher → coder → verifier → janitor) runs against a real codebase, with durable state in .sdd/runtime/, MCP-grounded tools, resume by thread ID. Single-agent. The metric profile became a quality-gate config. The adaptive layer became a backlog-driven planner that re-derives priorities each tick. Kotef is the reason I trusted that the loop survived contact with file systems.
Bernstein — the deterministic control plane at scale
What Kotef was for one agent, Bernstein is for many. A deterministic Python scheduler decomposes a goal, dispatches short-lived agents into isolated git worktrees, verifies output through a janitor, commits what survives. The LLM writes code. The orchestrator decides what runs and what merges. The decision log is a directory of plain files. The vocabulary changed (tasks, adapters, budget caps, MCP integration). The shape (generate, validate, score, adjust criteria, choose next move) is the same five-step loop SYNAPSE drew on the whiteboard. Kotef's lessons about durable state and backlog-driven planning landed there too.
The repo stays public because the lineage is more honest than the polish. SYNAPSE is a small piece of evidence that the loop works on the easy case. The production systems that came later are the real argument. This one is the index card pinned above the desk.
The orchestrator is the product. It is allowed to change its mind about what counts as success. The condition is that every such change stays legible and replayable, and that a human can override it.
Repositories
- SYNAPSE on GitHub — the 2025 prototype
- Kotef on GitHub — durable single-agent runner that came next
- Bernstein on GitHub — multi-agent control plane shipped from the same DNA
Related reading
- I ran 12 AI agents for 47 hours
- Building agentic AI systems that hold up
- Spec-driven development: the workflow I actually use
- Getting AI-assisted development to green
Further reading
- Hwang & Yoon, Multiple Attribute Decision Making (1981) — the TOPSIS lineage that runs through every adaptive-metric routine here.
- Sutton & Barto, Reinforcement Learning: An Introduction (2nd ed., 2018) — the policy-iteration framing for the outer loop.
- Brooks, The Mythical Man-Month (1975) — conceptual integrity as the engineer's first job.
— Alex Chernysh, alexchernysh.com