RightLayout: Shipping a Mac AI Tool, Then Letting Go
Why I trained a small CoreML model from scratch for a Mac keyboard-layout corrector, used it for months, and then open-sourced it instead of scaling it.
Notes
Notes on retrieval, evals, observability, and the engineering that starts once the demo is the easy part.
Why I leave the future to astrology and reach for reference classes, premortems, and calibration logs instead. Disciplined uncertainty in plain text.
Why I stopped applying the LinkedIn way and built a quiet service that does the grind for me, and now for you. Drop a résumé once, get a ranked daily shortlist with a one-line pitch hint per role.
Open-source deterministic orchestrator for parallel CLI coding agents. Runs Claude Code, Codex CLI, Gemini CLI in parallel: zero coordination tokens, 37 adapters, janitor verification, git worktree isolation.
A short note from Israel on what repeated alarms do to attention, engineering judgment, and team habits, and which working practices make interruption easier to absorb.
A practical blueprint for legal QA, shaped in part by work around the Agentic RAG Legal Challenge: document identity, hybrid retrieval, structured answers, page-level grounding, telemetry, and evals.
A practical guide to LLM product safety: prompt injection, excessive agency, unsafe outputs, evals, and sober boundaries.
A practical memo on calm authority, visible product care, restrained motion, and why trustworthy interfaces feel expensive.
Repair loops, small diffs, test trust, and how to get CI back to green without trashing the codebase.
Practical guidance on tool contracts, context engineering, evals, approvals, and telemetry.
HyDE, query rewrite, decomposition, step-back prompting, and fusion for RAG: which query transformation technique fixes which retrieval failure, and when the extra latency pays off.
How to reduce hallucinations in LLM systems with better retrieval, abstention, verification, evals, and guardrails.
Chunking, titles, metadata, parent-child structure, reranking, and corpus QA for RAG systems.
How I use a lightweight spec-driven workflow in real projects, what SDDRush automates, and where Kotef fits if you want a stronger agent layer.
LLM evals for continuous delivery: turn production failures into automated tests, grade traces with task-specific graders, and block bad releases with eval-driven gates.
Prompt design now means response formats, examples, tools, and eval loops, not incantations.
How to make BI pages support decisions through narrative, visual hierarchy, and trust.
SYNAPSE was a 2025 framework for AI agents that adapt their own success criteria via MCDM. The deterministic-control-plane idea later shipped as Bernstein.