Alex ChernyshAlex ChernyshAgentic behaviorist · Tel Aviv
WritingAssistant
Back to notes

Note

Prompt Engineering: From Phrasing to Policy

Prompt design now means response formats, examples, tools, and eval loops, not incantations.

January 29, 2026·4 min read
AgentsPrompting
On this page(10)
The prompt is not the productClear instructions still winFew-shot examples beat another paragraphStructure beats vibesTools and response formats matter as much as wordingQuery transformation is retrieval designPrompt changes evaluated like code changesA good prompt does four jobs at onceRelated readingFurther reading

Prompt engineering used to get treated like copywriting on caffeine. In practice it is closer to policy design. Define the task, the boundaries, the pattern. Check the result against reality. The magic phrase never showed up.

Why this shifted

OpenAI, Anthropic, and Google ended up at roughly the same lesson. Prompts matter, but only inside a wider system that includes tools, context, examples, and evaluation.

Prompting is one layer
Output quality usually comes from several layers working together.

What actually matters

  • the instruction layer defines role and boundaries
  • examples compress desired behavior faster than prose
  • retrieved context decides what can be said truthfully
  • response formats decide what can be checked
  • evals decide whether the prompt change was worth it

The prompt is not the product

In a modern system the prompt sits among several layers. System instruction, user task framing, retrieved context, examples, tool configuration, output schema, evals and downstream checks.

That is why prompt work feels less like wordsmithing now. It is closer to operating a multi-layer interface contract.

Prompt as phrasing

The team tweaks wording until the answer sounds better.

  • style dominates the discussion
  • examples are treated as optional
  • output shape is vague
  • regressions are discovered late

Prompt as policy

The prompt becomes one governed layer in a larger system.

  • instructions are narrow and explicit
  • examples show the target behavior
  • schemas keep outputs machine-checkable
  • evals decide whether the change helped

Clear instructions still win

The most durable advice in this field is also the least exciting. Make the instructions clear and specific.

The model should know its role, the task, the constraints, the shape of the output, and what to do when the task is underspecified. Clarity outlives cleverness.

Few-shot examples beat another paragraph

Strong examples compress the desired behaviour into something the model can see. They show the format, the brevity level, the refusal style, the citation posture, the style boundaries.

If the examples are good, some of the instruction prose can shrink.

Structure beats vibes

Anthropic's prompt-engineering docs keep recommending the same boring set of techniques. Clarity. Examples. Explicit structure. Role prompting. Thinking. Prompt chaining where it actually helps.

I prefer prompts that keep each layer visible.

<role>You are a grounded assistant for production AI operations.</role>
<constraints>
- Use only retrieved context for factual claims.
- If support is missing, say so directly.
</constraints>
<context>[retrieved evidence]</context>
<task>[user question]</task>
<output_format>[exact shape]</output_format>

The model does not become perfect. The failure does become easier to diagnose.

Tools and response formats matter as much as wording

A lot of the prompt's quality is decided outside the sentence layer.

In real systems the difference comes from which tools are available, how narrowly their contracts are defined, whether the output schema is strict enough to validate against, whether the model is allowed to abstain.

That is why prompt conversations spill over into product requirements and interface design. An elegant prompt sitting on loose tools and a vague output shape behaves loosely.

Query transformation is retrieval design

Every retrieval or routing improvement gets called a "prompt breakthrough" sooner or later. Usually a category mistake.

Rewrite, decomposition, step-back prompting, and similar techniques can help retrieval. They are search-control moves with a latency cost. Evaluate them as such. Keep them when they earn their place. Drop the mythology.

Prompt changes evaluated like code changes

OpenAI's current eval guidance lands at one consequence. If the output matters, prompt changes belong in the same measurement culture as code changes.

A material prompt change should answer four questions. What behaviour is it supposed to improve. Which eval set should move. What new failure mode could appear. What does "worse" look like now.

If none of those are measurable, the team is arguing about taste.

A good prompt does four jobs at once

It narrows the task. It narrows the output shape. It narrows the model's freedom under uncertainty. It leaves enough room for the useful part of the work.

Skip one and the system drifts. Skip two and you might as well have left the field blank.

Related reading

Related reading

  • Which query transformation techniques actually help RAG?
  • How to run LLM evals in production
References

Further reading

  • Anthropic prompt engineering overview
  • OpenAI: Evaluation best practices
  • OpenAI: Agents guide
  • Google Gemini prompting strategies

✓ Reading complete

Alex ChernyshAlex ChernyshApplied AI Systems & Platform Engineer

More on Agents

Part of the public notes on grounded AI systems, retrieval, evals, and shipping under real constraints.

  • →I Ran 12 AI Agents for 47 Hours. Here's What Survived.Mar 29, 2026·7 min read
  • →LLM Product Safety Without TheaterMar 9, 2026·5 min read
  • →Building Agentic AI Systems That Hold UpMar 2, 2026·5 min read
On this page
  • 01The prompt is not the product1 min
  • 02Clear instructions still win
  • 03Few-shot examples beat another paragraph
  • 04Structure beats vibes
  • 05Tools and response formats matter as much as wording
  • 06Query transformation is retrieval design
  • 07Prompt changes evaluated like code changes
  • 08A good prompt does four jobs at once
  • 09Related reading
  • 10Further reading