Note

Prompt Engineering: From Phrasing to Policy

Prompt design now means response formats, examples, tools, and eval loops, not incantations.

January 29, 20264 min read

On this page(10)

Prompt engineering used to get treated like copywriting on caffeine. In practice it is closer to policy design. Define the task, the boundaries, the pattern. Check the result against reality. The magic phrase never showed up.

Prompting is one layer

Output quality usually comes from several layers working together.

The prompt is not the product

In a modern system the prompt sits among several layers. System instruction, user task framing, retrieved context, examples, tool configuration, output schema, evals and downstream checks.

That is why prompt work feels less like wordsmithing now. It is closer to operating a multi-layer interface contract.

Prompt as phrasing

The team tweaks wording until the answer sounds better.

style dominates the discussion
examples are treated as optional
output shape is vague
regressions are discovered late

Prompt as policy

The prompt becomes one governed layer in a larger system.

instructions are narrow and explicit
examples show the target behavior
schemas keep outputs machine-checkable
evals decide whether the change helped

Clear instructions still win

The most durable advice in this field is also the least exciting. Make the instructions clear and specific.

The model should know its role, the task, the constraints, the shape of the output, and what to do when the task is underspecified. Clarity outlives cleverness.

Few-shot examples beat another paragraph

Strong examples compress the desired behaviour into something the model can see. They show the format, the brevity level, the refusal style, the citation posture, the style boundaries.

If the examples are good, some of the instruction prose can shrink.

Structure beats vibes

Anthropic's prompt-engineering docs keep recommending the same boring set of techniques. Clarity. Examples. Explicit structure. Role prompting. Thinking. Prompt chaining where it actually helps.

I prefer prompts that keep each layer visible.

<role>You are a grounded assistant for production AI operations.</role>
<constraints>
- Use only retrieved context for factual claims.
- If support is missing, say so directly.
</constraints>
<context>[retrieved evidence]</context>
<task>[user question]</task>
<output_format>[exact shape]</output_format>

The model does not become perfect. The failure does become easier to diagnose.

Tools and response formats matter as much as wording

A lot of the prompt's quality is decided outside the sentence layer.

In real systems the difference comes from which tools are available, how narrowly their contracts are defined, whether the output schema is strict enough to validate against, whether the model is allowed to abstain.

That is why prompt conversations spill over into product requirements and interface design. An elegant prompt sitting on loose tools and a vague output shape behaves loosely.

Query transformation is retrieval design

Every retrieval or routing improvement gets called a "prompt breakthrough" sooner or later. Usually a category mistake.

Rewrite, decomposition, step-back prompting, and similar techniques can help retrieval. They are search-control moves with a latency cost. Evaluate them as such. Keep them when they earn their place. Drop the mythology.

Prompt changes evaluated like code changes

OpenAI's current eval guidance lands at one consequence. If the output matters, prompt changes belong in the same measurement culture as code changes.

A material prompt change should answer four questions. What behaviour is it supposed to improve. Which eval set should move. What new failure mode could appear. What does "worse" look like now.

If none of those are measurable, the team is arguing about taste.

A good prompt does four jobs at once

It narrows the task. It narrows the output shape. It narrows the model's freedom under uncertainty. It leaves enough room for the useful part of the work.

Skip one and the system drifts. Skip two and you might as well have left the field blank.