Alex ChernyshAlex ChernyshAgentic behaviorist · Tel Aviv
WritingAssistant
Back to notes

Note

Which Query Transformation Techniques Actually Help RAG?

Query rewrite, decomposition, step-back prompting, HyDE, fusion, and when each one is worth the extra latency.

February 24, 2026·6 min read
RAGRetrievalPrompting
On this page(12)
Query transformation is a family, not a techniqueRewrite when the query is the problemDecomposition for multi-fact answersStep-back for concept-level retrievalHyDE is a retrieval trickFusion combines weak views into a stronger setMeasure retrieval gain per unit of latencyMost systems should use fewer techniquesA starting matrixWhat I would do firstRelated readingFurther reading

Query transformation helps when it fixes a specific retrieval failure. It turns into expensive theatre the moment it gets added because the architecture diagram looked lonely.

Default stance

Do not add another transformation step until you can name the failure mode it fixes and the latency you are willing to pay for it.

Targeted transformation

The query is reshaped to solve a known retrieval problem.

  • better recall on underspecified questions
  • better routing to the right corpus slice
  • measurable gain in top-k quality

Transformation by habit

The system adds more steps because more steps look advanced.

  • latency goes up
  • failure analysis gets murkier
  • the retriever still misses for the old reasons

Decision table

  • use rewrite when the user query is vague, noisy, or elliptical
  • use decomposition when one question really contains several retrievable sub-questions
  • use step-back when a direct query is too narrow to pull the governing concept
  • use HyDE when lexical phrasing is weak but a hypothetical answer can anchor semantic retrieval
  • use fusion only when multiple retrieval views genuinely improve recall enough to justify the cost

Query transformation is a family, not a technique

People talk about query transformation like it is one pattern. It is not.

The common families do different jobs. Rewrite the query into a clearer version. Decompose one question into several smaller ones. Form a more abstract step-back question. Generate a hypothetical answer or document (HyDE). Run several retrieval variants and fuse the results.

Treating them as interchangeable means comparing methods that solve different problems. The conclusion sounds confident and is mostly noise.

Rewrite when the query is the problem

The simplest case is still common. The user asks something vague, shorthand, or context-dependent.

Examples.

  • "What changed after the last one?"
  • "Can we do that under the policy?"
  • "How long is it now?"

These are hard to retrieve against directly. A rewrite can help by restoring missing nouns, narrowing time references, or making the target object explicit.

Rewrite is the cheapest transformation in the toolbox. It is also the easiest to overuse. If the original query is already specific, a rewrite often adds latency without adding signal.

Decomposition for multi-fact answers

Useful when the user thinks they asked one question but the corpus needs several lookup moves. Compare two policies. Answer with both definition and exception paths. Compute a result from several retrieved facts.

A single retrieval pass underperforms here because each sub-question has its own evidence locus.

The catch. More retrieval passes mean more latency, more fusion logic, more ways to contaminate the final context with unrelated material. I use decomposition when the task genuinely needs several evidence pulls. I avoid it when the real issue is poor corpus preparation hiding in costume.

Step-back for concept-level retrieval

Step-back prompting first asks a broader question, then retrieves against that abstraction alongside the original query.

Useful when the direct query is too concrete and skips the concept that governs the answer. A narrow operational question may retrieve better once the system also asks a broader question about the policy principle or legal category in play.

The gain is conceptual recall. The cost is another model call and another retrieval branch. If the corpus is well structured and the original query is good, step-back does little. If the user is circling a concept they cannot quite name, it can help a lot.

HyDE is a retrieval trick

HyDE generates a hypothetical answer or document, embeds the synthetic text, and retrieves based on it.

The use case is straightforward. A user query may be too short or too awkward to anchor good semantic retrieval, while a plausible synthetic answer produces a better embedding target.

This can lift recall. It can also retrieve beautifully around the wrong idea when the hypothetical answer drifts. So HyDE belongs in the retrieval-aid bucket, not the smartness-multiplier bucket. Measure it on top-k quality, not in the abstract.

Fusion combines weak views into a stronger set

Fusion methods run several retrieval branches and merge results, often with reciprocal-rank-style logic. Attractive when different query variants surface different relevant chunks.

Less attractive when all branches mostly retrieve the same material, when the corpus is small enough that one good retrieval pass already covers it, when reranking is strong enough that fusion adds little besides cost.

Fusion can work well. It also has a habit of looking useful in architecture diagrams long before it proves useful in production.

Measure retrieval gain per unit of latency

The practical question is not "did a clever transformation run?" The practical question is closer to this.

How much top-k evidence quality did we buy per added millisecond and per new failure mode?

For each transformation worth keeping you want to know the top-k recall before and after, the reranker lift before and after, the latency added, the failure classes improved, the failure classes introduced.

Without that, you ship a query pipeline that is verbose, slow, and only spiritually better.

Most systems should use fewer techniques

If the corpus is well prepared and the query is decent, the default stack stays small. Direct retrieval. Optional rewrite for low-quality user phrasing. Rerank. Answer.

Only add more when a specific class of misses persists. The order I trust.

  1. improve corpus quality
  2. improve direct retrieval
  3. add reranking
  4. then test transformations selectively

Less exciting than a diagram with five branches. Easier to debug.

A starting matrix

If I had to choose quickly.

SymptomBetter first move
query is vague or ellipticalrewrite
one answer depends on several distinct factsdecomposition
direct question misses the governing conceptstep-back
semantic recall is weak on short or awkward queriesHyDE
several query variants each surface useful evidencefusion
retrieval misses because the corpus is messyfix ingestion first

That last row carries most of the weight. It deserves to.

What I would do first

I would not build all five techniques and pray.

I would.

  1. collect real retrieval misses
  2. label them by failure mode
  3. test one transformation per failure class
  4. keep only the transformations that improve evidence quality enough to justify the delay

The system does not need a richer theory of prompts. It needs a better reason for every extra step.

Related reading

Related reading

  • Most RAG failures start in the documents
  • Prompt engineering: from phrasing to policy
References

Further reading

  • Precise Zero-Shot Dense Retrieval without Relevance Labels (HyDE)
  • Step-Back Prompting: Evoking Reasoning via Abstraction
  • Lost in the Middle: How Language Models Use Long Contexts

✓ Reading complete

Alex ChernyshAlex ChernyshApplied AI Systems & Platform Engineer

More on RAG

Part of the public notes on grounded AI systems, retrieval, evals, and shipping under real constraints.

  • →How to Build Legal Answering Systems That Can Be TrustedMar 10, 2026·22 min read
  • →Most RAG Failures Start in the DocumentsFeb 12, 2026·5 min read
  • →Building Agentic AI Systems That Hold UpMar 2, 2026·5 min read
On this page
  • 01Query transformation is a family, not a technique
  • 02Rewrite when the query is the problem
  • 03Decomposition for multi-fact answers
  • 04Step-back for concept-level retrieval
  • 05HyDE is a retrieval trick
  • 06Fusion combines weak views into a stronger set
  • 07Measure retrieval gain per unit of latency
  • 08Most systems should use fewer techniques
  • 09A starting matrix
  • 10What I would do first
  • 11Related reading
  • 12Further reading