Prompt Engineering for Code: Practical Patterns That Work

Concrete, repeatable prompt patterns for working with coding agents in 2026 — persona, constraints, examples, chain-of-thought, and repo-aware grounding.

April 26, 202611 min readprompt engineering,AI coding,developer workflow,prompts,LLM patterns

MaxtDesign

Engineering

By 2026 every developer claims to "know how to prompt." Watch over their shoulder for ten minutes and the gap between productive use and frustrated tab-closing is enormous — and it has almost nothing to do with secret incantations. The patterns that separate good prompting from bad are concrete, learnable, and embarrassingly familiar: they look like writing a decent ticket. If you can write a clear spec for a junior engineer, you can prompt a coding agent. If you can't, no clever opener is going to save you.

The "prompt engineering" framing has been oversold. There is no secret handshake. What moves output quality is what moves any technical brief: precision about what you want, what you don't want, and what's already in the room. The patterns below have survived four model generations and will survive four more, because they're not really about LLMs. They're about communication.

The five durable patterns

Models change every quarter; these five patterns have survived every generation since GPT-3.5 and are still load-bearing in Claude 4.x and the current crop of frontier models. Anthropic's own prompt engineering overview orders them roughly the same way, and that ordering matters: start with clarity and structure, only reach for chain-of-thought and examples when those don't get you there.

Persona prompting — when it helps, when it's noise

"You are a senior staff engineer with 20 years of experience" is the prompting equivalent of homeopathy. It does almost nothing on modern models for code tasks, because the model already knows what good code looks like. The training data is drenched in senior engineers; flattering the model with the title doesn't unlock a hidden tier. What does help is when persona narrows a real ambiguity: "respond as a reviewer who cares about API stability, not performance" actually changes which suggestions surface, because now the model has an axis to choose along when those concerns collide.

Before:

You are a 10x rockstar TypeScript ninja. Refactor this function.

After:

Refactor this function. Optimize for readability over cleverness.
The audience is a developer joining the team next week who hasn't
seen the codebase. Prefer named helpers over inline expressions.

When this fails: persona becomes noise the moment it's a vibe rather than a tradeoff. "Act as a 10x engineer" gives the model nothing to optimize against; "act as a reviewer focused on backwards compatibility, not performance" tells it which axis to push on when those collide. If you can't name the tradeoff the persona is supposed to resolve, drop the persona entirely.

Signals you're using it well: the persona corresponds to a real role on your team (security reviewer, performance reviewer, junior onboarder), and removing it would change the answer. If swapping "staff engineer" for "junior engineer" wouldn't change the diff, the persona is decoration.

Constraint stacking (must / must not / prefer)

Constraints are where prompts earn their keep. The trick is to be explicit about which constraints are hard and which are soft. Models are good at following hard rules and good at making tradeoffs against soft preferences — but only if you tell them which is which. The three-bucket structure (Must / Must not / Prefer) is mechanical on purpose: it forces you to decide before the model has to guess. A prompt without that ranking is a prompt where every constraint has equal weight, which means none of them do.

Before:

Add pagination to /api/posts. Make it production-ready and don't
break anything.

After:

Add pagination to the /api/posts route.

Must:
- Keep existing query params working (no breaking changes).
- Use cursor-based pagination, not offset.
- Return Link headers per RFC 5988.

Must not:
- Touch the response body shape for existing clients.
- Add new dependencies.

Prefer:
- Reuse the cursor helper in lib/pagination.ts if it fits.
- Keep the diff under ~80 lines.

When this fails: constraints contradict each other and you haven't ranked them. "Must be backwards compatible" + "must use the new schema" forces the model to pick a winner silently, and it usually picks wrong. Either mark one as a hard must and the other as a prefer, or split into two changes.

Signals you're using it well: the diff comes back and you can point at each line and say which constraint produced it. If the agent had to break a soft constraint to satisfy a hard one, it tells you so in the summary instead of hiding it.

Examples (one-shot, few-shot) — when format matters

For anything where output shape matters — JSON schemas, commit messages, test names, comment style — one well-chosen example beats three paragraphs of description. The Anthropic cookbook is full of patterns where a single in-domain example collapses error rates by an order of magnitude. For freeform reasoning, examples can over-anchor the model; for structured output, they're nearly always worth the tokens.

Before:

Write a commit message for this diff. Use conventional commits and
keep it short.

After:

Write a commit message for this diff. Match this style exactly:

  fix(auth): clear stale session cookie on 401

  The middleware kept stale session cookies alive after a 401,
  which caused a redirect loop on /me. Clearing the cookie on
  the 401 path resolves the loop without affecting valid sessions.

  Refs: INC-2231

Use the same scope vocabulary (auth, billing, ui, db, infra).
Subject under 60 chars. Body wraps at 72.

When this fails: the example is too narrow and the model copies its content instead of its shape. If you give one example of a fix commit and ask for a feature commit, half the time you'll get something that calls itself a fix. Two or three examples covering different cases beats one perfect one.

Signals you're using it well: output matches your house style on the first try without "no, like this" follow-ups, and you can drop the example block into a reusable prompt template without rewriting it for each task.

Chain-of-thought — when it earns the latency

"Think step by step" used to be a magic phrase. On modern reasoning-tuned models it's mostly redundant — and on agentic tools like Claude Code and Cursor's agent mode, the model is already planning. Where chain-of-thought still pays off is when the task involves multiple constraints that interact (concurrency + ordering + error handling), or when you want the model to expose its reasoning so you can catch flawed assumptions before it writes 200 lines of code. The point isn't to make the model smarter; it's to slow it down enough for you to spot the assumption that would have become a bug. Treat the planning step as a cheap rehearsal — the failures that show up there cost minutes; the same failures discovered after a 200-line diff cost an afternoon.

Before:

Add a retry with backoff to the webhook dispatcher.

After:

Before writing code, walk me through:
1. What ordering guarantees the existing dispatcher provides.
2. How retries interact with our idempotency keys.
3. Where backoff state lives (in-memory vs. persisted) and why.
4. Failure modes you want me to sign off on.

Then propose the diff. Stop after step 4 for review.

When this fails: on simple tasks, chain-of-thought just adds tokens and latency for no gain — and worse, it can talk the model into second-guessing a clean answer. If the task fits in one obvious diff, skip it.

Signals you're using it well: the plan exposes at least one assumption you would have missed, or the model surfaces a failure mode (race, partial write, ordering hazard) before any code is generated. If the plan reads like a restatement of the prompt, it didn't help.

Repository-aware prompting — referencing real files, real symbols, real conventions

The single highest-leverage move once you're working in a repo: stop describing the codebase and start pointing at it. "Match the style of lib/db/queries.ts" beats "use idiomatic TypeScript." "Extend the UserRepo interface, don't add a new one" beats "follow our conventions." Coding agents have file access — use it. Generic style guidance is the model's training-data average; pointing at a real file is your team's actual taste. The two are rarely the same, and the delta is usually where reviewers spend their time. If you're picking a tool partly on how well it does this, our comparison of Copilot, Cursor, Claude Code, Windsurf, Replit and Bolt grades each one on repo grounding.

Before:

Add a service to send transactional emails. Follow our conventions.

After:

Add a TransactionalEmailService.

- Mirror the shape of lib/services/notification-service.ts (constructor
  injection, no static methods, async methods return Result<T, E>).
- Register it in lib/services/registry.ts the same way NotificationService
  is registered.
- Use the Resend client already configured in lib/email/resend.ts.
- Add a fake at __tests__/fakes/email.ts following the pattern in
  __tests__/fakes/notifications.ts.

Read those files first. Don't invent a new pattern.

When this fails: the file you pointed at is itself the inconsistent one — the agent will faithfully copy a pattern your team has been trying to deprecate. Point at the canonical example, not the most recent one. When in doubt, name two files and tell the model which is the source of truth.

Signals you're using it well: the diff fits cleanly into a grep across the repo (the new symbol shows up alongside its siblings, the imports look like every other file in that directory), and code review takes minutes because there's nothing surprising in the shape — only the behaviour.

Patterns specific to coding agents (Cursor, Claude Code, Windsurf)

When the model can run shells, edit files, and call tools, the prompting game changes. You're not asking for an answer — you're scoping a session.

Sub-tasking ("break this into commits"). Ask the agent to plan the change as a sequence of atomic commits before touching code. You get a review surface; the agent gets a structure that prevents drift. Example: "Split this auth refactor into commits — one for the interface change, one for each consumer migration, one for deleting the old code. Don't start commit N until N-1 typechecks."
Self-review ("now critique your own diff"). After the agent completes a change, ask it to re-read its own diff against the original constraints. It catches at least a third of the regressions a human reviewer would have flagged, for free. Example: "Re-read the diff you just produced. For each Must in the original prompt, quote the exact lines that satisfy it. Flag anything you can't quote."
Test-first ("write the test, then make it pass"). Forces the agent to commit to behavior before implementation. Especially powerful for bug fixes — write the failing test that reproduces the bug, then fix it. Example: "Reproduce INC-2231 as a failing Vitest case in auth.test.ts. Run it and paste the failure output. Only then propose the fix."
Plan-mode workflows. Cursor and Claude Code both expose explicit plan-then-execute modes. Use them for any change that touches more than two files. The plan is cheaper than a wrong implementation. Example: in Claude Code, hit Shift+Tab into plan mode and prompt "lay out the file-by-file change list with rationale; do not edit yet" — then approve or revise before unlocking edits.
Tool / MCP grounding. If you've wired up MCP servers (Postgres, GitHub, your own internal docs), name them in the prompt. "Use the postgres MCP to confirm the column type before writing the migration" turns a guess into a verified change. Another example: "Use the GitHub MCP to read the last three PRs that touched billing/ and match their commit-message style." For a deeper take on stitching these into longer workflows, see our agentic AI workflows orchestration guide.

The anti-patterns

"Be a 10x dev" with no constraints. The model has nothing to optimize for. You'll get plausible-looking code that ignores your conventions, because you didn't mention any. In practice: you ask for a "production-ready rate limiter" and get a Redis-backed token bucket, when your codebase already has an in-memory limiter and you didn't want a new dependency.
Mega-prompts mixing concerns. Refactor + add tests + write docs + fix the bug, all in one turn. The model will half-do all four. Split it. In practice: the bug gets fixed but the new tests don't actually exercise the regression, because the model spent its budget on the doc rewrite you also asked for.
Forgetting the model can't see your terminal or files unless you point it. "Why is this failing?" is unanswerable without the failure. Paste the error, name the file, or let the agent run the command itself. In practice: you get five paragraphs of speculation about possible causes when one paste of the stack trace would have pinpointed it.
Over-specification preempting the model's reasoning. If you already wrote the algorithm in pseudo-code, you've turned a smart collaborator into a transpiler. Specify the contract; let the model pick the implementation. Then review. In practice: you dictate a nested-loop solution and the model faithfully implements it, when a one-line Map lookup was sitting right there.

A short prompt workshop

One real example, refactored. Same task, two prompts — the difference is entirely discipline.

Bad:

hey can you add caching to our API? it's slow. make it fast.
use redis or whatever. thanks!

Good:

Add response caching to the GET /api/products list endpoint
in app/api/products/route.ts.

Context:
- The query hits Postgres via lib/db/products.ts:listProducts().
- p95 latency is ~480ms; target is &lt;80ms for cache hits.
- We already have a Redis client in lib/cache/redis.ts (see existing
  use in lib/cache/sessions.ts for the pattern).

Must:
- Cache by serialized query params + user role.
- TTL: 60 seconds.
- Invalidate on POST/PUT/DELETE to /api/products.
- Add a unit test in __tests__/api/products.cache.test.ts covering
  hit, miss, and invalidation.

Must not:
- Cache responses for authenticated admin users (they expect fresh data).
- Add new dependencies.

Plan first, then implement. Stop after the plan for me to review.

The good prompt isn't longer because it's flowery — it's longer because it has done the thinking the bad prompt was hoping the model would do telepathically. Notice there's no persona, no "act as," no chain-of-thought incantation. Just a spec.

Building a prompt library on your team

Once you've found three or four prompts that consistently produce good output for recurring tasks — migration scaffolding, PR descriptions, incident postmortems, test stubs — stop retyping them. A team prompt library is just version-controlled institutional memory, and treating it like code is the cheapest productivity win available to a dev org in 2026.

Where to keep them: in the repo, alongside the code they prompt against. A prompts/ directory at the root, with one Markdown file per prompt, beats Notion or a shared doc every time — because it's diff-able, reviewable, and travels with the codebase it knows about. Notion is fine for high-level playbooks; it's wrong for prompts that reference real file paths and symbols, because those drift the moment someone renames a directory.

How to version them: normal git. Tag the prompt with the model and date it was last validated against (tested: claude-4.5-sonnet, 2026-03), because behavior shifts across model versions and a prompt that worked in November may not work the same way in April. Who curates: the same person who curates your team's testing patterns or component library — usually a tech lead with strong taste. Open contributions, gated review. When to retire: the moment a prompt produces output you have to fix more than half the time. A bad prompt in a shared library is worse than no prompt — it gives juniors false confidence and wastes everyone's tokens.

Prompt engineering is not magic. It's spec-writing — the discipline you'd apply to a ticket destined for a competent contractor in a different timezone, who won't Slack you for clarification and will bill for the round trip if you were vague. State the constraints, link the files, name what must not change. The compiler tolerated a decade of sloppy intent; coding agents won't, and that's the gift. If you want help picking the tool that best rewards this discipline, see our decision framework for AI code assistants, or — to find out where your team actually sits on the AI maturity curve — the AI IQ Diagnostic will tell you in about 25 minutes.

Need help putting this into practice?

MaxtDesign builds the AI-powered web stacks the articles describe — from agentic workflows to performance-first WordPress + WooCommerce. Talk to us about your project.

Start a conversation More on AI Tools

Prompt Engineering for Code: Practical Patterns That Work

The five durable patterns

Persona prompting — when it helps, when it's noise

Constraint stacking (must / must not / prefer)

Examples (one-shot, few-shot) — when format matters

Chain-of-thought — when it earns the latency

Repository-aware prompting — referencing real files, real symbols, real conventions

Patterns specific to coding agents (Cursor, Claude Code, Windsurf)

The anti-patterns

A short prompt workshop

Building a prompt library on your team

Related reading

Need help putting this into practice?