Skip to main content

Command Palette

Search for a command to run...

Coding With AI Is a New Skill – And We’re All Wasting Time on the Wrong Decisions

Updated
9 min read
Coding With AI Is a New Skill – And We’re All Wasting Time on the Wrong Decisions

If you’re using tools like Cursor, VS Code + AI extensions, or Zed with agents, you’ve probably felt this:

“I spend way too much time picking models and modes instead of actually shipping code.”

You get:

  • Multiple models (Composer, Claude, GPT-5, Grok, Qwen…)

  • “Thinking” vs non-thinking variants

  • Toggles like Max mode, cloud vs local, Ask vs Agent vs Plan

  • And a UI that changes every few weeks

On paper this is “control”. In practice it’s cognitive overhead.

This post is a first-principles way to think about coding with AI so you can stop micro-optimizing knobs and get maximum quality per buck (and per minute).


1. Coding with AI is its own skill

It’s easy to think of AI as a smarter autocomplete. It isn’t. To use coding assistants effectively you need to learn a new skillset:

  • Framing tasks so a model can actually solve them

  • Choosing the right level of autonomy (advise vs edit vs refactor whole project)

  • Knowing when to iterate and when to start over

  • Optimizing for your time, not just tokens

What makes this hard is that most tools expose low-level levers (model, mode, “thinking”, Max, cloud) instead of giving you a simple “do-what-I-mean” abstraction.

So let’s build your own abstraction.


2. Forget the UI: there are only three real decisions

Ignore all the product marketing and menus for a second. Under the hood, almost everything reduces to three questions:

  1. How big / smart a model do I want?
    (Small/cheap vs large/expensive; “thinking” vs “fast”.)

  2. How much of my repo / context does it need to see?
    (Normal vs “Max mode” / huge context.)

  3. How much autonomy do I give it?
    (Explain only? Edit a few files? Refactor the world? Run in the cloud without me watching?)

Everything else (Ask vs Agent vs Plan, local vs cloud, etc.) is just preconfigured answers to those three.

Once you internalize that, the UX stops feeling like chaos. You’re just tuning:

Model size × Context budget × Autonomy


3. The cost/quality tradeoff: stop fighting physics

A lot of people (maybe you) are doing this pattern:

  1. Try with a cheaper / smaller model.

  2. Get a mediocre answer.

  3. Switch to a top model (Claude Sonnet, GPT-5, etc.).

  4. Finally get something usable.

On paper this “saves money”. In reality, it often doesn’t.

Let’s reason it out.

  • Let C_small be the cost of a small/mid model call.

  • Let C_big be the cost of a top model call. (Say ~10× C_small for intuition.)

  • Let p be the fraction of tasks where you end up using the big model anyway.

Two strategies:

  • Strategy A – Always use big model
    Cost ≈ C_big

  • Strategy B – Try small, escalate if needed
    Cost ≈ C_small + p * C_big

If C_small = 1 and C_big = 10:

  • Always-big → cost = 10

  • Try-small → cost = 1 + 10p

You only beat always-big if:

1 + 10p < 10p < 0.9

So if you escalate to a big model more than ~90% of the time, the “cheap first” step is pure overhead.

In practice:

  • For hard, ambiguous tasks (understanding systems, ugly bugs, architecture), most people do escalate. So going straight to the best model is actually rational.

  • For small, mechanical tasks, you rarely escalate, so a cheaper model really can save money.

The trick isn’t “optimize every call”. The trick is:

Know which tasks almost always need a top model, and stop pretending otherwise.


4. Classify tasks, not prompts

Instead of choosing a model for each prompt, classify the type of work first.

Bucket 1 – Hard / ambiguous / high-value work

Examples:

  • Understanding an unfamiliar subsystem end-to-end

  • Designing a new feature that cuts across backend, frontend, DB

  • Debugging a weird bug that involves async jobs, queues, and timeouts

  • Reasoning about tradeoffs (performance, correctness, architecture)

Characteristics:

  • You’d personally need a while to think

  • There isn’t a single obvious local fix

  • You expect multiple iterations and back-and-forth

Use case: You want a smart junior/staff engineer thinking with you.

Strategy:
Go straight to your best model (Claude Sonnet, GPT-5, etc.), often with “thinking” / deeper reasoning turned on.

Don’t bother with cheaper models here. Your own time is more expensive than the token difference.


Bucket 2 – Local / routine / mechanical work

Examples:

  • “Add tests for this function and adapt fixtures.”

  • “Extract this bit of logic into a helper and update this file + its tests.”

  • “Generate a typed API client from this OpenAPI schema.”

  • “Add logging around this one endpoint.”

Characteristics:

  • Scope is limited to 1–3 files

  • Expected behavior is clear

  • You can quickly tell if it’s wrong just by reading the diff

Use case: You want a smart autocomplete that can handle slightly non-trivial patterns.

Strategy:
Use a cheaper, mid-tier coding model (Composer, Qwen, Grok-code, etc.) as default.

  • If it nails it on the first or second try → great, you saved money.

  • If it struggles repeatedly for this exact type of task, upgrade that task type to Bucket 1 in your mental model and use a top model next time.


Bucket 3 – Bulk / repo-wide but well-specified work

Examples:

  • Updating license headers everywhere

  • Renaming a logging library and updating all call sites

  • Performing a very specific, repetitive change across many files

Characteristics:

  • High file count, but low conceptual complexity

  • You can write a very explicit spec: exactly what changes, where

Here you have two good options:

  1. Use AI once to generate a codemod, then run that codemod with normal tooling (no AI for the bulk).

  2. Use a “Max mode” or background agent, but only when the spec is crystal clear and the change is mechanical.

Background/cloud agents are terrible at vague, creative tasks. They are good at “update this pattern everywhere exactly like this”.


5. Model size, “thinking” and Max mode in plain language

Now let’s map the knobs to this framework.

Model size

  • Big / top models (Sonnet, GPT-5, etc.):
    Better reasoning, more robust across weird prompts, more expensive.

  • Mid / small models (Composer-lite, Qwen, etc.):
    Fine for localized code, tests, boilerplate, cheaper.

Mapping:

  • Bucket 1 → Always big

  • Bucket 2 → Default to mid, escalate only if it clearly fails

  • Bucket 3 → Depends on whether the transformation is semantic or mechanical:

    • Mechanical → mid/small or no AI at all

    • Semantic → big model once to define the pattern, then script/codemod


“Thinking” models

A “thinking” mode basically means:

  • The model spends more internal compute to reason before answering

  • It might run extra internal steps, plan, and then answer

  • It tends to be slower and more expensive, but handles harder problems

Mapping:

  • Bucket 1:

    • “Thinking” ON is usually worth it for tough bugs and design questions.
  • Bucket 2:

    • “Thinking” often unnecessary – you want precision and speed.
  • Bucket 3:

    • If change is mechanical, “thinking” is overkill. If change is subtle/semantic, have the big model reason once about the pattern, then let code/scripts do the bulk.

“Max mode” / huge context

“Max” is essentially:

  • Give the model way more context (more of your repo + history)

  • Allow many more tool calls / steps per “task”

  • Accept higher cost and latency

This is not “make it smarter”; it’s “let it see and touch more at once.”

Mapping:

  • Bucket 1:

    • Sometimes useful if understanding a huge codebase, but dangerous if you give it too much autonomy.
  • Bucket 2:

    • Mostly unnecessary. Your scope is intentionally small.
  • Bucket 3:

    • Useful if you’re doing massive transformations and want the AI to navigate many files, as long as the spec is strict.

As a default:

  • Max OFF unless:

    • The task is explicitly repo-wide or doc-heavy, and

    • You already know what you want it to do.


6. Autonomy: Ask vs Agent vs Plan vs “do everything in the cloud”

The other dimension is: how much power do you give it?

Low autonomy – “Ask mode”

  • The AI explains, reviews, suggests.

  • You apply changes manually.

This is ideal for Bucket 1 (understanding, design, root-cause analysis). You want conversation, not automated edits.

Medium autonomy – “Agent mode on a leash”

  • The AI can edit files, but you keep tasks small.

  • You review every diff like a PR.

This fits Bucket 2: local changes where the impact is limited and obvious.

You can even tell an Agent: “Do not edit files yet, only propose a patch” – turning it into Ask mode with extra repo context.

High autonomy – “Cloud agent, big plan, lots of freedom”

  • The AI runs in a separate environment, possibly with Max context.

  • It can edit many files and run tools, often without interactive feedback.

This only belongs in Bucket 3 – mechanical repo-wide work with a tight spec.

Using high-autonomy modes for Bucket 1 or 2 (“figure out this fuzzy feature idea and implement it end-to-end”) is almost guaranteed to produce garbage. You’re skipping the human-in-the-loop phase where requirements actually get nailed down.


7. A practical playbook you can actually use

You don’t need to think from scratch every time. Here’s a simple policy you can adopt:

Step 1 – Classify the task

Before typing the prompt, ask:

“Is this hard/ambiguous, local/mechanical, or bulk/mechanical?”

Then:

  • Hard / ambiguous → Bucket 1

  • Local / mechanical → Bucket 2

  • Bulk / mechanical → Bucket 3

Step 2 – Apply the corresponding defaults

Bucket 1

  • Model: Best available (Sonnet/GPT-5)

  • Thinking: ON

  • Max: OFF by default, ON only when you really need repo-wide visibility to understand something

  • Autonomy: Ask mode; Agent only for small, clearly scoped edits you’ve agreed on

Bucket 2

  • Model: Mid-tier coding model (Composer/Qwen/etc.)

  • Thinking: OFF (or minimal)

  • Max: OFF

  • Autonomy: Agent allowed, but limited to small scopes; you review diffs

If it fails more than once for this sort of task, reclassify this specific pattern as Bucket 1.

Bucket 3

  • Model:

    • Use top model once to help you define a precise spec or codemod script

    • Use scripts/tools for the actual transformation

  • Thinking: ON only during spec design, OFF for mechanical execution

  • Max: ON only if you actually need massive context or multi-file reasoning

  • Autonomy: Cloud/background agent only for strictly specified, mechanical jobs


8. Optimize for your time, not just token spend

The final mental shift is this:

  • Tokens are cheap compared to your time and frustration.

  • Chasing small token savings by trying multiple cheap models and then escalating often loses in the real world.

For interactive development:

  • Use the best model for anything that feels like “real engineering” (understanding, design, complex bugs).

  • Use cheaper models or no AI at all for tiny, obvious, mechanical things.

  • Use AI + scripts for big mechanical tasks, not AI alone.

That’s how you actually get quality per buck – and, more importantly, quality per hour of your life.