Coding With AI Is a New Skill – And We’re All Wasting Time on the Wrong Decisions

If you’re using tools like Cursor, VS Code + AI extensions, or Zed with agents, you’ve probably felt this:
“I spend way too much time picking models and modes instead of actually shipping code.”
You get:
Multiple models (Composer, Claude, GPT-5, Grok, Qwen…)
“Thinking” vs non-thinking variants
Toggles like Max mode, cloud vs local, Ask vs Agent vs Plan
And a UI that changes every few weeks
On paper this is “control”. In practice it’s cognitive overhead.
This post is a first-principles way to think about coding with AI so you can stop micro-optimizing knobs and get maximum quality per buck (and per minute).
1. Coding with AI is its own skill
It’s easy to think of AI as a smarter autocomplete. It isn’t. To use coding assistants effectively you need to learn a new skillset:
Framing tasks so a model can actually solve them
Choosing the right level of autonomy (advise vs edit vs refactor whole project)
Knowing when to iterate and when to start over
Optimizing for your time, not just tokens
What makes this hard is that most tools expose low-level levers (model, mode, “thinking”, Max, cloud) instead of giving you a simple “do-what-I-mean” abstraction.
So let’s build your own abstraction.
2. Forget the UI: there are only three real decisions
Ignore all the product marketing and menus for a second. Under the hood, almost everything reduces to three questions:
How big / smart a model do I want?
(Small/cheap vs large/expensive; “thinking” vs “fast”.)How much of my repo / context does it need to see?
(Normal vs “Max mode” / huge context.)How much autonomy do I give it?
(Explain only? Edit a few files? Refactor the world? Run in the cloud without me watching?)
Everything else (Ask vs Agent vs Plan, local vs cloud, etc.) is just preconfigured answers to those three.
Once you internalize that, the UX stops feeling like chaos. You’re just tuning:
Model size × Context budget × Autonomy
3. The cost/quality tradeoff: stop fighting physics
A lot of people (maybe you) are doing this pattern:
Try with a cheaper / smaller model.
Get a mediocre answer.
Switch to a top model (Claude Sonnet, GPT-5, etc.).
Finally get something usable.
On paper this “saves money”. In reality, it often doesn’t.
Let’s reason it out.
Let
C_smallbe the cost of a small/mid model call.Let
C_bigbe the cost of a top model call. (Say ~10×C_smallfor intuition.)Let
pbe the fraction of tasks where you end up using the big model anyway.
Two strategies:
Strategy A – Always use big model
Cost ≈C_bigStrategy B – Try small, escalate if needed
Cost ≈C_small + p * C_big
If C_small = 1 and C_big = 10:
Always-big → cost = 10
Try-small → cost =
1 + 10p
You only beat always-big if:
1 + 10p < 10 → p < 0.9
So if you escalate to a big model more than ~90% of the time, the “cheap first” step is pure overhead.
In practice:
For hard, ambiguous tasks (understanding systems, ugly bugs, architecture), most people do escalate. So going straight to the best model is actually rational.
For small, mechanical tasks, you rarely escalate, so a cheaper model really can save money.
The trick isn’t “optimize every call”. The trick is:
Know which tasks almost always need a top model, and stop pretending otherwise.
4. Classify tasks, not prompts
Instead of choosing a model for each prompt, classify the type of work first.
Bucket 1 – Hard / ambiguous / high-value work
Examples:
Understanding an unfamiliar subsystem end-to-end
Designing a new feature that cuts across backend, frontend, DB
Debugging a weird bug that involves async jobs, queues, and timeouts
Reasoning about tradeoffs (performance, correctness, architecture)
Characteristics:
You’d personally need a while to think
There isn’t a single obvious local fix
You expect multiple iterations and back-and-forth
Use case: You want a smart junior/staff engineer thinking with you.
Strategy:
Go straight to your best model (Claude Sonnet, GPT-5, etc.), often with “thinking” / deeper reasoning turned on.
Don’t bother with cheaper models here. Your own time is more expensive than the token difference.
Bucket 2 – Local / routine / mechanical work
Examples:
“Add tests for this function and adapt fixtures.”
“Extract this bit of logic into a helper and update this file + its tests.”
“Generate a typed API client from this OpenAPI schema.”
“Add logging around this one endpoint.”
Characteristics:
Scope is limited to 1–3 files
Expected behavior is clear
You can quickly tell if it’s wrong just by reading the diff
Use case: You want a smart autocomplete that can handle slightly non-trivial patterns.
Strategy:
Use a cheaper, mid-tier coding model (Composer, Qwen, Grok-code, etc.) as default.
If it nails it on the first or second try → great, you saved money.
If it struggles repeatedly for this exact type of task, upgrade that task type to Bucket 1 in your mental model and use a top model next time.
Bucket 3 – Bulk / repo-wide but well-specified work
Examples:
Updating license headers everywhere
Renaming a logging library and updating all call sites
Performing a very specific, repetitive change across many files
Characteristics:
High file count, but low conceptual complexity
You can write a very explicit spec: exactly what changes, where
Here you have two good options:
Use AI once to generate a codemod, then run that codemod with normal tooling (no AI for the bulk).
Use a “Max mode” or background agent, but only when the spec is crystal clear and the change is mechanical.
Background/cloud agents are terrible at vague, creative tasks. They are good at “update this pattern everywhere exactly like this”.
5. Model size, “thinking” and Max mode in plain language
Now let’s map the knobs to this framework.
Model size
Big / top models (Sonnet, GPT-5, etc.):
Better reasoning, more robust across weird prompts, more expensive.Mid / small models (Composer-lite, Qwen, etc.):
Fine for localized code, tests, boilerplate, cheaper.
Mapping:
Bucket 1 → Always big
Bucket 2 → Default to mid, escalate only if it clearly fails
Bucket 3 → Depends on whether the transformation is semantic or mechanical:
Mechanical → mid/small or no AI at all
Semantic → big model once to define the pattern, then script/codemod
“Thinking” models
A “thinking” mode basically means:
The model spends more internal compute to reason before answering
It might run extra internal steps, plan, and then answer
It tends to be slower and more expensive, but handles harder problems
Mapping:
Bucket 1:
- “Thinking” ON is usually worth it for tough bugs and design questions.
Bucket 2:
- “Thinking” often unnecessary – you want precision and speed.
Bucket 3:
- If change is mechanical, “thinking” is overkill. If change is subtle/semantic, have the big model reason once about the pattern, then let code/scripts do the bulk.
“Max mode” / huge context
“Max” is essentially:
Give the model way more context (more of your repo + history)
Allow many more tool calls / steps per “task”
Accept higher cost and latency
This is not “make it smarter”; it’s “let it see and touch more at once.”
Mapping:
Bucket 1:
- Sometimes useful if understanding a huge codebase, but dangerous if you give it too much autonomy.
Bucket 2:
- Mostly unnecessary. Your scope is intentionally small.
Bucket 3:
- Useful if you’re doing massive transformations and want the AI to navigate many files, as long as the spec is strict.
As a default:
Max OFF unless:
The task is explicitly repo-wide or doc-heavy, and
You already know what you want it to do.
6. Autonomy: Ask vs Agent vs Plan vs “do everything in the cloud”
The other dimension is: how much power do you give it?
Low autonomy – “Ask mode”
The AI explains, reviews, suggests.
You apply changes manually.
This is ideal for Bucket 1 (understanding, design, root-cause analysis). You want conversation, not automated edits.
Medium autonomy – “Agent mode on a leash”
The AI can edit files, but you keep tasks small.
You review every diff like a PR.
This fits Bucket 2: local changes where the impact is limited and obvious.
You can even tell an Agent: “Do not edit files yet, only propose a patch” – turning it into Ask mode with extra repo context.
High autonomy – “Cloud agent, big plan, lots of freedom”
The AI runs in a separate environment, possibly with Max context.
It can edit many files and run tools, often without interactive feedback.
This only belongs in Bucket 3 – mechanical repo-wide work with a tight spec.
Using high-autonomy modes for Bucket 1 or 2 (“figure out this fuzzy feature idea and implement it end-to-end”) is almost guaranteed to produce garbage. You’re skipping the human-in-the-loop phase where requirements actually get nailed down.
7. A practical playbook you can actually use
You don’t need to think from scratch every time. Here’s a simple policy you can adopt:
Step 1 – Classify the task
Before typing the prompt, ask:
“Is this hard/ambiguous, local/mechanical, or bulk/mechanical?”
Then:
Hard / ambiguous → Bucket 1
Local / mechanical → Bucket 2
Bulk / mechanical → Bucket 3
Step 2 – Apply the corresponding defaults
Bucket 1
Model: Best available (Sonnet/GPT-5)
Thinking: ON
Max: OFF by default, ON only when you really need repo-wide visibility to understand something
Autonomy: Ask mode; Agent only for small, clearly scoped edits you’ve agreed on
Bucket 2
Model: Mid-tier coding model (Composer/Qwen/etc.)
Thinking: OFF (or minimal)
Max: OFF
Autonomy: Agent allowed, but limited to small scopes; you review diffs
If it fails more than once for this sort of task, reclassify this specific pattern as Bucket 1.
Bucket 3
Model:
Use top model once to help you define a precise spec or codemod script
Use scripts/tools for the actual transformation
Thinking: ON only during spec design, OFF for mechanical execution
Max: ON only if you actually need massive context or multi-file reasoning
Autonomy: Cloud/background agent only for strictly specified, mechanical jobs
8. Optimize for your time, not just token spend
The final mental shift is this:
Tokens are cheap compared to your time and frustration.
Chasing small token savings by trying multiple cheap models and then escalating often loses in the real world.
For interactive development:
Use the best model for anything that feels like “real engineering” (understanding, design, complex bugs).
Use cheaper models or no AI at all for tiny, obvious, mechanical things.
Use AI + scripts for big mechanical tasks, not AI alone.
That’s how you actually get quality per buck – and, more importantly, quality per hour of your life.



