AI Coding Roundup - 26th Mar 2026

The signals from the noise.

New stuff

Gemini 3.1 pro beats opus 4.6 in many benchmarks and is half the price
Claude code security scanning with auto-fixes, similar for codex (research preview)
Reliability issues
Cursor automations (also for codex) e.g., repo summary, find vulnerability, PR review, etc.
GPT5.4 another benchmark topper (steer is useful)
composer 2 - fast mode has good speed/cost

The end of the (classic) IDE, instead agent orchestration
Managing parallel agents (i.e., agent first over editor, like CLI but easier to manage multiple as GUI)
- Cursor agent view and worktrees, though awkward as applies to local checkout - now glass (agent GUI)
- Codex app has nice worktrees as makes PRs without touching local checkout
- Claude code desktop similar to codex

make skills over rules or commands
- rules are global persistent instructions (waste context)
- commands are manual only
- skills are on-demand, automatic, progressively disclosed (i.e., reads name and description first), modular abilities
  - can call manually too

ralph loop and loom - infinite loop continuously clearing context rot, persist good stuff in files (need to define)
- when to use
- claude’s ralph (plugin)
claude’s /loop
autoresearch from karpathy for automated research e.g., making AI loss go down

challenging - collaborate, coordinate, sync, problem decomposition, verify, resilience, recovery, distributed systems, delegation, review, redirect, latency, accuracy, scalability, throughput, interfaces, safety, reliability
- analogies to non-deterministic microservices, a factory line with randomness, managing/directing a team of human devs
- most important lesson from years of distributed systems: keep everything on a single machine for as long as humanly possible
Cursor’s attempts - recursion, cloud agents
Claude agent teams
- anthropic used parallel long-running agents to make a c compiler
  - nice reflection from chris lattner
cloudflare made next.js copy - faster and smaller
gas town and wasteland
openclaw (example use)

simplicity even more important, specifically things that don’t intertwine (simple != easy)
recommend reading best practices (mostly transferable) e.g., claude code, cursor

Avoid using expensive models (e.g., Opus) for every task — match model to task complexity
Avoid max mode / extended thinking on routine tasks — a significant cost multiplier; reserve for genuinely hard problems
Avoid fast mode unless essential, and use extra-high effort only with care — both are significant cost multipliers
Use auto / routing models when available — picks the cheapest capable model; avoid locking to a specific expensive model

Clear the context window regularly — cost scales with length, can summarize and start fresh
Avoid large context when a snippet would do — use @file references to pull only relevant chunks
Use .cursorignore — unfiltered large repos inflate every context window
Disable idle MCP servers — tool definitions add to context even when unused; prefer CLI tools (gh, aws, gcloud)
Keep AGENTS.md/CLAUDE.md lean — it loads on every session; keep under ~200 lines and move workflow-specific instructions to skills
Batch related changes into one session — switching between many short sessions re-loads context each time

Front-load context in one precise prompt — vague prompts lead to multiple clarification rounds
Give the agent relevant context upfront — avoids redundant file reads and searches
Ask for short answers when that’s all you need — output tokens cost ~3–5× more than input tokens
Avoid attaching large screenshots or images unnecessarily — vision inputs cost more per token than text

Avoid multi-agent orchestration unless necessary — parallel agents multiply cost fast
Avoid spawning subagents for many small tasks — a single focused agent is often cheaper
Set a turn or tool-call limit on long-running agents
Diagnose failures before retrying — avoid letting agents loop without fixing the root cause
Course-correct early — stop immediately if heading wrong; use rewind in Cursor

Avoid agent review on every commit or PR — trigger only when justified
Avoid automated CI/CD triggers running agents on every push
Use a simpler tool when it fits — grep, a script, or a template is free; reserve agents for genuinely ambiguous work

Treat AI cost like cloud cost — make it visible and part of design decisions, not just a billing surprise
Check model prices and usage regularly
Cursor usage dashboard — spot runaway spend early
Review which models and features are consuming most tokens

Try always having an agent running
vericoding seems promising - prove correctness of AI generated code
Taste also applies to deciding what to do - like choosing what to read when there’s a ~infinite number of books
- taste might be automatible too
some nice ideas for using ai coding for research - /experiment
try and get ai’s to come up with ideas - unknown, expensive