AI Coding Roundup - 26th Mar 2026
The signals from the noise.
New stuff
- Gemini 3.1 pro beats opus 4.6 in many benchmarks and is half the price
- Claude code security scanning with auto-fixes, similar for codex (research preview)
- Reliability issues
- Cursor automations (also for codex) e.g., repo summary, find vulnerability, PR review, etc.
- GPT5.4 another benchmark topper (steer is useful)
- composer 2 - fast mode has good speed/cost
Agents and IDEs
- The end of the (classic) IDE, instead agent orchestration
- Managing parallel agents (i.e., agent first over editor, like CLI but easier to manage multiple as GUI)
- Cursor agent view and worktrees, though awkward as applies to local checkout - now glass (agent GUI)
- Codex app has nice worktrees as makes PRs without touching local checkout
- Claude code desktop similar to codex
How agentic coding works
- agents combine:
- user interface (cli, ide, cloud, background)
- model (foundation, multimodal tokens (e.g., text, image, audio), reasoning/thinking/effort level (i.e., problem decomposition chatting to itself))
- harness (loop chat and tools)
- each turn of the loop
- build context
- always: static prefixes - system prompt,
AGENTS.md/CLAUDE.md, skill descriptions, MCP servers, etc. - all prior turns (prompts, responses, tool results)
- this is the memory being persisted i.e., model is stateless
- caching
- across requests: prompt caching
- static prefixes after first turn
- within a request: key-value (KV) cache
- models are autoregressive — without caching, generating token 100 recomputes key/value attention vectors for tokens 1-99
- prefill phase: process entire prompt in parallel (compute-bound), builds the KV cache
- decode phase: generate tokens one at a time using cached KVs (memory bandwidth-bound — GPU waits reading KVs from VRAM)
- across requests: prompt caching
- always: static prefixes - system prompt,
- send context (limited by context window, which is limited by KV cache memory) to model
- model response
- probability distribution over next token, sampled to give next action
- could be tool calls (e.g., read, write, bash, python) or skill use
- build context
- each turn of the loop
Skills
- make skills over rules or commands
- rules are global persistent instructions (waste context)
- commands are manual only
- skills are on-demand, automatic, progressively disclosed (i.e., reads name and description first), modular abilities
- can call manually too
Subagents
Single-agent autonomy
- ralph loop and loom - infinite loop continuously clearing context rot, persist good stuff in files (need to define)
- when to use
- claude’s ralph (plugin)
- claude’s /loop
- autoresearch from karpathy for automated research e.g., making AI loss go down
Multi-agent orchestration
- challenging - collaborate, coordinate, sync, problem decomposition, verify, resilience, recovery, distributed systems, delegation, review, redirect, latency, accuracy, scalability, throughput, interfaces, safety, reliability
- analogies to non-deterministic microservices, a factory line with randomness, managing/directing a team of human devs
- most important lesson from years of distributed systems: keep everything on a single machine for as long as humanly possible
- Cursor’s attempts - recursion, cloud agents
- Claude agent teams
- anthropic used parallel long-running agents to make a c compiler
- nice reflection from chris lattner
- anthropic used parallel long-running agents to make a c compiler
- cloudflare made next.js copy - faster and smaller
- gas town and wasteland
- openclaw (example use)
Tips
- simplicity even more important, specifically things that don’t intertwine (simple != easy)
- recommend reading best practices (mostly transferable) e.g., claude code, cursor
Cost recommendations
Models
- Avoid using expensive models (e.g., Opus) for every task — match model to task complexity
- Avoid max mode / extended thinking on routine tasks — a significant cost multiplier; reserve for genuinely hard problems
- Avoid fast mode unless essential, and use extra-high effort only with care — both are significant cost multipliers
- Use
auto/ routing models when available — picks the cheapest capable model; avoid locking to a specific expensive model
Context
- Clear the context window regularly — cost scales with length, can summarize and start fresh
- Avoid large context when a snippet would do — use
@filereferences to pull only relevant chunks - Use
.cursorignore— unfiltered large repos inflate every context window - Disable idle MCP servers — tool definitions add to context even when unused; prefer CLI tools (
gh,aws,gcloud) - Keep
AGENTS.md/CLAUDE.mdlean — it loads on every session; keep under ~200 lines and move workflow-specific instructions to skills - Batch related changes into one session — switching between many short sessions re-loads context each time
Prompts and outputs
- Front-load context in one precise prompt — vague prompts lead to multiple clarification rounds
- Give the agent relevant context upfront — avoids redundant file reads and searches
- Ask for short answers when that’s all you need — output tokens cost ~3–5× more than input tokens
- Avoid attaching large screenshots or images unnecessarily — vision inputs cost more per token than text
Agents
- Avoid multi-agent orchestration unless necessary — parallel agents multiply cost fast
- Avoid spawning subagents for many small tasks — a single focused agent is often cheaper
- Set a turn or tool-call limit on long-running agents
- Diagnose failures before retrying — avoid letting agents loop without fixing the root cause
- Course-correct early — stop immediately if heading wrong; use rewind in Cursor
Automation
- Avoid agent review on every commit or PR — trigger only when justified
- Avoid automated CI/CD triggers running agents on every push
- Use a simpler tool when it fits —
grep, a script, or a template is free; reserve agents for genuinely ambiguous work
Visibility
- Treat AI cost like cloud cost — make it visible and part of design decisions, not just a billing surprise
- Check model prices and usage regularly
- Cursor usage dashboard — spot runaway spend early
- Review which models and features are consuming most tokens
Resources
Opinions
- Try always having an agent running
- vericoding seems promising - prove correctness of AI generated code
- Taste also applies to deciding what to do - like choosing what to read when there’s a ~infinite number of books
- taste might be automatible too
- some nice ideas for using ai coding for research - /experiment
- try and get ai’s to come up with ideas - unknown, expensive
Free time
- Questions?
- Discussion?
- Demo requests?
- Anything anyone wants to share?
- Anything like me to cover next time?