AI Coding Roundup - 26th Mar 2026

The signals from the noise.

New stuff

Agents and IDEs

How agentic coding works

  • agents combine:
    • user interface (cli, ide, cloud, background)
    • model (foundation, multimodal tokens (e.g., text, image, audio), reasoning/thinking/effort level (i.e., problem decomposition chatting to itself))
    • harness (loop chat and tools)
      • each turn of the loop
        • build context
          • always: static prefixes - system prompt, AGENTS.md/CLAUDE.md, skill descriptions, MCP servers, etc.
          • all prior turns (prompts, responses, tool results)
            • this is the memory being persisted i.e., model is stateless
          • caching
            • across requests: prompt caching
              • static prefixes after first turn
            • within a request: key-value (KV) cache
              • models are autoregressive — without caching, generating token 100 recomputes key/value attention vectors for tokens 1-99
              • prefill phase: process entire prompt in parallel (compute-bound), builds the KV cache
              • decode phase: generate tokens one at a time using cached KVs (memory bandwidth-bound — GPU waits reading KVs from VRAM)
        • send context (limited by context window, which is limited by KV cache memory) to model
        • model response
          • probability distribution over next token, sampled to give next action
          • could be tool calls (e.g., read, write, bash, python) or skill use

Skills

  • make skills over rules or commands
    • rules are global persistent instructions (waste context)
    • commands are manual only
    • skills are on-demand, automatic, progressively disclosed (i.e., reads name and description first), modular abilities
      • can call manually too

Subagents

Single-agent autonomy

Multi-agent orchestration

  • challenging - collaborate, coordinate, sync, problem decomposition, verify, resilience, recovery, distributed systems, delegation, review, redirect, latency, accuracy, scalability, throughput, interfaces, safety, reliability
    • analogies to non-deterministic microservices, a factory line with randomness, managing/directing a team of human devs
    • most important lesson from years of distributed systems: keep everything on a single machine for as long as humanly possible
  • Cursor’s attempts - recursion, cloud agents
  • Claude agent teams
    • anthropic used parallel long-running agents to make a c compiler
  • cloudflare made next.js copy - faster and smaller
  • gas town and wasteland
  • openclaw (example use)

Tips

Cost recommendations

Models

  • Avoid using expensive models (e.g., Opus) for every task — match model to task complexity
  • Avoid max mode / extended thinking on routine tasks — a significant cost multiplier; reserve for genuinely hard problems
  • Avoid fast mode unless essential, and use extra-high effort only with care — both are significant cost multipliers
  • Use auto / routing models when available — picks the cheapest capable model; avoid locking to a specific expensive model

Context

  • Clear the context window regularly — cost scales with length, can summarize and start fresh
  • Avoid large context when a snippet would do — use @file references to pull only relevant chunks
  • Use .cursorignore — unfiltered large repos inflate every context window
  • Disable idle MCP servers — tool definitions add to context even when unused; prefer CLI tools (gh, aws, gcloud)
  • Keep AGENTS.md/CLAUDE.md lean — it loads on every session; keep under ~200 lines and move workflow-specific instructions to skills
  • Batch related changes into one session — switching between many short sessions re-loads context each time

Prompts and outputs

  • Front-load context in one precise prompt — vague prompts lead to multiple clarification rounds
  • Give the agent relevant context upfront — avoids redundant file reads and searches
  • Ask for short answers when that’s all you need — output tokens cost ~3–5× more than input tokens
  • Avoid attaching large screenshots or images unnecessarily — vision inputs cost more per token than text

Agents

  • Avoid multi-agent orchestration unless necessary — parallel agents multiply cost fast
  • Avoid spawning subagents for many small tasks — a single focused agent is often cheaper
  • Set a turn or tool-call limit on long-running agents
  • Diagnose failures before retrying — avoid letting agents loop without fixing the root cause
  • Course-correct early — stop immediately if heading wrong; use rewind in Cursor

Automation

  • Avoid agent review on every commit or PR — trigger only when justified
  • Avoid automated CI/CD triggers running agents on every push
  • Use a simpler tool when it fits — grep, a script, or a template is free; reserve agents for genuinely ambiguous work

Visibility

  • Treat AI cost like cloud cost — make it visible and part of design decisions, not just a billing surprise
  • Check model prices and usage regularly
  • Cursor usage dashboard — spot runaway spend early
  • Review which models and features are consuming most tokens

Resources

Opinions

Free time

  • Questions?
  • Discussion?
  • Demo requests?
  • Anything anyone wants to share?
  • Anything like me to cover next time?