Issue 326 Apr 2026
Anthropic ships Computer Use as a production API, not a demo.
Matthew PaverEditorLondon5 min read
The lead this week is Anthropic's Computer Use API, which moves agent control from research demo to production endpoint. The rest of the issue is context for teams already shipping with LLMs.
If you have been holding off on agent work waiting for the demo-to-production gap to close, this is the moment that gap closed for one major model provider.
The lead this week is Anthropic's production-grade Computer Use API. The interesting question is which agent workflows your team is now willing to ship that you would have left on the shelf last quarter.
Spotlight
Computer Use moving to a stable API tier is the most likely change to near-term agent roadmaps for teams shipping AI products.
Underhyped
Llama 3.3 70B's instruction-following gains deserve more attention than they are getting; on real workloads the cost-per-good-output keeps falling.
Risk to watch
Keep an eye on Mistral Large 2 pricing. Pricing shifts in this tier tend to ripple through vendor budgets within a quarter.
Filed underIndustryModelsReposResearchTools
Lead story
Anthropic has shipped Computer Use as a public beta on the Claude 3.5 Sonnet API, letting models drive a screen via cursor, keyboard, and screenshots inside a sandboxed VM the developer provides.
Our take
The gap between a Twitter demo and a billable API endpoint is the gap that decides whether agent ideas leave the prototype folder. Anthropic just closed it for one major model.
Try this week
Pick one internal workflow that is currently a tab-switching slog and prototype it with Computer Use this week. Decide on cost, latency, and failure modes before pitching wider use.
Learn this week
Recommended reading tied to this week's lead.
Prompt of the week
Anthropic ships Computer Use as a production API, not a demo.
You are an expert reviewer. Read the diff below. Identify only changes that materially affect runtime behaviour, security posture, or public API. Ignore stylistic edits and rewordings. Return up to five bullets, each citing the file and line.Works with: Claude, ChatGPT, Gemini
What changed in industry
2 minMeta has released Llama 3.3 70B with a focus on instruction-following, multilingual coverage, and tool use, positioned as a drop-in replacement for the much larger 405B model. If the cost-per-good-output claims hold on your workload, the economics of self-hosted inference shift again. Worth a benchmark before any commitment to a closed-source tier.
Google has rolled out Gemini 2.0 Flash with native tool use, multimodal output, and an experimental free tier through AI Studio aimed at developers prototyping agent workflows. A genuinely free tier removes one of the last frictions to evaluating Google as a primary model provider. Useful if you have been waiting for a sane prototyping path.
Anthropic has published a pattern catalogue for building agentic systems, distinguishing 'workflows' (LLM + code with predictable control flow) from 'agents' (LLM-driven control flow) and laying out which primitives compose well. Most production agent code today reaches for the most agentic option when a simpler workflow would ship faster and break less. The catalogue gives teams a vocabulary for that decision.
Research worth your time
1 minResearchers from Harvard, Stanford, MIT, and Carnegie Mellon have proposed precision-aware scaling laws covering both training and inference, arguing that lower precisions earlier in the pipeline can be cheaper than the field has assumed. If the law generalises, the right precision floor for a given budget moves down a step. Worth tracking if you set inference budgets or train your own models.
Tools to try
1 minDeepSeek has released V3, a 671B-parameter Mixture-of-Experts model, with open weights, a detailed technical report, and reproducible training cost figures. Open weights at this capability tier change the build-versus-buy maths for teams that can host inference. The training cost numbers are also a useful reference point for planning.
LangChain has continued investing in LangGraph as the durable-state layer for multi-agent systems, with new patterns for human-approval gates and persistence across sessions. If you have hit the limits of a single agent loop, the question is which framework's state model survives a real production deployment. LangGraph is one of the few credible answers.
Open-source picks
1 minvLLM has continued to ship support for new model architectures, FP8 inference paths, and improved scheduling, keeping pace with the wave of recent open-weight model releases. If you self-host inference, vLLM remains the pragmatic default for high-throughput serving. Worth a re-evaluation if your last benchmark was more than two months ago.
OpenAI has open-sourced an official Python agents SDK with primitives for multi-agent handoffs, guardrails, and tracing, deliberately small enough to read in one sitting. Having a first-party reference implementation reduces the framework-shopping tax for teams that already standardise on OpenAI APIs. Worth comparing against LangGraph and your in-house rig.
InferenceIssue 326 Apr 2026
Set in Space Grotesk and Source Serif 4. Compiled in London.
More from Inference
More issues you might enjoy
How was this issue?
Helps us pick better stories next week.
