Issue 326 Apr 2026

Anthropic ships Computer Use as a production API, not a demo.

Matthew PaverEditorLondon5 min read

The lead this week is Anthropic's Computer Use API, which moves agent control from research demo to production endpoint. The rest of the issue is context for teams already shipping with LLMs.

If you have been holding off on agent work waiting for the demo-to-production gap to close, this is the moment that gap closed for one major model provider.

The lead this week is Anthropic's production-grade Computer Use API. The interesting question is which agent workflows your team is now willing to ship that you would have left on the shelf last quarter.

Spotlight

Computer Use moving to a stable API tier is the most likely change to near-term agent roadmaps for teams shipping AI products.

Underhyped

Llama 3.3 70B's instruction-following gains deserve more attention than they are getting; on real workloads the cost-per-good-output keeps falling.

Risk to watch

Keep an eye on Mistral Large 2 pricing. Pricing shifts in this tier tend to ripple through vendor budgets within a quarter.

Filed underIndustryModelsReposResearchTools

Lead story

Anthropic News

Introducing Computer Use, a new Claude 3.5 Sonnet capability

Anthropic has shipped Computer Use as a public beta on the Claude 3.5 Sonnet API, letting models drive a screen via cursor, keyboard, and screenshots inside a sandboxed VM the developer provides.

Our take

The gap between a Twitter demo and a billable API endpoint is the gap that decides whether agent ideas leave the prototype folder. Anthropic just closed it for one major model.

Try this week

Pick one internal workflow that is currently a tab-switching slog and prototype it with Computer Use this week. Decide on cost, latency, and failure modes before pitching wider use.

Read article

Learn this week

Recommended reading tied to this week's lead.

Building production agents with Claude Computer Use

The official guide. Skip the marketing intro and read the failure-modes section.

IntermediateAnthropic Docs1 hourFree

Prompt of the week

Anthropic ships Computer Use as a production API, not a demo.

You are an expert reviewer. Read the diff below. Identify only changes that materially affect runtime behaviour, security posture, or public API. Ignore stylistic edits and rewordings. Return up to five bullets, each citing the file and line.

Works with: Claude, ChatGPT, Gemini

What changed in industry

2 min

Technical · Meta AI BlogBuilder

Llama 3.3: open source, multilingual, 70B that follows instructions

Meta has released Llama 3.3 70B with a focus on instruction-following, multilingual coverage, and tool use, positioned as a drop-in replacement for the much larger 405B model. If the cost-per-good-output claims hold on your workload, the economics of self-hosted inference shift again. Worth a benchmark before any commitment to a closed-source tier.

Technical · Google AI BlogBuilder

Gemini 2.0 Flash: agent capabilities and a free experimental tier

Google has rolled out Gemini 2.0 Flash with native tool use, multimodal output, and an experimental free tier through AI Studio aimed at developers prototyping agent workflows. A genuinely free tier removes one of the last frictions to evaluating Google as a primary model provider. Useful if you have been waiting for a sane prototyping path.

Technical · Anthropic ResearchBuilder

Building effective agents: Anthropic's pattern catalogue

Anthropic has published a pattern catalogue for building agentic systems, distinguishing 'workflows' (LLM + code with predictable control flow) from 'agents' (LLM-driven control flow) and laying out which primitives compose well. Most production agent code today reaches for the most agentic option when a simpler workflow would ship faster and break less. The catalogue gives teams a vocabulary for that decision.

Research worth your time

1 min

Deep dive · arXiv CS.AIResearcher

Scaling laws for precision: trade-offs between model size, data, and quantisation

Researchers from Harvard, Stanford, MIT, and Carnegie Mellon have proposed precision-aware scaling laws covering both training and inference, arguing that lower precisions earlier in the pipeline can be cheaper than the field has assumed. If the law generalises, the right precision floor for a given budget moves down a step. Worth tracking if you set inference budgets or train your own models.

Tools to try

1 min

Deep dive · GitHubBuilder

DeepSeek-V3: a 671B Mixture-of-Experts model with open weights

DeepSeek has released V3, a 671B-parameter Mixture-of-Experts model, with open weights, a detailed technical report, and reproducible training cost figures. Open weights at this capability tier change the build-versus-buy maths for teams that can host inference. The training cost numbers are also a useful reference point for planning.

Technical · LangChain BlogBuilder

LangGraph: stateful multi-agent workflows

LangChain has continued investing in LangGraph as the durable-state layer for multi-agent systems, with new patterns for human-approval gates and persistence across sessions. If you have hit the limits of a single agent loop, the question is which framework's state model survives a real production deployment. LangGraph is one of the few credible answers.

Open-source picks

1 min

Technical · GitHubBuilder

vLLM: high-throughput inference serving for LLMs

vLLM has continued to ship support for new model architectures, FP8 inference paths, and improved scheduling, keeping pace with the wave of recent open-weight model releases. If you self-host inference, vLLM remains the pragmatic default for high-throughput serving. Worth a re-evaluation if your last benchmark was more than two months ago.

Technical · GitHubBuilder

openai-agents-python: OpenAI's official agent loop

OpenAI has open-sourced an official Python agents SDK with primitives for multi-agent handoffs, guardrails, and tracing, deliberately small enough to read in one sitting. Having a first-party reference implementation reduces the framework-shopping tax for teams that already standardise on OpenAI APIs. Worth comparing against LangGraph and your in-house rig.