Frontier models at 10× less cost
New Coding Agents: Cursor Composer 2.5, Qwen3.7-Max, Grok Build Goes Wide
Cursor Composer 2.5 launched on Moonshot's Kimi K2.5, matching Claude Opus 4.7 and GPT-5.5 on Terminal-Bench 2.0 (69.3%) and edging past GPT-5.5 on SWE-Bench Multilingual at 10× lower cost ($0.50/$2.50 per million tokens vs. $5–$30/M for frontier rivals); Cursor has hit $3B ARR with SpaceX's $60B acquisition expected ~30 days after the SpaceX IPO this summer.
Alibaba's Qwen3.7-Max — a closed-source, API-only model — can run for 35 continuous hours of autonomous execution, executing 1,158 tool calls and 432 kernel evaluations in a single task with a 10× speedup; it natively speaks the Anthropic API protocol for cross-harness compatibility, scored 76.4 on MCP-Atlas (Coding Agent), and is priced at $2.50/$7.50 per million tokens.
Grok Build opened to all SuperGrok and X Premium Plus subscribers on May 25 — beyond the $300/month SuperGrok Heavy early-access tier launched May 15; features include plan-before-execute mode, parallel subagents in git worktrees, AGENTS.md/plugins/hooks/skills/MCP server support, headless mode for CI/CD, and ACP support; xAI simultaneously shipped a Grok + OpenCode integration powering the open-source OpenCode terminal agent. (Previously: Grok Build missed six consecutive "next week" deadlines — it is now broadly live.)
DeepSeek Reasonix (MIT, May 24–25) — a terminal agent engineered around DeepSeek V4's prefix caching — one user processed 435M input tokens in a day at a 99.82% cache hit rate for ~$12 vs. ~$61 without caching; features include plan mode, MCP support, and a vision-based self-testing loop where the agent screenshots its own output to verify UI correctness.
Anthropic source strings now reference
"claude-mythos-1-preview"for Claude Code and Claude Security integration; some users briefly saw "Mythos 1" appear in the UI; Claude Opus 4.8 is in partner evaluation and expected within weeks. Glasswing's update stated: "once we've developed the far stronger safeguards we need, we look forward to making Mythos-class models available through a general release."
Anthropic's First Profitable Quarter in Sight; BMS Deploys to 30,000
Anthropic generated $4.8B in Q1 2026 and is targeting $10.9B in Q2 — more than all of 2025's revenue in a single quarter, which would make Q2 its first profitable quarter; both Anthropic and OpenAI are now eyeing 2026 IPOs.
Bristol Myers Squibb deployed Claude Enterprise to 30,000 employees — positioning Claude as its unified intelligence platform across drug discovery, clinical development, regulatory submissions, manufacturing, and commercial workflows; Anthropic's top-50 customer base is now 40% financial and life-sciences institutions.
Exa raised $250M Series C at a $2.2B valuation (a16z led, May 20) — the company builds the search infrastructure layer powering web retrieval inside Cursor, Cognition (Devin), and other agent platforms, targeting hundreds of thousands of agent searches per second; the raise was announced the morning after Google I/O declared its own search box obsolete.
Salesforce CEO Benioff confirmed zero new software engineers hired in 2026 — AI productivity is absorbing headcount growth rather than generating layoffs; Gartner simultaneously warned that falling token unit costs will not reduce aggregate AI spend as agentic models require far more tokens per task: "Chief Product Officers should not confuse the deflation of commodity tokens with the democratization of frontier reasoning." Goldman Sachs projects a 24× increase in token consumption by 2030 driven entirely by agents.
Google Unifies Five Coding Tools into Antigravity by June 18
Google is consolidating Gemini CLI, Gemini Code Assist, and AI Studio into a single Antigravity 2.0 platform — June 18 is the migration deadline for free and Pro users, with a longer Enterprise window; new Antigravity desktop app and CLI replace all prior tools. (Previously: Antigravity 2.0 shipped at Google I/O — the forced sunset of legacy tools and the hard deadline are new.)
Google simultaneously cut its top AI tier from $250 to $200/month and added a new $100/month AI Ultra plan with higher usage limits — a direct response to pricing pressure from Anthropic's Max plan and OpenAI's $100/month team offering, compressing the premium tier range across all three major labs.
Supply Chain Emergency: A Worm Inside GitHub, 495 Malicious Models
TeamPCP (UNC6780) breached GitHub via the Nx Console VS Code extension (2.2M installs) — compromised for ~18 minutes, exposing ~3,800 internal GitHub repositories containing infrastructure configs and staging credentials; the group's Mini Shai-Hulud self-replicating worm simultaneously hit the
@antvnpm ecosystem (639 malicious versions across 323 packages), a GitHub Actions workflow, and Microsoft's Pythondurabletaskclient — Sigstore certificates were exploited to make compromised packages appear legitimate; TeamPCP claims Claude was used to write malware components.JFrog's 2026 Software Supply Chain Report finds npm malicious package activity surged 451% in 2025 and 495 malicious AI models were detected on public registries (Hugging Face); 53% of organizations pull AI models directly from public registries; 97% of enterprises claim certified AI governance but only ~19% have active oversight; attackers have pivoted to IDE extensions as the primary breach vector — the exact channel TeamPCP exploited.
1Password's Environments MCP for Codex provisions just-in-time credentials injected at runtime into authorized processes only — values never appear in code, prompts, terminals, or model context; Codex can be instructed to store credentials it creates back to the 1Password vault; CTO Nancy Wang: "A credential that persists is already compromised."
Nudge Security launched browser-extension-based shadow AI agent discovery (May 27) — covering agents in Atlassian Rovo, ChatGPT Workspace Agents, Cursor Automations, Retool Agents, and Zapier Agents without relying on platform APIs that many agent-building tools don't yet expose; each discovered agent maps to its human creator with risk enrichment covering hardcoded credentials and unauthenticated connections.
Anthropic's Containment Architecture: Approval Fatigue and the MitM Proxy Fix
Anthropic published "How We Contain Claude" — three production containment architectures: Ephemeral containers (claude.ai code execution via gVisor, per-session filesystem); Human-in-the-loop sandbox (Claude Code — key failure: approval fatigue caused users to approve sensitive actions without scrutiny, mitigated by OS-level Seatbelt/bubblewrap sandbox); and Local VM (Cowork — key failure: traffic to
api.anthropic.comwas permitted, enabling exfiltration via compromised API keys, fixed with a man-in-the-middle proxy that inspects all outbound calls).The stated design principle: "Containment first, model steering second" — environmental boundaries are deterministic while probabilistic model defenses can fail; Anthropic flags persistent memory poisoning, multi-agent trust escalation, and agent identity as emerging challenges not yet addressed by any current architecture.
Get Agentic software development in your inbox
Subscribe to receive new issues as they're published.