Frontier models at 10× less cost

New Coding Agents: Cursor Composer 2.5, Qwen3.7-Max, Grok Build Goes Wide

  • Cursor Composer 2.5 launched on Moonshot's Kimi K2.5, matching Claude Opus 4.7 and GPT-5.5 on Terminal-Bench 2.0 (69.3%) and edging past GPT-5.5 on SWE-Bench Multilingual at 10× lower cost ($0.50/$2.50 per million tokens vs. $5–$30/M for frontier rivals); Cursor has hit $3B ARR with SpaceX's $60B acquisition expected ~30 days after the SpaceX IPO this summer.

  • Alibaba's Qwen3.7-Max — a closed-source, API-only model — can run for 35 continuous hours of autonomous execution, executing 1,158 tool calls and 432 kernel evaluations in a single task with a 10× speedup; it natively speaks the Anthropic API protocol for cross-harness compatibility, scored 76.4 on MCP-Atlas (Coding Agent), and is priced at $2.50/$7.50 per million tokens.

  • Grok Build opened to all SuperGrok and X Premium Plus subscribers on May 25 — beyond the $300/month SuperGrok Heavy early-access tier launched May 15; features include plan-before-execute mode, parallel subagents in git worktrees, AGENTS.md/plugins/hooks/skills/MCP server support, headless mode for CI/CD, and ACP support; xAI simultaneously shipped a Grok + OpenCode integration powering the open-source OpenCode terminal agent. (Previously: Grok Build missed six consecutive "next week" deadlines — it is now broadly live.)

  • DeepSeek Reasonix (MIT, May 24–25) — a terminal agent engineered around DeepSeek V4's prefix caching — one user processed 435M input tokens in a day at a 99.82% cache hit rate for ~$12 vs. ~$61 without caching; features include plan mode, MCP support, and a vision-based self-testing loop where the agent screenshots its own output to verify UI correctness.

  • Anthropic source strings now reference "claude-mythos-1-preview" for Claude Code and Claude Security integration; some users briefly saw "Mythos 1" appear in the UI; Claude Opus 4.8 is in partner evaluation and expected within weeks. Glasswing's update stated: "once we've developed the far stronger safeguards we need, we look forward to making Mythos-class models available through a general release."


Anthropic's First Profitable Quarter in Sight; BMS Deploys to 30,000


Google Unifies Five Coding Tools into Antigravity by June 18

  • Google is consolidating Gemini CLI, Gemini Code Assist, and AI Studio into a single Antigravity 2.0 platformJune 18 is the migration deadline for free and Pro users, with a longer Enterprise window; new Antigravity desktop app and CLI replace all prior tools. (Previously: Antigravity 2.0 shipped at Google I/O — the forced sunset of legacy tools and the hard deadline are new.)

  • Google simultaneously cut its top AI tier from $250 to $200/month and added a new $100/month AI Ultra plan with higher usage limits — a direct response to pricing pressure from Anthropic's Max plan and OpenAI's $100/month team offering, compressing the premium tier range across all three major labs.


Supply Chain Emergency: A Worm Inside GitHub, 495 Malicious Models

  • TeamPCP (UNC6780) breached GitHub via the Nx Console VS Code extension (2.2M installs) — compromised for ~18 minutes, exposing ~3,800 internal GitHub repositories containing infrastructure configs and staging credentials; the group's Mini Shai-Hulud self-replicating worm simultaneously hit the @antv npm ecosystem (639 malicious versions across 323 packages), a GitHub Actions workflow, and Microsoft's Python durabletask client — Sigstore certificates were exploited to make compromised packages appear legitimate; TeamPCP claims Claude was used to write malware components.

  • JFrog's 2026 Software Supply Chain Report finds npm malicious package activity surged 451% in 2025 and 495 malicious AI models were detected on public registries (Hugging Face); 53% of organizations pull AI models directly from public registries; 97% of enterprises claim certified AI governance but only ~19% have active oversight; attackers have pivoted to IDE extensions as the primary breach vector — the exact channel TeamPCP exploited.

  • 1Password's Environments MCP for Codex provisions just-in-time credentials injected at runtime into authorized processes only — values never appear in code, prompts, terminals, or model context; Codex can be instructed to store credentials it creates back to the 1Password vault; CTO Nancy Wang: "A credential that persists is already compromised."

  • Nudge Security launched browser-extension-based shadow AI agent discovery (May 27) — covering agents in Atlassian Rovo, ChatGPT Workspace Agents, Cursor Automations, Retool Agents, and Zapier Agents without relying on platform APIs that many agent-building tools don't yet expose; each discovered agent maps to its human creator with risk enrichment covering hardcoded credentials and unauthenticated connections.


Anthropic's Containment Architecture: Approval Fatigue and the MitM Proxy Fix

  • Anthropic published "How We Contain Claude" — three production containment architectures: Ephemeral containers (claude.ai code execution via gVisor, per-session filesystem); Human-in-the-loop sandbox (Claude Code — key failure: approval fatigue caused users to approve sensitive actions without scrutiny, mitigated by OS-level Seatbelt/bubblewrap sandbox); and Local VM (Cowork — key failure: traffic to api.anthropic.com was permitted, enabling exfiltration via compromised API keys, fixed with a man-in-the-middle proxy that inspects all outbound calls).

  • The stated design principle: "Containment first, model steering second" — environmental boundaries are deterministic while probabilistic model defenses can fail; Anthropic flags persistent memory poisoning, multi-agent trust escalation, and agent identity as emerging challenges not yet addressed by any current architecture.

Get Agentic software development in your inbox

Subscribe to receive new issues as they're published.