Search

DiffusionGemma 4x faster, dynamic Claude Code workflows in GA, Grok Voice #1 EVA-Bench

ai-powered-markdown-translator

Article translated from fr to en with gpt-5.4-mini.

View project on GitHub ↗

June 10, 2026 marks a packed day: Google DeepMind launches DiffusionGemma, a new diffusion-based text generation architecture reaching 1,000 tokens per second on H100, immediately optimized by NVIDIA for local hardware. On the developer tooling side, Anthropic moves Claude Code dynamic workflows into general availability with agent recursion up to 5 levels, and xAI positions Grok Voice Think Fast 1.0 as number one on the EVA-Bench benchmark. GitHub, OpenAI, Perplexity, and Cohere round out a day rich in announcements.


DiffusionGemma: parallel generation of 256-token blocks, 4x faster on GPU

June 10 — Google DeepMind launches DiffusionGemma, an experimental open model with 26 billion parameters (Mixture of Experts architecture) released under the Apache 2.0 license. Its distinguishing feature: instead of generating one token after another like any classic autoregressive model, it generates whole blocks of 256 tokens simultaneously by applying the same iterative denoising principle used by image diffusion models.

Result: up to 4x faster on dedicated GPU. The model activates only 3.8 billion parameters during inference, allowing it to fit in 18 GB of VRAM once quantized — accessible on high-end consumer GPUs. Bidirectional attention opens up uses that are difficult for autoregressive models: inline editing, code completion, amino acid sequences, mathematical graphs.

NVIDIA immediately optimized DiffusionGemma for its GPUs by leveraging Tensor Cores, where autoregressive architectures are constrained by memory bandwidth. Measured performance on different hardware:

HardwarePerformance
NVIDIA H100 (server)1,000 tokens/s
NVIDIA DGX Stationup to 800 tokens/s
NVIDIA DGX Spark (local)150 tokens/s
GeForce RTX 5090 (quantized)700+ tokens/s
GeForce RTX 4090 (quantized)llama.cpp support coming soon

Weights are available on Hugging Face with immediate support in HF Transformers, vLLM, and Unsloth. The model is also free to try on build.nvidia.com.

Important: Google explicitly states that output quality remains below that of standard Gemma 4 models. DiffusionGemma targets developers exploring interactive local workflows — rapid iteration, inline editing — not production.

“DiffusionGemma is our new experimental open model with up to 4x faster output on dedicated GPUs. Instead of predicting word-by-word, it generates entire blocks of text simultaneously.” — @GoogleDeepMind on X

🔗 Google DeepMind announcement · 🔗 NVIDIA optimization


Claude Code v2.1.172: dynamic workflows in general availability, recursive sub-agents up to 5 levels

June 10 — Anthropic announces that Claude Code’s dynamic workflows are moving into general availability. Introduced in research preview on May 28, they allow Claude to design its own orchestration and launch dozens to hundreds of sub-agents in parallel to handle complex end-to-end tasks.

The v2.1.172 CLI version, released the same day, introduces the key associated capability: sub-agents can now create their own sub-agents, up to 5 levels of nesting. This is the technical foundation that makes dynamic workflows operational at scale.

Main use cases:

  • Bug hunting across an entire repository, security audits
  • Code migrations touching thousands of files (e.g.: Bun’s Zig→Rust port in 11 days)
  • Adversarial verification of a result before delivery

Availability and conditions:

ItemDetail
PlansMax, Team, Enterprise (if enabled by the admin), Claude API
Cloud platformsAmazon Bedrock, Vertex AI, Microsoft Foundry
ActivationCommand Create a workflow or parameter ultracode (effort xhigh)
Sub-agent depthUp to 5 levels
CLI versionv2.1.172

Note: dynamic workflows consume significantly more tokens than a standard Claude Code session. Claude Code displays a confirmation before the first launch. Enterprise admins can disable the feature via managed settings.

Other changes in v2.1.172: fix for sessions getting permanently stuck with 1M context without credits, browser search bar /plugin, Amazon Bedrock now reads the AWS region from ~/.aws if AWS_REGION is not set, numerous stability fixes for background agents.

🔗 @claudeai announcement · 🔗 Dynamic Workflows blog · 🔗 CHANGELOG


Grok Voice Think Fast 1.0 — number one on EVA-Bench

June 10 — xAI announces Grok Voice Think Fast 1.0, its voice model positioned on the Pareto frontier of ServiceNow AI Research’s EVA-Bench ranking. The Pareto frontier means that no other system in the evaluation simultaneously surpasses its accuracy and user experience quality.

xAI highlights three characteristics: natural timing, context-appropriate intonation, and a perceived warmth similar to a human’s. The model is available via the xAI voice API at x.ai/api/voice, at a price presented as significantly lower than competitors’.

MetricValue
EVA-Bench rankingPareto frontier (number 1)
Availabilityx.ai/api/voice API
Pricing positionFraction of competitors’ price (according to xAI)

“Grok Voice offers state-of-the-art performance with human-like timing, tone, and warmth. And it’s a fraction the price of competitors.” — @xai on X

🔗 EVA-Bench results


NVIDIA Confidential Computing in Apple Private Cloud Compute (WWDC 2026)

June 9 — Announced at WWDC 2026, this Apple–NVIDIA–Google tripartite integration marks a major step for cloud AI privacy. NVIDIA Blackwell with Confidential Computing GPUs are now integrated into Apple’s Private Cloud Compute (PCC) infrastructure, extending beyond Apple data centers to Google Cloud.

The goal: process Apple Intelligence requests on the server side with an absolute cryptographic privacy guarantee — no one, not even the system builders, can access users’ data, conversations, or chats.

Protection mechanisms:

  • Hardware-rooted trust: verification that the infrastructure has not been tampered with
  • Encrypted communication paths between components
  • Remote attestation: the software verifies the platform’s security state before any sensitive data transfer
  • Support for accelerated inference without compromising GPU performance

This architecture allows Apple to extend Apple Intelligence on Google Cloud while maintaining its privacy commitments — a rare combination in the industry. For NVIDIA, it is large-scale adoption of Blackwell Confidential Computing in a consumer deployment.

🔗 NVIDIA blog


Anthropic: scheduled agents, secret vaults, and regulatory framework

Claude Managed Agents — scheduled deployments and variable vaults

June 9 — Two new features are arriving in public beta in Claude Managed Agents, announced at Code with Claude Tokyo:

Scheduled deployments: agents can now run automatically on a schedule, without manual intervention — daily reports, periodic checks, regular data pipelines.

Variables in vaults: agents access their secrets and configurations through a managed vault, without exposing keys in code or session configurations.

FeatureStatus
Scheduled deploymentsPublic beta
Variables in vaultsPublic beta
PlatformClaude Managed Agents

🔗 What’s new in Claude Managed Agents

Policy on the AI Exponential — Anthropic’s regulatory framework

June 10 — Anthropic publishes Policy on the AI Exponential, a public policy framework accompanied by an essay by Dario Amodei. The finding: AI capabilities are advancing at an exponential rate that the legislative process was not designed to keep pace with.

The document targets models trained with more than 10²⁵ floating-point operations (FLOP), developed by companies generating more than USD 500 million in AI-related revenue or spending more than USD 1 billion on AI R&D. It identifies four categories of catastrophic risk: biological risk, cyber risk, loss of control over AI systems, and automation of AI R&D itself.

Proposed obligationDescription
TransparencyMandatory tests, publication of results
Independent evaluationAt least one qualified external evaluator
SecurityProtection of weights against state actors
Government authorityPower to block or delay dangerous models

“AI is advancing at a pace our policymaking institutions were never built for—and the gap between the two is becoming the central challenge of the technology.” — @AnthropicAI on X

🔗 Policy on the AI Exponential


GitHub Copilot: app open to everyone, visible agent sessions, and CLI security review

Copilot App — open technical preview with no waitlist

June 10 — The GitHub Copilot app technical preview is now available to all Copilot Pro, Pro+, Max, Business, and Enterprise subscribers, with no waitlist. This desktop app designed for agents centralizes agent session management, pull request creation, and development task orchestration from the desktop — from ticket to PR in one place.

🔗 @github announcement

Copilot Chat now shows agent sessions

June 10 — GitHub improves the transition between Copilot Chat and the agent cloud. Two new tools are available in Copilot Chat: Get agent logs (logs from an agent session on a pull request, directly queryable in the conversation) and Session search (search and summarize past sessions by topic, title, or date). The status of an ongoing session is now reflected in real time in chat.

🔗 Changelog

Copilot CLI — /security-review command (experimental public preview)

June 10 — A new /security-review slash command is available in experimental public preview in GitHub Copilot CLI. It analyzes local code changes directly from the terminal: injections (SQL, commands), XSS, insecure data handling, path traversal, weak cryptography. Results are scored by severity and confidence, with suggestions that can be applied without leaving the terminal. The command is independent of GitHub code scanning and Dependabot — it complements them with lightweight on-demand analysis.

🔗 Changelog

Manus — Zoom Connector

June 9 — Manus launches the Zoom connector, allowing the agent to automatically analyze meeting content accessible from the connected account: summaries, transcripts, recordings, notes, agendas, whiteboards, participant information. Three main use cases: on-demand meeting analysis, automatic recurring review with reporting in Slack or email, and trend analysis across multiple meetings. Limitation: Manus only accesses resources that the connected Zoom account is authorized to see.

🔗 Manus Blog


xAI and Kimi: partnerships and swarm agents

Grok + eToro — Tori agent powered by real-time X data

June 10 — xAI and eToro announce that Tori, eToro’s AI agent (40 million users in 75 countries), now integrates xAI models and real-time data from the X platform to analyze market sentiment. Tori can read sentiment changes in real time, track live signals, and analyze information. The same real-time intelligence is available to all developers via the xAI API console.

🔗 xAI News

Kimi Agent Swarm — Prediction of the 104 matches in the 2026 World Cup

June 9 — Kimi (Moonshot AI) deploys 300 sub-agents in parallel to predict the 104 matches of the 2026 FIFA World Cup. Each agent has its own analytical angle: tactics, player form, historical data, public sentiment, weather, psychology, odds movement. The system uses Elo/FIFA models, Poisson/Dixon-Coles, Monte Carlo simulations, and dynamic Bayesian updating. Identified signal: Germany’s estimated title probability at ~11.3% versus ~7.4% in betting markets.

🔗 @Kimi_Moonshot announcement


OpenAI Codex: migration from Claude Code and Ableton showcase

Codex app 26.608 — Migration from Claude Code and plugin redesign

June 9 — The Codex app 26.608 update introduces a Migrate to Codex flow allowing automatic import of configuration from Claude Code and Claude Cowork, including on first app launch. The plugins interface has been completely redesigned with separate tabs, a marketplace with category filters, and improved keyboard navigation. Settings search now extends to Git and visual customizations.

FeatureDetail
Claude Code/Cowork migrationAutomatic import, including during onboarding
Plugins screenTabs, marketplace, category filters
Settings searchExtended to Git, visual customizations

🔗 Codex Changelog


Perplexity and Cohere: multi-model orchestration and voice benchmark

Perplexity Computer integrates Claude Fable 5 as orchestrator

June 10 — Perplexity announces the integration of Claude Fable 5 as the orchestrator model in Perplexity Computer, its multi-step agentic interface. This integration is reserved for Pro and Max subscribers.

🔗 @perplexity_ai announcement

Cohere Transcribe ranks #1 on Hugging Face’s Far-Field ASR benchmark

June 10Cohere Transcribe, Cohere’s open-source speech recognition model, ranks first on Hugging Face’s new Far-Field ASR benchmark, designed to test robustness in real-world audio environments (meeting rooms, contact centers, phone calls).

ModelFar-Field ASR WER
Cohere Transcribe17.9
IBM Granite Speech~19.8
NVIDIA Parakeet~21.5

The model remains under the Apache 2.0 license and can run locally. It had already ranked first on the general-purpose OpenASR leaderboard in March 2026.

🔗 Announcement @cohere


Gemini App: new features for small businesses

June 10 — At the Google for Brazil event in São Paulo, Google announced two Gemini App features aimed at small businesses, with a global rollout planned for June 2026.

Google Business Profile connection: users connect their profile directly in the Gemini app. Once connected, Gemini accesses customer reviews, questions, and performance data to provide personalized recommendations: monthly performance analysis, drafting responses to reviews in the brand’s voice, updating hours and the profile.

Business notebooks: a centralized space where the business organizes its conversations, sources, and Google Business profile. Gemini uses it as a knowledge base to maintain conversation continuity and provide proactive alerts (unanswered customer question, holiday hours missing).

🔗 Google Blog


Briefs

  • Gemini outage on June 10 — Product director Josh Woodward reports a service outage at 7:31 p.m., with partial fixes already deployed. 🔗 @joshwoodward
  • GitHub Enterprise — 500 cost centers — The limit of cost centers per enterprise increases from 250 to 500, automatically and with no configuration required. 🔗 Changelog
  • Dependabot supports Deno — Deno version updates are supported via an entry in deno in .github/dependabot.yml (security updates not covered for now). 🔗 Changelog
  • npm v12 — breaking changes in July 2026 — Install scripts, Git dependencies, and remote URLs will be blocked by default. Update recommended to npm 11.16.0+ to prepare. 🔗 Changelog
  • Alibaba Wan — Fisheye Lens — New tool that transforms standard images into circular ultra-wide-angle fish-eye-style views, added to Wan’s visual skills gallery. 🔗 @Alibaba_Wan
  • Z-Image-Engineer-V6 — Swappable text encoder for Z-Image-Turbo (Tongyi Lab / Alibaba), turning simple prompts into cinematic descriptions. Available on Hugging Face. 🔗 @Ali_TongyiLab
  • Qwen-Image-Edit-2511 + LoRA — New community Hugging Face space for Qwen-Image-Edit-2511 with a versatile LoRA matrix (face swap, poses, virtual try-on, multi-angle rendering). 🔗 @Ali_TongyiLab
  • ChatGPT for iOS 1.2026.153 — New Codex Mobile features — worktrees, /goal.
  • Codex in Ableton Live — @OpenAIDevs highlights musician @sound4movement, who uses Codex to automatically configure Ableton Live from a track description. 🔗 @OpenAIDevs
  • Cohere Labs — AI and the future of work — Publication of a report on the evidence gaps in the debate over AI’s impact on employment, launching a new research direction. 🔗 @cohere

What it means

New inference architectures: is token-by-token coming to an end? DiffusionGemma is the first large-scale public demonstration of an open text diffusion architecture, and NVIDIA’s immediate interest — it optimized the model the very day it launched — confirms that this direction is being taken seriously at the industrial level. The 4x gain on dedicated GPUs is not trivial: it shifts the bottleneck from memory bandwidth (the Achilles’ heel of autoregressive models) to tensor compute cores. The current limitation (quality below Gemma 4) and the explicit targeting of developers rather than production indicate that this is a research path, not an immediate replacement — but Grok Voice’s Pareto benchmark on EVA-Bench, in a different domain (voice), shows that the race for efficiency is now being played on several architectural fronts in parallel.

Agentic autonomy: from promise to infrastructure. The GA release of Claude Code’s dynamic workflows with 5-level recursion, combined with scheduled deployments and secret vaults in Claude Managed Agents, makes a paradigm shift tangible: agents are no longer one-off tools but persistent, schedulable processes with secure access to secrets. The Kimi Agent Swarm initiative (300 sub-agents over 104 matches) illustrates the same movement on Moonshot AI’s side. And Perplexity Computer integrating Claude Fable 5 as an orchestrator signals that competition in agents is playing out as much at the tooling level as in the models themselves.

Privacy and trust: the Apple–NVIDIA–Google axis. NVIDIA Confidential Computing’s integration into Apple PCC on Google Cloud is structurally significant: it shows that a consumer-scale deployment can combine GPU acceleration, cryptographic privacy guarantees, and third-party cloud infrastructure. This is not a niche — Apple Intelligence reaches hundreds of millions of devices. If this architecture becomes standard, it could become a de facto norm for AI services handling sensitive personal data.

Developer ecosystem: consolidation and tooling competition. The fact that Codex 26.608 offers a migration flow from Claude Code is not trivial: it acknowledges that developers have invested their setup in competing tools and that the cost of switching must be lowered. GitHub Copilot, meanwhile, is accelerating its “agent-native” approach (app without a waitlist, agent sessions visible in chat, security review in CLI). June 10 sketches an ecosystem where differentiation depends less on raw model capability than on the depth of integration into everyday developer workflows.


Sources