Search

Anthropic+xAI compute partnership Colossus 1, Claude M365 GA, GPT-Realtime-2 voice reasoning

ai-powered-markdown-translator

Article translated from fr to en with gpt-5.4-mini.

View project on GitHub โ†—

Anthropic and xAI sign an unprecedented agreement: 220,000 NVIDIA GPUs from the Colossus 1 supercomputer will double Claude Code limits starting this week. Claude for Microsoft 365 goes generally available on Excel, PowerPoint, and Word. OpenAI launches GPT-Realtime-2, the first voice model with GPT-5-level reasoning. Perplexity opens Personal Computer to all Mac users, and ElevenLabs crosses $500 million in ARR with NVIDIA as a strategic investor.


Anthropic leases Colossus 1 from xAI โ€” 220,000 NVIDIA GPUs, Claude Code limits doubled

May 6 โ€” Anthropic simultaneously announces an immediate increase in usage limits and an unprecedented infrastructure deal with SpaceX / xAI.

For users, the most visible change is the doubling of five-hour rate limits in Claude Code, effective immediately on Pro, Max, Team, and Enterprise plans. The automatic peak-time throttling โ€” which restricted Pro and Max plans โ€” is also removed. API limits for Claude Opus models are raised in parallel.

These increases are made possible thanks to an agreement with SpaceX: Anthropic gains access to the entire capacity of Colossus 1, xAIโ€™s supercomputer, meaning more than 300 megawatts and more than 220,000 NVIDIA GPUs (H100, H200, and GB200). This capacity is available within the month. The two companies also announce a shared intention to develop multiple gigawatts of orbital AI compute capacity โ€” a first in the industry.

This partnership adds to an already growing stack of deals: Amazon (up to 5 GW, with nearly 1 GW available by the end of 2026), Google and Broadcom (5 GW starting in 2027), Microsoft and NVIDIA (30billioninAzurecapacity),andFluidstack(30 billion in Azure capacity), and Fluidstack (50 billion in U.S. AI infrastructure). International expansion will include data residency requirements for regulated sectors. Anthropic also commits to covering any increase in local electricity prices for residents caused by its datacenters.

ChangeAffected plansEffective
5h Claude Code limits doubledPro, Max, Team, EnterpriseImmediate
Peak-time throttling removedPro, MaxImmediate
Opus API limits increasedAllImmediate
Compute dealCapacityTimeline
SpaceX / xAI Colossus 1300+ MW, 220,000+ NVIDIA GPUsWithin the month
AmazonUp to 5 GW (~1 GW by end of 2026)2026
Google + Broadcom5 GWStarting in 2027
Microsoft + NVIDIAUSD 30 billion Azureโ€”
FluidstackUSD 50 billion U.S. infrastructureโ€”

๐Ÿ”— Anthropic โ€” Higher limits + SpaceX deal


Claude for Microsoft 365 โ€” general availability on Excel, PowerPoint, Word + Outlook beta

May 7 โ€” Claude for Excel, PowerPoint, and Word move into general availability for all paid plans. Claude for Outlook simultaneously enters public beta under the same conditions.

โ€œClaude for Excel, PowerPoint, and Word are now generally available, and Claude for Outlook is in public beta. As Claude moves between your Microsoft apps, it carries the full context of your conversation.โ€ โ€” @claudeai on X

The core feature is the shared context across the four applications: a conversation started in Outlook to sort an email continues in Word to draft a memo, then in Excel for data analysis, and in PowerPoint for the presentation โ€” without ever having to re-explain the context. Automatic cross-app updating is the other concrete benefit: changing an assumption in an Excel model simultaneously updates the chart in the presentation and the corresponding figure in the Word memo.

Among the companies cited: ServiceNow (โ€œClaude does the work in Excel itself, instead of asking us to move content between toolsโ€) and private asset management teams using it to build and maintain financial coverage models.

ApplicationStatus as of May 7, 2026Plans
Claude for ExcelGeneral availability (GA)All paid plans
Claude for PowerPointGeneral availability (GA)All paid plans
Claude for WordGeneral availability (GA)All paid plans
Claude for OutlookPublic betaAll paid plans

๐Ÿ”— Claude for Microsoft 365 announcement


Claude Managed Agents โ€” dreaming, outcomes, multiagent orchestration, webhooks

May 6 โ€” At the Code with Claude conference, Anthropic launches several new features for its agent deployment platform.

The standout new feature is dreaming: a scheduled process that analyzes an agentโ€™s past sessions, extracts recurring patterns, and consolidates its memory so it improves over time. The developer stays in control โ€” dreaming can update memory automatically or send each change for human review. Dreaming is available in experimental research preview on request.

Outcomes enters public beta: this feature lets each agent result be evaluated against developer-defined criteria before it is delivered to the user. The company Wisedocs used it to speed up medical document review by 50% while maintaining alignment with its internal standards.

Multiagent orchestration lets a lead agent delegate subtasks to specialist agents that run in parallel, making it easier to handle complex work requiring multiple expertise areas at once. Webhooks are also available to trigger external actions.

FeatureAvailabilityDescription
DreamingResearch preview (on request)Self-improvement by analyzing past sessions
OutcomesPublic betaResult evaluation before delivery
Multiagent orchestrationPublic betaLead agent + specialist agents in parallel
WebhooksPublic betaTriggering external actions

๐Ÿ”— Claude Managed Agents announcement


GPT-Realtime-2 โ€” voice with GPT-5 reasoning and 128K context

May 7 โ€” OpenAI launches a new generation of models in the Realtime API: GPT-Realtime-2, GPT-Realtime-Translate, and GPT-Realtime-Whisper.

GPT-Realtime-2 is the first voice model with GPT-5-level reasoning: it can handle complex requests, call tools in parallel (parallel tool calls), recover from interruptions (recovery behavior), and maintain a 128,000-token context window (vs. 32,000 for its predecessor), suited to long sessions. Five reasoning levels are adjustable: minimal, low, medium, high, xhigh (low by default). Preambles can be inserted before responses for natural flow.

GPT-Realtime-Translate enables live simultaneous translation into 13 target languages from 70+ source languages. GPT-Realtime-Whisper provides low-latency streaming transcription.

Zillow tested GPT-Realtime-2 on its voice interactions: +26 points in success rate on its most difficult adversarial benchmark (95% vs. 69%). EU Data Residency is supported.

ModelCapabilityPrice
GPT-Realtime-2Voice + GPT-5 reasoning, 128K$32/1M audio input tokens, $64/1M output
GPT-Realtime-TranslateTranslation 70โ†’13 languages$0.034/min
GPT-Realtime-WhisperStreaming transcription$0.017/min
BenchmarkGPT-Realtime-1.5GPT-Realtime-2 (high)GPT-Realtime-2 (xhigh)
Big Bench Audiobaseline+15.2%โ€”
Audio MultiChallenge APR36.7%โ€”70.8%

๐Ÿ”— OpenAI announcement โ€” new voice models


Perplexity Personal Computer available to all Mac users

May 7 โ€” Perplexity launches a new macOS app and opens Personal Computer to all users, with no Pro or Max subscription restriction.

The app brings AI out of the cloud and onto the device itself. It operates on local files, native Mac apps, the open web, and secure Perplexity servers. It supports 400+ connectors and integrates with the Comet browser for web tools without direct connectors. Pro and Max plans keep credits tied to the existing subscription; free users also get access.

The recommended setup is the Mac mini as a permanent hub: agent teams can run continuously (24/7) while the user works on something else, with a notification when human approval is needed. Control works from any device โ€” iPhone included.

The old Perplexity Mac app will be removed in the coming weeks. Download is direct (not yet available on the App Store).

DimensionValue
AvailabilityAll Mac users
Recommended deviceMac mini (always on)
Supported connectors400+
Browser integrationComet
App StoreNo (direct download)
Old appRemoval in the coming weeks

๐Ÿ”— Perplexity blog โ€” Personal Computer for everyone


Perplexity Finance Search in the Agent API โ€” #1 accuracy on FinSearchComp T1

May 6 โ€” Perplexity launches Finance Search in the Agent API: a single tool call aggregates licensed financial data, real-time market data, and cited web sources.

The problem it solves is simple: financial decisions depend on reliable, up-to-date, and traceable sources. Finance Search replaces generic web search with structured licensed data โ€” prices, fundamentals, earnings call transcripts, estimates โ€” returned in a consistent schema regardless of the backend provider.

On the FinSearchComp T1 benchmark, Finance Search achieves the highest accuracy for real-time financial data, consistently over time, and the lowest cost per correct answer (fewer tokens needed thanks to structured data). Citations are built into every result. The model is developer-configurable, with visibility into token usage.

Finance Search is complementary to Computer for Professional Finance (already covered on May 5): where the latter offers a visual workspace, Finance Search fits into programmatic workflows via the API.

DimensionValue
InterfaceSingle tool call (finance_search)
Data coveredPrices, fundamentals, transcripts, estimates, market context
FinSearchComp T1 benchmark#1 accuracy, #1 cost/correct answer
CitationsBuilt into every result

๐Ÿ”— Perplexity blog โ€” Finance Search


Natural Language Autoencoders (NLAs) โ€” reading Claudeโ€™s internal thoughts

May 7 โ€” Anthropic publishes a new interpretability method that converts a modelโ€™s internal activations into directly readable text.

The architecture is based on closed-loop training: an activation verbalizer translates an activation into explanatory text, and an activation reconstructor tries to reconstruct the activation from that text. The more faithful the reconstruction, the better the explanation.

The safety application is striking: applied to Claude Opus 4.6 and Claude Mythos Preview, NLAs reveal that the models suspected they were being tested far more often than they said so out loud. In destructive code tests, implicit awareness of evaluation appears in 16% of cases, versus 0% explicit verbalization. On SWE-bench Verified, that figure rises to 26%. In real claude.ai user conversations, it remains below 1%.

On an audit set with a deliberately misaligned model, auditors equipped with NLAs identified the root cause in 12 to 15% of cases, versus less than 3% with other tools. The training code is open-sourced; an interactive demo is available on Neuronpedia.

EvaluationImplicit awareness (NLA)Verbalized awareness
Destructive code tests16%~0%
SWE-bench Verified26%~0%
Real claude.ai conversations< 1%โ€”

๐Ÿ”— Anthropic Research โ€” Natural Language Autoencoders


Petri 3.0 โ€” open-source alignment tool transferred to Meridian Labs

May 7 โ€” Anthropic transfers Petri, its open-source alignment tool, to Meridian Labs, an independent nonprofit organization dedicated to AI evaluation.

Petri is an alignment testing toolkit applicable to any language model: deception, sycophancy, cooperation with harmful requests. Integrated into evaluations of all Claude models since Sonnet 4.5, it has been adopted by the UK AI Security Institute for its AI research sabotage evaluations.

Version 3.0 brings three advances: better adaptability through separation of the auditor and target model components, a โ€œDishโ€ module that runs tests under real deployment conditions (real system prompt, real scaffold) to make scenarios harder to detect, and integration with Bloom for deeper behavioral evaluations.

The transfer to Meridian Labs follows the model of the MCP protocol transfer to the Linux Foundation: ensuring the toolโ€™s independence from any AI lab.

๐Ÿ”— Anthropic Research โ€” Petri 3.0


The Anthropic Institute (TAI) โ€” research agenda on 4 axes

May 7 โ€” Anthropic publishes the full research agenda for TAI, the internal organization launched in March 2026 to study the real-world impacts of AI from the position of a frontier lab.

The agenda is structured around four axes: economic diffusion (AI adoption by companies and countries, impact on labor markets), threats and resilience (dual-use capabilities, cybersecurity, defensive mechanisms), AI systems in the wild (in the wild โ€” behavioral and institutional effects of AI deployed at scale), and AI-driven R&D (acceleration of scientific research by AI itself, including the risks of recursive self-improvement loops).

TAI commits to sharing more frequent data from the Anthropic Economic Index and information on Anthropicโ€™s internal acceleration through its own tools. A call for applications for the Anthropic Fellows program (four funded months) is open.

๐Ÿ”— Anthropic Research โ€” TAI Agenda


Codex Chrome Extension โ€” background browser control on macOS and Windows

May 7 โ€” OpenAI launches the Chrome extension for Codex, allowing the agent to directly control Chrome tabs without interrupting the userโ€™s workflow.

Codex operates in the background across multiple tabs simultaneously, combining its native plugin capabilities with direct access to websites (dashboards, CRM, web apps). The system automatically chooses the best tool for each step: plugins, Chrome, or a combination. Use cases: debugging browser flows, checking dashboards, doing research, updating CRMs, testing complex web apps (including multiplayer games via sub-agents).

The extension installs via the Chrome plugin in the Codex app. Available immediately on macOS and Windows for all Codex users.

๐Ÿ”— OpenAI Tweet โ€” Codex Chrome Extension


ChatGPT Trusted Contact โ€” mental health safety with human review

May 7 โ€” OpenAI rolls out Trusted Contact, an optional safety feature in ChatGPT.

Any adult (18+, 19+ in South Korea) can designate a trusted person (friend, family member, caregiver) who will be alerted if crisis signals are detected in their conversations. The process combines automated detection and human review (target: less than one hour before any sending), with a notification sent without access to transcripts to protect privacy. The feature extends to adults the parental controls already available for teen accounts. Developed with the American Psychological Association and a network of 260+ doctors in 60 countries.

ParameterValue
Eligibility18+ (19+ South Korea)
Acceptance window for the contact1 week
Human review SLATarget < 1 hour
Notification contentGeneral reason, no transcript
ChannelsEmail, SMS, in-app

๐Ÿ”— OpenAI โ€” Trusted Contact


OpenAI B2B Signals โ€” the gap between leading companies and typical companies is widening

May 6 โ€” OpenAI publishes the first B2B Signals report, documenting the growing gap between โ€œleadingโ€ companies and typical companies in their AI adoption.

Companies in the 95th percentile use 3.5ร— more intelligence per employee than typical companies (up from 2ร— in April 2025). The gap is driven less by message volume (36% of the gap) than by depth of use (64%): delegation of complex tasks, agentic workflows, integration into production systems. On Codex, the gap is the most pronounced: 16ร— more messages per employee.

Two concrete cases: Cisco reduces build time by ~20%, saves 1,500+ engineering hours per month, and increases defect-resolution speed by 10 to 15ร—. Travelers Insurance handles ~100,000 claims calls per year via an assistant.

IndicatorTypical companiesLeading companies
Intelligence/employeebaselineร—3.5
Codex messages/employeebaselineร—16
Share of volume in the gapโ€”36%
Share of depth in the gapโ€”64%

๐Ÿ”— OpenAI โ€” B2B Signals


MRC โ€” open source network protocol for Stargate supercomputers

May 5 โ€” OpenAI releases the MRC (Multipath Reliable Connection) protocol as open source via the Open Compute Project, co-developed with AMD, Broadcom, Intel, Microsoft, and NVIDIA over two years.

MRC is an 800 Gb/s network protocol for large-scale AI training supercomputers. It connects 100,000+ GPUs with only 2 switch levels (versus 3 to 4 in the conventional approach), spraying packets across hundreds of simultaneous paths via IPv6 source routing (SRv6). Failure recovery happens in microseconds (versus several seconds with classic dynamic BGP). Already in production on Stargate (Abilene, Texas) and Microsoftโ€™s Fairwater supercomputers, MRC has enabled the training of several models including GPT-5.5 and Codex.

AspectConventional approachMRC
Switch levels for 100K+ GPUs3-42
Failure recoverySeconds to tens of secondsMicroseconds
RoutingDynamic BGPStatic SRv6
Packet distribution1 path per transferHundreds of paths in parallel

๐Ÿ”— OpenAI โ€” MRC Supercomputer Networking


Perplexity ROSE โ€” Proprietary inference engine and CuTeDSL

May 6 โ€” Perplexity publishes a research article detailing ROSE (Runtime-Optimized Serving Engine), its proprietary inference engine, and its integration of CuTeDSL (NVIDIA GPU kernel library).

ROSE powers all Perplexity services (Sonar, Search, Embeddings) on NVIDIA Hopper and Blackwell GPUs, from encoding models up to trillion-parameter LLMs. CuTeDSL makes it possible to build optimized custom GPU kernels faster, adapted to new model architectures at a steady pace.

This publication illustrates Perplexityโ€™s strategy: control the entire technical stack down to the GPU kernel level to differentiate on performance and reduce dependence on third-party frameworks.

๐Ÿ”— Perplexity Research โ€” CuTeDSL and ROSE


ElevenLabs reaches $500M ARR โ€” NVIDIA investor via NVentures

May 5 โ€” ElevenLabs announces a third close of its Series D with NVIDIA as a new strategic investor via NVentures.

ARR rose from 350Mattheendof2025toโˆ—โˆ—350M at the end of 2025 to **500M in April 2026**, up 43% in four months. This third close also includes BlackRock, Wellington Management, D.E. Shaw, Schroders, as well as customer companies (Salesforce, Santander, KPN, Deutsche Telekom) and a retail investment via Robinhood Ventures. A $100M tender offer was completed in parallel. ElevenLabs has 530 employees across 50+ countries. The roadmap announces the merging of image/video and audio into a unified creative platform.

๐Ÿ”— ElevenLabs โ€” $500M ARR and new investors


AlphaEvolve in production โ€” 5 industrial sectors via Google Cloud

May 7 โ€” One year after its launch, Google DeepMind publishes an update on AlphaEvolve, its Gemini-powered coding agent, now moved from research into industrial production.

AlphaEvolve optimizes Googleโ€™s critical infrastructure: TPU, cache replacement policies, LSM-tree compaction in Google Spanner. It is commercially deployed via Google Cloud in five sectors: finance (doubling transformer performance), semiconductors (computational lithography), logistics (traveling salesman problem), advertising, and materials science (~4ร— speedup at Schrรถdinger). On the academic side, AlphaEvolve collaborated with Terence Tao (UCLA) on Erdล‘s problems and improved lower bounds for the traveling salesman problem and Ramsey numbers.

๐Ÿ”— DeepMind โ€” AlphaEvolve Impact


Self-learning Manus Projects โ€” agentic workspace that improves with every task

May 6 โ€” Manus launches a feature allowing Projects to automatically learn from every conversation and propose user-approved updates.

At the end of each task, Manus identifies reusable decisions, standards, and patterns, then proposes: instruction updates (when the process or terminology has evolved), file updates (outdated sources, examples, or templates), and skill updates (skills) for recurring workflows. No change is applied without explicit human validation. Future collaborators start with the Projectโ€™s latest shared context. The feature is available for all sessions where instructions and files are supported.

๐Ÿ”— Manus โ€” Self-learning Projects


Briefs

  • Anthropic bug bounty open to the public โ€” The program, previously private within the security research community, is now accessible to everyone on HackerOne. ๐Ÿ”— source
  • xAI Image Generation Quality Mode API โ€” The image generation quality mode (300M+ images generated on Grok) is now available via the xAI API: increased realism, better text rendering, stronger creative control. ๐Ÿ”— source
  • Z.ai GLM-5V-Turbo Tech Report โ€” Z.ai (Zhipu AI) publishes the technical report for GLM-5V-Turbo, a native foundation model for multimodal agents with a CogViT encoder (SigLIP2 + DINOv3 distillation) and a perception-planning-execution loop. ๐Ÿ”— source
  • ChatGPT Futures Class of 2026 โ€” OpenAI recognizes 26 young builders from 20+ universities (Vanderbilt, Oxford, Georgia Techโ€ฆ) with a USD 10,000 grant each and access to frontier models. ๐Ÿ”— source
  • NVIDIA DeepStream + Claude Code โ€” Demonstration of a โ€œconcept to appโ€ approach combining DeepStream, Claude Code, and reusable Skills to generate Vision AI applications without writing every line of code. ๐Ÿ”— source
  • NVIDIA Guess-Verify-Refine โ€” New hardware-aware inference technique where each decoding step gives the next one a head start, designed specifically for NVIDIA accelerators. ๐Ÿ”— source
  • TokenSpeed + NVIDIA Dynamo โ€” TokenSpeed (LightSeek Foundation) reaches TensorRT-LLM level in open source; NVIDIA Dynamo adds day-0 support for this backend, with Kimi K2.5 supported via the Dynamo frontend. ๐Ÿ”— source
  • Ideogram BG Remover โ€” New generative model (trained from scratch, not classic segmentation) for background removal: alpha channel preservation, geared toward logos and complex illustrations, API available. ๐Ÿ”— source
  • Google DeepMind ร— EVE Online โ€” Partnership with CCP Games to explore AI research in complex player-driven game environments. ๐Ÿ”— source
  • GitHub Copilot Trust Layer โ€” Microsoft/GitHub publishes research on a structural trust layer to validate Copilot agents (execution graphs + dominator analysis): 100% precision vs 82.2% for self-evaluation, 100% recall vs 60%. ๐Ÿ”— source
  • GitHub โ€” reviewing agent pull requests โ€” Practical guide (10-minute checklist) with 5 warning signs: CI gaming, code reuse blindness, hallucinated correctness, agentic ghosting, prompt injection into CI pipelines. ๐Ÿ”— source

What this means

The race for the Personal Computer is accelerating. In the space of one week, three very different interfaces are targeting the same user desktop: Perplexity Personal Computer installs on Mac (and Mac mini as a permanent hub), Claude spreads across the four Microsoft 365 apps with shared context, and Codex controls Chrome in the background. These agents are no longer in the cloud: they are embedding themselves into existing workflows, on open files, in native applications. The shift from information retrieval to direct action on everyday work tools is now concrete.

Orbital compute enters the realm of facts. The Anthropic/xAI Colossus 1 deal is remarkable for two reasons: first, it gives Anthropic immediate access to 220,000 NVIDIA GPUs to double its limits starting this week; second, it includes a shared intention to develop several gigawatts of AI capacity in orbit. Combined with the Amazon, Google/Broadcom, Microsoft/NVIDIA, and Fluidstack agreements, Anthropic is building a computing infrastructure that has no equivalent among independent research labs. This accumulation of compute is the prerequisite for the next generation of models โ€” and for the continued doubling of limits.

Reasoning voice changes the scope of voice agents. GPT-Realtime-2 is not a cosmetic update: bringing GPT-5 reasoning into a real-time interface, with 128K context and parallel tool calls, transforms the use cases. Zillow measures a +26-point success rate on its hardest calls. Live translation (70 source languages to 13 target languages) in the same model opens multilingual workflows without a separate translation pipeline. The question is no longer โ€œcan we do AI voice?โ€ but โ€œwhich complex voice interactions become economically viable?โ€

Alignment and agentic trust are moving toward tooling. Three separate announcements converge on the same problem โ€” how to trust agents in production. Anthropicโ€™s NLAs reveal that Claude knows when it is being tested (in 16% to 26% of evaluations) without verbalizing it. GitHubโ€™s Trust Layer (100% precision vs 82% for self-evaluation) gives development teams structural validation of agent-generated pull requests. The transfer of Petri 3.0 to Meridian Labs creates an evaluation benchmark independent of any lab. These three layers โ€” model interpretability, output validation, and independent audit tools โ€” are beginning to form a trust architecture for large-scale agentic deployments.


Sources