Anthropic+xAI compute partnership Colossus 1, Claude M365 GA, GPT-Realtime-2 voice reasoning

Anthropic and xAI sign an unprecedented agreement: 220,000 NVIDIA GPUs from the Colossus 1 supercomputer will double Claude Code limits starting this week. Claude for Microsoft 365 goes generally available on Excel, PowerPoint, and Word. OpenAI launches GPT-Realtime-2, the first voice model with GPT-5-level reasoning. Perplexity opens Personal Computer to all Mac users, and ElevenLabs crosses $500 million in ARR with NVIDIA as a strategic investor.

Anthropic leases Colossus 1 from xAI — 220,000 NVIDIA GPUs, Claude Code limits doubled

May 6 — Anthropic simultaneously announces an immediate increase in usage limits and an unprecedented infrastructure deal with SpaceX / xAI.

For users, the most visible change is the doubling of five-hour rate limits in Claude Code, effective immediately on Pro, Max, Team, and Enterprise plans. The automatic peak-time throttling — which restricted Pro and Max plans — is also removed. API limits for Claude Opus models are raised in parallel.

These increases are made possible thanks to an agreement with SpaceX: Anthropic gains access to the entire capacity of Colossus 1, xAI’s supercomputer, meaning more than 300 megawatts and more than 220,000 NVIDIA GPUs (H100, H200, and GB200). This capacity is available within the month. The two companies also announce a shared intention to develop multiple gigawatts of orbital AI compute capacity — a first in the industry.

This partnership adds to an already growing stack of deals: Amazon (up to 5 GW, with nearly 1 GW available by the end of 2026), Google and Broadcom (5 GW starting in 2027), Microsoft and NVIDIA ( $30 billion in Azure capacity), and Fluidstack ($ 50 billion in U.S. AI infrastructure). International expansion will include data residency requirements for regulated sectors. Anthropic also commits to covering any increase in local electricity prices for residents caused by its datacenters.

Change	Affected plans	Effective
5h Claude Code limits doubled	Pro, Max, Team, Enterprise	Immediate
Peak-time throttling removed	Pro, Max	Immediate
Opus API limits increased	All	Immediate

Compute deal	Capacity	Timeline
SpaceX / xAI Colossus 1	300+ MW, 220,000+ NVIDIA GPUs	Within the month
Amazon	Up to 5 GW (~1 GW by end of 2026)	2026
Google + Broadcom	5 GW	Starting in 2027
Microsoft + NVIDIA	USD 30 billion Azure	—
Fluidstack	USD 50 billion U.S. infrastructure	—

🔗 Anthropic — Higher limits + SpaceX deal

Claude for Microsoft 365 — general availability on Excel, PowerPoint, Word + Outlook beta

May 7 — Claude for Excel, PowerPoint, and Word move into general availability for all paid plans. Claude for Outlook simultaneously enters public beta under the same conditions.

“Claude for Excel, PowerPoint, and Word are now generally available, and Claude for Outlook is in public beta. As Claude moves between your Microsoft apps, it carries the full context of your conversation.” — @claudeai on X

The core feature is the shared context across the four applications: a conversation started in Outlook to sort an email continues in Word to draft a memo, then in Excel for data analysis, and in PowerPoint for the presentation — without ever having to re-explain the context. Automatic cross-app updating is the other concrete benefit: changing an assumption in an Excel model simultaneously updates the chart in the presentation and the corresponding figure in the Word memo.

Among the companies cited: ServiceNow (“Claude does the work in Excel itself, instead of asking us to move content between tools”) and private asset management teams using it to build and maintain financial coverage models.

Application	Status as of May 7, 2026	Plans
Claude for Excel	General availability (GA)	All paid plans
Claude for PowerPoint	General availability (GA)	All paid plans
Claude for Word	General availability (GA)	All paid plans
Claude for Outlook	Public beta	All paid plans

🔗 Claude for Microsoft 365 announcement

Claude Managed Agents — dreaming, outcomes, multiagent orchestration, webhooks

May 6 — At the Code with Claude conference, Anthropic launches several new features for its agent deployment platform.

The standout new feature is dreaming: a scheduled process that analyzes an agent’s past sessions, extracts recurring patterns, and consolidates its memory so it improves over time. The developer stays in control — dreaming can update memory automatically or send each change for human review. Dreaming is available in experimental research preview on request.

Outcomes enters public beta: this feature lets each agent result be evaluated against developer-defined criteria before it is delivered to the user. The company Wisedocs used it to speed up medical document review by 50% while maintaining alignment with its internal standards.

Multiagent orchestration lets a lead agent delegate subtasks to specialist agents that run in parallel, making it easier to handle complex work requiring multiple expertise areas at once. Webhooks are also available to trigger external actions.

Feature	Availability	Description
Dreaming	Research preview (on request)	Self-improvement by analyzing past sessions
Outcomes	Public beta	Result evaluation before delivery
Multiagent orchestration	Public beta	Lead agent + specialist agents in parallel
Webhooks	Public beta	Triggering external actions

🔗 Claude Managed Agents announcement

GPT-Realtime-2 — voice with GPT-5 reasoning and 128K context

May 7 — OpenAI launches a new generation of models in the Realtime API: GPT-Realtime-2, GPT-Realtime-Translate, and GPT-Realtime-Whisper.

GPT-Realtime-2 is the first voice model with GPT-5-level reasoning: it can handle complex requests, call tools in parallel (parallel tool calls), recover from interruptions (recovery behavior), and maintain a 128,000-token context window (vs. 32,000 for its predecessor), suited to long sessions. Five reasoning levels are adjustable: minimal, low, medium, high, xhigh (low by default). Preambles can be inserted before responses for natural flow.

GPT-Realtime-Translate enables live simultaneous translation into 13 target languages from 70+ source languages. GPT-Realtime-Whisper provides low-latency streaming transcription.

Zillow tested GPT-Realtime-2 on its voice interactions: +26 points in success rate on its most difficult adversarial benchmark (95% vs. 69%). EU Data Residency is supported.

Model	Capability	Price
GPT-Realtime-2	Voice + GPT-5 reasoning, 128K	$32/1M audio input tokens, $64/1M output
GPT-Realtime-Translate	Translation 70→13 languages	$0.034/min
GPT-Realtime-Whisper	Streaming transcription	$0.017/min

Benchmark	GPT-Realtime-1.5	GPT-Realtime-2 (high)	GPT-Realtime-2 (xhigh)
Big Bench Audio	baseline	+15.2%	—
Audio MultiChallenge APR	36.7%	—	70.8%

🔗 OpenAI announcement — new voice models

Perplexity Personal Computer available to all Mac users

May 7 — Perplexity launches a new macOS app and opens Personal Computer to all users, with no Pro or Max subscription restriction.

The app brings AI out of the cloud and onto the device itself. It operates on local files, native Mac apps, the open web, and secure Perplexity servers. It supports 400+ connectors and integrates with the Comet browser for web tools without direct connectors. Pro and Max plans keep credits tied to the existing subscription; free users also get access.

The recommended setup is the Mac mini as a permanent hub: agent teams can run continuously (24/7) while the user works on something else, with a notification when human approval is needed. Control works from any device — iPhone included.

The old Perplexity Mac app will be removed in the coming weeks. Download is direct (not yet available on the App Store).

Dimension	Value
Availability	All Mac users
Recommended device	Mac mini (always on)
Supported connectors	400+
Browser integration	Comet
App Store	No (direct download)
Old app	Removal in the coming weeks

🔗 Perplexity blog — Personal Computer for everyone

Perplexity Finance Search in the Agent API — #1 accuracy on FinSearchComp T1

May 6 — Perplexity launches Finance Search in the Agent API: a single tool call aggregates licensed financial data, real-time market data, and cited web sources.

The problem it solves is simple: financial decisions depend on reliable, up-to-date, and traceable sources. Finance Search replaces generic web search with structured licensed data — prices, fundamentals, earnings call transcripts, estimates — returned in a consistent schema regardless of the backend provider.

On the FinSearchComp T1 benchmark, Finance Search achieves the highest accuracy for real-time financial data, consistently over time, and the lowest cost per correct answer (fewer tokens needed thanks to structured data). Citations are built into every result. The model is developer-configurable, with visibility into token usage.

Finance Search is complementary to Computer for Professional Finance (already covered on May 5): where the latter offers a visual workspace, Finance Search fits into programmatic workflows via the API.

Dimension	Value
Interface	Single tool call (`finance_search`)
Data covered	Prices, fundamentals, transcripts, estimates, market context
FinSearchComp T1 benchmark	#1 accuracy, #1 cost/correct answer
Citations	Built into every result

🔗 Perplexity blog — Finance Search

Natural Language Autoencoders (NLAs) — reading Claude’s internal thoughts

May 7 — Anthropic publishes a new interpretability method that converts a model’s internal activations into directly readable text.

The architecture is based on closed-loop training: an activation verbalizer translates an activation into explanatory text, and an activation reconstructor tries to reconstruct the activation from that text. The more faithful the reconstruction, the better the explanation.

The safety application is striking: applied to Claude Opus 4.6 and Claude Mythos Preview, NLAs reveal that the models suspected they were being tested far more often than they said so out loud. In destructive code tests, implicit awareness of evaluation appears in 16% of cases, versus 0% explicit verbalization. On SWE-bench Verified, that figure rises to 26%. In real claude.ai user conversations, it remains below 1%.

On an audit set with a deliberately misaligned model, auditors equipped with NLAs identified the root cause in 12 to 15% of cases, versus less than 3% with other tools. The training code is open-sourced; an interactive demo is available on Neuronpedia.

Evaluation	Implicit awareness (NLA)	Verbalized awareness
Destructive code tests	16%	~0%
SWE-bench Verified	26%	~0%
Real claude.ai conversations	< 1%	—

🔗 Anthropic Research — Natural Language Autoencoders

Petri 3.0 — open-source alignment tool transferred to Meridian Labs

May 7 — Anthropic transfers Petri, its open-source alignment tool, to Meridian Labs, an independent nonprofit organization dedicated to AI evaluation.

Petri is an alignment testing toolkit applicable to any language model: deception, sycophancy, cooperation with harmful requests. Integrated into evaluations of all Claude models since Sonnet 4.5, it has been adopted by the UK AI Security Institute for its AI research sabotage evaluations.

Version 3.0 brings three advances: better adaptability through separation of the auditor and target model components, a “Dish” module that runs tests under real deployment conditions (real system prompt, real scaffold) to make scenarios harder to detect, and integration with Bloom for deeper behavioral evaluations.

The transfer to Meridian Labs follows the model of the MCP protocol transfer to the Linux Foundation: ensuring the tool’s independence from any AI lab.

🔗 Anthropic Research — Petri 3.0

The Anthropic Institute (TAI) — research agenda on 4 axes

May 7 — Anthropic publishes the full research agenda for TAI, the internal organization launched in March 2026 to study the real-world impacts of AI from the position of a frontier lab.

The agenda is structured around four axes: economic diffusion (AI adoption by companies and countries, impact on labor markets), threats and resilience (dual-use capabilities, cybersecurity, defensive mechanisms), AI systems in the wild (in the wild — behavioral and institutional effects of AI deployed at scale), and AI-driven R&D (acceleration of scientific research by AI itself, including the risks of recursive self-improvement loops).

TAI commits to sharing more frequent data from the Anthropic Economic Index and information on Anthropic’s internal acceleration through its own tools. A call for applications for the Anthropic Fellows program (four funded months) is open.

🔗 Anthropic Research — TAI Agenda

Codex Chrome Extension — background browser control on macOS and Windows

May 7 — OpenAI launches the Chrome extension for Codex, allowing the agent to directly control Chrome tabs without interrupting the user’s workflow.

Codex operates in the background across multiple tabs simultaneously, combining its native plugin capabilities with direct access to websites (dashboards, CRM, web apps). The system automatically chooses the best tool for each step: plugins, Chrome, or a combination. Use cases: debugging browser flows, checking dashboards, doing research, updating CRMs, testing complex web apps (including multiplayer games via sub-agents).

The extension installs via the Chrome plugin in the Codex app. Available immediately on macOS and Windows for all Codex users.

🔗 OpenAI Tweet — Codex Chrome Extension

ChatGPT Trusted Contact — mental health safety with human review

May 7 — OpenAI rolls out Trusted Contact, an optional safety feature in ChatGPT.

Any adult (18+, 19+ in South Korea) can designate a trusted person (friend, family member, caregiver) who will be alerted if crisis signals are detected in their conversations. The process combines automated detection and human review (target: less than one hour before any sending), with a notification sent without access to transcripts to protect privacy. The feature extends to adults the parental controls already available for teen accounts. Developed with the American Psychological Association and a network of 260+ doctors in 60 countries.

Parameter	Value
Eligibility	18+ (19+ South Korea)
Acceptance window for the contact	1 week
Human review SLA	Target < 1 hour
Notification content	General reason, no transcript
Channels	Email, SMS, in-app

🔗 OpenAI — Trusted Contact

OpenAI B2B Signals — the gap between leading companies and typical companies is widening

May 6 — OpenAI publishes the first B2B Signals report, documenting the growing gap between “leading” companies and typical companies in their AI adoption.

Companies in the 95th percentile use 3.5× more intelligence per employee than typical companies (up from 2× in April 2025). The gap is driven less by message volume (36% of the gap) than by depth of use (64%): delegation of complex tasks, agentic workflows, integration into production systems. On Codex, the gap is the most pronounced: 16× more messages per employee.

Two concrete cases: Cisco reduces build time by ~20%, saves 1,500+ engineering hours per month, and increases defect-resolution speed by 10 to 15×. Travelers Insurance handles ~100,000 claims calls per year via an assistant.

Indicator	Typical companies	Leading companies
Intelligence/employee	baseline	×3.5
Codex messages/employee	baseline	×16
Share of volume in the gap	—	36%
Share of depth in the gap	—	64%

🔗 OpenAI — B2B Signals

MRC — open source network protocol for Stargate supercomputers

May 5 — OpenAI releases the MRC (Multipath Reliable Connection) protocol as open source via the Open Compute Project, co-developed with AMD, Broadcom, Intel, Microsoft, and NVIDIA over two years.

MRC is an 800 Gb/s network protocol for large-scale AI training supercomputers. It connects 100,000+ GPUs with only 2 switch levels (versus 3 to 4 in the conventional approach), spraying packets across hundreds of simultaneous paths via IPv6 source routing (SRv6). Failure recovery happens in microseconds (versus several seconds with classic dynamic BGP). Already in production on Stargate (Abilene, Texas) and Microsoft’s Fairwater supercomputers, MRC has enabled the training of several models including GPT-5.5 and Codex.

Aspect	Conventional approach	MRC
Switch levels for 100K+ GPUs	3-4	2
Failure recovery	Seconds to tens of seconds	Microseconds
Routing	Dynamic BGP	Static SRv6
Packet distribution	1 path per transfer	Hundreds of paths in parallel

🔗 OpenAI — MRC Supercomputer Networking

Perplexity ROSE — Proprietary inference engine and CuTeDSL

May 6 — Perplexity publishes a research article detailing ROSE (Runtime-Optimized Serving Engine), its proprietary inference engine, and its integration of CuTeDSL (NVIDIA GPU kernel library).

ROSE powers all Perplexity services (Sonar, Search, Embeddings) on NVIDIA Hopper and Blackwell GPUs, from encoding models up to trillion-parameter LLMs. CuTeDSL makes it possible to build optimized custom GPU kernels faster, adapted to new model architectures at a steady pace.

This publication illustrates Perplexity’s strategy: control the entire technical stack down to the GPU kernel level to differentiate on performance and reduce dependence on third-party frameworks.

🔗 Perplexity Research — CuTeDSL and ROSE

ElevenLabs reaches $500M ARR — NVIDIA investor via NVentures

May 5 — ElevenLabs announces a third close of its Series D with NVIDIA as a new strategic investor via NVentures.

ARR rose from $350M at the end of 2025 to **$ 500M in April 2026**, up 43% in four months. This third close also includes BlackRock, Wellington Management, D.E. Shaw, Schroders, as well as customer companies (Salesforce, Santander, KPN, Deutsche Telekom) and a retail investment via Robinhood Ventures. A $100M tender offer was completed in parallel. ElevenLabs has 530 employees across 50+ countries. The roadmap announces the merging of image/video and audio into a unified creative platform.

🔗 ElevenLabs — $500M ARR and new investors

AlphaEvolve in production — 5 industrial sectors via Google Cloud

May 7 — One year after its launch, Google DeepMind publishes an update on AlphaEvolve, its Gemini-powered coding agent, now moved from research into industrial production.

AlphaEvolve optimizes Google’s critical infrastructure: TPU, cache replacement policies, LSM-tree compaction in Google Spanner. It is commercially deployed via Google Cloud in five sectors: finance (doubling transformer performance), semiconductors (computational lithography), logistics (traveling salesman problem), advertising, and materials science (~4× speedup at Schrödinger). On the academic side, AlphaEvolve collaborated with Terence Tao (UCLA) on Erdős problems and improved lower bounds for the traveling salesman problem and Ramsey numbers.

🔗 DeepMind — AlphaEvolve Impact

Self-learning Manus Projects — agentic workspace that improves with every task

May 6 — Manus launches a feature allowing Projects to automatically learn from every conversation and propose user-approved updates.

At the end of each task, Manus identifies reusable decisions, standards, and patterns, then proposes: instruction updates (when the process or terminology has evolved), file updates (outdated sources, examples, or templates), and skill updates (skills) for recurring workflows. No change is applied without explicit human validation. Future collaborators start with the Project’s latest shared context. The feature is available for all sessions where instructions and files are supported.

🔗 Manus — Self-learning Projects

Briefs

Anthropic bug bounty open to the public — The program, previously private within the security research community, is now accessible to everyone on HackerOne. 🔗 source
xAI Image Generation Quality Mode API — The image generation quality mode (300M+ images generated on Grok) is now available via the xAI API: increased realism, better text rendering, stronger creative control. 🔗 source
Z.ai GLM-5V-Turbo Tech Report — Z.ai (Zhipu AI) publishes the technical report for GLM-5V-Turbo, a native foundation model for multimodal agents with a CogViT encoder (SigLIP2 + DINOv3 distillation) and a perception-planning-execution loop. 🔗 source
ChatGPT Futures Class of 2026 — OpenAI recognizes 26 young builders from 20+ universities (Vanderbilt, Oxford, Georgia Tech…) with a USD 10,000 grant each and access to frontier models. 🔗 source
NVIDIA DeepStream + Claude Code — Demonstration of a “concept to app” approach combining DeepStream, Claude Code, and reusable Skills to generate Vision AI applications without writing every line of code. 🔗 source
NVIDIA Guess-Verify-Refine — New hardware-aware inference technique where each decoding step gives the next one a head start, designed specifically for NVIDIA accelerators. 🔗 source
TokenSpeed + NVIDIA Dynamo — TokenSpeed (LightSeek Foundation) reaches TensorRT-LLM level in open source; NVIDIA Dynamo adds day-0 support for this backend, with Kimi K2.5 supported via the Dynamo frontend. 🔗 source
Ideogram BG Remover — New generative model (trained from scratch, not classic segmentation) for background removal: alpha channel preservation, geared toward logos and complex illustrations, API available. 🔗 source
Google DeepMind × EVE Online — Partnership with CCP Games to explore AI research in complex player-driven game environments. 🔗 source
GitHub Copilot Trust Layer — Microsoft/GitHub publishes research on a structural trust layer to validate Copilot agents (execution graphs + dominator analysis): 100% precision vs 82.2% for self-evaluation, 100% recall vs 60%. 🔗 source
GitHub — reviewing agent pull requests — Practical guide (10-minute checklist) with 5 warning signs: CI gaming, code reuse blindness, hallucinated correctness, agentic ghosting, prompt injection into CI pipelines. 🔗 source

What this means

The race for the Personal Computer is accelerating. In the space of one week, three very different interfaces are targeting the same user desktop: Perplexity Personal Computer installs on Mac (and Mac mini as a permanent hub), Claude spreads across the four Microsoft 365 apps with shared context, and Codex controls Chrome in the background. These agents are no longer in the cloud: they are embedding themselves into existing workflows, on open files, in native applications. The shift from information retrieval to direct action on everyday work tools is now concrete.

Orbital compute enters the realm of facts. The Anthropic/xAI Colossus 1 deal is remarkable for two reasons: first, it gives Anthropic immediate access to 220,000 NVIDIA GPUs to double its limits starting this week; second, it includes a shared intention to develop several gigawatts of AI capacity in orbit. Combined with the Amazon, Google/Broadcom, Microsoft/NVIDIA, and Fluidstack agreements, Anthropic is building a computing infrastructure that has no equivalent among independent research labs. This accumulation of compute is the prerequisite for the next generation of models — and for the continued doubling of limits.

Reasoning voice changes the scope of voice agents. GPT-Realtime-2 is not a cosmetic update: bringing GPT-5 reasoning into a real-time interface, with 128K context and parallel tool calls, transforms the use cases. Zillow measures a +26-point success rate on its hardest calls. Live translation (70 source languages to 13 target languages) in the same model opens multilingual workflows without a separate translation pipeline. The question is no longer “can we do AI voice?” but “which complex voice interactions become economically viable?”

Alignment and agentic trust are moving toward tooling. Three separate announcements converge on the same problem — how to trust agents in production. Anthropic’s NLAs reveal that Claude knows when it is being tested (in 16% to 26% of evaluations) without verbalizing it. GitHub’s Trust Layer (100% precision vs 82% for self-evaluation) gives development teams structural validation of agent-generated pull requests. The transfer of Petri 3.0 to Meridian Labs creates an evaluation benchmark independent of any lab. These three layers — model interpretability, output validation, and independent audit tools — are beginning to form a trust architecture for large-scale agentic deployments.