GPT-5.5 Instant new ChatGPT default, Grok 4.3 on xAI API, Anthropic x Blackstone enterprise

Busy week: OpenAI pushes GPT-5.5 Instant to directly replace GPT-5.3 for all ChatGPT users, xAI launches Grok 4.3 on its API with a one-million-token context, and Anthropic announces a new enterprise AI services company co-founded with Blackstone, Hellman & Friedman, and Goldman Sachs. On the tooling side, GitHub triples security GA/previews via its MCP server, Perplexity launches a product dedicated to financial teams, and Runway presents real-time video agents generated from a single image.

GPT-5.5 Instant — new default ChatGPT model

May 5 — OpenAI replaces GPT-5.3 Instant with GPT-5.5 Instant as the default ChatGPT model for all users. The rollout spans two days.

Dimension	Improvement vs GPT-5.3 Instant
Hallucinations (medicine, law, finance)	-52.5%
Incorrect claims (reported by users)	-37.3%
Response verbosity	-30.2% words on average

The model also improves image analysis, STEM answers, and the decision to use web search. Responses are more concise without losing substance, with less unnecessary formatting and fewer irrelevant follow-up questions.

Memory sources — OpenAI introduces “memory sources” across all ChatGPT models: when a response is personalized from saved memories, past conversations, or connected Gmail, the user sees exactly which sources were used and can correct or delete them. Personalization from past conversations and files is initially reserved for Plus and Pro subscribers (web), with other plans to follow.

Availability:

Gradual rollout over 2 days for all ChatGPT users
Available via API under the alias chat-latest
GPT-5.3 Instant remains accessible for 3 months for paying subscribers

🔗 Official GPT-5.5 Instant announcement

Grok 4.3 launched on the xAI API — 1M token context, #1 agentic tool calling

May 5 — xAI announces via X the launch of Grok 4.3 on the xAI API (console.x.ai). The model is presented as the fastest and smartest in the lineup to date.

Feature	Value
Context window	1 million tokens
Agentic tool calling benchmark	#1 (@ArtificialAnlys leaderboard)
Instruction following benchmark	#1 (@ArtificialAnlys leaderboard)
Enterprise domains	#1 case law and corporate finance (@ValsAI)
Availability	xAI API (console.x.ai) — not yet on grok.com

Grok 4.3 is now live on the xAI API. It’s our fastest, most intelligent model to date. It tops the @ArtificialAnlys leaderboards in agentic tool calling and instruction following, and ranks #1 in @ValsAI enterprise domains like case law and corporate finance. Grok 4.3 supports a 1 million token context. — @xai on X

The tweet generated 25.7 million views and 6,029 likes. Note: no dedicated page on x.ai/news at the time of the announcement — the launch went exclusively through X.

Anthropic and Blackstone, Hellman & Friedman, Goldman Sachs — new enterprise AI services company

May 4 — Anthropic, Blackstone, Hellman & Friedman, and Goldman Sachs announce the creation of a new enterprise AI services company, backed by a consortium of additional alternative investors.

The goal: deploy Claude in the core operations of large companies for tasks that require intensive engineering and deep industry knowledge. According to Anthropic, enterprise demand for Claude exceeds what a single distribution model can absorb.

The typical operating model starts with a small team working closely with the client to identify friction points, then building Claude agents tailored to the business. The concrete example given: a multi-site network of medical practices where Claude handles clinical documentation, repetitive administrative tasks, and coordination between specialties, allowing clinicians to focus on patient care.

The new company will join the Claude Partner Network, alongside Accenture, Deloitte, and PwC. It represents a structural step in Anthropic’s enterprise distribution strategy: rather than selling only API licenses, the company is now engaging in complex operational deployments with top-tier financial partners.

🔗 Official announcement

Claude agents for financial services and insurance

May 5 — Anthropic launches ten ready-to-run agent templates for financial services and insurance. Available as plugins in Claude Cowork or Claude Code, or as standalone Claude Managed Agents on the Claude platform.

Research and client coverage:

Agent	Role
Pitch builder	Target lists, comparables, pitchbooks
Meeting preparer	Client and counterparty briefs
Earnings reviewer	Transcript reading and model updates
Model builder	Financial model creation from filings and data
Market researcher	Sector monitoring and news synthesis

Finance and operations:

Agent	Role
Valuation reviewer	Valuation checks
General ledger reconciler	Accounting reconciliation and NAV calculations
Month-end closer	Monthly close and accounting entries
Statement auditor	Financial statement review
KYC screener	Entity file assembly and compliance screening

Claude now integrates into Microsoft Excel, PowerPoint, Word, and Outlook (in progress) via add-ins. Claude Cowork’s Dispatch feature lets you assign tasks by text or voice from anywhere.

New data connectors: Dun & Bradstreet, Fiscal AI, Financial Modeling Prep, Guidepoint, IBISWorld, SS&C IntraLinks, Third Bridge, Verisk, and a Moody’s MCP (ratings and data on more than 6,000 entities).

Among the clients cited: Citadel, FIS, BNY, Carlyle, Mizuho, Travelers, Walleye Capital (100% of employees use Claude Code), Hg, Morningstar, FactSet. These agents are optimized for Claude Opus 4.7, ranked #1 on the Vals AI Finance Agent benchmark.

🔗 Official announcement

Perplexity Computer for Professional Finance

May 5 — Perplexity launches Computer for professional finance, a version of Computer designed specifically for analysis and investment teams: buy-side and sell-side analysts, hedge funds, private equity.

Dimension	Value
Included workflows	35 (10 segments)
Integrated data providers	14 (including Quartr, Fiscal)
Premium MCP connectors	Morningstar, PitchBook, Daloopa, Carbon Arc
Available platforms	Microsoft Teams, Agent API
Coming soon	Excel add-in
FinSearchComp T1 benchmark	1st (accuracy, cost, latency)

Teams with licensed subscriptions can connect their own credentials via MCP connectors to access Morningstar, PitchBook, Daloopa, and Carbon Arc. Others get access to built-in financial tools powered by 14 data providers.

Every numerical value links back to its source: for values from SEC documents, Computer shows the calculation and points to the exact pages in the document. On the FinSearchComp T1 benchmark (time-sensitive data extraction), Perplexity ranks first in accuracy, cost per correct answer, and latency — including real-time prices, crypto prices, and exchange rates.

🔗 Perplexity blog — Computer for Professional Finance

Runway Characters — real-time video agent from a single image

May 4 — Runway announces Characters, a technology that turns a single image into a real-time conversational video agent.

Metric	Value
End-to-end latency	1.75 seconds
Video quality	24 fps HD
Required image source	1 image only
Cold starts	60× faster (peer-to-peer GPU)

The 1.75-second delay is measured from the moment the user stops speaking to the character’s first response. Runway simultaneously published two engineering papers: the first describes the real-time video agent architecture, the second explains how peer-to-peer GPU infrastructure reduces cold-start times by 60.

Target use cases include conversational agents, interactive real-time characters, and video interfaces for applications. The technology marks a shift from offline video rendering to synchronous interaction.

🔗 Runway Characters announcement tweet

GitHub MCP Server — Triple security advance

May 5 — GitHub simultaneously publishes three security updates for its MCP server, all on the same day.

Secret scanning GA

Secret scanning via GitHub MCP Server moves to general availability (out of preview since March 2026). In GitHub Copilot CLI, installation is done with /plugin install advanced-security@copilot-plugins; in VS Code, the advanced-security plugin exposes the /secret-scanning command.

Aspect	Detail
Status	GA (general availability)
Availability	Repositories with GitHub Secret Protection enabled
Integrations	Copilot CLI, VS Code, any MCP-compatible IDE

MCP tools now respect existing push protection customizations — bypass behavior is consistent with the repository or organization configuration.

🔗 Changelog — Secret scanning GA

Dependency scanning in public preview

Dependency vulnerability detection via MCP Server moves to public preview. The system queries the GitHub Advisory Database and returns structured results with affected packages, severity, and recommended fixed versions.

Aspect	Detail
Status	Public preview
Availability	Repositories with Dependabot alerts enabled
CLI activation	`copilot --add-github-mcp-toolset dependabot`

🔗 Changelog — Dependency scanning

GitHub Advanced Security × Microsoft Defender for Cloud GA

The GitHub Advanced Security × Microsoft Defender for Cloud integration also moves to GA. It correlates container images deployed in cloud environments with GitHub source code, bringing runtime context into security views.

New filters available in the organization view: has:deployment, runtime-risk:internet-exposed, runtime-risk:sensitive-data. Security campaigns can be assigned directly to the GitHub Copilot coding agent.

🔗 Changelog — Code-to-cloud GA

Model Spec Midtraining (MSM) — agentic alignment reduced from 68% to 5%

May 5 — Anthropic researchers publish “Model Spec Midtraining” (MSM), an alignment method inserted between pretraining and alignment fine-tuning (AFT).

The principle: models are trained on a synthetic corpus of documents discussing the content of their Model Spec before learning to follow its rules. The idea is that understanding why a rule exists improves the robustness of its application.

Model	Misalignment (AFT only)	With MSM + AFT
Qwen2.5-32B	68%	5%
Qwen3-32B	54%	7%

MSM also makes AFT much more data-efficient: 40 to 60 times less AFT data is needed to achieve comparable performance. The authors also show that explaining the motivations behind the rules (rather than multiplying sub-rules) improves out-of-distribution generalization.

🔗 MSM paper — alignment.anthropic.com

May 5 — NotebookLM improves its Mind Maps with three features rolled out simultaneously.

Feature	Description
Customization	Guide the map with specific user instructions
Organization	Rename and share Mind Map charts instantly
Navigation	Smooth transitions between nodes

The rollout is gradual for all users. The update completes NotebookLM’s sequence of rapid improvements since early April: automatic source organization (April 24, 100% rollout reached on May 5), integration into the Gemini mobile app (April 30).

🔗 NotebookLM tweet

Genspark sb-git — Git server rewritten for AI agents

May 5 — Genspark launches sb-git, a Git server rewritten from scratch for AI agents. Complete Git semantics: versioning, branches, diff, blame, rollback, and push.

Aspect	Detail
CLI	`gsk` (init, clone-url, cat, commit)
Compatibility	Claude Code, OpenClaw, any Git agent
Storage	1 GB (free), 10 GB (Plus/Pro)
Account required	No — no GitHub account needed
Availability	Immediate (web + mobile)

No GitHub account required, no prior repository setup. The focus is on compatibility with common AI agents (Claude Code, OpenClaw) without installation friction.

🔗 Genspark sb-git tweet

NVIDIA + ServiceNow — Project Arc, autonomous long-running desktop agent

May 5 — At the ServiceNow Knowledge 2026 conference, Jensen Huang and Bill McDermott announced the expansion of their partnership around autonomous AI agents in the enterprise.

ServiceNow launches Project Arc, an autonomous long-running desktop agent designed for knowledge workers: developers, IT teams, administrators. The agent uses NVIDIA OpenShell (open source sandbox) for governance and security, and connects natively to the ServiceNow platform via ServiceNow Action Fabric.

Metric	Value
Blackwell vs Hopper efficiency	50× tokens/watt
Cost reduction per million tokens	~35×
Nemotron 3 Super (open source)	#1 EnterpriseOps-Gym (NOWAI-Bench)
Tickets resolved autonomously	90% (ServiceNow + Apriel/Nemotron)

🔗 NVIDIA Blog — ServiceNow

NVIDIA NemoClaw + OpenClaw — persistent open source agent surpassing React on GitHub

April 30 — OpenClaw (created by Peter Steinberger) surpassed 250,000 GitHub stars in 60 days, overtaking React to become the platform’s most-starred project. NVIDIA is collaborating with the community to secure this persistent self-hosted AI agent project.

NVIDIA is launching NemoClaw, a one-command-installable reference implementation, combining OpenClaw + NVIDIA OpenShell + Nemotron with security-hardening configurations enabled by default.

Metric	Value
OpenClaw GitHub stars	250,000+ (March 2026)
Growth	#1 GitHub project in 60 days (surpasses React)
Agent inference multiplier vs reasoning AI	1,000×
NemoClaw installation	1 command

🔗 NVIDIA Blog — OpenClaw/NemoClaw

Luma AI Uni-1.1 API — image generation that reasons about creative briefs

May 5 — Luma AI is launching the Uni-1.1 API, an image generation model designed to reason about creative briefs rather than tokens. Unlike traditional APIs that require prompt engineering, Uni-1.1 understands the aesthetic context of each visual tradition and produces usable results on the first try.

Use cases cited: fashion tools, architectural renders, manga pipelines, cinematic content. No middleware required. The API is available at lumalabs.ai/api.

🔗 Luma AI Uni-1.1 Tweet

ChatGPT Ads Manager self-serve and CPC bidding

May 5 — OpenAI is expanding its advertising program with two new features: a self-serve tool (Ads Manager, in beta in the US) and the launch of CPC bidding mode (cost per click).

Mode	Status	Description
CPM (cost per thousand impressions)	Existing	Available since the program launched
CPC (cost per click)	New	The advertiser pays only when a click actually happens
Ads Manager self-serve (beta)	New	Available to US advertisers

Agency partners: Dentsu, Omnicom, Publicis, WPP. Technology partners: Adobe, Criteo, Kargo, Pacvue, StackAdapt. OpenAI also launched a Conversions API and pixel tracking to measure post-click actions without exposing individual conversations to advertisers.

🔗 OpenAI advertising announcement

Perplexity Premium Health Sources

May 5 — Perplexity launches Premium Health Sources. More than one in ten queries on the platform are about health. The sources available at launch are NEJM, BMJ Journals, and BMJ Best Practice — medical references usually reserved for institutional subscriptions.

In Computer, these sources activate automatically for health questions without manual selection. Each answer includes traceable citations. Upcoming sources: Micromedex, EBSCOhost, Health Affairs, VisualDx, American Academy of Orthopaedic Surgeons, American Diabetes Association, Springer Publishing.

🔗 Perplexity Blog — Premium Health Sources

Briefs

Manus — Automatic connector recommendation — Manus now detects which connector (Slack, Notion, Gmail, Google Drive) is needed to complete a task and recommends it in the conversation, without leaving the thread. Activation still requires user confirmation. 🔗 source
Black Forest Labs — FLUX Creator Program — BFL is opening a selective creator program for early access to upcoming FLUX models, with amplification of their work through BFL channels. 🔗 source
GPT-5.5 Instant System Card — First System Card in the Instant lineup classified as “High capability” in OpenAI’s Preparedness Framework categories of Cybersecurity and Biology & Chemistry. Additional safeguards have been implemented accordingly. 🔗 source
OpenAI — WebRTC relay+transceiver architecture — OpenAI publishes an engineering article describing the redesign of its WebRTC infrastructure for real-time voice (ChatGPT Voice, Realtime API), serving more than 900 million weekly users. The architecture separates packet routing (lightweight, stateless relay) from protocol termination (transceiver, stateful), enabling standard Kubernetes deployment with a reduced public UDP footprint. 🔗 source

What this means

Finance as the primary playground for enterprise AI. Within 24 hours, Anthropic, Perplexity, and xAI each published announcements explicitly targeting financial teams: ten Claude agent templates (valuation, KYC, month-end close), Computer for Professional Finance with 35 workflows and 14 data providers, and Grok 4.3 ranked #1 on the Vals AI benchmark in corporate finance and case law. The convergence is no accident — finance combines structured document volume, precision requirements, and tolerance for premium tool costs, making it the ideal ground for the first high-value autonomous agent deployments.

The race for default models. GPT-5.5 Instant reduces hallucinations by 52.5% compared with its immediate predecessor, and Grok 4.3 reaches a one-million-token context with measured, published agentic performance. These two models launch on the same day. The goal is no longer just to publish the best academic benchmarks, but to be the model loaded by default in consumer interfaces (ChatGPT) or activated first in developer pipelines (xAI API).

MCP as a developer security standard. GitHub simultaneously released three security updates through its MCP server (secret scanning GA, dependency scanning in preview, code-to-cloud GA). This coordinated rollout turns GitHub’s MCP server into a native security integration channel for coding agents — Copilot CLI, VS Code, and any MCP-compatible IDE can now scan for secrets and vulnerable dependencies before every commit, directly in the agent workflow.

Persistent agents and real-time infrastructure. Runway Characters (a video agent with 1.75 s latency from an image), ServiceNow’s Project Arc (a long-running desktop agent), OpenClaw/NemoClaw (250,000 GitHub stars, 1,000× more inference demand than reasoning AI), and Genspark sb-git (Git rewritten for agents) all signal the same shift: AI agents are moving out of the one-off query era and into that of persistent processes, with radically different infrastructure needs — state storage, real-time latency, native versioning.