Search

Claude Opus 4.6 and GPT-5.3-Codex: Dual Launch, Gemini 3 Update

Claude Opus 4.6 and GPT-5.3-Codex: Dual Launch, Gemini 3 Update

Double launch at the summit: Anthropic releases Claude Opus 4.6 with 1M token context and agent teams, while OpenAI responds with GPT-5.3-Codex and an enterprise platform. Google pushes Gemini 3 on all fronts, and GitHub finally answers an 8-year-old request.


Claude Opus 4.6: SOTA in agentic coding and 1M context

February 5 — Anthropic launches Claude Opus 4.6, a major update to its most intelligent model. The model improves in planning, long sessions, code review, and offers for the first time a 1 million token context in beta for an Opus model.

BenchmarkScoreDetail
Terminal-Bench 2.0SOTAHighest agentic coding score
Humanity’s Last ExamSOTAMultidisciplinary reasoning
GDPval-AA+144 Elo vs GPT-5.2Professional work (finance, legal)
BrowseCompSOTAComplex information retrieval
MRCR v2 (8-needle 1M)76%vs 18.5% for Sonnet 4.5

API and Product New Features

FeatureDescription
Agent teamsMultiple Claude Code agents in parallel (research preview)
Adaptive thinkingThe model chooses when to use deep thinking
Effort controls4 levels: low, medium, high (default), max
Context compactionAutomatic context summarization for long sessions
128k output tokensLonger outputs in a single request
Claude in PowerPointResearch preview (Max, Team, Enterprise)

Pricing: Unchanged at 5/5/25 per million tokens (input/output). Premium pricing beyond 200k tokens (10/10/37.50).

Availability: claude.ai, API (claude-opus-4-6), and all major cloud platforms.

Engineering blogs: Infrastructure noise and C compiler

Anthropic publishes two technical articles on the same day. The first quantifies infrastructure noise in agentic coding benchmarks: on Terminal-Bench 2.0, resource configuration alone can create gaps of 6 percentage points between setups. The second documents the construction of a C compiler in Rust by 16 Claude agents in parallel: 100,000 lines of code, capable of compiling the Linux 6.9 kernel on x86, ARM, and RISC-V, in ~2,000 Claude Code sessions for ~$20,000.

Opus 4.6 in GitHub Copilot

The same day, Claude Opus 4.6 becomes available in GA in GitHub Copilot via Agent HQ, after the public preview announced the day before.

🔗 Opus 4.6 Announcement | Infrastructure noise | Building a C compiler


GPT-5.3-Codex: coding frontier + pro knowledge

February 5 — OpenAI launches GPT-5.3-Codex, which merges the coding performance of GPT-5.2-Codex with the reasoning capabilities of GPT-5.2, all 25% faster.

BenchmarkScore
SWE-Bench Pro (Public)56.8%
Terminal-Bench 2.077.3%
OSWorld-Verified64.7%
GDPval (wins or ties)70.9%
Cybersecurity CTF77.6%
SWE-Lancer IC Diamond81.4%

GPT-5.3-Codex is the first model to have contributed to its own creation: the team used preliminary versions to debug training, manage deployment, and analyze test results.

Beyond code

The model produces presentations, spreadsheets, data analysis, and handles productivity tasks in a desktop environment (64.7% on OSWorld-Verified).

Cybersecurity: high capability

GPT-5.3-Codex is the first model rated high capability for cybersecurity under OpenAI’s preparedness framework, and the first specifically trained to identify software vulnerabilities.

🔗 GPT-5.3-Codex Blog | System Card


OpenAI: Frontier, MCP Apps, security and biotech

OpenAI Frontier: enterprise agent platform

February 5 — OpenAI launches Frontier, a platform to develop, deploy, and manage AI agents in the enterprise. Agents receive shared business context, permissions, and learn from experience.

AspectDetail
First customersHP, Intuit, Oracle, State Farm, Thermo Fisher, Uber
AI PartnersAbridge, Clay, Ambience, Decagon, Harvey, Sierra
ApproachForward Deployed Engineers (FDE) integrated into teams
StandardsOpen standards, compatible with existing systems

ChatGPT: MCP Apps in beta

February 5MCP Apps arrive in beta in ChatGPT Business, Enterprise, and Edu. New partner connectors: Amplitude, Fireflies, Vercel, Monday.com, Stripe, Hex, Egnyte, and others. Organizations can build custom MCP apps via developer mode.

Trusted Access for Cyber

February 5 — OpenAI launches Trusted Access for Cyber, a trust-based access pilot program for advanced cyber capabilities. Users can verify their identity at chatgpt.com/cyber. $10 million in API credits are allocated to cyber defense via the Cybersecurity Grant Program.

GPT-5 reduces protein synthesis cost

February 5 — In partnership with Ginkgo Bioworks, OpenAI connects GPT-5 to a robotic lab to optimize cell-free protein synthesis (CFPS). Result: 40% reduction in production cost and 57% improvement in reagent cost, after 36,000 compositions tested on 580 automated plates in six rounds of experimentation.

🔗 OpenAI Frontier | MCP Apps | Trusted Access for Cyber | GPT-5 proteins


Google: Gemini 3, Super Bowl and NotebookLM

Gemini 3: updates and Super Bowl

February 5-6 — Google pushes Gemini 3 on all fronts. Gemini 3 Flash, launched recently, offers Pro-level reasoning at Flash speed: 90.4% on GPQA Diamond and 33.7% on Humanity’s Last Exam (without tools). Gemini 3 becomes the default model for AI Overviews in Google Search.

Google is also preparing a 60-second Gemini ad for Super Bowl LX (February 8) — the “New Home” spot shows a child preparing for a move with the help of Gemini, illustrating search capabilities in Google Photos and image generation.

NotebookLM: Infographics and Slide Decks

NotebookLM, now built on Gemini 3, rolls out Infographics and Slide Decks for Free and Pro users. Slide Decks are already the second most popular output studio. Ultra users can remove the watermark.

🔗 Gemini 3 Flash | Gemini 3 App | NotebookLM Infographics


GitHub: pinned comments on Issues

February 5 — GitHub launches pinned comments on Issues. It is now possible to pin a comment to the top of an issue from the context menu. A feature requested since 2017 to highlight decisions, updates, and key next steps in long threads.

🔗 Changelog


What this means

February 5, 2026, will remain as a landmark day: Anthropic and OpenAI simultaneously launch their most advanced coding models. Claude Opus 4.6 dominates professional work and information retrieval benchmarks, while GPT-5.3-Codex excels in terminal coding and computer use. Both models claim SOTA (State Of The Art) on Terminal-Bench 2.0 — Anthropic’s article on infrastructure noise makes perfect sense.

Beyond the models, the platform battle is intensifying: OpenAI Frontier attacks the enterprise with agents deployed at Oracle and Uber, while Anthropic bets on the developer ecosystem (GitHub, Xcode, Claude Code). Google advances on all fronts with Gemini 3 in Search, Chrome, and NotebookLM, and prepares the Super Bowl to anchor Gemini in the mainstream.


Sources