The week ends with several significant announcements: OpenAIโs GPT-5.4 consolidates native computer use with 75% on OSWorld and a one-million-token context window, NotebookLM introduces Cinematic Video Overviews with Gemini as director, and Codex extends support to Windows with a native sandbox. On the developer tooling side, Anthropic improves the skill-creator and launches HTTP hooks in Claude Code, and GitHub enables Copilot Memory by default for Pro users.
GPT-5.4 โ Native computer use, 1M tokens, tool search
March 5, 2026 โ OpenAI launches GPT-5.4, its frontier model for professional work. Available in ChatGPT (under the name GPT-5.4 Thinking), in the API (identifier gpt-5.4) and in Codex, this model consolidates reasoning, coding, and agentic workflow capabilities introduced in previous models into a single architecture.
The most significant technical novelty is the native integration of computer use: GPT-5.4 can operate graphical interfaces via screenshots and keyboard/mouse without third-party plugins. On OSWorld-Verified โ the reference benchmark for interaction with real software interfaces โ GPT-5.4 reaches 75.0%, versus 47.3% for GPT-5.2. The context window increases to 1 million tokens in Codex and the API.
Another notable addition is tool search: instead of receiving the full list of available tools on every call, the model receives a lightweight list and searches for tools on demand. OpenAI measures a 47% reduction in token consumption on multi-tool workflows (tested on Scale MCP Atlas). The /fast mode in Codex gains 1.5ร speed at equal intelligence.
Benchmarks:
| Evaluation | GPT-5.4 | GPT-5.3-Codex | GPT-5.2 |
|---|---|---|---|
| GDPval (professional work) | 83.0 % | 70.9 % | 70.9 % |
| SWE-Bench Pro | 57.7 % | 56.8 % | 55.6 % |
| OSWorld-Verified (computer use) | 75.0 % | 74.0 % | 47.3 % |
| BrowseComp (web search) | 82.7 % | 77.3 % | 65.8 % |
| Toolathlon (tool usage) | 54.6 % | 51.9 % | 46.3 % |
| ARC-AGI-2 (abstract reasoning) | 73.3 % | โ | 52.9 % |
API pricing:
| Model | Input | Output |
|---|---|---|
| gpt-5.2 | $1.75 / M tokens | $14 / M tokens |
| gpt-5.4 | $2.50 / M tokens | $15 / M tokens |
| gpt-5.2-pro | $21 / M tokens | $168 / M tokens |
| gpt-5.4-pro | $30 / M tokens | $180 / M tokens |
GPT-5.4 Thinking is available today to ChatGPT Plus, Team, and Pro subscribers. GPT-5.2 Thinking will remain available under โLegacy Modelsโ until June 5, 2026. On safety, OpenAI classifies GPT-5.4 as โHigh cyber capabilityโ in its Preparedness Framework. The company simultaneously publishes CoT-Control, an open-source evaluation suite measuring chain-of-thought controllability across 13 frontier models โ the scores, low (0.1% to 15.4%), indicate that monitoring chains of thought remains a reliable safety tool.
๐ Introducing GPT-5.4 | OpenAI
NotebookLM โ Cinematic Video Overviews
March 4, 2026 โ NotebookLM introduces Cinematic Video Overviews in its Studio. These videos go beyond the Audio Overviews (podcast format) launched in 2024 and standard video templates.
The idea: Gemini is positioned as director. The model analyzes the userโs sources, decides on the most suitable format (tutorial, documentary, etc.), selects a visual style, generates images, then self-critiques before producing the final version. The result is an immersive, personalized video unique to each set of sources.
The feature is available to Google AI Ultra subscribers, in English, since March 4, 2026. Full rollout to Ultra users was confirmed the same day. Pro subscriber access is planned on the roadmap, with no precise timeline. The announcement tweet received 3 million views.
๐ NotebookLM announcement on X
OpenAI โ Codex on Windows, CoT-Control research
Codex available on Windows
March 4, 2026 โ The Codex application is now available on Windows, with a native agent sandbox and support for Windows development environments via PowerShell. Two new skills are available: $aspnet-core for Blazor, ASP.NET MVC and Razor Pages applications, and $winui-app for native Windows apps with WinUI 3.
๐ @OpenAIDevs on X
Research โ chain-of-thought controllability
March 5, 2026 โ OpenAI publishes โReasoning models struggle to control their chains of thought, and thatโs good.โ The open-source evaluation suite CoT-Control measures chain-of-thought controllability across 13 frontier models. Scores range from 0.1% to 15.4%, indicating that current models struggle to deliberately alter their reasoning to bypass monitoring systems โ a result presented as positive for safety. OpenAI plans to include these metrics in future modelsโ system cards.
๐ CoT-Control research | OpenAI
Anthropic โ Skill-creator and HTTP hooks
Improved skill-creator
March 3, 2026 โ Anthropic releases a major update to its skill-creator tool for Claude Code and Claude.ai. The announcement introduces two formal types of Agent Skills:
| Type | Description | Durability |
|---|---|---|
| Capability uplift | Helps Claude do something it does not yet do well | May become obsolete if the model improves |
| Encoded preference | Encodes team processes and preferences | Durable, depends on fidelity to the real workflow |
New features: evals (automated tests) to verify a skill produces the expected result, a benchmark mode to measure success rate, time and token consumption, and multi-agent support to run evaluations in parallel without cross-contamination between tests. An A/B comparator mode allows comparing two versions of a skill. The skill-creator is available now on Claude.ai and Cowork; for Claude Code it installs as a plugin.
๐ Improving skill-creator: Test, measure, and refine Agent Skills
HTTP hooks in Claude Code
March 4, 2026 โ Claude Code launches HTTP hooks, an alternative to existing command hooks. Instead of running a local shell script, Claude Code sends an event to a user-chosen URL and waits for a response. Use cases: build a web app to visualize progress, manage permissions, or synchronize state between multiple Claude Code instances via a database. HTTP hooks work in plugins, custom agents, and managed enterprise settings.
๐ Tweet @dickson_tsai
Gemini CLI v0.32.0 โ Generalist Agent by default
March 3, 2026 โ Gemini CLI version 0.32.0 enables the Generalist Agent by default to improve task delegation and routing. The update also adds Model Steering directly in the workspace, improvements to Plan Mode (opening and editing plans in an external editor, multi-selection management for complex tasks), interactive shell autocompletion, and parallel loading of extensions for better startup performance.
๐ Changelog Gemini CLI
GitHub Copilot โ Memory by default, mobile and metrics
Copilot Memory enabled by default
March 4, 2026 โ GitHub enables Copilot Memory by default for all Pro and Pro+ plan users. The feature, previously in preview via opt-in, allows Copilot to retain persistent repository-level information: coding conventions, architectural patterns, critical dependencies.
Memories are strictly limited to a single repository and validated against current code before application, avoiding use of stale context. They automatically expire after 28 days. The feature is active on the coding agent, code review, and the Copilot CLI โ knowledge discovered by one agent is immediately available to others. Users can disable Copilot Memory in their settings (Settings > Features > Copilot Memory); Enterprise admins retain full control.
๐ Copilot Memory now on by default for Pro and Pro+ users
Live notifications for agents in GitHub Mobile
March 4, 2026 โ GitHub Mobile receives real-time notifications for Copilot agent sessions. Developers can follow their agentsโ progress, whether the session was started from a desktop or from the phone.
๐ GitHub Mobile | Announcement on X
Grok Code Fast 1 in Copilot Free Auto
March 4, 2026 โ GitHub adds xAIโs Grok Code Fast 1 to Copilot Freeโs automatic model selection (Auto). This model can now be chosen by Copilot during chat sessions in Visual Studio Code, Visual Studio, JetBrains IDEs, Xcode and Eclipse.
๐ Grok Code Fast 1 in Copilot Free auto model selection
Copilot CLI metrics at user level
March 5, 2026 โ GitHub expands Copilot usage metrics to include user-level CLI activity. This update follows last weekโs enterprise-level release. Admins can now identify active CLI users, view request and session counts, and track token consumption by user.
๐ Copilot usage metrics โ user-level CLI activity
Perplexity โ GPT-5.4 and Voice Mode in Computer
GPT-5.4 Thinking available on Perplexity
March 5, 2026 โ GPT-5.4 and GPT-5.4 Thinking are now accessible in Perplexity for Pro and Max subscribers. The Thinking version activates GPT-5.4โs extended reasoning for deeper answers to complex queries.
๐ Announcement on X
Voice Mode in Perplexity Computer
March 4, 2026 โ Perplexity introduces a Voice Mode in Perplexity Computer. The interface, which already allowed searching, coding and deploying projects, now accepts voice instructions directly.
๐ Announcement on X
Cohere ร Aston Martin F1 โ multi-year partnership
March 4, 2026 โ Cohere announces a multi-year partnership with the Aston Martin Aramco F1 team. Every team member will have access to enterprise models and Cohereโs agentic AI platform (North) to work in one of the most demanding data environments in world sport. The Cohere logo will appear on the car starting at the 2026 Australian Grand Prix.
Black Forest Labs โ Self-Flow, multi-modal research
March 4, 2026 โ Black Forest Labs (creators of FLUX) releases Self-Flow in research preview. This approach trains multi-modal generative models (image, video, audio, text) without relying on external models for representation, using a self-supervised flow matching method.
Results shown: up to 2.8ร faster cross-modal convergence, better temporal coherence in video, and crisper typographic rendering. Demos include a 4B-parameter video model trained on 6M videos, a 4B-parameter image model trained on 200M images, and a joint audio-video model. BFL positions Self-Flow as a path toward world models: โSelf-Flow opens a path toward world models: combining visual scalability with semantic abstraction for planning and understanding.โ
๐ Tweet @bfl_ml
In brief
Runway launched a unified model hub on March 3, centralizing access to third-party image, video, audio and language models directly within the platform. ๐ Announcement
Claude reached #1 on the iOS App Store in 14 countries simultaneously on March 5 โ Australia, Austria, Belgium, Canada, France, Germany, Ireland, Italy, New Zealand, Norway, Singapore, Switzerland, United Kingdom, United States. ๐ Tweet
Manus published its annual letter on March 5 for its first anniversary, highlighting user testimonials (a mother, an 86-year-old linguist, a florist). ๐ Letter
Grok surpassed one million reviews on the US App Store. ๐ Tweet @grok
What this means
GPT-5.4 confirms that computer use is moving from experimental to an integrated capability within a generalist model. The 75% score on OSWorld-Verified and the 47% token reduction via tool search are concrete measures of a paradigm shift: AI agents can now operate complex software interfaces without specialized infrastructure.
On the developer tools side, the week shows convergence: Anthropic improves how agent skills are tested and supervised, GitHub enables persistent memory for its coding agents, and Perplexity adds voice mode to its Computer agent. Agentic runtimes are gaining layers of memory, observability (HTTP hooks, mobile notifications) and natural interaction (voice).
NotebookLMโs Cinematic Video Overviews illustrate a different axis: generating long-form educational content from personal sources. Gemini as director โ analyze, critique, recombine โ is an example of AI as a meta-production tool rather than just a generation assistant.
Sources - Introducing GPT-5.4 | OpenAI
- @OpenAI on X
- @OpenAIDevs on X โ Codex for Windows
- Codex for Windows | OpenAI Developers
- Reasoning models CoT-Control | OpenAI
- NotebookLM announcement on X
- Improving skill-creator | Anthropic
- HTTP hooks Claude Code โ @dickson_tsai
- Claude #1 App Store โ @RyD0ne
- Gemini CLI changelog
- Copilot Memory now on by default | GitHub
- GitHub Mobile live agent notifications
- Grok Code Fast 1 in Copilot Free auto | GitHub
- Copilot CLI metrics user-level | GitHub
- GPT-5.4 on Perplexity
- Perplexity Computer Voice Mode
- Cohere ร Aston Martin F1
- BFL Self-Flow
- Runway Hub multi-model
- Manus anniversary letter
- Grok 1M reviews on App Store
This document was translated from the fr version into the en language using the gpt-5-mini model. For more information on the translation process, see https://gitlab.com/jls42/ai-powered-markdown-translator