March 17, 2026 is marked by NVIDIA GTC and several major launches. OpenAI releases GPT-5.4 mini and nano, its most capable compact models to date, which come close to the full model on several benchmarks. The NVIDIA Nemotron Coalition grows with the addition of Mistral AI and Perplexity. Perplexity simultaneously opens Comet Enterprise with full MDM governance, Claude Code v2.1.77 doubles the generation limit for Opus 4.6, and GitHub, Anthropic, Google, and OpenAI join forces to fund open source security to the tune of $12.5 million.
GPT-5.4 mini and nano: OpenAI’s compact models
17 March — OpenAI launches GPT-5.4 mini and GPT-5.4 nano, its most powerful compact models to date. These two variants bring GPT-5.4 capabilities into formats optimized for high-volume workloads, with lower latency and lower cost.
GPT-5.4 mini significantly improves on GPT-5 mini in code, reasoning, multimodal understanding, and tool use, while running more than twice as fast. It comes close to the performance of the full GPT-5.4 model on several key evaluations, including SWE-Bench Pro and OSWorld-Verified.
GPT-5.4 nano is the smallest and least expensive version of the GPT-5.4 family, designed for tasks where speed and cost matter most: classification, data extraction, ranking, and simple code sub-agents.
| Evaluation | GPT-5.4 | GPT-5.4 mini | GPT-5.4 nano | GPT-5 mini |
|---|---|---|---|---|
| SWE-Bench Pro (public) | 57.7 % | 54.4 % | 52.4 % | 45.7 % |
| Terminal-Bench 2.0 | 75.1 % | 60.0 % | 46.3 % | 38.2 % |
| Toolathlon | 54.6 % | 42.9 % | 35.5 % | 26.9 % |
| GPQA Diamond | 93.0 % | 88.0 % | 82.8 % | 81.6 % |
| OSWorld-Verified | 75.0 % | 72.1 % | 39.0 % | 42.0 % |
The use cases fall into three categories: code assistants (GPT-5.4 mini excels in fast coding workflows, debugging loops, and frontend generation), sub-agents (in Codex, GPT-5.4 can delegate subtasks to GPT-5.4 mini using only 30% of the GPT-5.4 quota), and interface control (computer use), where GPT-5.4 mini quickly interprets screenshots of dense interfaces.
| Model | Availability | Input price | Output price | Context |
|---|---|---|---|---|
| GPT-5.4 mini | API, Codex, ChatGPT Free/Go | $0.75/million tokens | $4.50/million tokens | 400,000 tokens |
| GPT-5.4 nano | API only | $0.20/million tokens | $1.25/million tokens | — |
In ChatGPT, GPT-5.4 mini is available to Free and Go users via the “Thinking” feature in the + menu. For paid plans, it serves as a fallback model when GPT-5.4 Thinking hits its rate limit.
🔗 Introducing GPT-5.4 mini and nano
NVIDIA GTC 2026: Nemotron Coalition and Dynamo 1.0
NVIDIA’s GTC conference, which began on March 16, was the catalyst for several major announcements across the industry: the formation of an open coalition around frontier open source models, the production release of an inference operating system, and the announcement of a data blueprint for physical AI.
Mistral joins the NVIDIA Nemotron Coalition
16 March — Mistral AI announces a strategic partnership with NVIDIA to co-develop frontier open source AI models. Mistral becomes a founding member of the NVIDIA Nemotron Coalition, combining its frontier architecture with NVIDIA’s compute infrastructure and development tools.
| Aspect | Detail |
|---|---|
| Mistral role | Founding member, frontier architecture + full-stack AI offering |
| NVIDIA contribution | GPU infrastructure + development tools |
| Goal | Co-develop open frontier-level models |
Perplexity also joins the coalition
16 March — Perplexity announces that it is joining the same NVIDIA Nemotron Coalition. Key points: Perplexity fine-tunes different open models for each stage of its response pipeline (query analysis, reasoning, final answer). The Nemotron 3 Super model (120 billion parameters, MoE architecture) is now available in the Perplexity search bar, the Agent API, and Perplexity Computer.
🔗 Perplexity blog – Nemotron Coalition 🔗 NVIDIA announcement
Dynamo 1.0: inference operating system goes to production
16 March — NVIDIA announces at GTC the production release of Dynamo 1.0, presented as the “inference operating system” for AI factories. Dynamo boosts inference performance on Blackwell GPUs by up to 7x compared with unoptimized deployments. The move to v1.0 marks its transition from the experimental phase to industrial production.
🔗 NVIDIA Dynamo 1.0 announcement
Physical AI Data Factory Blueprint
16 March — NVIDIA unveils the Physical AI Data Factory Blueprint: a reference architecture for turning accelerated computing into high-quality training data for robotics, AI vision agents, and autonomous vehicles. This blueprint enables companies to synthetically generate training data for physical AI at scale.
🔗 NVIDIA Physical AI announcement
Cohere + NVIDIA: sovereign AI on DGX Spark
16 March — Cohere and NVIDIA are partnering to develop sovereign, secure, and efficient AI, also announced at GTC. Two main tracks: NVIDIA ecosystem-native models (custom models optimized for the latest NVIDIA architecture, targeting specialized enterprise workloads) and North on DGX Spark (Cohere’s North agentic platform will be available on NVIDIA DGX Spark, local and low-latency for sensitive data). Target sectors include finance, healthcare, and the public sector.
🔗 Cohere blog – NVIDIA sovereign AI
Perplexity Comet Enterprise: MDM governance and CrowdStrike integration
17 March — Perplexity launches Comet Enterprise for all Enterprise subscribers. The AI browser moves to an enterprise version with full deployment governance.
| Feature | Description |
|---|---|
| MDM deployment | Silent installer, deployment across thousands of machines, audit logs |
| Granular telemetry | Per-user tracking |
| CrowdStrike Falcon | Anti-phishing protection, exfiltration detection (screenshots, downloads) |
| Real-time intervention | Possible via CrowdStrike integration |
| Privacy | Perplexity never trains its models on enterprise data |
Early users include Fortune companies, AWS, AlixPartners, Gunderson Dettmer, and Bessemer Venture Partners. Documented use cases cover client meeting preparation (real-time news), SOW contract analysis, financial calculations, and sector research.
🔗 Perplexity blog – Comet Enterprise
Claude Code v2.1.77: 64k tokens by default for Opus 4.6
17 March — Claude Code v2.1.77 is released with a significant increase in generation limits and several critical bug fixes.
| Model | Default limit | Maximum limit |
|---|---|---|
| Claude Opus 4.6 | 64,000 tokens | 128,000 tokens |
| Claude Sonnet 4.6 | — | 128,000 tokens |
The default limit for Opus 4.6 doubles, from 32k to 64k tokens, enabling much longer responses without additional configuration.
New features:
allowReadin sandboxes: new filesystem configuration setting allowing reads to be re-enabled in areas covered by adenyReadrule. Useful for granular security configurations./copy N: the/copycommand now accepts an optional index —/copy 2copies the assistant’s second-most-recent response without navigating history.
Notable fixes:
- “Always Allow” on compound bash commands: the rule was saved for the full string (
cd src && npm test) instead of per subcommand. Fixed. - Auto-updater: started parallel downloads during repeated window opens and closes, which could accumulate dozens of gigabytes in memory. Fixed.
--resumetruncating history: a race condition between memory extraction writes and the main transcript could lead to silent truncation. Fixed.PreToolUsehooks bypassingdenyrules: a hook returning"allow"bypasseddenypermission rules, including enterprise-managed settings. Important security fix.
Technical article: how the Claude Code team uses Skills
17 March — Thariq (@trq212), an engineer on the Claude Code team at Anthropic, publishes “Lessons from Building Claude Code: How We Use Skills”, the second article in the series after “Seeing like an Agent” (February 27, 3.6 million views).
The article documents how Skills have become one of the most widely used extension points in Claude Code — flexible, easy to maintain, and enabling teams to define reusable workflows directly in their development environment. Boris Cherny (@bcherny), head of Claude Code, shared the article and called it “Really great writeup”. The author also announces the upcoming release of an open source iMessage skill as a concrete example.
“Using Skills well is a skill issue. I didn’t quite realize how much until I wrote this.”
🇫🇷 Using Skills well is a matter of skill. I hadn’t realized how much until writing this article. — @trq212 on X
Codex Security: why there is no SAST report
16 March — OpenAI publishes a technical article explaining the design choice behind Codex Security: why the system does not rely on static analysis (SAST) as a starting point.
The approach rests on four pillars: contextual reading (analyzing the full code path with repository context), targeted micro-fuzzing (reducing to the smallest testable fragment to write micro-fuzzers), constraint reasoning (using a Python environment with z3-solver to formalize complex problems), and sandbox validation (distinguishing “this could be a problem” from “this is a problem” with a compiled PoC). The article illustrates these principles with CVE-2024-29041 (Express), an open redirect where malformed URLs bypassed allowlist implementations.
🔗 Why Codex Security Doesn’t Include a SAST Report
Gemini Personal Intelligence: free expansion in the United States
17 March — Google expands Personal Intelligence to more users for free in the United States. This feature, previously reserved for paying subscribers, is now available to free-tier accounts via three surfaces: AI Mode in Google Search, the Gemini app (iOS/Android), and the Gemini in Chrome extension.
Personal Intelligence securely connects the user’s Google apps (Gmail, Google Photos, YouTube, Search) to provide personalized answers. Examples include shopping recommendations tailored to past purchases, technical help targeted at the exact device bought (pulled from Gmail receipts), and personalized travel itineraries based on hotel confirmations. The user chooses which apps to connect and can disable them at any time. Available for personal Google accounts only, not Workspace enterprise/education accounts.
🔗 Google blog – Personal Intelligence
AlphaFold Database: millions of new protein complex structures
17 March — Google DeepMind announces the expansion of the AlphaFold Database (AFDB) with millions of new protein complex structures predicted by AI, in collaboration with EMBL-EBI (European Bioinformatics Institute), NVIDIA, and Seoul National University. The new structures cover, among others, WHO priority bacterial pathogens — the most dangerous and antibiotic-resistant bacteria. This expansion moves from individual proteins to protein complexes (interactions between several proteins), a qualitative leap for medical and pharmaceutical research.
🔗 Pushmeet Kohli announcement on X
xAI: Grok Text-to-Speech API and first place in video editing
Text-to-Speech API
16 March — xAI announces the availability of the Grok Text-to-Speech API, offering natural and expressive voices for developers. LiveKit integrated this TTS into LiveKit Inference at launch.
Grok Imagine #1 in video editing
15 March — Grok Imagine reaches first place in video editing on the Design Arena ranking, with an Elo of 1290. L’API Imagine is now available to developers. The feature covers adding, removing, and swapping objects in video scenes.
Perplexity Computer: full control of Comet and Android
Computer controls Comet without MCP
March 16 — Computer can now take full control of the Comet browser to carry out autonomous tasks: the browser agent can access any site or connected app, without connectors or MCP. Available to all Computer users on Comet.
Computer on Android
March 16 — Perplexity Computer is now available on Android, extending the iOS launch on March 13 to all mobile platforms.
Manus: local desktop and Google Workspace at developer level
Manus “My Computer” on macOS and Windows
March 16 — Manus announces “My Computer”, a central feature of the new Manus Desktop app (macOS and Windows). Previously limited to a cloud sandbox, Manus can now run directly on the local machine via command-line instructions in a local terminal — with explicit user approval at each step.
Use cases span a wide range: sorting and renaming thousands of files, creating native desktop apps (example cited: a Mac app for real-time translation and subtitling built in 20 minutes, without opening Xcode), or using the local GPU to train machine learning models. My Computer complements existing cloud Connectors (Google Calendar, Gmail) rather than replacing them.
🔗 Manus tweet · 🔗 Manus blog
Manus masters Google Workspace with precision
March 17 — Manus is rolling out a major update to its Google Workspace connector, based on the Google Workspace CLI (an open-source tool from the Google team). The previous version treated Google files as monolithic blocks; the new version enables granular actions:
| Area | New capabilities |
|---|---|
| Google Docs | Surgical text replacements, replies to specific comments |
| Google Sheets | Cross-sheet multi-sheet reading, updating a specific cell, duplicating tabs |
| Google Slides | Editing existing presentations (slide title, timeline updates) |
| Google Drive | Folder reorganization |
The update is free and backward compatible.
🔗 Manus tweet · 🔗 Manus blog
GitHub: /fleet for bulk maintenance and $12.5M for open source
Copilot /fleet: maintenance across the entire repo fleet
March 15 — GitHub demonstrates the /fleet command in GitHub Copilot. With a single instruction, developers managing multiple repositories can delegate repetitive maintenance tasks (configuration updates, dependency fixes) to the agent across their entire fleet, rather than repository by repository.
$12.5M for open source security
March 17 — GitHub, Anthropic, AWS, Google, and OpenAI are joining forces in a collective commitment of $12.5 million in support of Alpha-Omega, the Linux Foundation program dedicated to securing the open source ecosystem.
Key GitHub points: 280,000+ maintainers across hundreds of millions of public repositories will be eligible for free access to GitHub Copilot Pro. GitHub is also injecting $5.5M in Azure credits for training. The GitHub Secure Open Source Fund, which has already supported 138 projects, will open its fourth round at the end of April 2026.
The context is significant: AI has greatly accelerated vulnerability discovery, which increases maintainers’ workload. The stated goal is for AI to reduce that burden rather than increase it.
🔗 GitHub Blog article 🔗 Linux Foundation announcement
Z.ai GLM-5-Turbo: high speed for agent environments
March 15 — Z.ai launches GLM-5-Turbo, a high-speed variant of GLM-5 optimized for agent environments (notably OpenClaw). The same day, usage limits are tripled for GLM Coding Plan subscribers. Available on OpenRouter and via the direct API.
Kimi publishes a paper on Attention Residuals
March 16-17 — Moonshot AI publishes a research paper on Attention Residuals on arXiv: a new deep aggregation approach that replaces standard residual connections with recurrence inspired by the time/depth duality (depth-wise aggregation). The analysis shows that this approach naturally mitigates hidden-state magnitude growth issues. Elon Musk replied “Impressive work from Kimi” on the announcement tweet (4.5 million views).
🔗 Kimi tweet · 🔗 arXiv 2603.15031
ElevenLabs × Deloitte: omnichannel agents for the enterprise
March 14 — ElevenLabs and Deloitte announce a strategic partnership combining the ElevenLabs Agents platform with Deloitte’s industry expertise, to help large enterprises deploy omnichannel conversational agents. The partnership targets regulated industries (finance, healthcare, public sector). Deloitte provides business integration, ElevenLabs supplies the AI audio infrastructure (voice, transcription, agents).
Briefs
Tongyi Fun-CineForge (Alibaba, March 16) — Tongyi Lab open-sources Fun-CineForge, an AI cinematic dubbing system approaching professional movie quality. Available on GitHub, HuggingFace, and ModelScope. 🔗 Announcement on X
What this means
NVIDIA GTC 2026 crystallizes an important dynamic: several leading AI labs (Mistral, Perplexity, Cohere) are aligning around NVIDIA infrastructure to co-develop open frontier models or sovereign deployments. This convergence around an open coalition stands in contrast to the recent period of fragmentation — and signals that large-scale pretraining has become too costly to be handled in silos.
GPT-5.4 mini confirms a major trend: “small-format” models are no longer degraded versions but competitive alternatives. With 54.4% on SWE-Bench Pro versus 57.7% for the full model, and a 19x lower cost, GPT-5.4 mini redefines the performance/price ratio for coding workflows.
March 17 also illustrates the rise of local and desktop agents: Manus “My Computer” moves out of the cloud to access the local machine, Perplexity Computer takes control of Comet without MCP, and Claude Code doubles its default generation window for Opus 4.6. The era of the agent that merely suggests is giving way to the era of the agent that executes.
Sources
- Introducing GPT-5.4 mini and nano – OpenAI
- Why Codex Security Doesn’t Include a SAST Report – OpenAI
- Mistral × NVIDIA – X announcement
- Perplexity joins the NVIDIA Nemotron Coalition
- NVIDIA Nemotron Coalition
- NVIDIA Dynamo 1.0 – X
- NVIDIA Physical AI Data Factory Blueprint – X
- Cohere + NVIDIA sovereign AI
- Perplexity Comet Enterprise
- Claude Code v2.1.77 CHANGELOG
- Thariq – Skills article
- Google Personal Intelligence expansion
- AlphaFold Database expansion – X
- xAI TTS API – X
- Grok Imagine #1 Design Arena – X
- Perplexity Computer controls Comet – X
- Perplexity Computer Android – X
- Manus My Computer
- Manus Google Workspace CLI
- GitHub Copilot /fleet – X
- GitHub + Alpha-Omega $12.5M
- Linux Foundation – open source security fund
- Z.ai GLM-5-Turbo – X
- Kimi Attention Residuals – X
- Kimi Attention Residuals – arXiv
- ElevenLabs × Deloitte
- Tongyi Fun-CineForge – X
This document was translated from the fr version to the en language using the gpt-5.4-mini model. For more information about the translation process, see https://gitlab.com/jls42/ai-powered-markdown-translator