March 17, 2026 is marked by NVIDIA GTC and several major launches. OpenAI releases GPT-5.4 mini and nano, its most capable compact models to date, which come close to the full model on several benchmarks. The NVIDIA Nemotron Coalition gains momentum with Mistral AI and Perplexity joining. Perplexity simultaneously opens Comet Enterprise with full MDM governance, Claude Code v2.1.77 doubles the generation limit for Opus 4.6, and GitHub, Anthropic, Google, and OpenAI join forces to fund open source security to the tune of $12.5 million.
GPT-5.4 mini and nano: OpenAI’s compact models
March 17 — OpenAI launches GPT-5.4 mini and GPT-5.4 nano, its highest-performing compact models to date. These two variants bring GPT-5.4 capabilities into formats optimized for high-volume workloads, with reduced latency and lower cost.
GPT-5.4 mini significantly improves on GPT-5 mini in code, reasoning, multimodal understanding, and tool use, while running more than twice as fast. It approaches the performance of the full GPT-5.4 model on several key evaluations, including SWE-Bench Pro and OSWorld-Verified.
GPT-5.4 nano is the smallest and least expensive version of the GPT-5.4 family, designed for tasks where speed and cost come first: classification, data extraction, ranking, and simple code sub-agents.
| Evaluation | GPT-5.4 | GPT-5.4 mini | GPT-5.4 nano | GPT-5 mini |
|---|---|---|---|---|
| SWE-Bench Pro (public) | 57.7% | 54.4% | 52.4% | 45.7% |
| Terminal-Bench 2.0 | 75.1% | 60.0% | 46.3% | 38.2% |
| Toolathlon | 54.6% | 42.9% | 35.5% | 26.9% |
| GPQA Diamond | 93.0% | 88.0% | 82.8% | 81.6% |
| OSWorld-Verified | 75.0% | 72.1% | 39.0% | 42.0% |
Use cases fall into three categories: code assistants (GPT-5.4 mini excels in fast coding workflows, debugging loops, frontend generation), sub-agents (in Codex, GPT-5.4 can delegate subtasks to GPT-5.4 mini using only 30% of the GPT-5.4 quota), and interface control (computer use), where GPT-5.4 mini quickly interprets screenshots of dense interfaces.
| Model | Availability | Input price | Output price | Context |
|---|---|---|---|---|
| GPT-5.4 mini | API, Codex, ChatGPT Free/Go | $0.75/million tokens | $4.50/million tokens | 400,000 tokens |
| GPT-5.4 nano | API only | $0.20/million tokens | $1.25/million tokens | — |
In ChatGPT, GPT-5.4 mini is available to Free and Go users via the “Thinking” feature in the + menu. For paid plans, it serves as a fallback model when the GPT-5.4 Thinking rate limit is reached.
🔗 Introducing GPT-5.4 mini and nano
NVIDIA GTC 2026: the Nemotron Coalition and Dynamo 1.0
NVIDIA’s GTC conference, which began on March 16, was the catalyst for several major industry announcements: the formation of an open coalition around open source frontier models, the production release of an inference operating system, and the announcement of a data blueprint for physical AI.
Mistral joins the NVIDIA Nemotron Coalition
March 16 — Mistral AI announces a strategic partnership with NVIDIA to co-develop open source frontier AI models. Mistral becomes a founding member of the NVIDIA Nemotron Coalition, combining its frontier architecture with NVIDIA’s compute infrastructure and development tools.
| Aspect | Detail |
|---|---|
| Mistral role | Founding member, frontier architecture + full-stack AI offering |
| NVIDIA input | GPU infrastructure + development tools |
| Goal | Co-develop open frontier-level models |
Perplexity also joins the coalition
March 16 — Perplexity announces that it is joining the same NVIDIA Nemotron Coalition. Key points: Perplexity fine-tunes different open models for each stage of its response pipeline (query analysis, reasoning, final answer). The Nemotron 3 Super model (120 billion parameters, MoE architecture) is now available in the Perplexity search bar, the Agent API, and Perplexity Computer.
🔗 Perplexity blog – Nemotron Coalition 🔗 NVIDIA announcement
Dynamo 1.0: the inference operating system enters production
March 16 — At GTC, NVIDIA announces the production release of Dynamo 1.0, presented as the “inference operating system” for AI factories. Dynamo boosts inference performance on Blackwell GPUs by up to 7x compared with non-optimized deployments. The move to v1.0 marks its transition from the experimental phase into industrial production.
🔗 NVIDIA Dynamo 1.0 announcement
Physical AI Data Factory Blueprint
March 16 — NVIDIA unveils the Physical AI Data Factory Blueprint: a reference architecture for turning accelerated computing into high-quality training data for robotics, AI vision agents, and autonomous vehicles. This blueprint enables enterprises to synthetically generate training data for physical AI at large scale.
🔗 NVIDIA Physical AI announcement
Cohere + NVIDIA: sovereign AI on DGX Spark
March 16 — Cohere and NVIDIA partner to develop sovereign, secure, and efficient AI, also announced at GTC. Two main pillars: NVIDIA ecosystem-native models (custom models optimized for the latest NVIDIA architecture, targeting specialized enterprise workloads) and North on DGX Spark (Cohere’s agentic North platform will be available on NVIDIA DGX Spark, on-premises and with low latency for sensitive data). The target sectors are finance, healthcare, and the public sector.
🔗 Cohere blog – NVIDIA sovereign AI
Perplexity Comet Enterprise: MDM governance and CrowdStrike integration
March 17 — Perplexity launches Comet Enterprise for all Enterprise subscribers. The AI browser moves into an enterprise version with full deployment governance.
| Feature | Description |
|---|---|
| MDM deployment | Silent installer, deployment across thousands of machines, audit logs |
| Granular telemetry | Per-user tracking |
| CrowdStrike Falcon | Anti-phishing protection, exfiltration detection (screenshots, downloads) |
| Real-time intervention | Possible via the CrowdStrike integration |
| Privacy | Perplexity never trains its models on enterprise data |
Early users include Fortune-ranked companies, AWS, AlixPartners, Gunderson Dettmer, and Bessemer Venture Partners. Documented use cases cover client meeting preparation (real-time news), SOW contract analysis, financial calculations, and sector research.
🔗 Perplexity blog – Comet Enterprise
Claude Code v2.1.77: 64k tokens by default for Opus 4.6
March 17 — Claude Code v2.1.77 is released with a significant increase in generation limits and several critical bug fixes.
| Model | Default limit | Maximum limit |
|---|---|---|
| Claude Opus 4.6 | 64,000 tokens | 128,000 tokens |
| Claude Sonnet 4.6 | — | 128,000 tokens |
The default limit for Opus 4.6 doubles (from 32k to 64k tokens), enabling much longer responses without additional configuration.
New features:
allowReadin sandboxes: new filesystem configuration parameter allowing reads to be re-authorized in areas covered by adenyReadrule. Useful for granular security configurations./copy N: the/copycommand now accepts an optional index —/copy 2copies the second previous assistant response without navigating through history.
Notable fixes:
- “Always Allow” on composed bash commands: the rule was being saved for the full string (
cd src && npm test) instead of per sub-command. Fixed. - Auto-updater: started parallel downloads during repeated window openings and closings, potentially accumulating tens of gigabytes in memory. Fixed.
--resumetruncating history: a race condition between memory extraction writes and the main transcript could lead to silent truncation. Fixed.PreToolUsehooks bypassingdenyrules: a hook returning"allow"bypasseddenypermission rules, including enterprise-managed settings. Important security fix.
Technical article: how the Claude Code team uses Skills
March 17 — Thariq (@trq212), an engineer on the Claude Code team at Anthropic, publishes “Lessons from Building Claude Code: How We Use Skills”, the second article in the series after “Seeing like an Agent” (February 27, 3.6 million views).
The article documents how Skills have become one of Claude Code’s most widely used extension points — flexible, easy to maintain, and allowing teams to define reusable workflows directly in their development environment. Boris Cherny (@bcherny), head of Claude Code, shared the article, calling it a “Really great writeup”. The author also announces the upcoming open source release of an iMessage skill as a concrete example.
“Using Skills well is a skill issue. I didn’t quite realize how much until I wrote this.” — @trq212 on X
Codex Security: why there is no SAST report
March 16 — OpenAI publishes a technical article explaining the design choice behind Codex Security: why the system does not rely on static analysis (SAST) as a starting point.
The approach rests on four pillars: contextual reading (analyzing the full code path with repository context), targeted micro-fuzzing (reducing to the smallest testable fragment to write micro-fuzzers), constraint reasoning (using a Python environment with z3-solver to formalize complex problems), and sandbox validation (distinguishing “this could be a problem” from “this is a problem” with a compiled PoC). The article illustrates these principles with CVE-2024-29041 (Express), an open redirect where malformed URLs bypassed allowlist implementations.
🔗 Why Codex Security Doesn’t Include a SAST Report
Gemini Personal Intelligence: free expansion in the United States
March 17 — Google expands Personal Intelligence to more users for free in the United States. This feature, previously reserved for paid subscribers, is now available to free-tier accounts via three surfaces: AI Mode in Google Search, the Gemini app (iOS/Android), and the Gemini in Chrome extension.
Personal Intelligence securely connects the user’s Google apps (Gmail, Google Photos, YouTube, Search) to provide personalized answers. Examples: shopping recommendations adapted to past purchases, technical assistance targeting the exact device purchased (extracted from Gmail receipts), personalized travel itineraries based on hotel confirmations. The user chooses which apps to connect and can disable them at any time. Available for personal Google accounts only (not enterprise/education Workspace).
🔗 Google blog – Personal Intelligence
AlphaFold Database: millions of new protein complex structures
March 17 — Google DeepMind announces the expansion of the AlphaFold Database (AFDB) with millions of new AI-predicted protein complex structures, in collaboration with EMBL-EBI (European Bioinformatics Institute), NVIDIA, and Seoul National University. The new structures notably cover the WHO’s priority bacterial pathogens — the most dangerous and antibiotic-resistant bacteria. This expansion moves from the level of individual proteins to protein complexes (interactions between multiple proteins), a qualitative leap for medical and pharmaceutical research.
🔗 Pushmeet Kohli announcement on X
xAI: Grok Text-to-Speech API and first place in video editing
Text-to-Speech API
March 16 — xAI announces the availability of the Grok Text-to-Speech API, offering natural and expressive voices for developers. LiveKit integrated this TTS into LiveKit Inference at launch.
Grok Imagine #1 in video editing
March 15 — Grok Imagine reaches first place in video editing on the Design Arena leaderboard, with an Elo of 1290. The Imagine API is now accessible to developers. The feature covers adding, removing, and swapping objects in video scenes.
Perplexity Computer: full control of Comet and Android
Computer controls Comet without MCP
March 16 — Computer can now take full control of the Comet browser to perform autonomous tasks: the browser agent can access any connected site or application, without connectors or MCP. Available to all Computer users on Comet.
Computer on Android
March 16 — Perplexity Computer is now available on Android, extending the March 13 iOS launch to all mobile platforms.
Manus: local desktop and developer-grade Google Workspace
Manus “My Computer” on macOS and Windows
March 16 — Manus announces “My Computer”, a core feature of the new Manus Desktop app (macOS and Windows). Until now limited to a cloud sandbox, Manus can now run directly on the local machine via command-line instructions in a local terminal — with explicit user approval at every step.
Use cases cover a broad spectrum: sorting and renaming thousands of files, creating native desktop applications (example cited: a real-time translation and subtitling Mac app created in 20 minutes, without opening Xcode), or using the local GPU to train machine learning models. My Computer complements existing cloud Connectors (Google Calendar, Gmail) rather than replacing them.
🔗 Manus tweet · 🔗 Manus blog
Manus masters Google Workspace with precision
March 17 — Manus rolls out a major update to its Google Workspace connector, based on the Google Workspace CLI (an open source tool from the Google team). The previous version treated Google files as monolithic blocks; the new version enables granular actions:
| Area | New capabilities |
|---|---|
| Google Docs | Surgical text replacements, replies to specific comments |
| Google Sheets | Cross-sheet multi-sheet reading, updating a precise cell, duplicating tabs |
| Google Slides | Editing existing presentations (slide title, timeline update) |
| Google Drive | Folder reorganization |
The update is free and backward-compatible.
🔗 Manus tweet · 🔗 Manus blog
GitHub: /fleet for fleet-wide maintenance and $12.5M for open source
Copilot /fleet: maintenance across the entire repository fleet
March 15 — GitHub demonstrates the /fleet command in GitHub Copilot. With a single instruction, developers who manage multiple repositories can delegate repetitive maintenance tasks (configuration updates, dependency fixes) to the agent across their entire fleet, rather than repository by repository.
$12.5M for open source security
March 17 — GitHub, Anthropic, AWS, Google, and OpenAI are joining forces in a collective $12.5 million commitment to Alpha-Omega, the Linux Foundation program dedicated to securing the open source ecosystem.
Key points on GitHub’s side: 280,000+ maintainers across hundreds of millions of public repositories will be eligible for free access to GitHub Copilot Pro. GitHub is also injecting $5.5M in Azure credits for training. The GitHub Secure Open Source Fund, which has already supported 138 projects, opens its fourth cohort at the end of April 2026.
The context is significant: AI has considerably accelerated vulnerability discovery, increasing the burden on maintainers. The stated goal is for AI to reduce that burden rather than increase it.
🔗 GitHub Blog article 🔗 Linux Foundation announcement
Z.ai GLM-5-Turbo: high speed for agent environments
March 15 — Z.ai launches GLM-5-Turbo, a high-speed variant of GLM-5 optimized for agent environments (notably OpenClaw). The same day, usage limits are tripled for GLM Coding Plan subscribers. Available on OpenRouter and via the direct API.
Kimi publishes a paper on Attention Residuals
March 16–17 — Moonshot AI publishes a research paper on Attention Residuals on arXiv: a new depth-wise aggregation approach that replaces standard residual connections with a recurrence inspired by the time/depth duality (depth-wise aggregation). The analysis shows that this approach naturally mitigates hidden-state magnitude growth issues. Elon Musk replied “Impressive work from Kimi” to the announcement tweet (4.5 million views).
🔗 Kimi tweet · 🔗 arXiv 2603.15031
ElevenLabs × Deloitte: omnichannel agents for the enterprise
March 14 — ElevenLabs and Deloitte announce a strategic partnership combining the ElevenLabs Agents platform with Deloitte’s industry expertise to help large enterprises deploy omnichannel conversational agents. The partnership targets regulated companies (finance, healthcare, public services). Deloitte provides business integration, while ElevenLabs supplies the AI audio infrastructure (voice, transcription, agents).
Briefs
Tongyi Fun-CineForge (Alibaba, March 16) — Tongyi Lab open-sources Fun-CineForge, an AI cinematic dubbing system approaching professional cinema quality. Available on GitHub, HuggingFace, and ModelScope. 🔗 Announcement on X
What it means
NVIDIA GTC 2026 crystallizes an important dynamic: several leading AI labs (Mistral, Perplexity, Cohere) are aligning around NVIDIA infrastructure to co-develop open frontier models or sovereign deployments. This convergence around an open coalition contrasts with the recent period of fragmentation — and signals that large-scale pre-training has become too costly to handle in silos.
GPT-5.4 mini confirms a major trend: “small-format” models are no longer degraded versions but competitive alternatives. With 54.4% on SWE-Bench Pro versus 57.7% for the full model, and a 19x lower cost, GPT-5.4 mini redefines the performance/price ratio for coding workflows.
March 17 also illustrates the rise of local and desktop agents: Manus “My Computer” moves out of the cloud to access the local machine, Perplexity Computer takes control of Comet without MCP, and Claude Code doubles its default generation window for Opus 4.6. The era of the agent that merely suggests is giving way to the era of the agent that executes.
Sources
- Introducing GPT-5.4 mini and nano – OpenAI
- Why Codex Security Doesn’t Include a SAST Report – OpenAI
- Mistral × NVIDIA – X announcement
- Perplexity joins the NVIDIA Nemotron Coalition
- NVIDIA Nemotron Coalition
- NVIDIA Dynamo 1.0 – X
- NVIDIA Physical AI Data Factory Blueprint – X
- Cohere + NVIDIA sovereign AI
- Perplexity Comet Enterprise
- Claude Code v2.1.77 CHANGELOG
- Thariq – Skills article
- Google Personal Intelligence expansion
- AlphaFold Database expansion – X
- xAI TTS API – X
- Grok Imagine #1 Design Arena – X
- Perplexity Computer controls Comet – X
- Perplexity Computer Android – X
- Manus My Computer
- Manus Google Workspace CLI
- GitHub Copilot /fleet – X
- GitHub + Alpha-Omega $12.5M
- Linux Foundation – open source security funding
- Z.ai GLM-5-Turbo – X
- Kimi Attention Residuals – X
- Kimi Attention Residuals – arXiv
- ElevenLabs × Deloitte
- Tongyi Fun-CineForge – X
This document has been translated from the fr version into en using the gpt-5.5 model. For more information about the translation process, see https://gitlab.com/jls42/ai-powered-markdown-translator