GPT-5.4 mini and nano launched by OpenAI, Mistral joins the NVIDIA Nemotron Coalition, Perplexity Comet Enterprise available

March 17, 2026 is marked by NVIDIA GTC and several major launches. OpenAI releases GPT-5.4 mini and nano, its most capable compact models to date, which come close to the full model on several benchmarks. The NVIDIA Nemotron Coalition gains momentum with Mistral AI and Perplexity joining. Perplexity simultaneously opens Comet Enterprise with full MDM governance, Claude Code v2.1.77 doubles the generation limit for Opus 4.6, and GitHub, Anthropic, Google, and OpenAI join forces to fund open source security to the tune of $12.5 million.

GPT-5.4 mini and nano: OpenAI’s compact models

March 17 — OpenAI launches GPT-5.4 mini and GPT-5.4 nano, its highest-performing compact models to date. These two variants bring GPT-5.4 capabilities into formats optimized for high-volume workloads, with reduced latency and lower cost.

GPT-5.4 mini significantly improves on GPT-5 mini in code, reasoning, multimodal understanding, and tool use, while running more than twice as fast. It approaches the performance of the full GPT-5.4 model on several key evaluations, including SWE-Bench Pro and OSWorld-Verified.

GPT-5.4 nano is the smallest and least expensive version of the GPT-5.4 family, designed for tasks where speed and cost come first: classification, data extraction, ranking, and simple code sub-agents.

Evaluation	GPT-5.4	GPT-5.4 mini	GPT-5.4 nano	GPT-5 mini
SWE-Bench Pro (public)	57.7%	54.4%	52.4%	45.7%
Terminal-Bench 2.0	75.1%	60.0%	46.3%	38.2%
Toolathlon	54.6%	42.9%	35.5%	26.9%
GPQA Diamond	93.0%	88.0%	82.8%	81.6%
OSWorld-Verified	75.0%	72.1%	39.0%	42.0%

Use cases fall into three categories: code assistants (GPT-5.4 mini excels in fast coding workflows, debugging loops, frontend generation), sub-agents (in Codex, GPT-5.4 can delegate subtasks to GPT-5.4 mini using only 30% of the GPT-5.4 quota), and interface control (computer use), where GPT-5.4 mini quickly interprets screenshots of dense interfaces.

Model	Availability	Input price	Output price	Context
GPT-5.4 mini	API, Codex, ChatGPT Free/Go	$0.75/million tokens	$4.50/million tokens	400,000 tokens
GPT-5.4 nano	API only	$0.20/million tokens	$1.25/million tokens	—

In ChatGPT, GPT-5.4 mini is available to Free and Go users via the “Thinking” feature in the + menu. For paid plans, it serves as a fallback model when the GPT-5.4 Thinking rate limit is reached.

🔗 Introducing GPT-5.4 mini and nano

NVIDIA GTC 2026: the Nemotron Coalition and Dynamo 1.0

NVIDIA’s GTC conference, which began on March 16, was the catalyst for several major industry announcements: the formation of an open coalition around open source frontier models, the production release of an inference operating system, and the announcement of a data blueprint for physical AI.

Mistral joins the NVIDIA Nemotron Coalition

March 16 — Mistral AI announces a strategic partnership with NVIDIA to co-develop open source frontier AI models. Mistral becomes a founding member of the NVIDIA Nemotron Coalition, combining its frontier architecture with NVIDIA’s compute infrastructure and development tools.

Aspect	Detail
Mistral role	Founding member, frontier architecture + full-stack AI offering
NVIDIA input	GPU infrastructure + development tools
Goal	Co-develop open frontier-level models

🔗 Mistral announcement on X

Perplexity also joins the coalition

March 16 — Perplexity announces that it is joining the same NVIDIA Nemotron Coalition. Key points: Perplexity fine-tunes different open models for each stage of its response pipeline (query analysis, reasoning, final answer). The Nemotron 3 Super model (120 billion parameters, MoE architecture) is now available in the Perplexity search bar, the Agent API, and Perplexity Computer.

🔗 Perplexity blog – Nemotron Coalition 🔗 NVIDIA announcement

Dynamo 1.0: the inference operating system enters production

March 16 — At GTC, NVIDIA announces the production release of Dynamo 1.0, presented as the “inference operating system” for AI factories. Dynamo boosts inference performance on Blackwell GPUs by up to 7x compared with non-optimized deployments. The move to v1.0 marks its transition from the experimental phase into industrial production.

🔗 NVIDIA Dynamo 1.0 announcement

Physical AI Data Factory Blueprint

March 16 — NVIDIA unveils the Physical AI Data Factory Blueprint: a reference architecture for turning accelerated computing into high-quality training data for robotics, AI vision agents, and autonomous vehicles. This blueprint enables enterprises to synthetically generate training data for physical AI at large scale.

🔗 NVIDIA Physical AI announcement

Cohere + NVIDIA: sovereign AI on DGX Spark

March 16 — Cohere and NVIDIA partner to develop sovereign, secure, and efficient AI, also announced at GTC. Two main pillars: NVIDIA ecosystem-native models (custom models optimized for the latest NVIDIA architecture, targeting specialized enterprise workloads) and North on DGX Spark (Cohere’s agentic North platform will be available on NVIDIA DGX Spark, on-premises and with low latency for sensitive data). The target sectors are finance, healthcare, and the public sector.

🔗 Cohere blog – NVIDIA sovereign AI

Perplexity Comet Enterprise: MDM governance and CrowdStrike integration

March 17 — Perplexity launches Comet Enterprise for all Enterprise subscribers. The AI browser moves into an enterprise version with full deployment governance.

Feature	Description
MDM deployment	Silent installer, deployment across thousands of machines, audit logs
Granular telemetry	Per-user tracking
CrowdStrike Falcon	Anti-phishing protection, exfiltration detection (screenshots, downloads)
Real-time intervention	Possible via the CrowdStrike integration
Privacy	Perplexity never trains its models on enterprise data

Early users include Fortune-ranked companies, AWS, AlixPartners, Gunderson Dettmer, and Bessemer Venture Partners. Documented use cases cover client meeting preparation (real-time news), SOW contract analysis, financial calculations, and sector research.

🔗 Perplexity blog – Comet Enterprise

Claude Code v2.1.77: 64k tokens by default for Opus 4.6

March 17 — Claude Code v2.1.77 is released with a significant increase in generation limits and several critical bug fixes.

Model	Default limit	Maximum limit
Claude Opus 4.6	64,000 tokens	128,000 tokens
Claude Sonnet 4.6	—	128,000 tokens

The default limit for Opus 4.6 doubles (from 32k to 64k tokens), enabling much longer responses without additional configuration.

New features:

allowRead in sandboxes: new filesystem configuration parameter allowing reads to be re-authorized in areas covered by a denyRead rule. Useful for granular security configurations.
/copy N: the /copy command now accepts an optional index — /copy 2 copies the second previous assistant response without navigating through history.

Notable fixes:

“Always Allow” on composed bash commands: the rule was being saved for the full string (cd src && npm test) instead of per sub-command. Fixed.
Auto-updater: started parallel downloads during repeated window openings and closings, potentially accumulating tens of gigabytes in memory. Fixed.
--resume truncating history: a race condition between memory extraction writes and the main transcript could lead to silent truncation. Fixed.
PreToolUse hooks bypassing deny rules: a hook returning "allow" bypassed deny permission rules, including enterprise-managed settings. Important security fix.

🔗 Claude Code CHANGELOG

Technical article: how the Claude Code team uses Skills

March 17 — Thariq (@trq212), an engineer on the Claude Code team at Anthropic, publishes “Lessons from Building Claude Code: How We Use Skills”, the second article in the series after “Seeing like an Agent” (February 27, 3.6 million views).

The article documents how Skills have become one of Claude Code’s most widely used extension points — flexible, easy to maintain, and allowing teams to define reusable workflows directly in their development environment. Boris Cherny (@bcherny), head of Claude Code, shared the article, calling it a “Really great writeup”. The author also announces the upcoming open source release of an iMessage skill as a concrete example.

“Using Skills well is a skill issue. I didn’t quite realize how much until I wrote this.” — @trq212 on X

🔗 Publication tweet

Codex Security: why there is no SAST report

March 16 — OpenAI publishes a technical article explaining the design choice behind Codex Security: why the system does not rely on static analysis (SAST) as a starting point.

The approach rests on four pillars: contextual reading (analyzing the full code path with repository context), targeted micro-fuzzing (reducing to the smallest testable fragment to write micro-fuzzers), constraint reasoning (using a Python environment with z3-solver to formalize complex problems), and sandbox validation (distinguishing “this could be a problem” from “this is a problem” with a compiled PoC). The article illustrates these principles with CVE-2024-29041 (Express), an open redirect where malformed URLs bypassed allowlist implementations.

🔗 Why Codex Security Doesn’t Include a SAST Report

Gemini Personal Intelligence: free expansion in the United States

March 17 — Google expands Personal Intelligence to more users for free in the United States. This feature, previously reserved for paid subscribers, is now available to free-tier accounts via three surfaces: AI Mode in Google Search, the Gemini app (iOS/Android), and the Gemini in Chrome extension.

Personal Intelligence securely connects the user’s Google apps (Gmail, Google Photos, YouTube, Search) to provide personalized answers. Examples: shopping recommendations adapted to past purchases, technical assistance targeting the exact device purchased (extracted from Gmail receipts), personalized travel itineraries based on hotel confirmations. The user chooses which apps to connect and can disable them at any time. Available for personal Google accounts only (not enterprise/education Workspace).

🔗 Google blog – Personal Intelligence

AlphaFold Database: millions of new protein complex structures

March 17 — Google DeepMind announces the expansion of the AlphaFold Database (AFDB) with millions of new AI-predicted protein complex structures, in collaboration with EMBL-EBI (European Bioinformatics Institute), NVIDIA, and Seoul National University. The new structures notably cover the WHO’s priority bacterial pathogens — the most dangerous and antibiotic-resistant bacteria. This expansion moves from the level of individual proteins to protein complexes (interactions between multiple proteins), a qualitative leap for medical and pharmaceutical research.

🔗 Pushmeet Kohli announcement on X

xAI: Grok Text-to-Speech API and first place in video editing

Text-to-Speech API

March 16 — xAI announces the availability of the Grok Text-to-Speech API, offering natural and expressive voices for developers. LiveKit integrated this TTS into LiveKit Inference at launch.

🔗 xAI announcement on X

Grok Imagine #1 in video editing

March 15 — Grok Imagine reaches first place in video editing on the Design Arena leaderboard, with an Elo of 1290. The Imagine API is now accessible to developers. The feature covers adding, removing, and swapping objects in video scenes.

🔗 Grok announcement on X

Perplexity Computer: full control of Comet and Android

Computer controls Comet without MCP

March 16 — Computer can now take full control of the Comet browser to perform autonomous tasks: the browser agent can access any connected site or application, without connectors or MCP. Available to all Computer users on Comet.

🔗 Perplexity tweet

Computer on Android

March 16 — Perplexity Computer is now available on Android, extending the March 13 iOS launch to all mobile platforms.

🔗 Perplexity Android tweet

Manus: local desktop and developer-grade Google Workspace

Manus “My Computer” on macOS and Windows

March 16 — Manus announces “My Computer”, a core feature of the new Manus Desktop app (macOS and Windows). Until now limited to a cloud sandbox, Manus can now run directly on the local machine via command-line instructions in a local terminal — with explicit user approval at every step.

Use cases cover a broad spectrum: sorting and renaming thousands of files, creating native desktop applications (example cited: a real-time translation and subtitling Mac app created in 20 minutes, without opening Xcode), or using the local GPU to train machine learning models. My Computer complements existing cloud Connectors (Google Calendar, Gmail) rather than replacing them.

🔗 Manus tweet · 🔗 Manus blog

Manus masters Google Workspace with precision

March 17 — Manus rolls out a major update to its Google Workspace connector, based on the Google Workspace CLI (an open source tool from the Google team). The previous version treated Google files as monolithic blocks; the new version enables granular actions:

Area	New capabilities
Google Docs	Surgical text replacements, replies to specific comments
Google Sheets	Cross-sheet multi-sheet reading, updating a precise cell, duplicating tabs
Google Slides	Editing existing presentations (slide title, timeline update)
Google Drive	Folder reorganization

The update is free and backward-compatible.

🔗 Manus tweet · 🔗 Manus blog

GitHub: `/fleet` for fleet-wide maintenance and $12.5M for open source

Copilot `/fleet`: maintenance across the entire repository fleet

March 15 — GitHub demonstrates the /fleet command in GitHub Copilot. With a single instruction, developers who manage multiple repositories can delegate repetitive maintenance tasks (configuration updates, dependency fixes) to the agent across their entire fleet, rather than repository by repository.

🔗 GitHub tweet

$12.5M for open source security

March 17 — GitHub, Anthropic, AWS, Google, and OpenAI are joining forces in a collective $12.5 million commitment to Alpha-Omega, the Linux Foundation program dedicated to securing the open source ecosystem.

Key points on GitHub’s side: 280,000+ maintainers across hundreds of millions of public repositories will be eligible for free access to GitHub Copilot Pro. GitHub is also injecting $5.5M in Azure credits for training. The GitHub Secure Open Source Fund, which has already supported 138 projects, opens its fourth cohort at the end of April 2026.

The context is significant: AI has considerably accelerated vulnerability discovery, increasing the burden on maintainers. The stated goal is for AI to reduce that burden rather than increase it.

🔗 GitHub Blog article 🔗 Linux Foundation announcement

Z.ai GLM-5-Turbo: high speed for agent environments

March 15 — Z.ai launches GLM-5-Turbo, a high-speed variant of GLM-5 optimized for agent environments (notably OpenClaw). The same day, usage limits are tripled for GLM Coding Plan subscribers. Available on OpenRouter and via the direct API.

🔗 Z.ai announcement on X

Kimi publishes a paper on Attention Residuals

March 16–17 — Moonshot AI publishes a research paper on Attention Residuals on arXiv: a new depth-wise aggregation approach that replaces standard residual connections with a recurrence inspired by the time/depth duality (depth-wise aggregation). The analysis shows that this approach naturally mitigates hidden-state magnitude growth issues. Elon Musk replied “Impressive work from Kimi” to the announcement tweet (4.5 million views).

🔗 Kimi tweet · 🔗 arXiv 2603.15031

ElevenLabs × Deloitte: omnichannel agents for the enterprise

March 14 — ElevenLabs and Deloitte announce a strategic partnership combining the ElevenLabs Agents platform with Deloitte’s industry expertise to help large enterprises deploy omnichannel conversational agents. The partnership targets regulated companies (finance, healthcare, public services). Deloitte provides business integration, while ElevenLabs supplies the AI audio infrastructure (voice, transcription, agents).

🔗 ElevenLabs blog

Briefs

Tongyi Fun-CineForge (Alibaba, March 16) — Tongyi Lab open-sources Fun-CineForge, an AI cinematic dubbing system approaching professional cinema quality. Available on GitHub, HuggingFace, and ModelScope. 🔗 Announcement on X

What it means

NVIDIA GTC 2026 crystallizes an important dynamic: several leading AI labs (Mistral, Perplexity, Cohere) are aligning around NVIDIA infrastructure to co-develop open frontier models or sovereign deployments. This convergence around an open coalition contrasts with the recent period of fragmentation — and signals that large-scale pre-training has become too costly to handle in silos.

GPT-5.4 mini confirms a major trend: “small-format” models are no longer degraded versions but competitive alternatives. With 54.4% on SWE-Bench Pro versus 57.7% for the full model, and a 19x lower cost, GPT-5.4 mini redefines the performance/price ratio for coding workflows.

March 17 also illustrates the rise of local and desktop agents: Manus “My Computer” moves out of the cloud to access the local machine, Perplexity Computer takes control of Comet without MCP, and Claude Code doubles its default generation window for Opus 4.6. The era of the agent that merely suggests is giving way to the era of the agent that executes.

Sources

This document has been translated from the fr version into en using the gpt-5.5 model. For more information about the translation process, see https://gitlab.com/jls42/ai-powered-markdown-translator