Search

GPT-5.4 mini and nano launched by OpenAI, Mistral joins NVIDIA's Nemotron Coalition, Perplexity Comet Enterprise available

GPT-5.4 mini and nano launched by OpenAI, Mistral joins NVIDIA's Nemotron Coalition, Perplexity Comet Enterprise available

March 17, 2026 is marked by NVIDIA GTC and several major launches. OpenAI releases GPT-5.4 mini and nano, its most capable compact models to date, which come close to the full model on several benchmarks. The NVIDIA Nemotron Coalition grows with the addition of Mistral AI and Perplexity. Perplexity simultaneously opens Comet Enterprise with full MDM governance, Claude Code v2.1.77 doubles the generation limit for Opus 4.6, and GitHub, Anthropic, Google, and OpenAI join forces to fund open source security to the tune of $12.5 million.


GPT-5.4 mini and nano: OpenAI’s compact models

17 March — OpenAI launches GPT-5.4 mini and GPT-5.4 nano, its most powerful compact models to date. These two variants bring GPT-5.4 capabilities into formats optimized for high-volume workloads, with lower latency and lower cost.

GPT-5.4 mini significantly improves on GPT-5 mini in code, reasoning, multimodal understanding, and tool use, while running more than twice as fast. It comes close to the performance of the full GPT-5.4 model on several key evaluations, including SWE-Bench Pro and OSWorld-Verified.

GPT-5.4 nano is the smallest and least expensive version of the GPT-5.4 family, designed for tasks where speed and cost matter most: classification, data extraction, ranking, and simple code sub-agents.

EvaluationGPT-5.4GPT-5.4 miniGPT-5.4 nanoGPT-5 mini
SWE-Bench Pro (public)57.7 %54.4 %52.4 %45.7 %
Terminal-Bench 2.075.1 %60.0 %46.3 %38.2 %
Toolathlon54.6 %42.9 %35.5 %26.9 %
GPQA Diamond93.0 %88.0 %82.8 %81.6 %
OSWorld-Verified75.0 %72.1 %39.0 %42.0 %

The use cases fall into three categories: code assistants (GPT-5.4 mini excels in fast coding workflows, debugging loops, and frontend generation), sub-agents (in Codex, GPT-5.4 can delegate subtasks to GPT-5.4 mini using only 30% of the GPT-5.4 quota), and interface control (computer use), where GPT-5.4 mini quickly interprets screenshots of dense interfaces.

ModelAvailabilityInput priceOutput priceContext
GPT-5.4 miniAPI, Codex, ChatGPT Free/Go$0.75/million tokens$4.50/million tokens400,000 tokens
GPT-5.4 nanoAPI only$0.20/million tokens$1.25/million tokens

In ChatGPT, GPT-5.4 mini is available to Free and Go users via the “Thinking” feature in the + menu. For paid plans, it serves as a fallback model when GPT-5.4 Thinking hits its rate limit.

🔗 Introducing GPT-5.4 mini and nano


NVIDIA GTC 2026: Nemotron Coalition and Dynamo 1.0

NVIDIA’s GTC conference, which began on March 16, was the catalyst for several major announcements across the industry: the formation of an open coalition around frontier open source models, the production release of an inference operating system, and the announcement of a data blueprint for physical AI.

Mistral joins the NVIDIA Nemotron Coalition

16 March — Mistral AI announces a strategic partnership with NVIDIA to co-develop frontier open source AI models. Mistral becomes a founding member of the NVIDIA Nemotron Coalition, combining its frontier architecture with NVIDIA’s compute infrastructure and development tools.

AspectDetail
Mistral roleFounding member, frontier architecture + full-stack AI offering
NVIDIA contributionGPU infrastructure + development tools
GoalCo-develop open frontier-level models

🔗 Mistral announcement on X

Perplexity also joins the coalition

16 March — Perplexity announces that it is joining the same NVIDIA Nemotron Coalition. Key points: Perplexity fine-tunes different open models for each stage of its response pipeline (query analysis, reasoning, final answer). The Nemotron 3 Super model (120 billion parameters, MoE architecture) is now available in the Perplexity search bar, the Agent API, and Perplexity Computer.

🔗 Perplexity blog – Nemotron Coalition 🔗 NVIDIA announcement

Dynamo 1.0: inference operating system goes to production

16 March — NVIDIA announces at GTC the production release of Dynamo 1.0, presented as the “inference operating system” for AI factories. Dynamo boosts inference performance on Blackwell GPUs by up to 7x compared with unoptimized deployments. The move to v1.0 marks its transition from the experimental phase to industrial production.

🔗 NVIDIA Dynamo 1.0 announcement

Physical AI Data Factory Blueprint

16 March — NVIDIA unveils the Physical AI Data Factory Blueprint: a reference architecture for turning accelerated computing into high-quality training data for robotics, AI vision agents, and autonomous vehicles. This blueprint enables companies to synthetically generate training data for physical AI at scale.

🔗 NVIDIA Physical AI announcement

Cohere + NVIDIA: sovereign AI on DGX Spark

16 March — Cohere and NVIDIA are partnering to develop sovereign, secure, and efficient AI, also announced at GTC. Two main tracks: NVIDIA ecosystem-native models (custom models optimized for the latest NVIDIA architecture, targeting specialized enterprise workloads) and North on DGX Spark (Cohere’s North agentic platform will be available on NVIDIA DGX Spark, local and low-latency for sensitive data). Target sectors include finance, healthcare, and the public sector.

🔗 Cohere blog – NVIDIA sovereign AI


Perplexity Comet Enterprise: MDM governance and CrowdStrike integration

17 March — Perplexity launches Comet Enterprise for all Enterprise subscribers. The AI browser moves to an enterprise version with full deployment governance.

FeatureDescription
MDM deploymentSilent installer, deployment across thousands of machines, audit logs
Granular telemetryPer-user tracking
CrowdStrike FalconAnti-phishing protection, exfiltration detection (screenshots, downloads)
Real-time interventionPossible via CrowdStrike integration
PrivacyPerplexity never trains its models on enterprise data

Early users include Fortune companies, AWS, AlixPartners, Gunderson Dettmer, and Bessemer Venture Partners. Documented use cases cover client meeting preparation (real-time news), SOW contract analysis, financial calculations, and sector research.

🔗 Perplexity blog – Comet Enterprise


Claude Code v2.1.77: 64k tokens by default for Opus 4.6

17 March — Claude Code v2.1.77 is released with a significant increase in generation limits and several critical bug fixes.

ModelDefault limitMaximum limit
Claude Opus 4.664,000 tokens128,000 tokens
Claude Sonnet 4.6128,000 tokens

The default limit for Opus 4.6 doubles, from 32k to 64k tokens, enabling much longer responses without additional configuration.

New features:

  • allowRead in sandboxes: new filesystem configuration setting allowing reads to be re-enabled in areas covered by a denyRead rule. Useful for granular security configurations.
  • /copy N: the /copy command now accepts an optional index — /copy 2 copies the assistant’s second-most-recent response without navigating history.

Notable fixes:

  • “Always Allow” on compound bash commands: the rule was saved for the full string (cd src && npm test) instead of per subcommand. Fixed.
  • Auto-updater: started parallel downloads during repeated window opens and closes, which could accumulate dozens of gigabytes in memory. Fixed.
  • --resume truncating history: a race condition between memory extraction writes and the main transcript could lead to silent truncation. Fixed.
  • PreToolUse hooks bypassing deny rules: a hook returning "allow" bypassed deny permission rules, including enterprise-managed settings. Important security fix.

🔗 Claude Code CHANGELOG


Technical article: how the Claude Code team uses Skills

17 March — Thariq (@trq212), an engineer on the Claude Code team at Anthropic, publishes “Lessons from Building Claude Code: How We Use Skills”, the second article in the series after “Seeing like an Agent” (February 27, 3.6 million views).

The article documents how Skills have become one of the most widely used extension points in Claude Code — flexible, easy to maintain, and enabling teams to define reusable workflows directly in their development environment. Boris Cherny (@bcherny), head of Claude Code, shared the article and called it “Really great writeup”. The author also announces the upcoming release of an open source iMessage skill as a concrete example.

“Using Skills well is a skill issue. I didn’t quite realize how much until I wrote this.”

🇫🇷 Using Skills well is a matter of skill. I hadn’t realized how much until writing this article.@trq212 on X

🔗 Publication tweet


Codex Security: why there is no SAST report

16 March — OpenAI publishes a technical article explaining the design choice behind Codex Security: why the system does not rely on static analysis (SAST) as a starting point.

The approach rests on four pillars: contextual reading (analyzing the full code path with repository context), targeted micro-fuzzing (reducing to the smallest testable fragment to write micro-fuzzers), constraint reasoning (using a Python environment with z3-solver to formalize complex problems), and sandbox validation (distinguishing “this could be a problem” from “this is a problem” with a compiled PoC). The article illustrates these principles with CVE-2024-29041 (Express), an open redirect where malformed URLs bypassed allowlist implementations.

🔗 Why Codex Security Doesn’t Include a SAST Report


Gemini Personal Intelligence: free expansion in the United States

17 March — Google expands Personal Intelligence to more users for free in the United States. This feature, previously reserved for paying subscribers, is now available to free-tier accounts via three surfaces: AI Mode in Google Search, the Gemini app (iOS/Android), and the Gemini in Chrome extension.

Personal Intelligence securely connects the user’s Google apps (Gmail, Google Photos, YouTube, Search) to provide personalized answers. Examples include shopping recommendations tailored to past purchases, technical help targeted at the exact device bought (pulled from Gmail receipts), and personalized travel itineraries based on hotel confirmations. The user chooses which apps to connect and can disable them at any time. Available for personal Google accounts only, not Workspace enterprise/education accounts.

🔗 Google blog – Personal Intelligence


AlphaFold Database: millions of new protein complex structures

17 March — Google DeepMind announces the expansion of the AlphaFold Database (AFDB) with millions of new protein complex structures predicted by AI, in collaboration with EMBL-EBI (European Bioinformatics Institute), NVIDIA, and Seoul National University. The new structures cover, among others, WHO priority bacterial pathogens — the most dangerous and antibiotic-resistant bacteria. This expansion moves from individual proteins to protein complexes (interactions between several proteins), a qualitative leap for medical and pharmaceutical research.

🔗 Pushmeet Kohli announcement on X


xAI: Grok Text-to-Speech API and first place in video editing

Text-to-Speech API

16 March — xAI announces the availability of the Grok Text-to-Speech API, offering natural and expressive voices for developers. LiveKit integrated this TTS into LiveKit Inference at launch.

🔗 xAI announcement on X

Grok Imagine #1 in video editing

15 March — Grok Imagine reaches first place in video editing on the Design Arena ranking, with an Elo of 1290. L’API Imagine is now available to developers. The feature covers adding, removing, and swapping objects in video scenes.

🔗 Grok announcement on X


Perplexity Computer: full control of Comet and Android

Computer controls Comet without MCP

March 16 — Computer can now take full control of the Comet browser to carry out autonomous tasks: the browser agent can access any site or connected app, without connectors or MCP. Available to all Computer users on Comet.

🔗 Perplexity tweet

Computer on Android

March 16 — Perplexity Computer is now available on Android, extending the iOS launch on March 13 to all mobile platforms.

🔗 Perplexity Android tweet


Manus: local desktop and Google Workspace at developer level

Manus “My Computer” on macOS and Windows

March 16 — Manus announces “My Computer”, a central feature of the new Manus Desktop app (macOS and Windows). Previously limited to a cloud sandbox, Manus can now run directly on the local machine via command-line instructions in a local terminal — with explicit user approval at each step.

Use cases span a wide range: sorting and renaming thousands of files, creating native desktop apps (example cited: a Mac app for real-time translation and subtitling built in 20 minutes, without opening Xcode), or using the local GPU to train machine learning models. My Computer complements existing cloud Connectors (Google Calendar, Gmail) rather than replacing them.

🔗 Manus tweet · 🔗 Manus blog

Manus masters Google Workspace with precision

March 17 — Manus is rolling out a major update to its Google Workspace connector, based on the Google Workspace CLI (an open-source tool from the Google team). The previous version treated Google files as monolithic blocks; the new version enables granular actions:

AreaNew capabilities
Google DocsSurgical text replacements, replies to specific comments
Google SheetsCross-sheet multi-sheet reading, updating a specific cell, duplicating tabs
Google SlidesEditing existing presentations (slide title, timeline updates)
Google DriveFolder reorganization

The update is free and backward compatible.

🔗 Manus tweet · 🔗 Manus blog


GitHub: /fleet for bulk maintenance and $12.5M for open source

Copilot /fleet: maintenance across the entire repo fleet

March 15 — GitHub demonstrates the /fleet command in GitHub Copilot. With a single instruction, developers managing multiple repositories can delegate repetitive maintenance tasks (configuration updates, dependency fixes) to the agent across their entire fleet, rather than repository by repository.

🔗 GitHub tweet

$12.5M for open source security

March 17 — GitHub, Anthropic, AWS, Google, and OpenAI are joining forces in a collective commitment of $12.5 million in support of Alpha-Omega, the Linux Foundation program dedicated to securing the open source ecosystem.

Key GitHub points: 280,000+ maintainers across hundreds of millions of public repositories will be eligible for free access to GitHub Copilot Pro. GitHub is also injecting $5.5M in Azure credits for training. The GitHub Secure Open Source Fund, which has already supported 138 projects, will open its fourth round at the end of April 2026.

The context is significant: AI has greatly accelerated vulnerability discovery, which increases maintainers’ workload. The stated goal is for AI to reduce that burden rather than increase it.

🔗 GitHub Blog article 🔗 Linux Foundation announcement


Z.ai GLM-5-Turbo: high speed for agent environments

March 15 — Z.ai launches GLM-5-Turbo, a high-speed variant of GLM-5 optimized for agent environments (notably OpenClaw). The same day, usage limits are tripled for GLM Coding Plan subscribers. Available on OpenRouter and via the direct API.

🔗 Z.ai announcement on X


Kimi publishes a paper on Attention Residuals

March 16-17 — Moonshot AI publishes a research paper on Attention Residuals on arXiv: a new deep aggregation approach that replaces standard residual connections with recurrence inspired by the time/depth duality (depth-wise aggregation). The analysis shows that this approach naturally mitigates hidden-state magnitude growth issues. Elon Musk replied “Impressive work from Kimi” on the announcement tweet (4.5 million views).

🔗 Kimi tweet · 🔗 arXiv 2603.15031


ElevenLabs × Deloitte: omnichannel agents for the enterprise

March 14 — ElevenLabs and Deloitte announce a strategic partnership combining the ElevenLabs Agents platform with Deloitte’s industry expertise, to help large enterprises deploy omnichannel conversational agents. The partnership targets regulated industries (finance, healthcare, public sector). Deloitte provides business integration, ElevenLabs supplies the AI audio infrastructure (voice, transcription, agents).

🔗 ElevenLabs blog


Briefs

Tongyi Fun-CineForge (Alibaba, March 16) — Tongyi Lab open-sources Fun-CineForge, an AI cinematic dubbing system approaching professional movie quality. Available on GitHub, HuggingFace, and ModelScope. 🔗 Announcement on X


What this means

NVIDIA GTC 2026 crystallizes an important dynamic: several leading AI labs (Mistral, Perplexity, Cohere) are aligning around NVIDIA infrastructure to co-develop open frontier models or sovereign deployments. This convergence around an open coalition stands in contrast to the recent period of fragmentation — and signals that large-scale pretraining has become too costly to be handled in silos.

GPT-5.4 mini confirms a major trend: “small-format” models are no longer degraded versions but competitive alternatives. With 54.4% on SWE-Bench Pro versus 57.7% for the full model, and a 19x lower cost, GPT-5.4 mini redefines the performance/price ratio for coding workflows.

March 17 also illustrates the rise of local and desktop agents: Manus “My Computer” moves out of the cloud to access the local machine, Perplexity Computer takes control of Comet without MCP, and Claude Code doubles its default generation window for Opus 4.6. The era of the agent that merely suggests is giving way to the era of the agent that executes.


Sources

This document was translated from the fr version to the en language using the gpt-5.4-mini model. For more information about the translation process, see https://gitlab.com/jls42/ai-powered-markdown-translator