Claude Code auto mode, visual shopping in ChatGPT, Grok Imagine multi-image to video

A dense two days on March 23 and 24: Anthropic rolls out auto mode in Claude Code — an action classifier that makes approval decisions on the user’s behalf — and publishes an engineering article on its GAN-inspired multi-agent architecture. OpenAI launches visual shopping in ChatGPT with the Agentic Commerce Protocol, while xAI opens Grok Imagine to multi-image video on API. GitHub Copilot, Google DeepMind, and the Anthropic Science Blog round out this overview.

Claude Code: auto mode, a classifier between you and commands

March 24 — Claude Code adds a third permission level: auto mode. Until now, the tool either let you manually approve every file write and bash command, or disabled permission checks entirely. Auto mode introduces a middle path: Claude makes the decisions itself, guided by a classifier that analyzes each action before execution.

The mechanism is simple — before each tool call, the classifier evaluates whether the action is potentially destructive. Actions deemed safe run automatically. Risky actions are blocked, and Claude looks for an alternative approach without interrupting the user.

Anthropic says this mode reduces risk without eliminating it, and recommends using it in isolated environments. To enable it: claude --enable-auto-mode, then navigate to this mode with Shift+Tab.

The feature is available as a research preview on the Team plan. Deployment for Enterprise and API was announced for the following days.

New in Claude Code: auto mode. Instead of approving every file write and bash command, or skipping permissions entirely, auto mode lets Claude make permission decisions on your behalf. Safeguards check each action before it runs.

— @claudeai on X

🔗 Announcement on X

Multi-agent architecture: Anthropic Engineering’s GAN approach

March 24 — In an article published on the Anthropic Engineering Blog, Prithvi Rajasekaran (Labs team) describes a multi-agent architecture designed to push Claude beyond its limits in two areas: interface design and long-running autonomous app development.

The approach draws inspiration from Generative Adversarial Networks (GAN): a generator agent produces the code or design, while a separate evaluator agent scores the result and provides critical feedback. This decoupling solves a known problem — Claude tends to self-evaluate too leniently. A dedicated evaluator, progressively calibrated with examples, becomes an effective improvement lever.

For frontend design, the evaluator gets access to the Playwright MCP to navigate live pages. Four criteria are used: design quality and consistency, originality (penalizing generic “AI slop” patterns), technical craft, and functionality. In 10 to 15 iterations, the generator produces noticeably more distinctive interfaces.

For application development, the architecture adds a planner: it turns a one-sentence prompt into a complete product specification. Generator and evaluator negotiate “sprint contracts” before each implementation, defining success criteria. The evaluator tests the application via Playwright and can fail a sprint, forcing a revision.

Approach	Duration	Cost	Result
Solo Opus 4.5 agent	20 min	$9	Broken app
Full harness	6 h	$200	Functional app
Harness with Opus 4.6	4 h	$124.70	Functional app + integrated Claude agent

With Opus 4.6 — which no longer suffers from “context anxiety” — the author was able to simplify the architecture, remove session resets, and reduce costs. The guiding principle remains: regularly audit the harness to remove what the model can now do on its own.

🔗 Full article 🔗 Announcement on X

Computer Use in Cowork and Claude Code (macOS, Pro/Max)

March 23 — Claude can now use your computer to complete tasks directly. As a research preview, this feature is available in Claude Cowork and Claude Code, on macOS only.

Claude can open applications, navigate in the browser, and fill in spreadsheets. The idea: hand off a task from your phone, do something else, and come back to finished work. It is also possible to set recurring tasks — scan emails every morning, generate a report every Friday.

The Computer Use feature is available on the Pro and Max plans by updating the desktop app and pairing it with the mobile app.

🔗 Announcement on X 🔗 Cowork product page

Anthropic Economic Index: “Learning curves” (5th report)

March 24 — Anthropic publishes its fifth report of the Anthropic Economic Index, titled “Learning curves”, based on Claude usage data from February 2026 (about 1 million conversations, from February 5 to 12).

The report documents two major developments since November 2025. First, a diversification of uses: the ten most frequent tasks on Claude.ai now account for only 19% of traffic, down from 24% three months earlier. This trend is partly explained by the migration of coding tasks to API, driven by the growth of Claude Code.

Second, the “learning curve” effect: long-time users (more than six months) show a success rate 4 to 5 percentage points higher. They work on more complex problems, collaborate more, and delegate less in automatic mode. The authors see this as a signal of learning-by-doing, even though a survivor bias remains possible.

On model choice, the data confirm that users favor Opus for high-value tasks: every $10/hour increase in the estimated value of a task is accompanied by a 1.5-point rise in the share of Opus use on Claude.ai, and 2.8 points on API.

🔗 Full report 🔗 Announcement on X

Anthropic Science Blog: a new blog for AI in scientific research

March 23 — Anthropic launches the Anthropic Science Blog, dedicated to the intersection between AI and scientific research. The goal is to document how AI accelerates researchers’ work and to explore the questions this transformation raises.

The blog will publish three types of content: in-depth articles on specific results with AI’s role detailed (Features), practical guides by scientific field (Workflows), and field-coverage roundups (Field notes).

Two launch articles accompany this release: “Vibe physics: The AI grad student” by Matthew Schwartz (a physicist supervised by Claude on a real calculation), and a tutorial on orchestrating Claude Code for multi-day scientific tasks.

This blog is anchored in Anthropic’s existing initiatives: the AI for Science program (API credits for researchers), Claude for Life Sciences (partnerships with pharma and biotech), and the Genesis Mission.

🔗 Launch article 🔗 Announcement on X

Visual shopping in ChatGPT and Agentic Commerce Protocol

March 24 — OpenAI launches a visual and immersive shopping experience directly in ChatGPT. Users can browse products visually, compare them side by side with details (price, reviews, features), and refine their search in conversation — without leaving ChatGPT. It is also possible to upload an inspiration photo to find similar items.

To power this feature, OpenAI expands the Agentic Commerce Protocol (ACP) to product discovery. This protocol becomes the connection layer between merchants and users: merchants share their catalogs via ACP, and the data flows directly into ChatGPT. Salesforce and Stripe are already integrated as third-party providers.

Detail	Info
Availability	All Free, Go, Plus, Pro users — rollout this week
Image upload	Inspiration photo to find similar items
Integrated merchants	Target, Sephora, Nordstrom, Lowe’s, Best Buy, The Home Depot, Wayfair
Shopify	Catalogs already integrated without merchant action

Walmart is the first merchant to offer a native ChatGPT app: from discovery in ChatGPT to a Walmart environment with account linking, loyalty program, and payments. Available in web browser, with iOS and Android coming. Note: OpenAI is dropping its initial “Instant Checkout” feature, deemed not flexible enough for merchants, and is focusing on discovery.

🔗 Official announcement

OpenAI: open-source safety policies for teenagers

March 24 — OpenAI publishes an open-source set of safety policies to help developers build teen-appropriate experiences. These policies are provided as prompts that can be used directly with gpt-oss-safeguard, OpenAI’s open-weight safety model.

The goal: help developers translate abstract safety goals into precise operational rules. Six areas are covered:

Area	Description
Graphic violent content	Filtering explicit violence
Graphic sexual content	Filtering explicit sexuality
Dangerous body ideals	Eating disorders, risky behaviors
Dangerous activities and challenges	Risky viral challenges
Romantic or violent role play	Inappropriate interactions
Adult-only goods and services	Alcohol, tobacco, gambling

These policies were developed with Common Sense Media and everyone.ai. Published via the ROOST Model Community (RMC GitHub), they are explicitly presented as a starting point, not a complete solution.

🔗 Official announcement

OpenAI Foundation: at least $1 billion deployed

March 24 — Bret Taylor, chairman of the OpenAI Foundation board, announces that the Foundation is beginning to deploy the resources resulting from the autumn 2025 recapitalization. At least $1 billion will be invested over the year in four areas: life sciences (Alzheimer’s, high-mortality diseases), employment and economic impact, AI resilience (child safety, biosecurity), and community programs.

🔗 Official announcement

OpenAI: Library tab to manage files in ChatGPT

March 23 — OpenAI adds new file management features in ChatGPT: recent files directly accessible from the toolbar, the ability to query an already uploaded document, and a Library tab in the web sidebar to find all files. Available for Plus, Pro, and Business subscribers, with rollout coming to the European Economic Area, Switzerland, and the United Kingdom.

🔗 Announcement on X

Gemini 3.1 Flash-Lite: a browser that generates pages in real time

March 24 — Google DeepMind publishes a demo of Gemini 3.1 Flash-Lite: an experimental browser that generates each web page on the fly as you click, search, and browse. No preexisting HTML page — each piece of content is created in real time by the model. The demo is accessible directly from Google AI Studio and generated strong engagement (85,000 views in a few hours).

🔗 AI Studio demo 🔗 Announcement on X

Google DeepMind × Agile Robots: robotics partnership

March 24 — Google DeepMind announces a research partnership with Agile Robots, a specialist in humanoid robotics. The agreement will integrate Gemini foundation models into Agile Robots’ robotic hardware as part of Google DeepMind’s Gemini Robotics strategy.

🔗 Announcement on X

Grok Imagine: multi-image video on API (#1 Arena Elo 1342)

March 24 — xAI announces two new capabilities for its Grok Imagine API: video generation from multiple images (multi-image to video) and video extension (video extension).

Developers can submit up to 7 input images to generate a coherent video via model grok-imagine-video. The API works asynchronously: you submit the request, then poll until status done. Outputs support 16:9 at 720p.

According to Design Arena, Grok Imagine immediately took the top spot in the Multi Image to Video Arena leaderboard with an Elo score of 1342.

🔗 @grok announcement 🔗 Imagine API documentation

GitHub Copilot: @copilot to edit a PR directly

March 24 — GitHub Copilot can now directly edit any pull request on demand. By mentioning @copilot in a comment with a natural-language instruction — fix failing tests, address a review comment, add a unit test — the agent works in its cloud environment, validates its work with tests and linters, then pushes the changes to the branch. The previous behavior (opening a new PR) remains available if explicitly requested. Available on all paid Copilot plans.

🔗 GitHub changelog

GitHub Copilot: Gemini 3.1 Pro in JetBrains, Xcode, and Eclipse

March 23 — GitHub Copilot expands Gemini 3.1 Pro availability to JetBrains, Xcode, and Eclipse IDEs. The model is now accessible via the Copilot model selector in all modes (agent, ask, edit) on these environments, in addition to the platforms already supported. In public preview for Enterprise, Business, Pro and Pro+ plans.

🔗 GitHub Changelog

GitHub Copilot: managing agent access by repository via API

March 24 — GitHub is releasing a public preview REST API to manage Copilot coding agent access at the organization repository level. Administrators can programmatically allow the agent on none, all, or specific repositories — useful for large-scale enterprise deployments.

🔗 GitHub Changelog

GitHub Copilot: live logs in Raycast

March 20 — The GitHub Copilot extension for Raycast (the macOS/Windows launcher) now lets you monitor coding agent logs live without leaving the launcher. Via the “View Tasks” command, then selecting the session, developers can follow the agent’s progress in real time. Available for all paid Copilot subscribers.

🔗 GitHub Changelog

What this means

Claude Code’s auto mode is the most significant change in this period. It shifts the cognitive load away from the user — no need to approve each command anymore — while keeping a safety net through the classifier. It’s a step toward more autonomous development agents, but within a framework explicitly recommended for isolated environments. The engineering article on the multi-agent harness completes the picture: Anthropic’s trajectory is clearly toward agents that work for long periods autonomously, with internal supervision structures (dedicated evaluator, sprint contracts) rather than human oversight at every step.

On OpenAI’s side, visual shopping in ChatGPT marks a pivot toward mainstream commercial use cases. The Agentic Commerce Protocol positions ChatGPT as an intermediation layer between merchants and consumers — a different strategy from the pure API, which directly targets transactional value.

Grok Imagine reaching the top spot in the Arena ranking on launch with multi-image to video illustrates how quickly xAI iterates on video generation. GitHub Copilot, for its part, is systematically strengthening the autonomy of its coding agent: the ability to modify an existing PR directly further reduces back-and-forth between the agent and the developer.

Sources

This document was translated from the fr version to the en language using the gpt-5.4-mini model. For more information about the translation process, see https://gitlab.com/jls42/ai-powered-markdown-translator