Voice Mode in Claude Code, GPT-5.3 Instant for everyone, Gemini 3.1 Flash-Lite in preview

March 3, 2026 marks a busy day: Claude Code activates its Voice Mode in a gradual rollout, OpenAI rolls out GPT-5.3 Instant to all ChatGPT users with a notable reduction in hallucinations, and Google launches Gemini 3.1 Flash-Lite in preview — the most economical model in the Gemini 3 series. In parallel, OpenAI is already teasing GPT-5.4, FLUX.2 [pro] doubles its speed without quality loss, and Anthropic acknowledges an “unprecedented” growth that put its systems under pressure.

Voice Mode in Claude Code — push-to-talk, rollout ~5%

March 3, 2026 — Thariq (@trq212, Claude Code team at Anthropic) announces the gradual rollout of Voice Mode in Claude Code. The feature has been much anticipated by the developer community for several weeks.

How it works:

Aspect	Detail
Activation	Command `/voice` to enable/disable
Push-to-talk	Hold the space bar to speak, release to send
Transcription	Appears in real time in the terminal
Welcome note	Visible on the home screen when activated
Tokens	Voice transcription tokens do not count towards rate limits

Boris Cherny (@bcherny, lead Claude Code) confirms he uses this mode daily: he says he wrote “the majority of [his] CLI code this week” with Voice Mode. His feedback suggests the feature is ready for real work sessions, not just tests.

Availability: Active for about 5% of users as of March 3. The progressive rollout will continue in the following weeks. Free transcription (tokens outside quota) is a notable choice that removes a common friction for this type of feature.

The community is already requesting a bidirectional version — responses read aloud — as well as general availability of /remote-control. Both items remain on the roadmap.

🔗 Tweet @trq212 — Voice Mode rolling out 🔗 Tweet @bcherny — Experience report

GPT-5.3 Instant available for everyone — -26.8% hallucinations

March 3, 2026 — OpenAI deploys GPT-5.3 Instant to all ChatGPT users. This model replaces GPT-5.2 Instant as ChatGPT’s default model, with a primary focus on everyday quality rather than academic benchmarks.

The most concrete outcome of this update: a reduction in hallucinations.

Scenario	Hallucination reduction
With web access (high-stakes domains: medicine, law)	-26.8%
Without web access	-19.7%
User feedback (error reports)	-22.5% (web), -9.6% (without)

Other documented improvements in the OpenAI blog:

Fewer unnecessary refusals: reduction of defensive/moralizing preambles from GPT-5.2 — the model responds directly without superfluous warnings
Web search: better balance between web results and internal knowledge; fewer link lists, more relevant synthesis
Smoother tone: fewer assumptions about the user’s emotional state
Creative writing: more evocative and immersive prose

“GPT-5.3 Instant in ChatGPT is now rolling out to everyone. More accurate, less cringe.” — @OpenAI on X

API availability: ID gpt-5.3-chat-latest. GPT-5.2 Instant remains accessible in “Older models” for 3 months, then is retired on June 3, 2026. The Thinking and Pro updates are announced as “coming soon”.

Known limitation: tone in Japanese and Korean can still be rigid at times — being addressed.

🔗 GPT-5.3 Instant announcement 🔗 GPT-5.3 Instant System Card

Gemini 3.1 Flash-Lite — preview, 2.5× faster, $0.25/1M tokens

March 3, 2026 — Google launches Gemini 3.1 Flash-Lite in preview via the Gemini API in Google AI Studio and Vertex AI. It’s the most cost-effective model of the Gemini 3 series, designed for high-volume developer workloads.

Pricing and performance

Aspect	Value
Input price	$0.25 / 1M tokens
Output price	$1.50 / 1M tokens
Speed (TTFAT)	2.5× faster vs Gemini 2.5 Flash
Output speed	+45% vs Gemini 2.5 Flash (Artificial Analysis)
Elo score (Arena.ai)	1432
GPQA Diamond	86.9%
MMMU Pro	76.8%

These benchmarks place Flash-Lite above several larger previous-generation Gemini models — validating the efficiency approach of the 3.1 series.

Adaptive thinking levels

A notable feature: thinking levels (adaptive reasoning depths) are natively integrated into AI Studio and Vertex AI. Developers can dynamically adjust the depth of reasoning according to task complexity — useful for mixing low-cost simple tasks and complex analyses in the same pipeline without changing models.

Documented use cases

Large-scale multilingual translation, content moderation, e-commerce interface generation, dynamic dashboards, multi-step SaaS agents. Companies like Latitude, Cartwheel and Whering are already in early access.

🔗 Gemini 3.1 Flash-Lite announcement 🔗 Tweet @GoogleAI

Teaser GPT-5.4 — “5.4 sooner than you Think.”

March 3, 2026 — One hour after the GPT-5.3 Instant announcement, OpenAI posts a terse tweet: “5.4 sooner than you Think.” 800k views, 13k likes.

The unusual capitalization of “Think” is noted by the community — a possible reference to an improved thinking mode in GPT-5.4. No further details are available at this stage.

🔗 Teaser GPT-5.4 — @OpenAI

Claude scalability — unprecedented traffic, #1 App Store

March 3, 2026 — Later in the day, Thariq (@trq212) posts a message acknowledging scaling difficulties:

“We’ve seen unprecedented growth in Claude and Claude Code traffic this week that was genuinely hard to forecast. We appreciate you bearing with us as we scale.” — @trq212 on X

Context: Claude reached #1 in the App Store on March 1 (confirmed by Mike Krieger, CPO @mikeyk), and the Voice Mode launch generated an additional traffic spike. The npm package @anthropic-ai/claude-code records 9.5 million weekly downloads.

🔗 Tweet @trq212 — Scalability 🔗 Tweet @mikeyk — Claude #1 App Store

BFL FLUX.2 [pro] — 2× faster, same price, same quality

March 3, 2026 — Black Forest Labs announces a major update to FLUX.2 [pro]: the model is now 2× faster with no loss of quality and no price increase.

FLUX.2 [pro] covers three modes: text-to-image, image editing, and multi-reference. BFL’s tweet positions it as the “sweet spot of high quality + reasonable speed + broad capabilities” — notably for photorealism (product photos, graphic design) and consistent character rendering.

🔗 Tweet @bfl_ml — FLUX.2 [pro] update 🔗 FLUX.2 documentation

ElevenLabs at MWC — network voice assistant and Deloitte partnership

March 2, 2026 — ElevenLabs announces two partnerships from the Mobile World Congress Barcelona.

ElevenLabs × Deutsche Telekom — Magenta AI Call Assistant

Deutsche Telekom unveils the Magenta AI Call Assistant — presented as the first AI voice assistant integrated directly into the telecom network. Powered by ElevenLabs’ ElevenAgents platform, it works without an app to install, on any device capable of placing a call (smartphones and landlines).

Announced features: translation in 50 languages, smart call summarization, autonomous action within workflows.

ElevenLabs × Deloitte — omnichannel enterprise agents

ElevenLabs and Deloitte announce their first partnership. The goal: combine the ElevenLabs Agents platform with Deloitte’s consulting expertise to help companies deploy omnichannel voice agents — customer experience, sales, internal operations — integrated with existing enterprise systems. It’s ElevenLabs’ first partnership with a Big Four firm.

🔗 Tweet @elevenlabsio — Deutsche Telekom MWC 🔗 ElevenLabs × Deloitte blog

Briefs

Claude Code v2.1.64 (pre-release “next”)

Version 2.1.64 of Claude Code is published with tag next on npm — not yet promoted to latest (which remains 2.1.63) and absent from the official GitHub Releases. The changelog is not yet available; it’s likely a pre-release that includes Voice Mode.

🔗 npm @anthropic-ai/claude-code

Qwen 3.5 GPTQ-Int4 — quantization, vLLM and SGLang

March 3 — Alibaba/Qwen publishes GPTQ-Int4 weights for the Qwen 3.5 series with native vLLM and SGLang support. Result: less VRAM required, faster inference, easier local deployments on limited-GPU setups.

🔗 Tweet @Alibaba_Qwen — GPTQ-Int4

Qwen 3.5 Small on LM Studio, Ollama and MLX

March 2–3 — Qwen 3.5 Small models (0.8B–9B) are now available on the three main local inference platforms: LM Studio (~7 GB VRAM for 9B), Ollama and MLX. Local deployment is therefore operational the day after launch.

🔗 LM Studio · Ollama · MLX

Z.ai Startup Program — API credits and early access GLM-5

March 2 — Z.ai opens its Startup Program: free API credits, priority rate limits, early API access, and a dedicated community. Target: AI-native startups, agent builders, SaaS founders. The active model on the platform is GLM-5.

🔗 Tweet @Zai_org — Startup Program

March Pixel Drop — Gemini in apps, Circle to Search multi-object, Scam Detection in France

March 3 — March’s Pixel Drop brings several AI features to Pixel devices. Gemini can now perform tasks directly inside apps (commands, bookings, coffee — in beta). Circle to Search now recognizes all visible objects on a screen in a single search, with a “Try It On” button to virtually try clothes. Magic Cue suggests restaurants via Gemini directly in conversations. On the safety front, Scam Detection arrives in France, Italy, Spain, Mexico, Germany and Japan. Pixel Watch gains earthquake alerts and Satellite SOS in Europe and Canada.

🔗 March Pixel Drop — Google Blog

GPT-5.3 Instant System Card

The System Card accompanying GPT-5.3 Instant is published simultaneously. The safety approach is identical to GPT-5.2 Instant — the model is also referenced under gpt-5.3-instant.

🔗 GPT-5.3 Instant System Card

What this means

Voice Mode in Claude Code is the most structurally significant decision of the day for developers. Making transcription free (outside quota) removes the main economic barrier for this kind of feature — it’s a deliberate choice to maximize adoption, not a detail. Using the space bar as push-to-talk in a terminal is a minimalistic interface consistent with the tool.

On the model front, GPT-5.3 Instant and Gemini 3.1 Flash-Lite illustrate two different strategies: OpenAI improves the daily experience for the general public (fewer hallucinations, fewer unnecessary refusals), while Google optimizes cost/performance for high-volume API developers (2.5× faster, aggressive pricing). The GPT-5.4 teaser posted an hour after GPT-5.3’s launch suggests an accelerated deployment cadence at OpenAI in March 2026.

Anthropic’s mention of unprecedented traffic, combined with the #1 App Store ranking, confirms that Claude Code and the Claude app are moving from a niche to a much larger audience. The scalability issues are a sign of adoption exceeding projections, not a technical failure.

Sources - Tweet @trq212 — Voice Mode rolling out

This document was translated from the fr version into the en language using the gpt-5-mini model. For more information on the translation process, consult https://gitlab.com/jls42/ai-powered-markdown-translator