Search

Voice Mode in Claude Code, GPT-5.3 Instant for everyone, Gemini 3.1 Flash-Lite in preview

Voice Mode in Claude Code, GPT-5.3 Instant for everyone, Gemini 3.1 Flash-Lite in preview

March 3, 2026 marks a busy day: Claude Code activates its Voice Mode in a gradual rollout, OpenAI rolls out GPT-5.3 Instant to all ChatGPT users with a notable reduction in hallucinations, and Google launches Gemini 3.1 Flash-Lite in preview โ€” the most economical model in the Gemini 3 series. In parallel, OpenAI is already teasing GPT-5.4, FLUX.2 [pro] doubles its speed without quality loss, and Anthropic acknowledges an โ€œunprecedentedโ€ growth that put its systems under pressure.


Voice Mode in Claude Code โ€” push-to-talk, rollout ~5%

March 3, 2026 โ€” Thariq (@trq212, Claude Code team at Anthropic) announces the gradual rollout of Voice Mode in Claude Code. The feature has been much anticipated by the developer community for several weeks.

How it works:

AspectDetail
ActivationCommand /voice to enable/disable
Push-to-talkHold the space bar to speak, release to send
TranscriptionAppears in real time in the terminal
Welcome noteVisible on the home screen when activated
TokensVoice transcription tokens do not count towards rate limits

Boris Cherny (@bcherny, lead Claude Code) confirms he uses this mode daily: he says he wrote โ€œthe majority of [his] CLI code this weekโ€ with Voice Mode. His feedback suggests the feature is ready for real work sessions, not just tests.

Availability: Active for about 5% of users as of March 3. The progressive rollout will continue in the following weeks. Free transcription (tokens outside quota) is a notable choice that removes a common friction for this type of feature.

The community is already requesting a bidirectional version โ€” responses read aloud โ€” as well as general availability of /remote-control. Both items remain on the roadmap.

๐Ÿ”— Tweet @trq212 โ€” Voice Mode rolling out ๐Ÿ”— Tweet @bcherny โ€” Experience report


GPT-5.3 Instant available for everyone โ€” -26.8% hallucinations

March 3, 2026 โ€” OpenAI deploys GPT-5.3 Instant to all ChatGPT users. This model replaces GPT-5.2 Instant as ChatGPTโ€™s default model, with a primary focus on everyday quality rather than academic benchmarks.

The most concrete outcome of this update: a reduction in hallucinations.

ScenarioHallucination reduction
With web access (high-stakes domains: medicine, law)-26.8%
Without web access-19.7%
User feedback (error reports)-22.5% (web), -9.6% (without)

Other documented improvements in the OpenAI blog:

  • Fewer unnecessary refusals: reduction of defensive/moralizing preambles from GPT-5.2 โ€” the model responds directly without superfluous warnings
  • Web search: better balance between web results and internal knowledge; fewer link lists, more relevant synthesis
  • Smoother tone: fewer assumptions about the userโ€™s emotional state
  • Creative writing: more evocative and immersive prose

โ€œGPT-5.3 Instant in ChatGPT is now rolling out to everyone. More accurate, less cringe.โ€ โ€” @OpenAI on X

API availability: ID gpt-5.3-chat-latest. GPT-5.2 Instant remains accessible in โ€œOlder modelsโ€ for 3 months, then is retired on June 3, 2026. The Thinking and Pro updates are announced as โ€œcoming soonโ€.

Known limitation: tone in Japanese and Korean can still be rigid at times โ€” being addressed.

๐Ÿ”— GPT-5.3 Instant announcement ๐Ÿ”— GPT-5.3 Instant System Card


Gemini 3.1 Flash-Lite โ€” preview, 2.5ร— faster, $0.25/1M tokens

March 3, 2026 โ€” Google launches Gemini 3.1 Flash-Lite in preview via the Gemini API in Google AI Studio and Vertex AI. Itโ€™s the most cost-effective model of the Gemini 3 series, designed for high-volume developer workloads.

Pricing and performance

AspectValue
Input price$0.25 / 1M tokens
Output price$1.50 / 1M tokens
Speed (TTFAT)2.5ร— faster vs Gemini 2.5 Flash
Output speed+45% vs Gemini 2.5 Flash (Artificial Analysis)
Elo score (Arena.ai)1432
GPQA Diamond86.9%
MMMU Pro76.8%

These benchmarks place Flash-Lite above several larger previous-generation Gemini models โ€” validating the efficiency approach of the 3.1 series.

Adaptive thinking levels

A notable feature: thinking levels (adaptive reasoning depths) are natively integrated into AI Studio and Vertex AI. Developers can dynamically adjust the depth of reasoning according to task complexity โ€” useful for mixing low-cost simple tasks and complex analyses in the same pipeline without changing models.

Documented use cases

Large-scale multilingual translation, content moderation, e-commerce interface generation, dynamic dashboards, multi-step SaaS agents. Companies like Latitude, Cartwheel and Whering are already in early access.

๐Ÿ”— Gemini 3.1 Flash-Lite announcement ๐Ÿ”— Tweet @GoogleAI


Teaser GPT-5.4 โ€” โ€œ5.4 sooner than you Think.โ€

March 3, 2026 โ€” One hour after the GPT-5.3 Instant announcement, OpenAI posts a terse tweet: โ€œ5.4 sooner than you Think.โ€ 800k views, 13k likes.

The unusual capitalization of โ€œThinkโ€ is noted by the community โ€” a possible reference to an improved thinking mode in GPT-5.4. No further details are available at this stage.

๐Ÿ”— Teaser GPT-5.4 โ€” @OpenAI


Claude scalability โ€” unprecedented traffic, #1 App Store

March 3, 2026 โ€” Later in the day, Thariq (@trq212) posts a message acknowledging scaling difficulties:

โ€œWeโ€™ve seen unprecedented growth in Claude and Claude Code traffic this week that was genuinely hard to forecast. We appreciate you bearing with us as we scale.โ€ โ€” @trq212 on X

Context: Claude reached #1 in the App Store on March 1 (confirmed by Mike Krieger, CPO @mikeyk), and the Voice Mode launch generated an additional traffic spike. The npm package @anthropic-ai/claude-code records 9.5 million weekly downloads.

๐Ÿ”— Tweet @trq212 โ€” Scalability ๐Ÿ”— Tweet @mikeyk โ€” Claude #1 App Store


BFL FLUX.2 [pro] โ€” 2ร— faster, same price, same quality

March 3, 2026 โ€” Black Forest Labs announces a major update to FLUX.2 [pro]: the model is now 2ร— faster with no loss of quality and no price increase.

FLUX.2 [pro] covers three modes: text-to-image, image editing, and multi-reference. BFLโ€™s tweet positions it as the โ€œsweet spot of high quality + reasonable speed + broad capabilitiesโ€ โ€” notably for photorealism (product photos, graphic design) and consistent character rendering.

๐Ÿ”— Tweet @bfl_ml โ€” FLUX.2 [pro] update ๐Ÿ”— FLUX.2 documentation


ElevenLabs at MWC โ€” network voice assistant and Deloitte partnership

March 2, 2026 โ€” ElevenLabs announces two partnerships from the Mobile World Congress Barcelona.

ElevenLabs ร— Deutsche Telekom โ€” Magenta AI Call Assistant

Deutsche Telekom unveils the Magenta AI Call Assistant โ€” presented as the first AI voice assistant integrated directly into the telecom network. Powered by ElevenLabsโ€™ ElevenAgents platform, it works without an app to install, on any device capable of placing a call (smartphones and landlines).

Announced features: translation in 50 languages, smart call summarization, autonomous action within workflows.

ElevenLabs ร— Deloitte โ€” omnichannel enterprise agents

ElevenLabs and Deloitte announce their first partnership. The goal: combine the ElevenLabs Agents platform with Deloitteโ€™s consulting expertise to help companies deploy omnichannel voice agents โ€” customer experience, sales, internal operations โ€” integrated with existing enterprise systems. Itโ€™s ElevenLabsโ€™ first partnership with a Big Four firm.

๐Ÿ”— Tweet @elevenlabsio โ€” Deutsche Telekom MWC ๐Ÿ”— ElevenLabs ร— Deloitte blog


Briefs

Claude Code v2.1.64 (pre-release โ€œnextโ€)

Version 2.1.64 of Claude Code is published with tag next on npm โ€” not yet promoted to latest (which remains 2.1.63) and absent from the official GitHub Releases. The changelog is not yet available; itโ€™s likely a pre-release that includes Voice Mode.

๐Ÿ”— npm @anthropic-ai/claude-code

Qwen 3.5 GPTQ-Int4 โ€” quantization, vLLM and SGLang

March 3 โ€” Alibaba/Qwen publishes GPTQ-Int4 weights for the Qwen 3.5 series with native vLLM and SGLang support. Result: less VRAM required, faster inference, easier local deployments on limited-GPU setups.

๐Ÿ”— Tweet @Alibaba_Qwen โ€” GPTQ-Int4

Qwen 3.5 Small on LM Studio, Ollama and MLX

March 2โ€“3 โ€” Qwen 3.5 Small models (0.8Bโ€“9B) are now available on the three main local inference platforms: LM Studio (~7 GB VRAM for 9B), Ollama and MLX. Local deployment is therefore operational the day after launch.

๐Ÿ”— LM Studio ยท Ollama ยท MLX

Z.ai Startup Program โ€” API credits and early access GLM-5

March 2 โ€” Z.ai opens its Startup Program: free API credits, priority rate limits, early API access, and a dedicated community. Target: AI-native startups, agent builders, SaaS founders. The active model on the platform is GLM-5.

๐Ÿ”— Tweet @Zai_org โ€” Startup Program

March Pixel Drop โ€” Gemini in apps, Circle to Search multi-object, Scam Detection in France

March 3 โ€” Marchโ€™s Pixel Drop brings several AI features to Pixel devices. Gemini can now perform tasks directly inside apps (commands, bookings, coffee โ€” in beta). Circle to Search now recognizes all visible objects on a screen in a single search, with a โ€œTry It Onโ€ button to virtually try clothes. Magic Cue suggests restaurants via Gemini directly in conversations. On the safety front, Scam Detection arrives in France, Italy, Spain, Mexico, Germany and Japan. Pixel Watch gains earthquake alerts and Satellite SOS in Europe and Canada.

๐Ÿ”— March Pixel Drop โ€” Google Blog

GPT-5.3 Instant System Card

The System Card accompanying GPT-5.3 Instant is published simultaneously. The safety approach is identical to GPT-5.2 Instant โ€” the model is also referenced under gpt-5.3-instant.

๐Ÿ”— GPT-5.3 Instant System Card


What this means

Voice Mode in Claude Code is the most structurally significant decision of the day for developers. Making transcription free (outside quota) removes the main economic barrier for this kind of feature โ€” itโ€™s a deliberate choice to maximize adoption, not a detail. Using the space bar as push-to-talk in a terminal is a minimalistic interface consistent with the tool.

On the model front, GPT-5.3 Instant and Gemini 3.1 Flash-Lite illustrate two different strategies: OpenAI improves the daily experience for the general public (fewer hallucinations, fewer unnecessary refusals), while Google optimizes cost/performance for high-volume API developers (2.5ร— faster, aggressive pricing). The GPT-5.4 teaser posted an hour after GPT-5.3โ€™s launch suggests an accelerated deployment cadence at OpenAI in March 2026.

Anthropicโ€™s mention of unprecedented traffic, combined with the #1 App Store ranking, confirms that Claude Code and the Claude app are moving from a niche to a much larger audience. The scalability issues are a sign of adoption exceeding projections, not a technical failure.


Sources - Tweet @trq212 โ€” Voice Mode rolling out

This document was translated from the fr version into the en language using the gpt-5-mini model. For more information on the translation process, consult https://gitlab.com/jls42/ai-powered-markdown-translator