March 3, 2026 marks a busy day: Claude Code activates its Voice Mode in a gradual rollout, OpenAI rolls out GPT-5.3 Instant to all ChatGPT users with a notable reduction in hallucinations, and Google launches Gemini 3.1 Flash-Lite in preview โ the most economical model in the Gemini 3 series. In parallel, OpenAI is already teasing GPT-5.4, FLUX.2 [pro] doubles its speed without quality loss, and Anthropic acknowledges an โunprecedentedโ growth that put its systems under pressure.
Voice Mode in Claude Code โ push-to-talk, rollout ~5%
March 3, 2026 โ Thariq (@trq212, Claude Code team at Anthropic) announces the gradual rollout of Voice Mode in Claude Code. The feature has been much anticipated by the developer community for several weeks.
How it works:
| Aspect | Detail |
|---|---|
| Activation | Command /voice to enable/disable |
| Push-to-talk | Hold the space bar to speak, release to send |
| Transcription | Appears in real time in the terminal |
| Welcome note | Visible on the home screen when activated |
| Tokens | Voice transcription tokens do not count towards rate limits |
Boris Cherny (@bcherny, lead Claude Code) confirms he uses this mode daily: he says he wrote โthe majority of [his] CLI code this weekโ with Voice Mode. His feedback suggests the feature is ready for real work sessions, not just tests.
Availability: Active for about 5% of users as of March 3. The progressive rollout will continue in the following weeks. Free transcription (tokens outside quota) is a notable choice that removes a common friction for this type of feature.
The community is already requesting a bidirectional version โ responses read aloud โ as well as general availability of /remote-control. Both items remain on the roadmap.
๐ Tweet @trq212 โ Voice Mode rolling out ๐ Tweet @bcherny โ Experience report
GPT-5.3 Instant available for everyone โ -26.8% hallucinations
March 3, 2026 โ OpenAI deploys GPT-5.3 Instant to all ChatGPT users. This model replaces GPT-5.2 Instant as ChatGPTโs default model, with a primary focus on everyday quality rather than academic benchmarks.
The most concrete outcome of this update: a reduction in hallucinations.
| Scenario | Hallucination reduction |
|---|---|
| With web access (high-stakes domains: medicine, law) | -26.8% |
| Without web access | -19.7% |
| User feedback (error reports) | -22.5% (web), -9.6% (without) |
Other documented improvements in the OpenAI blog:
- Fewer unnecessary refusals: reduction of defensive/moralizing preambles from GPT-5.2 โ the model responds directly without superfluous warnings
- Web search: better balance between web results and internal knowledge; fewer link lists, more relevant synthesis
- Smoother tone: fewer assumptions about the userโs emotional state
- Creative writing: more evocative and immersive prose
โGPT-5.3 Instant in ChatGPT is now rolling out to everyone. More accurate, less cringe.โ โ @OpenAI on X
API availability: ID gpt-5.3-chat-latest. GPT-5.2 Instant remains accessible in โOlder modelsโ for 3 months, then is retired on June 3, 2026. The Thinking and Pro updates are announced as โcoming soonโ.
Known limitation: tone in Japanese and Korean can still be rigid at times โ being addressed.
๐ GPT-5.3 Instant announcement ๐ GPT-5.3 Instant System Card
Gemini 3.1 Flash-Lite โ preview, 2.5ร faster, $0.25/1M tokens
March 3, 2026 โ Google launches Gemini 3.1 Flash-Lite in preview via the Gemini API in Google AI Studio and Vertex AI. Itโs the most cost-effective model of the Gemini 3 series, designed for high-volume developer workloads.
Pricing and performance
| Aspect | Value |
|---|---|
| Input price | $0.25 / 1M tokens |
| Output price | $1.50 / 1M tokens |
| Speed (TTFAT) | 2.5ร faster vs Gemini 2.5 Flash |
| Output speed | +45% vs Gemini 2.5 Flash (Artificial Analysis) |
| Elo score (Arena.ai) | 1432 |
| GPQA Diamond | 86.9% |
| MMMU Pro | 76.8% |
These benchmarks place Flash-Lite above several larger previous-generation Gemini models โ validating the efficiency approach of the 3.1 series.
Adaptive thinking levels
A notable feature: thinking levels (adaptive reasoning depths) are natively integrated into AI Studio and Vertex AI. Developers can dynamically adjust the depth of reasoning according to task complexity โ useful for mixing low-cost simple tasks and complex analyses in the same pipeline without changing models.
Documented use cases
Large-scale multilingual translation, content moderation, e-commerce interface generation, dynamic dashboards, multi-step SaaS agents. Companies like Latitude, Cartwheel and Whering are already in early access.
๐ Gemini 3.1 Flash-Lite announcement ๐ Tweet @GoogleAI
Teaser GPT-5.4 โ โ5.4 sooner than you Think.โ
March 3, 2026 โ One hour after the GPT-5.3 Instant announcement, OpenAI posts a terse tweet: โ5.4 sooner than you Think.โ 800k views, 13k likes.
The unusual capitalization of โThinkโ is noted by the community โ a possible reference to an improved thinking mode in GPT-5.4. No further details are available at this stage.
๐ Teaser GPT-5.4 โ @OpenAI
Claude scalability โ unprecedented traffic, #1 App Store
March 3, 2026 โ Later in the day, Thariq (@trq212) posts a message acknowledging scaling difficulties:
โWeโve seen unprecedented growth in Claude and Claude Code traffic this week that was genuinely hard to forecast. We appreciate you bearing with us as we scale.โ โ @trq212 on X
Context: Claude reached #1 in the App Store on March 1 (confirmed by Mike Krieger, CPO @mikeyk), and the Voice Mode launch generated an additional traffic spike. The npm package @anthropic-ai/claude-code records 9.5 million weekly downloads.
๐ Tweet @trq212 โ Scalability ๐ Tweet @mikeyk โ Claude #1 App Store
BFL FLUX.2 [pro] โ 2ร faster, same price, same quality
March 3, 2026 โ Black Forest Labs announces a major update to FLUX.2 [pro]: the model is now 2ร faster with no loss of quality and no price increase.
FLUX.2 [pro] covers three modes: text-to-image, image editing, and multi-reference. BFLโs tweet positions it as the โsweet spot of high quality + reasonable speed + broad capabilitiesโ โ notably for photorealism (product photos, graphic design) and consistent character rendering.
๐ Tweet @bfl_ml โ FLUX.2 [pro] update ๐ FLUX.2 documentation
ElevenLabs at MWC โ network voice assistant and Deloitte partnership
March 2, 2026 โ ElevenLabs announces two partnerships from the Mobile World Congress Barcelona.
ElevenLabs ร Deutsche Telekom โ Magenta AI Call Assistant
Deutsche Telekom unveils the Magenta AI Call Assistant โ presented as the first AI voice assistant integrated directly into the telecom network. Powered by ElevenLabsโ ElevenAgents platform, it works without an app to install, on any device capable of placing a call (smartphones and landlines).
Announced features: translation in 50 languages, smart call summarization, autonomous action within workflows.
ElevenLabs ร Deloitte โ omnichannel enterprise agents
ElevenLabs and Deloitte announce their first partnership. The goal: combine the ElevenLabs Agents platform with Deloitteโs consulting expertise to help companies deploy omnichannel voice agents โ customer experience, sales, internal operations โ integrated with existing enterprise systems. Itโs ElevenLabsโ first partnership with a Big Four firm.
๐ Tweet @elevenlabsio โ Deutsche Telekom MWC ๐ ElevenLabs ร Deloitte blog
Briefs
Claude Code v2.1.64 (pre-release โnextโ)
Version 2.1.64 of Claude Code is published with tag next on npm โ not yet promoted to latest (which remains 2.1.63) and absent from the official GitHub Releases. The changelog is not yet available; itโs likely a pre-release that includes Voice Mode.
๐ npm @anthropic-ai/claude-code
Qwen 3.5 GPTQ-Int4 โ quantization, vLLM and SGLang
March 3 โ Alibaba/Qwen publishes GPTQ-Int4 weights for the Qwen 3.5 series with native vLLM and SGLang support. Result: less VRAM required, faster inference, easier local deployments on limited-GPU setups.
๐ Tweet @Alibaba_Qwen โ GPTQ-Int4
Qwen 3.5 Small on LM Studio, Ollama and MLX
March 2โ3 โ Qwen 3.5 Small models (0.8Bโ9B) are now available on the three main local inference platforms: LM Studio (~7 GB VRAM for 9B), Ollama and MLX. Local deployment is therefore operational the day after launch.
๐ LM Studio ยท Ollama ยท MLX
Z.ai Startup Program โ API credits and early access GLM-5
March 2 โ Z.ai opens its Startup Program: free API credits, priority rate limits, early API access, and a dedicated community. Target: AI-native startups, agent builders, SaaS founders. The active model on the platform is GLM-5.
๐ Tweet @Zai_org โ Startup Program
March Pixel Drop โ Gemini in apps, Circle to Search multi-object, Scam Detection in France
March 3 โ Marchโs Pixel Drop brings several AI features to Pixel devices. Gemini can now perform tasks directly inside apps (commands, bookings, coffee โ in beta). Circle to Search now recognizes all visible objects on a screen in a single search, with a โTry It Onโ button to virtually try clothes. Magic Cue suggests restaurants via Gemini directly in conversations. On the safety front, Scam Detection arrives in France, Italy, Spain, Mexico, Germany and Japan. Pixel Watch gains earthquake alerts and Satellite SOS in Europe and Canada.
๐ March Pixel Drop โ Google Blog
GPT-5.3 Instant System Card
The System Card accompanying GPT-5.3 Instant is published simultaneously. The safety approach is identical to GPT-5.2 Instant โ the model is also referenced under gpt-5.3-instant.
๐ GPT-5.3 Instant System Card
What this means
Voice Mode in Claude Code is the most structurally significant decision of the day for developers. Making transcription free (outside quota) removes the main economic barrier for this kind of feature โ itโs a deliberate choice to maximize adoption, not a detail. Using the space bar as push-to-talk in a terminal is a minimalistic interface consistent with the tool.
On the model front, GPT-5.3 Instant and Gemini 3.1 Flash-Lite illustrate two different strategies: OpenAI improves the daily experience for the general public (fewer hallucinations, fewer unnecessary refusals), while Google optimizes cost/performance for high-volume API developers (2.5ร faster, aggressive pricing). The GPT-5.4 teaser posted an hour after GPT-5.3โs launch suggests an accelerated deployment cadence at OpenAI in March 2026.
Anthropicโs mention of unprecedented traffic, combined with the #1 App Store ranking, confirms that Claude Code and the Claude app are moving from a niche to a much larger audience. The scalability issues are a sign of adoption exceeding projections, not a technical failure.
Sources - Tweet @trq212 โ Voice Mode rolling out
- Tweet @bcherny โ Voice Mode feedback
- Tweet @trq212 โ Claude scalability
- Tweet @mikeyk โ Claude #1 on the App Store
- npm @anthropic-ai/claude-code โ v2.1.64 next
- GPT-5.3 Instant announcement โ OpenAI
- GPT-5.3 Instant System Card
- Tweet @OpenAI โ GPT-5.3 Instant
- Teaser GPT-5.4 โ @OpenAI
- Gemini 3.1 Flash-Lite announcement โ Google Blog
- Tweet @GoogleAI โ Gemini 3.1 Flash-Lite
- Tweet @bfl_ml โ FLUX.2 [pro] 2ร faster
- Tweet @elevenlabsio โ Deutsche Telekom MWC
- ElevenLabs ร Deloitte blog
- Tweet @Alibaba_Qwen โ GPTQ-Int4
- Tweet @Alibaba_Qwen โ LM Studio
- Tweet @Alibaba_Qwen โ Ollama
- Tweet @Alibaba_Qwen โ MLX
- Tweet @Zai_org โ Z.ai Startup Program
- March Pixel Drop โ Google Blog
This document was translated from the fr version into the en language using the gpt-5-mini model. For more information on the translation process, consult https://gitlab.com/jls42/ai-powered-markdown-translator