Search

Grok STT and TTS APIs at bargain prices, Claude for Word, Midjourney V8.1

Grok STT and TTS APIs at bargain prices, Claude for Word, Midjourney V8.1

On April 18, xAI launched two audio APIs — speech recognition (Speech to Text) and speech synthesis (Text to Speech) — with pricing that undercuts all established competitors. Anthropic makes Claude accessible directly in Microsoft Word for its Pro, Max, Team, and Enterprise subscribers. Midjourney rolls out V8.1 with native 2K rendering, three times faster and three times cheaper than V8. Meanwhile: Luma and Wonder Project open Innovative Dreams, backed by AWS; MiniMax partners with NousResearch for MaxHermes; Kimi publishes a cross-datacenter inference architecture; and Google enriches Chrome with Gemini Skills.


Grok STT and TTS — the cheapest audio APIs on the market

April 17 — xAI simultaneously launches two standalone audio APIs: a speech recognition API (Speech to Text, STT) and a speech synthesis API (Text to Speech, TTS). The pricing positioning is direct: both APIs offer the lowest prices in their respective segments.

STT API (speech recognition)

Grok’s STT API offers two modes: REST batch and WebSocket streaming. Pricing is $0.10/hour (batch) and $0.20/hour (streaming), respectively, compared with $0.22 and $0.39 at ElevenLabs, $0.21 and $0.45 at AssemblyAI, and $0.31 and $0.55 at Deepgram.

CompetitorBatch (REST)Streaming (WebSocket)
Grok$0.10/h$0.20/h
ElevenLabs$0.22/h$0.39/h
AssemblyAI$0.21/h$0.45/h
Deepgram$0.31/h$0.55/h

In terms of quality, Grok STT’s overall Word Error Rate is 6.9%, compared with 9.0% for ElevenLabs, 11.0% for Deepgram, and 12.9% for AssemblyAI. Grok STT covers 25+ languages with word-level timestamps, multi-speaker diarization (speaker diarization), multichannel support, and inverse text normalization (converting numbers and dates from speech).

TTS API (speech synthesis)

Grok’s TTS API is priced at $4.20 per million characters, compared with $30 from OpenAI, $40 from InWorld, $46.70 from Cartesia, and $50 from ElevenLabs. The API supports REST and WebSocket streaming. It introduces expressive tags: [laugh], [sigh], [whisper], <emphasis>, <slow>, <pause> — to control the tone and rhythm of the synthesis.

CompetitorPrice / million characters
Grok$4.20
OpenAI$30.00
InWorld$40.00
Cartesia$46.70
ElevenLabs$50.00

xAI announces the launch of Grok speech to text and text to speech APIs. Grok STT has the world’s lowest word error rate and price. Grok TTS has the world’s most expressive voice and lowest price. — @xai on X

🔗 xAI announcement 🔗 Tweet @xai


Claude for Word — the Microsoft extension in beta

April 17 — Anthropic launches Claude for Word in beta for Pro, Max, Team, and Enterprise subscribers. The extension integrates directly into the Microsoft Word interface — with no separate window — and works at the document level.

FeatureDescription
Native tracked changesAll changes made by Claude appear as Word revisions that can be accepted/rejected
Comment managementClaude reads comments, edits anchored text, and replies in the thread
Format preservationInherits heading styles, numbering, and defined terms
Cross-contextShares context with the Excel and PowerPoint add-ins within the same conversation
Enterprise securitySign-in via Claude account or existing cloud provider

The supported formats are .docx and .docm. The extension is installed via the Microsoft Marketplace under the ID WA200010453.

🔗 claude.com/claude-for-word 🔗 Tweet @claudeai


Midjourney V8.1 — native 2K rendering, 3× faster

April 14 — Midjourney released version V8.1 of its image generator. This update brings native 2K HD rendering with generation speed three times faster than V8, at one-third the cost.

V8.1 is a significant refinement of the V8 engine: resolution goes directly to 2K without subsequent upscaling, improving fine-detail fidelity and reducing the artifacts typical of enlargement steps. The speed/price/resolution combination positions V8.1 as the most accessible version in the V8 lineup.


Luma × Wonder Project — the Innovative Dreams studio, backed by AWS

April 16 — Luma AI and Wonder Project (a faith & values production studio and Prime Video partner) jointly announce the launch of Innovative Dreams — a new film production company, R&D lab, and VFX enterprise, supported and funded by Amazon Web Services (AWS).

Innovative Dreams is presented as the first studio to deploy Realtime Hybrid Filmmaking at scale — an approach that combines performance capture, virtual production, and generative AI (notably Luma Agents) across every stage of production: concept, pre-visualization, shooting, and post-production.

AspectDetail
CEOJon Erwin (Wonder Project founder)
CTO / LumaAmit Jain (CEO of Luma AI)
InfrastructureAWS cloud + AI for R&D and virtual production tools
TechnologyLuma Agents + Realtime Hybrid Filmmaking
SiteMBS Media Campus, Manhattan Beach, California
First project”The Old Stories: Moses” (3 episodes) with Ben Kingsley and O-T Fagbenle, for Prime Video

The “Realtime Hybrid Filmmaking” approach removes the traditional delays between shooting, rendering, and editing. Actors can react to digital environments in real time, shortening the distance between creative idea and final pixel while preserving human performance. Innovative Dreams also offers its tools to other Hollywood studios.

🔗 Luma announcement 🔗 Tweet @LumaLabsAI


MiniMax M2.7 × NousResearch — MaxHermes, Hermes Agent with no setup

April 16 — MiniMax announces a deepened partnership with NousResearch to integrate the M2.7 model into the Hermes Agent harness. The announcement introduces MaxHermes — a managed cloud version of Hermes Agent accessible directly from @MiniMaxAgent, with no terminal setup or local installation.

The M2.7 × Hermes Agent co-evolution aims for higher-class agents: Hermes’ self-improving loop gets the most out of the M2.7 model for agentic tasks. Users running Hermes locally can also connect their agent to MaxHermes to benefit from the managed cloud infrastructure.

🔗 Tweet @MiniMax_AI


Gemini Skills in Chrome — your prompts in one click

April 14 — Google Chrome integrates a new feature called “Skills” for Gemini in the browser. You can now save your most useful prompts and run them again with a single click, without retyping. A library of predefined prompts is also available to get started quickly.

The feature was announced on April 14 and confirmed as available on April 15, 2026, then included in the April 17 @GoogleAI weekly recap.

🔗 Tweet @googlechrome (Apr. 14) 🔗 Tweet @googlechrome (Apr. 15)


Gemini API — Prepay Billing in Google AI Studio

April 15 — Google AI Studio introduces “Prepay Billing” for the Gemini API. Developers can now buy credits in advance and consume them as they go, eliminating end-of-month billing surprises.

Automatic top-up is available when the balance is low. The feature is compatible with Spend Caps (launched previously) and Usage Tiers. It is available in the United States for new Google Cloud billing accounts, with a global rollout in the coming weeks. Established accounts with high usage levels will be able to switch to postpaid.

🔗 Tweet @GoogleAIStudio


Kimi Prefill-as-a-Service — cross-datacenter inference

April 18 — Moonshot AI (Kimi) publishes a technical advance in inference infrastructure: Prefill-as-a-Service (PraaS). The architecture pushes Prefill/Decode disaggregation (prefill/decode disaggregation) beyond a single cluster, toward a cross-datacenter architecture with heterogeneous hardware.

The reported results: 1.54× additional throughput (throughput) and -64% on P90 TTFT (time to first token). The key technology is the Kimi Linear hybrid model, which reduces the transfer cost of the KV cache (key-value cache) between datacenters. This is not a consumer launch but a research publication on distributed inference infrastructure, with a direct impact on reducing Kimi’s cost per token.

🔗 Tweet @Kimi_Moonshot 🔗 arXiv paper


Claude Code v2.1.114 and Runway Seedance 2.0 API

April 18 — Claude Code v2.1.114 fixes a crash that occurred when a member of an agent team requested access to a tool through the permissions dialog.

April 16 — Runway makes Seedance 2.0 available via the Runway API for developers. After the web launch (April 9), 1080p rendering (April 16), and the iOS app (April 17), API access completes the model’s multi-channel rollout. Documentation is available at dev.runwayml.com.

🔗 Claude Code CHANGELOG 🔗 Tweet @runwayml — Seedance API


What this means

The simultaneous launch of Grok’s STT and TTS APIs is the week’s most aggressive pricing move. By cutting prices by 2× to 10× compared with ElevenLabs, AssemblyAI, and OpenAI TTS, xAI is clearly signaling that AI audio is becoming a commodity — which will accelerate adoption among independent developers and startups, but compress margins for established players. The combination of one of the lowest speech recognition error rates on the market, bargain pricing, and expressive tags makes these APIs immediately usable in production.

Claude for Word and Gemini Skills in Chrome reflect two different strategies: Anthropic integrates its model into existing office productivity tools, where its users already spend their days; Google, meanwhile, enriches its browser to make Gemini indispensable in daily use. Both approaches seek to reduce friction in accessing the model.

Luma × Wonder Project × AWS illustrates the emergence of a new Hollywood studio model: generative AI integrated into every stage of production, AWS cloud infrastructure, and the ambition to “localize” in Los Angeles productions that had been moving offshore. The announcement is as symbolic as it is technical — it validates Realtime Hybrid Filmmaking as an industrializable pipeline, not just a concept.


Sources

This document was translated from the fr version into en using the gpt-5.5 model. For more information about the translation process, see https://gitlab.com/jls42/ai-powered-markdown-translator