Z.ai launches GLM-5, its new flagship open-source model with 744 billion parameters under the MIT license, which rises to the top rank of open-source models on coding and agentic tasks. Anthropic publishes an ASL-4 sabotage risk report for Opus 4.6, OpenAI enriches its API with agentic primitives, and Kimi reveals a system of 100 parallel sub-agents. On the ecosystem side, Runway raises $315 million and ElevenLabs launches an expressive mode for its voice agents.
Z.ai launches GLM-5: 744B parameters, open-source under MIT license
February 11 — Z.ai (Zhipu AI) launches GLM-5, its new frontier model designed for complex systems engineering and long-duration agentic tasks. Compared to GLM-4.5, the model grows from 355B parameters (32B active) to 744B parameters (40B active), with pre-training data increasing from 23T to 28.5T tokens.
GLM-5 integrates DeepSeek Sparse Attention (DSA) to reduce deployment costs while preserving long-context capability, and introduces “slime”, an asynchronous reinforcement learning infrastructure that improves post-training throughput.
| Benchmark | GLM-5 | GLM-4.7 | Kimi K2.5 | Claude Opus 4.5 | Gemini 3 Pro |
|---|---|---|---|---|---|
| SWE-bench Verified | 77.8% | 73.8% | 76.8% | 80.9% | 76.2% |
| HLE (text) | 30.5 | 24.8 | 31.5 | 28.4 | 37.2 |
| HLE w/ Tools | 50.4 | 42.8 | 51.8 | 43.4 | 45.8 |
| Terminal-Bench 2.0 | 56.2 | 41.0 | 50.8 | 59.3 | 54.2 |
| Vending Bench 2 | $4,432 | $2,377 | $1,198 | $4,967 | $5,478 |
GLM-5 positions itself as the best open-source model on reasoning, coding, and agentic tasks, bridging the gap with proprietary frontier models. On Vending Bench 2, a benchmark that simulates managing a vending machine over a year, GLM-5 finishes with a balance of 4,967).
Beyond code, GLM-5 can directly generate .docx, .pdf, and .xlsx files — proposals, financial reports, spreadsheets — delivered turnkey. Z.ai deploys an Agent mode with built-in skills for document creation, supporting multi-turn collaboration.
The model weights are published on Hugging Face under the MIT license. GLM-5 is compatible with Claude Code and OpenClaw, and available on OpenRouter. Deployment is progressive, starting with Coding Plan Max subscribers.
🔗 GLM-5 Technical Blog 🔗 Announcement on X
Anthropic publishes first ASL-4 sabotage risk report
February 11 — Anthropic publishes a sabotage risk report for Claude Opus 4.6, in anticipation of the ASL-4 (AI Safety Level 4) safety threshold for autonomous AI R&D.
Upon the release of Claude Opus 4.5, Anthropic committed to writing sabotage risk reports for every new frontier model. Rather than navigating vague thresholds, the company chose to proactively respect the higher ASL-4 safety standard.
| Element | Detail |
|---|---|
| Model evaluated | Claude Opus 4.6 |
| Safety threshold | ASL-4 (AI Safety Level 4) |
| Domain | Autonomous AI R&D |
| Format | Public PDF report |
| Precedent | Commitment made during Opus 4.5 launch |
This is a significant step in AI safety transparency: Anthropic is one of the first labs to publish such a sabotage report for a model in production.
When we released Claude Opus 4.5, we knew future models would be close to our AI Safety Level 4 threshold for autonomous AI R&D. We therefore committed to writing sabotage risk reports for future frontier models. Today we’re delivering on that commitment for Claude Opus 4.6. — @AnthropicAI on X
OpenAI: new agentic primitives in the Responses API
February 10 — OpenAI introduces three new primitives in the Responses API for long-duration agentic work.
Server-side compaction
Allows multi-hour agent sessions without hitting context limits. Compaction is managed server-side. Triple Whale, an early access tester, reports having achieved 150 tool calls and 5 million tokens in a single session without loss of precision.
Containers with networking
Containers hosted by OpenAI can now access the internet in a controlled manner. Administrators define a whitelist of domains in the dashboard, requests must explicitly define a network_policy, and domain secrets can be injected without exposing raw values to the model.
Skills in the API
Native support for the Agent Skills standard with a first pre-built skill (spreadsheets). Skills are reusable and versioned bundles that can be mounted in hosted shell environments, and models decide at runtime whether to invoke them.
| Primitive | Description | Status |
|---|---|---|
| Server-side compaction | Multi-hour sessions without context limits | Available |
| Containers with networking | Controlled internet access for hosted containers | Available |
| Skills in the API | Reusable bundles (first skill: spreadsheets) | Available |
Kimi Agent Swarm: orchestration of 100 sub-agents
February 10 — Kimi (Moonshot AI) unveils Agent Swarm, a multi-agent coordination capability allowing the parallelization of complex tasks with up to 100 specialized sub-agents.
The system can execute more than 1,500 tool calls and achieves a speed 4.5x higher than sequential executions. Use cases cover simultaneous multi-file generation (Word, Excel, PDFs), parallel content analysis, and creative generation in multiple styles in parallel. Agent Swarm resolves a structural limit of LLMs: the degradation of reasoning during long tasks that fill the context.
OpenAI Harness Engineering: zero lines of manual code with Codex
February 11 — OpenAI publishes feedback on building an internal software product with zero lines of code written manually. For 5 months, a team of 3 to 7 engineers used exclusively Codex to generate all code.
| Metric | Value |
|---|---|
| Lines of code generated | ~1 million |
| Pull requests | ~1,500 |
| PRs per engineer per day | 3.5 on average |
| Internal users | Several hundred |
| Estimated time | 1/10th of the time needed by hand |
| Codex sessions | Up to 6+ hours |
The “Harness Engineering” approach redefines the role of the engineer: designing environments, specifying intent, and building feedback loops for agents, rather than writing code. Documentation structured in the repo serves as a guide (AGENTS.md as table of contents), the architecture is rigid with linters and structural tests generated by Codex, and recurring tasks scan for deviations and open refactoring PRs automatically.
Runway raises $315 million in Series E
February 10 — Runway announces a 5.3 billion. The round is led by General Atlantic, with participation from NVIDIA, Adobe Ventures, AMD Ventures, Fidelity, AllianceBernstein, and others.
| Detail | Value |
|---|---|
| Amount | $315M |
| Series | E |
| Valuation | 3.3B in Series D) |
| Lead investor | General Atlantic |
| Total raised since 2018 | $860M |
Funds will be used to pre-train the next generation of “world models” — models capable of simulating the physical world — and deploy them in new products and industries. This announcement comes after the launch of Gen-4.5, Runway’s latest video generation model.
🔗 Official Announcement 🔗 Runway Post on X
Cowork available on Windows
February 10 — Claude Cowork, the desktop application for multi-step tasks, is now available on Windows in research preview with full feature parity compared to macOS.
| Feature | Description |
|---|---|
| File Access | Reading and writing local files |
| Plugins | Support for Cowork plugins |
| MCP Connectors | Integration with MCP servers |
| Folder Instructions | Claude.md style — natural language instructions per project |
Cowork on Windows is available for all paid Claude plans via claude.com/cowork.
Free features on the Claude free plan
February 11 — Anthropic expands features accessible on the free Claude plan. File creation, connectors, skills, and compaction are now available without a subscription. Compaction allows Claude to automatically summarize previous context so that long conversations can continue without restarting.
Claude Code Plan Mode in Slack
February 11 — The Claude Code integration in Slack receives Plan Mode. When giving Claude a code task in Slack, it can now elaborate a plan before executing, allowing validation of the approach before implementation.
| Feature | Description |
|---|---|
| Plan Mode | Plan elaboration before execution |
| Automatic detection | Intelligent routing between code and chat |
| PR Creation | ”Create PR” button directly from Slack |
| Prerequisites | Pro, Max, Team or Enterprise Plan + connected GitHub |
ElevenLabs launches Expressive Mode for its voice agents
February 10 — ElevenLabs unveils Expressive Mode for ElevenAgents, an evolution that makes its AI voice agents capable of adapting their tone, emotion, and emphasis in real-time.
The mode relies on Eleven v3 Conversational, a voice synthesis model optimized for real-time dialogue, coupled with a new turn-taking system that reduces interruptions. The price remains at $0.08 per minute. In parallel, ElevenLabs restructures its platform into three product families: ElevenAgents (voice agents), ElevenCreative (creative tools), and ElevenAPI (developer platform).
Kimi K2.5 integrated on Qoder
February 9 — Qoder (AI platform for developers) deploys Kimi K2.5 as the flagship model of its marketplace, with a SWE-bench Verified score of 76.8% and an advantageous rate (0.3x credit in Efficient tier). The recommended workflow: use heavy models for design and architecture, then K2.5 for implementation.
What this means
Open-source continues to progress rapidly towards frontier models. Z.ai’s GLM-5 narrows the gap with Claude Opus 4.5 and GPT-5.2 on coding and agentic task benchmarks, while being available under the MIT license. The publication of the ASL-4 sabotage report by Anthropic establishes a precedent for safety transparency that other labs will likely be compelled to follow.
On the developer side, OpenAI’s agentic primitives (server-side compaction, network containers, API skills) and the “Harness Engineering” approach outline a future where autonomous agents manage multi-hour sessions. Kimi Agent Swarm pushes this logic even further with the orchestration of hundreds of sub-agents in parallel.
Sources
- Z.ai — GLM-5 Tech Blog
- Z.ai — GLM-5 Announcement on X
- Anthropic — Sabotage Risk Report thread
- OpenAIDevs — Agentic Primitives
- OpenAI — Harness Engineering
- Kimi — Agent Swarm
- Runway — Series E Funding
- Claude — Cowork Windows
- Claude — Free plan features
- Boris Cherny — Claude Code Slack
- ElevenLabs — Expressive Mode
- Qoder — Kimi K2.5