Search

GLM-5 open-source, Sabotage Risk Report ASL-4, OpenAI launches agentic primitives

GLM-5 open-source, Sabotage Risk Report ASL-4, OpenAI launches agentic primitives

Z.ai launches GLM-5, its new flagship open-source model with 744 billion parameters under the MIT license, which rises to the top rank of open-source models on coding and agentic tasks. Anthropic publishes an ASL-4 sabotage risk report for Opus 4.6, OpenAI enriches its API with agentic primitives, and Kimi reveals a system of 100 parallel sub-agents. On the ecosystem side, Runway raises $315 million and ElevenLabs launches an expressive mode for its voice agents.


Z.ai launches GLM-5: 744B parameters, open-source under MIT license

February 11 — Z.ai (Zhipu AI) launches GLM-5, its new frontier model designed for complex systems engineering and long-duration agentic tasks. Compared to GLM-4.5, the model grows from 355B parameters (32B active) to 744B parameters (40B active), with pre-training data increasing from 23T to 28.5T tokens.

GLM-5 integrates DeepSeek Sparse Attention (DSA) to reduce deployment costs while preserving long-context capability, and introduces “slime”, an asynchronous reinforcement learning infrastructure that improves post-training throughput.

BenchmarkGLM-5GLM-4.7Kimi K2.5Claude Opus 4.5Gemini 3 Pro
SWE-bench Verified77.8%73.8%76.8%80.9%76.2%
HLE (text)30.524.831.528.437.2
HLE w/ Tools50.442.851.843.445.8
Terminal-Bench 2.056.241.050.859.354.2
Vending Bench 2$4,432$2,377$1,198$4,967$5,478

GLM-5 positions itself as the best open-source model on reasoning, coding, and agentic tasks, bridging the gap with proprietary frontier models. On Vending Bench 2, a benchmark that simulates managing a vending machine over a year, GLM-5 finishes with a balance of 4,432,approachingClaudeOpus4.5(4,432, approaching Claude Opus 4.5 (4,967).

Beyond code, GLM-5 can directly generate .docx, .pdf, and .xlsx files — proposals, financial reports, spreadsheets — delivered turnkey. Z.ai deploys an Agent mode with built-in skills for document creation, supporting multi-turn collaboration.

The model weights are published on Hugging Face under the MIT license. GLM-5 is compatible with Claude Code and OpenClaw, and available on OpenRouter. Deployment is progressive, starting with Coding Plan Max subscribers.

🔗 GLM-5 Technical Blog 🔗 Announcement on X


Anthropic publishes first ASL-4 sabotage risk report

February 11 — Anthropic publishes a sabotage risk report for Claude Opus 4.6, in anticipation of the ASL-4 (AI Safety Level 4) safety threshold for autonomous AI R&D.

Upon the release of Claude Opus 4.5, Anthropic committed to writing sabotage risk reports for every new frontier model. Rather than navigating vague thresholds, the company chose to proactively respect the higher ASL-4 safety standard.

ElementDetail
Model evaluatedClaude Opus 4.6
Safety thresholdASL-4 (AI Safety Level 4)
DomainAutonomous AI R&D
FormatPublic PDF report
PrecedentCommitment made during Opus 4.5 launch

This is a significant step in AI safety transparency: Anthropic is one of the first labs to publish such a sabotage report for a model in production.

When we released Claude Opus 4.5, we knew future models would be close to our AI Safety Level 4 threshold for autonomous AI R&D. We therefore committed to writing sabotage risk reports for future frontier models. Today we’re delivering on that commitment for Claude Opus 4.6. — @AnthropicAI on X

🔗 Anthropic Thread


OpenAI: new agentic primitives in the Responses API

February 10 — OpenAI introduces three new primitives in the Responses API for long-duration agentic work.

Server-side compaction

Allows multi-hour agent sessions without hitting context limits. Compaction is managed server-side. Triple Whale, an early access tester, reports having achieved 150 tool calls and 5 million tokens in a single session without loss of precision.

Containers with networking

Containers hosted by OpenAI can now access the internet in a controlled manner. Administrators define a whitelist of domains in the dashboard, requests must explicitly define a network_policy, and domain secrets can be injected without exposing raw values to the model.

Skills in the API

Native support for the Agent Skills standard with a first pre-built skill (spreadsheets). Skills are reusable and versioned bundles that can be mounted in hosted shell environments, and models decide at runtime whether to invoke them.

PrimitiveDescriptionStatus
Server-side compactionMulti-hour sessions without context limitsAvailable
Containers with networkingControlled internet access for hosted containersAvailable
Skills in the APIReusable bundles (first skill: spreadsheets)Available

🔗 OpenAIDevs Thread


Kimi Agent Swarm: orchestration of 100 sub-agents

February 10 — Kimi (Moonshot AI) unveils Agent Swarm, a multi-agent coordination capability allowing the parallelization of complex tasks with up to 100 specialized sub-agents.

The system can execute more than 1,500 tool calls and achieves a speed 4.5x higher than sequential executions. Use cases cover simultaneous multi-file generation (Word, Excel, PDFs), parallel content analysis, and creative generation in multiple styles in parallel. Agent Swarm resolves a structural limit of LLMs: the degradation of reasoning during long tasks that fill the context.

🔗 Kimi Announcement


OpenAI Harness Engineering: zero lines of manual code with Codex

February 11 — OpenAI publishes feedback on building an internal software product with zero lines of code written manually. For 5 months, a team of 3 to 7 engineers used exclusively Codex to generate all code.

MetricValue
Lines of code generated~1 million
Pull requests~1,500
PRs per engineer per day3.5 on average
Internal usersSeveral hundred
Estimated time1/10th of the time needed by hand
Codex sessionsUp to 6+ hours

The “Harness Engineering” approach redefines the role of the engineer: designing environments, specifying intent, and building feedback loops for agents, rather than writing code. Documentation structured in the repo serves as a guide (AGENTS.md as table of contents), the architecture is rigid with linters and structural tests generated by Codex, and recurring tasks scan for deviations and open refactoring PRs automatically.

🔗 Harness Engineering Blog


Runway raises $315 million in Series E

February 10 — Runway announces a 315millionSeriesEfundraising,bringingitsvaluationto315 million Series E fundraising, bringing its valuation to 5.3 billion. The round is led by General Atlantic, with participation from NVIDIA, Adobe Ventures, AMD Ventures, Fidelity, AllianceBernstein, and others.

DetailValue
Amount$315M
SeriesE
Valuation5.3B(vs5.3B (vs 3.3B in Series D)
Lead investorGeneral Atlantic
Total raised since 2018$860M

Funds will be used to pre-train the next generation of “world models” — models capable of simulating the physical world — and deploy them in new products and industries. This announcement comes after the launch of Gen-4.5, Runway’s latest video generation model.

🔗 Official Announcement 🔗 Runway Post on X


Cowork available on Windows

February 10 — Claude Cowork, the desktop application for multi-step tasks, is now available on Windows in research preview with full feature parity compared to macOS.

FeatureDescription
File AccessReading and writing local files
PluginsSupport for Cowork plugins
MCP ConnectorsIntegration with MCP servers
Folder InstructionsClaude.md style — natural language instructions per project

Cowork on Windows is available for all paid Claude plans via claude.com/cowork.

🔗 Cowork Windows Announcement


Free features on the Claude free plan

February 11 — Anthropic expands features accessible on the free Claude plan. File creation, connectors, skills, and compaction are now available without a subscription. Compaction allows Claude to automatically summarize previous context so that long conversations can continue without restarting.

🔗 Free plan Announcement


Claude Code Plan Mode in Slack

February 11 — The Claude Code integration in Slack receives Plan Mode. When giving Claude a code task in Slack, it can now elaborate a plan before executing, allowing validation of the approach before implementation.

FeatureDescription
Plan ModePlan elaboration before execution
Automatic detectionIntelligent routing between code and chat
PR Creation”Create PR” button directly from Slack
PrerequisitesPro, Max, Team or Enterprise Plan + connected GitHub

🔗 Boris Cherny Thread


ElevenLabs launches Expressive Mode for its voice agents

February 10 — ElevenLabs unveils Expressive Mode for ElevenAgents, an evolution that makes its AI voice agents capable of adapting their tone, emotion, and emphasis in real-time.

The mode relies on Eleven v3 Conversational, a voice synthesis model optimized for real-time dialogue, coupled with a new turn-taking system that reduces interruptions. The price remains at $0.08 per minute. In parallel, ElevenLabs restructures its platform into three product families: ElevenAgents (voice agents), ElevenCreative (creative tools), and ElevenAPI (developer platform).

🔗 Expressive Mode Blog


Kimi K2.5 integrated on Qoder

February 9 — Qoder (AI platform for developers) deploys Kimi K2.5 as the flagship model of its marketplace, with a SWE-bench Verified score of 76.8% and an advantageous rate (0.3x credit in Efficient tier). The recommended workflow: use heavy models for design and architecture, then K2.5 for implementation.

🔗 Qoder Announcement


What this means

Open-source continues to progress rapidly towards frontier models. Z.ai’s GLM-5 narrows the gap with Claude Opus 4.5 and GPT-5.2 on coding and agentic task benchmarks, while being available under the MIT license. The publication of the ASL-4 sabotage report by Anthropic establishes a precedent for safety transparency that other labs will likely be compelled to follow.

On the developer side, OpenAI’s agentic primitives (server-side compaction, network containers, API skills) and the “Harness Engineering” approach outline a future where autonomous agents manage multi-hour sessions. Kimi Agent Swarm pushes this logic even further with the orchestration of hundreds of sub-agents in parallel.


Sources