Search

AI News Jan 23, 2026: Claude in Excel, Tasks Claude Code, Codex Agent Loop

AI News Jan 23, 2026: Claude in Excel, Tasks Claude Code, Codex Agent Loop

Busy Week for AI Agents

From January 21 to 23, 2026, several major announcements regarding coding agents and infrastructure. Anthropic launches Claude in Excel and publishes three articles on multi-agent systems, OpenAI details the internal architecture of Codex and its PostgreSQL infrastructure, Qwen open-sources its text-to-speech model, and Runway adds Image to Video to Gen-4.5.


Anthropic: Claude in Excel and Claude Code

Claude in Excel

January 23 — Claude is now available in Microsoft Excel in beta. The integration allows analyzing complete Excel workbooks with their nested formulas and dependencies between tabs.

Features:

  • Understanding of the entire workbook (formulas, multi-tab dependencies)
  • Explanations with cell-level citations
  • Updating assumptions while preserving formulas

Available for Claude Pro, Max, Team, and Enterprise subscribers.

🔗 Claude in Excel


Claude Code v2.1.19: Tasks System

January 23 — Version 2.1.19 introduces Tasks, a new task management system for complex multi-session projects.

We’re turning Todos into Tasks in Claude Code. Tasks are a new primitive that help Claude Code track and complete more complicated projects and collaborate on them across multiple sessions or subagents.

Thariq (@trq212), Claude Code team Anthropic

Tasks Features:

AspectDetail
Storage~/.claude/tasks (files, allows building tools on top)
CollaborationCLAUDE_CODE_TASK_LIST_ID=name claude to share between sessions
DependenciesTasks with dependencies and blockers stored in metadata
BroadcastUpdate of a Task broadcasted to all sessions on the same Task List
CompatibilityWorks with claude -p and AgentSDK

What it’s for: On a complex project (multi-file refactoring, migration, long feature), Claude can break down the work into tasks, track what is done and what remains. Tasks are persisted on disk — they survive context compaction, session closing, and restart. Multiple sessions or subagents can collaborate on the same task list in real-time.

In practice: Claude creates tasks (TaskCreate), lists them (TaskList), and updates their status (TaskUpdate: pending → in_progress → completed). Example on an authentication refactoring:

#1 [completed] Migrate session storage to Redis
#2 [in_progress] Implement refresh token rotation
#3 [pending] Add OAuth integration tests
#4 [pending] Update API documentation

Tasks are stored in ~/.claude/tasks/ and can be shared between sessions via CLAUDE_CODE_TASK_LIST_ID.

Other new features v2.1.19:

  • Shorthand $0, $1 for arguments in custom commands
  • VSCode session forking and rewind for everyone
  • Skills without permissions run without approval
  • CLAUDE_CODE_ENABLE_TASKS=false to temporarily disable

🔗 CHANGELOG Claude Code | Thread @trq212


Claude Code v2.1.18: Customizable Keybindings

Previous version adding the ability to configure keybindings by context and create chord sequences.

Command: /keybindings

⚠️ Note: This feature is currently in preview and is not available for all users.

🔗 Keybindings Documentation


Petri 2.0: Automated Alignment Audits

January 22 — Anthropic publishes Petri 2.0, an update to its automated behavioral audit tool for language models.

What it’s for: Petri tests if an LLM could behave mainly problematically — manipulation, deception, rule circumvention. The tool generates realistic scenarios and observes the model’s responses to detect unwanted behaviors before they occur in production.

ImprovementDescription
70 new scenariosExtended seed library to cover more edge cases
Eval-awareness mitigationsThe model must not know it is being tested — otherwise it adapts its behavior. Petri 2.0 improves scenario realism to avoid this detection.
Frontier comparisonsEvaluation results for recent models (Claude, GPT, Gemini)

🔗 Petri 2.0 | GitHub


Blog: When to Use (or Not) Multi-Agent Systems

January 23 — Anthropic publishes a pragmatic guide on multi-agent architectures. The main message: do not use multi-agent by default.

We’ve seen teams invest months building elaborate multi-agent architectures only to discover that improved prompting on a single agent achieved equivalent results.

The article identifies 3 cases where multi-agent truly brings value:

CaseProblemMulti-agent Solution
Context PollutionAn agent generates voluminous data of which only a summary is useful afterwardsA sub-agent retrieves 2000 tokens of history, returns just “order delivered” to the main agent
ParallelizationMultiple independent searches to be doneLaunch 5 agents in parallel on 5 different sources instead of processing them sequentially
SpecializationToo many tools (20+) in a single agent degrades its ability to choose the right oneSeparate into specialized agents: one for CRM, one for marketing, one for messaging

The trap to avoid: Dividing by type of work (one agent plans, another implements, another tests). Each handover loses context and degrades quality. It is better for a single agent to handle a feature from end to end.

Real cost: 3-10x more tokens than a single agent for the same task.

Other articles in the series:

Building agents with Skills (Jan 22)

Instead of building agents specialized by domain, Anthropic proposes building skills: collections of files (workflows, scripts, best practices) that a generalist agent loads on demand.

Progressive disclosure in 3 levels:

LevelContentSize
1Metadata (name, description)~50 tokens
2Full SKILL.md file~500 tokens
3Reference documentation2000+ tokens

Each level is loaded only if necessary. Result: an agent can have hundreds of skills without saturating its context.

🔗 Building agents with Skills


Anthropic identifies 8 trends for software development in 2026.

Key message: Engineers are moving from writing code to coordinating agents that write code.

Important nuance: AI is used in ~60% of work, but only 0-20% can be fully delegated — human supervision remains essential.

CompanyResult
RakutenClaude Code on vLLM codebase (12.5M lines), 7h of autonomous work
TELUS30% faster, 500k hours saved
Zapier89% AI adoption, 800+ internal agents

🔗 Eight trends 2026


OpenAI: Codex Architecture and Infrastructure

Unrolling the Codex agent loop

January 23 — OpenAI opens the scenes of Codex CLI. First article of a series on the internal functioning of their software agent.

What we learn:

The agent loop is simple in theory: user sends a request → model generates a response or requests a tool → agent executes the tool → model resumes with the result → until a final response. In practice, the subtleties are in context management.

Prompt caching — the key to performance:

Each conversation turn adds content to the prompt. Without optimization, it is quadratic in sent tokens. Prompt caching allows reusing calculations from previous turns. Condition: the new prompt must be an exact prefix of the old one. OpenAI details the pitfalls that break the cache (changing MCP tools order, modifying config mid-conversation).

Automatic compaction:

When context exceeds a threshold, Codex calls /responses/compact which returns a compressed version of the conversation. The model keeps latent understanding via an opaque encrypted_content.

Zero Data Retention (ZDR):

For clients who do not want their data stored, encrypted_content allows preserving the model’s reasoning between turns without storing data server-side.

First article of a series — the next ones will cover CLI architecture, tool implementation, and sandboxing.

🔗 Unrolling the Codex agent loop | Codex GitHub


Scaling PostgreSQL: 800 million ChatGPT users

January 22 — OpenAI details how PostgreSQL powers ChatGPT and the API for 800 million users with millions of requests per second.

MetricValue
Users800 million
ThroughputMillions of QPS
Replicas~50 multi-region read replicas
p99 LatencyDouble digit ms client-side
AvailabilityFive-nines (99.999%)

Architecture:

  • Single primary Azure PostgreSQL flexible server
  • PgBouncer for connection pooling (connection latency: 50ms → 5ms)
  • Write-heavy workloads migrated to Azure Cosmos DB
  • Cache locking to protect against cache miss storms
  • Cascading replication in testing to exceed 100 replicas

Only SEV-0 PostgreSQL in the last 12 months: during the viral launch of ChatGPT ImageGen (100M new users in one week, write traffic x10).

🔗 Scaling PostgreSQL


Qwen: Qwen3-TTS Open-Source

January 22-23 — Alibaba releases Qwen3-TTS in open-source under Apache 2.0 license.

FeatureDetail
LicenseApache 2.0
Voice cloningYes
MLX-Audio supportAvailable

Installation:

uv pip install -U mlx-audio --prerelease=allow

🔗 Qwen3-TTS on X


Runway: Gen-4.5 Image to Video

January 21 — Runway adds Image to Video functionality to Gen-4.5.

FeatureDescription
Image to VideoTransformation of an image into cinematic video
Camera controlPrecise camera control
Coherent narrativesCoherent narratives over time
Character consistencyCharacters that remain consistent

Available for all Runway paid plans. Temporary promo: 15% discount.

🔗 Runway on X


What This Means

This week marks a maturation of coding agents tools. The two giants (Anthropic and OpenAI) publish detailed technical documentation on their agent architecture — a sign that the market is moving from the “demo” phase to the “production” phase.

On the infrastructure side, OpenAI’s PostgreSQL article shows that a single-primary architecture can hold up at the scale of hundreds of millions of users with the right optimizations.

The arrival of Claude in Excel opens a new front: AI integrated directly into daily productivity tools.


Sources