GitHub Copilot CLI enterprise plugins, VS Code BYOK + /chronicle, Claude Code 60+ fixes

GitHub Copilot is taking another step in enterprise adoption with centralized CLI plugin management, while April updates for VS Code bring BYOK, semantic search across all workspaces, and a searchable chat history. Claude Code continues its intense pace with 60 additional fixes this week. In parallel, Luma AI opens the API for its Uni-1.1 model, which leads the Human Preference Elo ranking, and Qwen3.6-35B-A3B posts a +8.2-point gain on the ODinW benchmark.

GitHub Copilot CLI — Enterprise-managed plugins in public preview

May 6 — GitHub is launching centralized management of Copilot CLI plugins for enterprises in public preview. Administrators can now define and distribute plugins (custom agents, skills, hooks, MCP configurations) to all users in their organization from a single settings.json file.

How it works in practice:

Parameter	Value
Configuration file	`.github-private/.github/copilot/settings.json`
Required plans	Copilot Business, Copilot Enterprise
Status	Public Preview
Installation	Automatic on authentication

Copilot CLI pulls and applies these settings for all licensed users. Plugins can include custom agents, workflow hooks, and organization-wide MCP configurations. If the company already configured a source for custom agents via .github-private, that same repository is reused. Administrators can verify the configuration from the Agents page in enterprise settings, under AI controls.

This feature fills an important gap between individual Copilot CLI adoption and organization-wide deployment: until now, each developer had to configure their plugins manually.

🔗 GitHub Changelog announcement

GitHub Copilot in VS Code — April 2026 updates (v1.116–v1.119)

May 6 — GitHub publishes the Copilot update roundup for VS Code covering v1.116 to v1.119 (April to early May 2026), following VS Code’s move to weekly stable releases.

Smarter context:

Feature	Status
Semantic search (all workspaces)	Available
`githubTextSearch` (grep across GitHub repos)	Available
`/chronicle` (local chat history)	Experimental
Prompt cache + deferred tool loading	Available

Semantic search is now active in all workspaces, no longer limited to indexed repositories. The githubTextSearch tool enables grep-style queries across entire GitHub repositories and organizations. The experimental /chronicle command creates a local database of chat history to retrieve past sessions, touched files, and referenced PRs.

Richer agent experience:

Feature	Status
BYOK (Business + Enterprise)	Available
Integrated Browser	Available
Remote CLI monitoring	Experimental
Access to open terminals	Available

BYOK (Bring Your Own Key) lets Copilot Business and Enterprise organizations connect their own API keys directly in VS Code: OpenRouter, Microsoft Foundry, Google, Anthropic, OpenAI, Ollama, and Foundry Local are supported. Agents can read and write in open terminals (REPLs, interactive scripts). The Integrated Browser feature lets users share browser tabs in real time as context for agents. Copilot CLI sessions can be controlled remotely from GitHub.com or the mobile app (experimental).

🔗 GitHub Changelog announcement

Claude Code — 60+ reliability fixes (week of May 8)

May 8 — The Claude Code team publishes a thread listing more than 60 fixes shipped this week, adding to the 50+ from the previous week.

“Last week we shipped 50+ Claude Code reliability fixes. This week it’s 60+ more. Smoother long-running sessions, a more efficient agent loop, auth that works in more environments, and terminal fixes.” — @ClaudeDevs on X

Area	Notable fixes
Stability	`claude -p` accepts >10 MB via stdin, wake-from-sleep recovery
Agent loop	Prompt cache for sub-agents, opt-in 1h cache via `ENABLE_PROMPT_CACHING_1H`
Authentication	OAuth code pasted directly into the terminal (WSL, SSH, containers)
MCP	Automatic reconnection + clear status in `/mcp`, bounded memory fix
Terminal rendering	Cursor, VS Code, JetBrains scrolling fix; Japanese character fix

To apply these fixes: claude update.

🔗 Claude Code changelog

GitHub Copilot — Grok Code Fast 1 deprecation (May 15)

May 8 — GitHub announces the deprecation of Grok Code Fast 1 across all Copilot environments on May 15, 2026, one week after the announcement. The reason: the model itself is being deprecated by xAI.

Model	Deprecation date	Suggested alternative
Grok Code Fast 1	May 15, 2026	GPT-5 mini, Claude Haiku 4.5

Copilot Enterprise administrators should verify that the alternative models are enabled in their model policies before that date. The transition after deprecation is automatic — no additional action is required to remove the model.

🔗 Deprecation announcement

Google Health App — Fitbit becomes Google Health with Gemini coach

May 8 — The Fitbit app is evolving into the new Google Health app. This redesign keeps all existing Fitbit features and adds a personalized health coach powered by Gemini. The coach analyzes data from wearables, preferred health apps, and medical records to provide proactive health guidance tailored to each user.

The app is compatible with Fitbit and Pixel Watch devices, and integrates with hundreds of third-party apps and devices.

🔗 @GoogleAI announcement

Gemini API — Multimodal File Search, Webhooks, Gemma 4 MTP 3x faster

May 8 — The weekly @GoogleAI roundup lists three developer releases from the week, 11 days before Google I/O:

Feature	Date	Impact
Multimodal File Search	May 5	Verifiable multimodal RAG with page citations
Gemini API Webhooks	May 4	Replaces polling with push notifications
Gemma 4 MTP drafters	May 5	Up to 3x faster inference

The File Search tool now supports custom metadata and page citations, making it possible to build verifiable RAG (Retrieval-Augmented Generation) systems on multimodal sources. Webhooks remove the need for continuous polling on long-running tasks. MTP (Multi-Token Prediction) accelerators for Gemma 4 deliver up to 3x more inference speed in deployment workflows.

🔗 Google Developers blog

Luma AI Uni-1.1 API — Public launch

May 5 — Luma AI is opening its Uni-1.1 API, making its Unified Intelligence model accessible to developers through a REST interface. The model combines reasoning and image generation in a single architecture — unlike the standard approach that assembles multiple separate models at inference time.

Metric	Value
Human Preference Elo	#1 (global generation, style, guided reference)
Image Arena	Top 3 (Text-to-Image + Image Edit)
RISEBench spatial reasoning	Top of the ranking
References per request	Up to 9 images
Generation time	~31 seconds per image
Production partners	Envato, Comfy, Runware, Flora, Krea, Magnific, Fal, LovArt

The API offers two main endpoints: Generate Image (text-to-image with up to 9 reference images to preserve identity, composition, or style) and Modify Image (natural-language editing). Python and JavaScript/TypeScript SDKs are available. Two pricing tiers: Build (usage-based billing) and Scale (higher rate limits, dedicated support).

🔗 Luma AI announcement

NVIDIA + SakanaAI — ICML 2026 paper on TwELL sparse kernels

May 8 — NVIDIA AI and SakanaAI Labs jointly publish a research paper accepted at ICML 2026, focusing on sparse transformer kernels and data formats optimized for execution on modern NVIDIA GPUs. The project is called TwELL.

The core intuition: the human brain activates only the neurons needed for a given thought. Applied to language models, this means selectively computing active weights through structured sparsity, reducing compute load without sacrificing performance. This research aligns with NVIDIA’s direction toward more efficient inference, especially for Mixture-of-Experts (MoE) architectures. The tweet received 50,000 views and 66 reposts in the ML community.

🔗 @NVIDIAAI tweet

Qwen3.6-35B-A3B — +8.2 points on the ODinW benchmark

May 9 — Tongyi Lab (Alibaba) announces a breakthrough in Instruction-Oriented Object Detection with the Qwen3.6-35B-A3B model. Unlike traditional detection, which simply localizes visual elements, this approach aims to semantically understand natural-language instructions to guide detection.

Metric	Qwen3.5	Qwen3.6-35B-A3B	Gain
ODinW score	42.6	50.8	+8.2 pts

An interactive demo is available on ModelScope.

🔗 @Ali_TongyiLab tweet

Tongyi Lab — 1,200+ languages for global inclusion

May 9 — Tongyi Lab (Alibaba) publishes a video titled “1,200+ Languages. One Vision for AI Inclusion”, raising the question of fair access to AI for underrepresented language communities. The initiative targets coverage of more than 1,200 languages — far beyond the 92 languages of Qwen-MT announced in July 2025 — in response to the gap between global technology and the communities it is meant to serve.

🔗 @Ali_TongyiLab tweet

OpenAI Codex Switch — ChatGPT to Codex migration page

May 8 — OpenAI publishes a minimalist tweet pointing to chatgpt.com/codex/switch-to-codex/, with the only message being “Just gonna leave this here.” The tweet generates 517,000 views. This teaser fits into Codex’s positioning strategy as the central development assistant for the ChatGPT platform. The landing page was not accessible at the time of the scan.

🔗 @OpenAI tweet

Briefs

OpenAI supply.openai.com — @OpenAIDevs posts a cryptic tweet: “Available until the goblins notice.” 🧌, linking to supply.openai.com. The page was not accessible at the time of the scan (274,000 views). 🔗 Tweet

What this means

Enterprise is becoming the central Copilot battleground. GitHub is laying the foundations for an organization-wide managed Copilot rollout: centralized plugins via .github-private, BYOK to connect your own models, remote CLI monitoring, and Integrated Browser as live context for agents. These features answer a real demand from IT leaders who want to standardize AI tooling without forcing each developer to configure their own stack. The simultaneous deprecation of Grok Code Fast 1 (replaced by GPT-5 mini or Claude Haiku 4.5) also shows how quickly third-party models are added and then removed in this ecosystem.

Claude Code is betting on reliability. 110+ fixes in two consecutive weeks on specific topics — long sessions, OAuth auth in constrained environments, MCP, terminal rendering — indicate that the Anthropic team has identified reliability as the main blocker to production adoption. The fixes for WSL, SSH, and containers explicitly target enterprise environments where the browser cannot reach localhost. The one-hour prompt cache opt-in for sub-agents is also a signal: long-running multi-agent workflows are becoming a priority use case.

Luma AI and the unified API: an architectural bet. Where most image-generation pipelines stitch together several specialized models, Uni-1 combines reasoning and generation in a single architecture. The ability to use up to 9 reference images per request — and the #1 Human Preference Elo results — suggests that this unified approach offers style consistency that is hard to achieve with assembled pipelines. The 8 partners already in production validate that the API is ready for real workloads.

Alibaba/Qwen is targeting multimodality and multilinguality. The +8.2-point gain on ODinW for Qwen3.6-35B-A3B on language-guided object detection, combined with the ambition to cover 1,200+ languages, points to a Tongyi Lab strategy focused on high social-impact use cases: industrial vision made accessible through text instructions, and AI usable by language communities currently underserved. These two directions converge in a common logic of broad accessibility.