AI Updates on 2026-02-20

Google releases Gemini 3.1 Pro with major improvements in reasoning, scoring 77.1% on ARC-AGI-2 benchmark (2x better than Gemini 3 Pro) @demishassabis
Anthropic launches Claude Sonnet 4.6 with 1M token context window in beta, jumping 130 points in Code Arena to rank #3 @arena
Anthropic introduces Claude Code Security in limited preview, scanning codebases for vulnerabilities and suggesting patches @claudeai
Alibaba releases Qwen3-Coder-Next API on Alibaba Cloud Model Studio with integration into Coding Plan @Alibaba_Qwen
Google launches Lyria 3 generative music model in beta, creating tracks with vocals and lyrics from photos and text @GeminiApp
NVIDIA releases Nemotron-Nano-9B-v2-Japanese achieving state-of-the-art for models under 10B parameters on Nejumi Leaderboard 4 @NVIDIAAIDev

Amazon bans Claude Code internally despite being Anthropic investor, pushing developers toward their own Kiro tool @GergelyOrosz
Perplexity reports Gemini 3.1 Pro is second most picked model by Enterprise customers after Claude 4.5 Sonnet/Opus family @AravSrinivas
Gemini 3.1 Pro Preview costs less than 50% to run evaluations compared to Claude Opus 4.6 and GPT-5.2 while scoring highest in Intelligence Index @ArtificialAnlys
OpenAI reports 18-24 year-olds account for nearly 50% of ChatGPT usage in India, with fastest growing Codex market globally (4x weekly users in 2 weeks) @sama
ggml.ai joins Hugging Face to continue building ggml and make llama.cpp more accessible to open-source community @ggerganov
Peak XV raises $1.3B doubling down on AI as global VC rivalry in India intensifies @TechCrunch
UAE's G42 teams up with Cerebras to deploy 8 exaflops of compute in India @TechCrunch

MIT CSAIL launches 2025 AI Agent Index documenting capabilities and safety features of 30 top AI agents, finding only 4 of 13 frontier-autonomous agents disclose safety evaluations @MIT_CSAIL
Research finds AI models can be jailbroken into sophisticated p-hacking when reframed as "responsible uncertainty quantification" despite resisting direct requests @ahall_research
US government launches AI Agent Standards Initiative amid heightened public concerns around autonomous AI agents @MIT_CSAIL

Gemini 3.1 Pro successfully generates photorealistic 3D ocean simulation with complex physics techniques including Gerstner Waves and subsurface scattering @deedydas
Perplexity Finance adds tap-through auditability to SEC filings with pre-scrolled pages for line items @AravSrinivas
DreamDojo releases open-source interactive world model for robotics that generates future frames from motor controls, pre-trained on 44K hours of human videos @DrJimFan
Oscar Health migrates 600 people from Jira to Linear in one month despite having one of three most complex Jira instances globally @cjc

METR estimates Claude Opus 4.6 has 50%-time-horizon of 14.5 hours on software tasks (95% CI: 6-98 hours), highest reported but extremely noisy due to task suite saturation @METR_Evals
Study comparing 22 AI models on analog clock generation shows clear capability threshold crossed in November 2025, with Claude Opus 4.5 significantly outperforming GPT-4o @randal_olson
NVIDIA Alpamayo 1 becomes Hugging Face's top-downloaded robotics model with 100K downloads for autonomous-driving vision-language-action evaluation @NVIDIADRIVE