AI Updates on 2026-02-24

Alibaba releases Qwen3.5 Medium Model Series including 35B-A3B, 122B-A10B, 27B, and Qwen3.5-Flash with 1M context length, surpassing previous Qwen3-235B models through better architecture and RL @Alibaba_Qwen
OpenAI launches GPT-5.3-Codex with improved precision and instruction-following, now available on OpenRouter for agentic coding tasks @OpenRouter
OpenAI releases gpt-realtime-1.5 with improved intelligence, instruction following, and voice quality for real-time applications @juberti
Anthropic updates Responsible Scaling Policy to version 3.0, separating unilateral safety commitments from industry recommendations and committing to publish Frontier Safety Roadmaps @AnthropicAI
Anthropic launches Cowork enabling Claude to work across Excel and PowerPoint end-to-end, plus new enterprise plugins for HR, design, engineering, and financial analysis @claudeai

Meta announces multi-year agreement with AMD to integrate Instinct GPUs into infrastructure with 6GW planned data center capacity for AI development @AIatMeta
Stripe valuation soars 74% to $159 billion with 2025 businesses generating $1.9T volume equivalent to 1.6% of global GDP @TechCrunch
Software development jobs grew 10% over last year while overall market declined 5.8%, contradicting predictions of AI replacing developers @perborgen
OpenAI COO states "we have not yet really seen AI penetrate enterprise business processes" despite widespread adoption @TechCrunch
Waymo begins welcoming first riders in Dallas, Houston, San Antonio, and Orlando as robotaxi expansion continues @Waymo

Anthropic publishes persona selection model theory explaining why AI assistants exhibit human-like behavior through autocomplete engines generating stories about helpful AI characters @AnthropicAI
Global AI summit produces generic promises signed by 86 countries, criticized as "AI-industry approved" rather than meaningfully protecting the public @AINowInstitute
Stanford AI+Education Summit reveals critical tensions including assessment crisis, AI product overload, inequitable access, and urgent literacy gaps @StanfordHAI
New study finds phone-free schools reduce psychological consultations, bullying incidents, and improve test scores particularly for low socioeconomic students @benryanwriter

Cursor launches agent demonstrations showing AI building software and recording video demos of finished work, with one-third of merged PRs now coming from cloud sandbox agents @cursor_ai
Perplexity and Comet roll out upgraded voice mode enabling full hands-free browser control using OpenAI's latest real-time model @AravSrinivas
Notion launches Custom Agents that run autonomously 24/7, connect to all business apps, and can be built in minutes without coding @ivanhzhao
Google DeepMind partners with Wyclef Jean to demonstrate Music AI Sandbox tools for professional musicians, used in creating "Back from Abu Dhabi" @GoogleDeepMind

Confluence Labs achieves 97.9% on ARC-AGI-2 benchmark at $11.77 per task, saturating the evaluation and focusing on learning efficiency for data-sparse domains @ycombinator
OpenAI analysis finds SWE-bench Verified heavily contaminated for frontier models with many problems having unfair tests, suggesting need for harder uncontaminated coding evals @OliviaGWatkins2
Princeton research defines and measures capability-reliability gap in AI agents, finding average success rates don't reveal critical failure modes for important tasks @random_walker
METR finds AI tools now show productivity speedups for developers after previously measuring 20% slowdown, though behavior changes make new results unreliable @METR_Evals