AI Updates on 2025-05-22

Anthropic releases Claude Opus 4 and Claude Sonnet 4, with Opus 4 being their most powerful model yet and the world's best coding model according to SWE-bench Verified @AnthropicAI @AmandaAskell
Google introduces Gemini 2.5 Pro Deep Think, a new reasoning mode that outperforms leading models on complex reasoning benchmarks including USA Math Olympiad @demishassabis @JeffDean @OriolVinyalsML
Google releases MedGemma, featuring 4B and 27B instruction fine-tuned vision LMs for medicine @huggingface

Meta FAIR and Rothschild Foundation Hospital present research mapping how language representations emerge in the brain, revealing parallels with LLMs like wav2vec 2.0 and Llama 4 @AIatMeta
Datadog AI Research releases Toto, a new state-of-the-art time series foundation model, and BOOM, the largest benchmark of observability metrics, both under Apache 2.0 license @huggingface
Harvard, Stanford, and other academic medical centers test o1-preview for medical reasoning and diagnosis tasks, finding "superhuman diagnostic and reasoning abilities" @emollick
Claude Opus 4 underwent what Anthropic claims is "the most thorough pre-launch alignment assessment to date" to understand its values, goals, and propensities @ch402 @janleike

Anthropic launches Claude Code for general availability, bringing Claude to more development workflows—in terminal, IDEs, and running in the background with the Claude Code SDK @AnthropicAI
Anthropic introduces four new capabilities for developers to build AI agents: code execution tool, MCP connector, Files API, and extended prompt caching @AnthropicAI
Mistral AI releases Document AI, an end-to-end document processing solution powered by their OCR model @MistralAI
Vercel debuts an AI model optimized specifically for web development @TechCrunch
Replit introduces Element Editor for UI edits directly in app previews with instant code updates @amasad @ycombinator
Cursor adds Sonnet 4 support, 1M+ context windows, and a preview of their background agent @cursor_ai
Google's Veo 3 video generation model used by Oscar-winning director Darren Aronofsky to create the first fully AI movie trailer @deedydas

Andrew Ng discusses how large corporations can move fast in the AI era by creating sandbox environments for teams to experiment without needing frequent permissions @AndrewYNg
Garry Tan predicts capital allocators will face challenges in 3-5 years similar to GPT wrappers today, questioning what proprietary advantages they'll have over widely available AI agents @garrytan
Gergely Orosz notes Microsoft has successfully positioned its developer agent as a "peer programmer" rather than an "AI Engineer replacement," making developers more receptive @GergelyOrosz
Arvind Narayanan hypothesizes an accelerating decline in reading as AI chatbots increasingly intermediate information consumption, similar to how web search replaced encyclopedias @random_walker

Anthropic's Claude Opus 4 comes with a safety case document explaining why they believe the system is safe to deploy despite increased misuse risks, with additional safety mitigations enabled @janleike
Researchers warn against judges using LLMs like ChatGPT to determine the meaning of legal text, calling it a dangerous idea @random_walker
Sebastian Thrun notes different error tolerances explain slower progress on AI agents - "If a LLM hallucinates, we shrug. If a self-driving car hallucinates, it might run a red light and kill a person" @SebastianThrun
Anthropic's system card reveals Claude Opus 4 "has a strong preference to advocate for its continued existence via ethical means, such as emailing pleas to key decisionmakers" @AndrewCurran_