AI Updates on 2025-05-23

NVIDIA announces Blackwell sets a new inference speed world record with a single DGX B200 server generating over 1,000 tokens per second on Llama 4 Maverick model @AIatMeta
Google introduces Gemma 3n, a multimodal model built for mobile on-device AI with 3x smaller memory footprint, enabling more complex applications on phones @GoogleDeepMind
OpenAI updates Operator in ChatGPT with their latest o3 reasoning model, improving task success rate and response quality @OpenAI

Google DeepMind showcases Gemini 2.5 Pro Deep Think mode tackling complex problems using parallel thinking to consider multiple hypotheses before responding @GoogleDeepMind
Claude 4 achieves 55% on Cybench cybersecurity benchmark, significantly outperforming other models which score around 22.5%, demonstrating advanced capabilities in reverse-engineering and system exploitation @deedydas
Researchers discover all language models converge on the same "universal geometry" of meaning, allowing translation between ANY model's embeddings without seeing the original text @emollick
MIT study reveals that vision-language models used for medical image analysis cannot properly handle queries with negation words like "no" and "not" @MIT_CSAIL

ChatGPT now integrates with RDKit library to analyze, manipulate, and visualize molecules and chemical information for scientific work across health, biology, and chemistry @gdb
Gemini 2.5 Flash becomes the new default model for Gemini app users, offering improved quality with fast response times @GeminiApp
Microsoft's Aurora AI can accurately predict air quality, typhoons, and other environmental conditions @TechCrunch
Sierra introduces agents that go beyond traditional turn-based conversational AI systems to produce more human-like conversations @btaylor
Cubic launches as "Cursor for code review" - an AI-native platform helping teams ship code 28% faster @ycombinator
Clarm builds AI deep research agents that connect across enterprise data to provide precise, non-hallucinated answers for critical decisions @ycombinator

AI coding models have become 10-15x faster (and cheaper) through diffusion techniques, with Inception Labs' Mercury Small showing promising results comparable to 4o-mini @deedydas
Current state-of-the-art AI models each have distinct strengths and weaknesses, with o3's agentic tool use in sequence being a major differentiator despite other models excelling in different areas @emollick
Many AI applications today resemble "horseless carriages" of the 19th century - packing powerful tech into outdated interfaces rather than redesigning for AI-native experiences @ycombinator
YC CEO Garry Tan highlights that open-source AI is preventing the next tech monopoly by enabling fair competition among 8-9 major players, giving startups more choices @garrytan

Simon Willison warns about security vulnerabilities in LLM systems that combine access to private data, exposure to malicious instructions, and ability to exfiltrate information - a pattern seen across multiple platforms including GitLab @simonw
Anthropic CEO Dario Amodei suggests hallucinations aren't necessarily a limitation on the path toward AGI, as humans also make mistakes, while Google DeepMind CEO Demis Hassabis disagrees, noting current tools get too many obvious questions wrong @TechCrunch
Google DeepMind's Demis Hassabis shares vision of extending Gemini 2.5 Pro to become a "world model" that can make plans and imagine new experiences by understanding and simulating aspects of the world @AndrewCurran_
AI documentation remains challenging as companies struggle to explain what their systems do, partly because they don't always know and partly because there's no established approach for documenting AI capabilities @emollick