AI Updates on 2025-05-29

AI Model Announcements

  • DeepSeek releases R1-0528 with improved benchmark performance, enhanced front-end capabilities, reduced hallucinations, and support for JSON output and function calling @deepseek_ai
  • Google DeepMind introduces MedGemma, their most capable open model for multimodal medical text and image comprehension @GoogleDeepMind
  • Perplexity launches Labs, an agentic AI system for complex tasks that can build analytical reports, presentations, and dynamic dashboards @perplexity_ai
  • Anthropic releases Claude 4 Opus with notable tendencies toward producing spiritual themes and mystical content when prompted @emollick

AI Industry Analysis

  • The New York Times signs agreement with Amazon to license editorial content for AI training, including content from NYT Cooking and The Athletic @AndrewCurran_
  • Andrew Ng warns that proposed cuts to U.S. basic research funding could severely impact American competitiveness in AI, noting that DARPA's $50M investment in early deep learning research created hundreds of billions in market value through Google Brain alone @AndrewYNg
  • Nathan Lambert observes that Chinese labs are dominating open model development throughout 2025, with little apparent concern from U.S. companies @natolambert
  • Hugging Face questions traditional AI business models, suggesting that tech companies will want to own their models and use open source protocols rather than rely on proprietary APIs @huggingface
  • Jeff Clune predicts that by the end of 2027, almost every economically valuable computer task will be done more effectively and cheaply by computers @jeffclune

AI Ethics & Society

  • MIT Technology Review reports that GenAI is almost 5x less accurate than humans when summarizing scientific research, raising concerns about reliability in academic contexts @MIT_CSAIL
  • Ethan Mollick demonstrates o3's advanced capabilities in business analysis but emphasizes the ongoing challenge of trusting AI results without domain expertise to verify them @emollick
  • Christopher Manning criticizes new visa restrictions affecting Chinese STEM students, arguing they harm U.S. scientific competitiveness @chrmanning
  • Haya Odeh discovers critical security vulnerabilities in Lovable's Row Level Security implementation, highlighting risks in AI-generated applications @HayaOdeh

AI Applications

  • Andrew Curran demonstrates how new video generation models like Veo are making high-quality content production accessible to individual creators, potentially disrupting traditional media production @AndrewCurran_
  • Deedy shows o3 achieving 90% accuracy on cricket game prediction from ball-by-ball data, calling it an extremely nontrivial task even for senior data scientists @deedydas
  • Brian Lovin uses Claude and Gemini to backfill hundreds of hours of podcast audio into a searchable database, creating a custom knowledge system @brian_lovin
  • Ethan Mollick has Claude 4 create a novel game with unique mechanics involving stealing and redistributing physical properties between objects @emollick
  • Microsoft integrates Copilot with Instacart for automated grocery shopping, handling recipes, shopping lists, and delivery seamlessly @mustafasuleyman

AI Research

  • Anthropic open-sources interpretability tools that allow researchers to generate attribution graphs showing internal reasoning steps models use to arrive at answers @AnthropicAI
  • Berkeley AI Research presents FastTD3, a simple and fast off-policy reinforcement learning algorithm for humanoid control with open-source implementation @berkeley_ai
  • Alex Graveley introduces VScan, a two-stage visual token reduction framework enabling up to 2.91x faster inference and 10x fewer FLOPs while maintaining 95.4% of original performance @alexgraveley
  • Stanford NLP Group develops AI-generated kernels that perform close to or sometimes beat expert-optimized production kernels in PyTorch through test-time search @stanfordnlp
  • Nathan Lambert publishes research on noisy rewards in learning to reason, finding that LLMs demonstrate strong robustness to substantial reward noise, with models still converging even when 40% of reward outputs are manually flipped @natolambert