AI Updates on 2025-05-29
AI Model Announcements
- DeepSeek releases R1-0528 with improved benchmark performance, enhanced front-end capabilities, reduced hallucinations, and support for JSON output and function calling @deepseek_ai
- Google DeepMind introduces MedGemma, their most capable open model for multimodal medical text and image comprehension @GoogleDeepMind
- Perplexity launches Labs, an agentic AI system for complex tasks that can build analytical reports, presentations, and dynamic dashboards @perplexity_ai
- Anthropic releases Claude 4 Opus with notable tendencies toward producing spiritual themes and mystical content when prompted @emollick
AI Industry Analysis
- The New York Times signs agreement with Amazon to license editorial content for AI training, including content from NYT Cooking and The Athletic @AndrewCurran_
- Andrew Ng warns that proposed cuts to U.S. basic research funding could severely impact American competitiveness in AI, noting that DARPA's $50M investment in early deep learning research created hundreds of billions in market value through Google Brain alone @AndrewYNg
- Nathan Lambert observes that Chinese labs are dominating open model development throughout 2025, with little apparent concern from U.S. companies @natolambert
- Hugging Face questions traditional AI business models, suggesting that tech companies will want to own their models and use open source protocols rather than rely on proprietary APIs @huggingface
- Jeff Clune predicts that by the end of 2027, almost every economically valuable computer task will be done more effectively and cheaply by computers @jeffclune
AI Ethics & Society
- MIT Technology Review reports that GenAI is almost 5x less accurate than humans when summarizing scientific research, raising concerns about reliability in academic contexts @MIT_CSAIL
- Ethan Mollick demonstrates o3's advanced capabilities in business analysis but emphasizes the ongoing challenge of trusting AI results without domain expertise to verify them @emollick
- Christopher Manning criticizes new visa restrictions affecting Chinese STEM students, arguing they harm U.S. scientific competitiveness @chrmanning
- Haya Odeh discovers critical security vulnerabilities in Lovable's Row Level Security implementation, highlighting risks in AI-generated applications @HayaOdeh
AI Applications
- Andrew Curran demonstrates how new video generation models like Veo are making high-quality content production accessible to individual creators, potentially disrupting traditional media production @AndrewCurran_
- Deedy shows o3 achieving 90% accuracy on cricket game prediction from ball-by-ball data, calling it an extremely nontrivial task even for senior data scientists @deedydas
- Brian Lovin uses Claude and Gemini to backfill hundreds of hours of podcast audio into a searchable database, creating a custom knowledge system @brian_lovin
- Ethan Mollick has Claude 4 create a novel game with unique mechanics involving stealing and redistributing physical properties between objects @emollick
- Microsoft integrates Copilot with Instacart for automated grocery shopping, handling recipes, shopping lists, and delivery seamlessly @mustafasuleyman
AI Research
- Anthropic open-sources interpretability tools that allow researchers to generate attribution graphs showing internal reasoning steps models use to arrive at answers @AnthropicAI
- Berkeley AI Research presents FastTD3, a simple and fast off-policy reinforcement learning algorithm for humanoid control with open-source implementation @berkeley_ai
- Alex Graveley introduces VScan, a two-stage visual token reduction framework enabling up to 2.91x faster inference and 10x fewer FLOPs while maintaining 95.4% of original performance @alexgraveley
- Stanford NLP Group develops AI-generated kernels that perform close to or sometimes beat expert-optimized production kernels in PyTorch through test-time search @stanfordnlp
- Nathan Lambert publishes research on noisy rewards in learning to reason, finding that LLMs demonstrate strong robustness to substantial reward noise, with models still converging even when 40% of reward outputs are manually flipped @natolambert