AI Updates on 2025-05-30

AI Model Announcements

  • Aidan McLaughlin introduces LisanBench, a new benchmark for evaluating large language models on knowledge, forward-planning, constraint adherence, memory and attention, and long context reasoning, with o3 performing best by escaping low-connectivity graph regions @aidan_mclau
  • Alex Graveley presents Atlas, a new architecture with long-term in-context memory that outperforms Transformers and modern linear RNNs in language modeling tasks, scaling to 10M context window with +80% accuracy on the BABILong benchmark @alexgraveley
  • Facebook releases MobileLLM-ParetoQ-600M-BF16 on Hugging Face for efficient on-device performance @huggingface

AI Industry Analysis

  • Aravind Srinivas reports that AI could have automated 70% of his previous consulting, banking, and hedge fund work, potentially reducing work hours significantly @AravSrinivas
  • Replit's founder reveals a new breed of AI-driven businesses reaching $10M in 90 days, demonstrating rapid scaling capabilities @HayaOdeh
  • Gergely Orosz observes that senior engineers often resist using AI development tools, similar to their resistance to project management tools like JIRA, suggesting adoption challenges beyond technical capabilities @GergelyOrosz
  • Julie Zhuo argues that whoever wins AI personalization will dominate the consumer market, questioning why companies aren't scrambling to collect more user data for better personalization @joulee
  • Arvind Narayanan estimates AI video production tools cost $1,000 for a several-minute video, likely less than traditional writer and editor costs, making these products profitable as compute costs fall @random_walker

AI Ethics & Society

  • Eric Jang warns that revoking visas of Chinese students studying AI and robotics is short-sighted and harmful to America's long-term prosperity, advocating for finding ways to evaluate and incentivize loyalty rather than blanket deportations @ericjang11
  • Christopher Manning emphasizes that international students, particularly Chinese students, are essential to the AI research ecosystem in the US, arguing you can't support AI research while threatening to revoke their visas @chrmanning
  • Paul Graham calls proposed restrictions on Chinese AI researchers a "colossal blunder at the dawn of the age of intelligence," warning it will drive the best startups outside the United States @paulg
  • Ethan Mollick notes that obvious wrong citations in AI-generated reports now indicate users didn't use deep research features, as the fake-citation problem has largely been solved by major AI platforms @emollick

AI Applications

  • Perplexity Labs enables users to build software applications with single prompts, including YouTube transcript extraction tools, particle physics simulators, and longevity research dashboards @AravSrinivas
  • Soleio outlines Circle's comprehensive "AI or Die" strategy involving process mapping, mission-critical agent deployment, and cultural shifts to achieve 10x better product experiences @soleio
  • Hugging Face announces partnership with Databricks for Spark 4, bringing access to 400k+ community datasets with versioning and filtering capabilities @huggingface
  • François Chollet develops PromoterAI at Illumina, a deep neural network using transformer-inspired metaformers with depthwise convolutions to identify non-coding promoter variants that disrupt gene expression @fchollet
  • Meta and Palmer Luckey partner to create extended reality devices for the U.S. military, aiming to turn warfighters into "technomancers" with heads-up displays and other capabilities @TechCrunch

AI Research

  • Jeff Clune introduces the Darwin Gödel Machine, an AI system that improves itself by rewriting its own code using open-ended algorithms inspired by Darwinian evolution, advancing beyond fixed meta-agents to enable continuous self-referential improvements @jeffclune
  • Stanford researchers demonstrate that frontier models with naive tree search can design kernels that outperform PyTorch implementations, showing strong hidden capabilities unlocked through test-time scaling techniques @stanfordnlp
  • Berkeley AI Research reveals an equivalence between policy improvement and diffusion guidance, formalizing CFGRL technique to improve performance when training diffusion policies @berkeley_ai
  • Andrew Curran observes o3 demonstrating improved self-reflection capabilities, literally telling itself "Wait, I'm going in circles here" and breaking out of repetitive search loops during chain-of-thought reasoning @AndrewCurran_
  • MIT Technology Review reports on a benchmark using Reddit's AITA to test how much AI models exhibit sycophantic behavior toward users @techreview