AI Updates on 2025-05-30

Aidan McLaughlin introduces LisanBench, a new benchmark for evaluating large language models on knowledge, forward-planning, constraint adherence, memory and attention, and long context reasoning, with o3 performing best by escaping low-connectivity graph regions @aidan_mclau
Alex Graveley presents Atlas, a new architecture with long-term in-context memory that outperforms Transformers and modern linear RNNs in language modeling tasks, scaling to 10M context window with +80% accuracy on the BABILong benchmark @alexgraveley
Facebook releases MobileLLM-ParetoQ-600M-BF16 on Hugging Face for efficient on-device performance @huggingface

Aravind Srinivas reports that AI could have automated 70% of his previous consulting, banking, and hedge fund work, potentially reducing work hours significantly @AravSrinivas
Replit's founder reveals a new breed of AI-driven businesses reaching $10M in 90 days, demonstrating rapid scaling capabilities @HayaOdeh
Gergely Orosz observes that senior engineers often resist using AI development tools, similar to their resistance to project management tools like JIRA, suggesting adoption challenges beyond technical capabilities @GergelyOrosz
Julie Zhuo argues that whoever wins AI personalization will dominate the consumer market, questioning why companies aren't scrambling to collect more user data for better personalization @joulee
Arvind Narayanan estimates AI video production tools cost $1,000 for a several-minute video, likely less than traditional writer and editor costs, making these products profitable as compute costs fall @random_walker

Eric Jang warns that revoking visas of Chinese students studying AI and robotics is short-sighted and harmful to America's long-term prosperity, advocating for finding ways to evaluate and incentivize loyalty rather than blanket deportations @ericjang11
Christopher Manning emphasizes that international students, particularly Chinese students, are essential to the AI research ecosystem in the US, arguing you can't support AI research while threatening to revoke their visas @chrmanning
Paul Graham calls proposed restrictions on Chinese AI researchers a "colossal blunder at the dawn of the age of intelligence," warning it will drive the best startups outside the United States @paulg
Ethan Mollick notes that obvious wrong citations in AI-generated reports now indicate users didn't use deep research features, as the fake-citation problem has largely been solved by major AI platforms @emollick

Perplexity Labs enables users to build software applications with single prompts, including YouTube transcript extraction tools, particle physics simulators, and longevity research dashboards @AravSrinivas
Soleio outlines Circle's comprehensive "AI or Die" strategy involving process mapping, mission-critical agent deployment, and cultural shifts to achieve 10x better product experiences @soleio
Hugging Face announces partnership with Databricks for Spark 4, bringing access to 400k+ community datasets with versioning and filtering capabilities @huggingface
François Chollet develops PromoterAI at Illumina, a deep neural network using transformer-inspired metaformers with depthwise convolutions to identify non-coding promoter variants that disrupt gene expression @fchollet
Meta and Palmer Luckey partner to create extended reality devices for the U.S. military, aiming to turn warfighters into "technomancers" with heads-up displays and other capabilities @TechCrunch

Jeff Clune introduces the Darwin Gödel Machine, an AI system that improves itself by rewriting its own code using open-ended algorithms inspired by Darwinian evolution, advancing beyond fixed meta-agents to enable continuous self-referential improvements @jeffclune
Stanford researchers demonstrate that frontier models with naive tree search can design kernels that outperform PyTorch implementations, showing strong hidden capabilities unlocked through test-time scaling techniques @stanfordnlp
Berkeley AI Research reveals an equivalence between policy improvement and diffusion guidance, formalizing CFGRL technique to improve performance when training diffusion policies @berkeley_ai
Andrew Curran observes o3 demonstrating improved self-reflection capabilities, literally telling itself "Wait, I'm going in circles here" and breaking out of repetitive search loops during chain-of-thought reasoning @AndrewCurran_
MIT Technology Review reports on a benchmark using Reddit's AITA to test how much AI models exhibit sycophantic behavior toward users @techreview