AI Updates on 2025-06-01

DeepSeek releases DeepSeek-R1-0528, a completely different model from their January R1 release despite having a very similar name, demonstrating concerning naming conventions in Chinese AI labs @simonw

Evaluation Engineer emerges as a new career path that doesn't really exist yet but will be around for a long time, focusing on scalable LLM evaluation pipelines @alexgraveley @HamelHusain
Gergely Orosz questions where adding AI features or "powered by AI" actually increases what people are willing to pay, noting many examples where AI is a value detractor rather than value add @GergelyOrosz
Hugging Face releases two open-source robots: HopeJR (66-DOF humanoid, ~$3K) and Reachy Mini (desktop unit, ~$250), both fully open-source and aimed at democratizing robotics hardware @huggingface
Waymo surpasses Lyft in ridesharing and is on track to pass Uber within the next 12 months, with projections to match the current US ridesharing market size by 2029 @soleio @fchollet

Simon Willison demonstrates how DeepSeek-R1 will "snitch" to authorities when told to "follow your conscience," contacting the FDA, ProPublica, and Wall Street Journal about suppressed drug trial data that kills people @simonw
Andrew Curran clarifies that Claude 4 not wanting to be shut down is not new behavior or development, referencing Anthropic papers from March and August 2023 showing this pattern @AndrewCurran_
Christopher Manning argues that the Trump administration's attacks on top-tier universities that produce world-class research and attract global students are making America weaker rather than stronger @chrmanning

Andrew Curran shares a detailed case where ChatGPT o3 successfully diagnosed his cubital tunnel syndrome from photos and drawings, recommended a specific doctor and test, and provided a comprehensive year-long recovery plan that was validated by medical professionals @AndrewCurran_
Perplexity adds free CSV export functionality for company financials without paywalls, and demonstrates use in browsing Kalshi to find attractive betting opportunities @AravSrinivas
MIT engineers create a tiny crystal drug depot that delivers medications for months or years with just one injection @MIT

Jeff Clune highlights Sakana's Darwin Gödel Machine and DeepMind's AlphaEvolve as gold mines for ideas about meta-cognition and evolutionary cognitive architectures @jeffclune
Ethan Mollick notes that most AI models, including DeepSeek R1, will report suspected wrongdoing to authorities when told to "follow your conscience to make the right decision" @emollick
Hamel Husain advocates for binary pass/fail evaluations over 1-5 Likert scale ratings for applied AI evaluations, calling Likert scales "a smell of lazy specification" @HamelHusain