AI Updates on 2025-05-05

Hugging Face announced that Nvidia has released Llama-Nemotron, an efficient reasoning model @huggingface
Nvidia open-sourced Parakeet TDT 0.6B, described as the best speech recognition model on Open ASR Leaderboard, capable of transcribing 60 minutes of audio in 1 second @huggingface

MIT researchers developed a new method to make AI models more trustworthy for high-stakes settings by conveying uncertainty more precisely @MIT
Chris Olah investigated whether superposition is a major cause of adversarial examples by training SAEs on adversarially trained models @ch402
Research suggests D-FINE, a real-time object detector faster and more accurate than YOLO with Apache 2.0 license, has been added to Hugging Face transformers @huggingface

Simon Willison released a new llm-video-frames plugin that turns video files into sequences of JPEGs to feed into long-context vision models like GPT-4.1-mini @simonw
Perplexity on WhatsApp provides a convenient way to use AI when in flight, as flight WiFi supports messaging apps well @AravSrinivas
Claude 3.7 Sonnet can now crawl entire websites, extract specific data, and complete research tasks without leaving the desktop app @ycombinator
Google's Veo 2 on the Gemini app allows users to input prompts directly to generate videos, with the model only able to respond in video format @AndrewCurran_
Pulse AI launched Ultra, described as their new hybrid reasoning model and "the most accurate document extraction model in the industry" @ycombinator
Alex 3.0 released with features to automatically compile and fix errors, auto-apply code, add packages, search the web, run terminal commands, and review code with local LLM support @ycombinator

OpenAI announced structural changes: the nonprofit will continue to control the for-profit entity, which will become a Public Benefit Corporation with the same mission @OpenAI @sama
Many companies cannot use Qwen and DeepSeek open models because they come from China, slowing adoption of open models across enterprises @natolambert
Google refreshed its music-generation tools with Lyria 2 for Music AI Sandbox and Lyria RealTime for DJ, producing high-quality 48kHz audio with extensive control over musical attributes @DeepLearningAI
The Keras team released KerasRS, a new library for building recommender systems with easy-to-use building blocks compatible with JAX, PyTorch, TF, and optimized for TPUs @fchollet
Hugging Face introduced the Common Crawl Creative Commons Corpus (C5), a heavily filtered web-crawl dataset containing only Creative Commons licensed documents with 150 billion tokens collected so far @huggingface

Arvind Narayanan discusses two ever-present risks when using generative AI for work: hallucinations/confabulations and deskilling, emphasizing the importance of having a plan to address these risks @random_walker
Stanford HAI reports that Visa's integration of AI into its payment system could lead to consumers facing higher prices and deceptive practices without realizing it @StanfordHAI
A study found that people struggle to get useful health advice from chatbots @TechCrunch
Ethan Mollick shares research suggesting people may be massively underreporting their AI usage in surveys @emollick