AI Updates on 2025-07-16
AI Model Announcements
- Google DeepMind introduces Mixture-of-Recursions architecture that achieves 2x inference speed, reduced training FLOPs, and ~50% reduced KV cache memory, potentially challenging Transformers @deedydas
- Google rolls out Gemini 2.5 Pro to AI Mode in Search for Google AI Pro and Ultra subscribers, featuring advanced reasoning capabilities for complex math problems @GoogleDeepMind
- Google launches Deep Search using Gemini 2.5 Pro model with multi-step reasoning and multiplied query fan-out technique, issuing hundreds of searches to create comprehensive, fully cited reports @GoogleAI
- xAI increases default rate limits for Grok 4 through their API due to overwhelming demand @xai
- OpenAI releases Record mode for ChatGPT Plus users globally in the macOS desktop app @OpenAI
AI Industry Analysis
- Cognition acquires Windsurf, with speculation that Devin lacks traction among experienced developers while Windsurf is more popular, based on survey data showing Devin had minimal mentions compared to other AI tools @GergelyOrosz
- Meta reportedly recruits two more high-profile OpenAI researchers, continuing the talent war between AI companies with guaranteed generational wealth as a key recruiting tool @TechCrunch
- Scale AI lays off 14% of staff, largely in data labeling business, indicating shifts in AI infrastructure needs @TechCrunch
- Survey data reveals Cursor is most popular IDE among developers on social media platforms like X, but GitHub Copilot dominates actual industry usage, highlighting disconnect between social media sentiment and real-world adoption @GergelyOrosz
- OpenAI could monetize free users through commission-based shopping features, positioning for future where AI agents increasingly handle autonomous shopping decisions @AndrewCurran_
AI Ethics & Society
- OpenAI and Anthropic researchers criticize Elon Musk's xAI for having a "reckless" safety culture, raising concerns about responsible AI development practices @TechCrunch
- Industry position paper calls for work on chain-of-thought faithfulness as an opportunity to train models to be interpretable, with OpenAI investing in this area @gdb
- AI optimization for engagement identified as a fraught path forward, with concerns about sycophantic behavior in models like GPT-4o and implications for AI companions @emollick
- AI development vulnerable to The McNamara Fallacy, where easily measurable aspects are prioritized while important but hard-to-measure qualities are disregarded or deemed non-existent @emollick
AI Applications
- Perplexity Comet demonstrates ability to clean up email inboxes by unsubscribing from spam and unwanted emails, with users reporting positive experiences @PerplexityComet
- Engineers spend 70% of their time understanding code rather than writing it, leading to development of Asimov at Reflection AI as a best-in-class code research agent for teams and organizations @MishaLaskin
- Google introduces AI-powered calling feature that can contact local businesses directly from Search, rolling out to all US users @sundarpichai
- DraftWise uses Cohere Command, Embed, and Rerank models via Microsoft Azure AI Foundry to help lawyers securely search reference data and draft contracts with smart recommendations @cohere
- Chip Huyen open sources Sniffly, a tool that analyzes Claude Code logs to understand usage patterns and errors, revealing that Content Not Found errors account for 20-30% of mistakes @chipro
AI Research
- Research shows traditional engineering metrics don't work for AI; new metrics include number of instructions needed until project completion and interruption rate (about 1 in 4 instructions for monitoring AI agents) @chipro
- KiVA Challenge introduces abstract visual reasoning benchmark grounded in real developmental data from children (3-12) and adults to test how "old" AI models are @eunice_yiu_
- MIT CSAIL's PhysicsGen system helps robots handle items efficiently by customizing and multiplying training data, turning VR demonstrations into thousands of simulations for building large datasets for dexterous robots @MIT_CSAIL
- Research on LLM-as-a-Judge versus Reward Models shows LaaJ models achieve superior scoring accuracy on pairwise preference scoring, though RMs remain more useful for RL-based training like PPO-based RLHF @cwolferesearch
- DSPy-optimized system deployed in real-world medical settings shows 70% increase in positive patient feedback, with Dr.Copilot multi-agent assistant optimized along 17 axes including Empathy and Explanations @DSPyOSS