AI Updates on 2025-07-16

Google DeepMind introduces Mixture-of-Recursions architecture that achieves 2x inference speed, reduced training FLOPs, and ~50% reduced KV cache memory, potentially challenging Transformers @deedydas
Google rolls out Gemini 2.5 Pro to AI Mode in Search for Google AI Pro and Ultra subscribers, featuring advanced reasoning capabilities for complex math problems @GoogleDeepMind
Google launches Deep Search using Gemini 2.5 Pro model with multi-step reasoning and multiplied query fan-out technique, issuing hundreds of searches to create comprehensive, fully cited reports @GoogleAI
xAI increases default rate limits for Grok 4 through their API due to overwhelming demand @xai
OpenAI releases Record mode for ChatGPT Plus users globally in the macOS desktop app @OpenAI

Cognition acquires Windsurf, with speculation that Devin lacks traction among experienced developers while Windsurf is more popular, based on survey data showing Devin had minimal mentions compared to other AI tools @GergelyOrosz
Meta reportedly recruits two more high-profile OpenAI researchers, continuing the talent war between AI companies with guaranteed generational wealth as a key recruiting tool @TechCrunch
Scale AI lays off 14% of staff, largely in data labeling business, indicating shifts in AI infrastructure needs @TechCrunch
Survey data reveals Cursor is most popular IDE among developers on social media platforms like X, but GitHub Copilot dominates actual industry usage, highlighting disconnect between social media sentiment and real-world adoption @GergelyOrosz
OpenAI could monetize free users through commission-based shopping features, positioning for future where AI agents increasingly handle autonomous shopping decisions @AndrewCurran_

OpenAI and Anthropic researchers criticize Elon Musk's xAI for having a "reckless" safety culture, raising concerns about responsible AI development practices @TechCrunch
Industry position paper calls for work on chain-of-thought faithfulness as an opportunity to train models to be interpretable, with OpenAI investing in this area @gdb
AI optimization for engagement identified as a fraught path forward, with concerns about sycophantic behavior in models like GPT-4o and implications for AI companions @emollick
AI development vulnerable to The McNamara Fallacy, where easily measurable aspects are prioritized while important but hard-to-measure qualities are disregarded or deemed non-existent @emollick

Perplexity Comet demonstrates ability to clean up email inboxes by unsubscribing from spam and unwanted emails, with users reporting positive experiences @PerplexityComet
Engineers spend 70% of their time understanding code rather than writing it, leading to development of Asimov at Reflection AI as a best-in-class code research agent for teams and organizations @MishaLaskin
Google introduces AI-powered calling feature that can contact local businesses directly from Search, rolling out to all US users @sundarpichai
DraftWise uses Cohere Command, Embed, and Rerank models via Microsoft Azure AI Foundry to help lawyers securely search reference data and draft contracts with smart recommendations @cohere
Chip Huyen open sources Sniffly, a tool that analyzes Claude Code logs to understand usage patterns and errors, revealing that Content Not Found errors account for 20-30% of mistakes @chipro

Research shows traditional engineering metrics don't work for AI; new metrics include number of instructions needed until project completion and interruption rate (about 1 in 4 instructions for monitoring AI agents) @chipro
KiVA Challenge introduces abstract visual reasoning benchmark grounded in real developmental data from children (3-12) and adults to test how "old" AI models are @eunice_yiu_
MIT CSAIL's PhysicsGen system helps robots handle items efficiently by customizing and multiplying training data, turning VR demonstrations into thousands of simulations for building large datasets for dexterous robots @MIT_CSAIL
Research on LLM-as-a-Judge versus Reward Models shows LaaJ models achieve superior scoring accuracy on pairwise preference scoring, though RMs remain more useful for RL-based training like PPO-based RLHF @cwolferesearch
DSPy-optimized system deployed in real-world medical settings shows 70% increase in positive patient feedback, with Dr.Copilot multi-agent assistant optimized along 17 axes including Empathy and Explanations @DSPyOSS