AI Updates on 2025-08-02

Google announces Gemini 2.5 Deep Think achieving state-of-the-art performance across many challenging benchmarks @demishassabis
OpenAI teases upcoming launches over the next couple of months including new models, products, and features, warning of potential capacity crunches during rollout @sama
Early access sightings reported of GPT-5-reasoning (medium) being tested by select users @AndrewCurran_

Anthropic revoked OpenAI's API access to its models due to terms of service violations, highlighting competitive tensions between AI companies @AndrewCurran_
Meta reportedly offered a researcher $1.5 billion over 6 years who ultimately declined, demonstrating the intense talent wars in AI @deedydas
Eugene Yan warns that AI coding tools help build faster but can create maintainability issues if code is generated without considering readability and extensibility, potentially increasing long-term ownership costs @eugeneyan
Paul Graham observes that startup partnerships with big companies rarely work as shortcuts to growth, with most attempts resulting in the startup being taken advantage of @paulg

A fourth problem on FrontierMath Tier 4 has been solved by AI, specifically a number theory problem that had won a prize for best submission @gdb
Breakthrough research shows a tiny 27M parameter brain-inspired model trained on only 1000 samples outperforms o3-mini-high on reasoning tasks, achieving 40% on ARC-AGI and solving complex sudoku and mazes @deedydas
Eric Jang predicts AI models will make novel math discoveries for simple unproven conjectures within 12 months and achieve rudimentary self-improvement within 24 months @ericjang11
Research reveals that traditional prompting techniques like threats, politeness, insults, and promising tips no longer significantly impact performance on challenging tasks for recent AI models @emollick
Chain-of-thought prompting no longer provides substantial performance improvements even for non-reasoning models, suggesting convergence in model capabilities @emollick

Ethan Mollick demonstrates Gemini 2.5 Deep Think creating a complete missile command game incorporating realistic relativity physics through simple prompts, with each iteration running without errors @emollick
Perplexity showcases Comet agent capabilities in comparison to ChatGPT Agent for real-world use cases @AravSrinivas
Browser-based AI agents demonstrate practical applications including finding working promo codes, managing YouTube content, creating product lists from tabs, and automating repetitive web tasks @garrytan
AI tools are accelerating scientific research through time-saving applications in data cleaning, exploratory analysis, writing, and research assistance when used carefully by humans @emollick

Ethan Mollick discusses the hypothetical consequences of Llama 4's relative failure, suggesting it could shift open-source AI development to China and drive companies toward closed models @emollick
Concerns raised about AI-generated scientific abstracts, with discussion about the balance between time-saving benefits and the need for human oversight in academic writing @emollick
Aidan McLaughlin criticizes barriers preventing AI researchers from accessing competitor models, arguing it hinders important qualitative research on model behavior @aidan_mclau