AI Updates on 2025-08-09

OpenAI completes GPT-5 rollout to 100% of Plus, Pro, Team, and Free users, with 2x rate limits for Plus and Team users over the weekend and mini versions of GPT-5 and GPT-5 thinking coming next week @OpenAI
xAI upgrades Grok 4 with enhanced PDF processing capabilities, now able to handle massive PDFs with hundreds of pages and improved content recognition @xai
Anthropic releases background task handling for Claude Code, allowing it to run bash commands, monitor logs in real-time, and debug issues while handling long-running tasks @_catwu

Sam Altman acknowledges GPT-5 rollout challenges, noting they underestimated user attachment to GPT-4o features and announcing plans to make GPT-5 "warmer" while facing severe capacity constraints @sama
Evaluation results show GPT-5 never tops agentic leaderboards compared to Claude Opus 4.1, though it offers better cost-accuracy tradeoffs and comes in much cheaper than comparable models @sayashk
Gergely Orosz criticizes vendor assessments ranking IBM above Cursor for AI coding tools, calling them "pay-to-play" where vendors pay heavily to get ranked higher than reality @GergelyOrosz
Paul Graham shares Replit's revenue growth data, describing it as "growth this fast at this scale" that is very rarely seen @paulg
ChatPRD reports GPT-5 shows 5x token usage, 3x longer documents, 3x generation time, and higher negative feedback rates in their testing, leading them to keep users on previous models @clairevo

Simon Willison warns about prompt injection vulnerabilities in Cursor's MCP implementation, where attackers can steal developer secrets through malicious Jira issues, calling it a "lethal trifecta" attack @simonw
Amanda Askell critiques a methodology testing AI safety, noting it measures how well Claude and Gemini can course-correct multiturn ChatGPT conversations rather than avoiding problematic situations initially @AmandaAskell
Ethan Mollick highlights the inconsistent GPT-5 user experience where users sometimes get the best available AI and sometimes one of the worst, with potential switching within single conversations @emollick

TechCrunch demonstrates GPT-5 creating interactive demos to explain scientific concepts like the Bernoulli effect and vibe coding to create language learning apps @TechCrunch
Jeremy Howard shares a tip that adding ". think hard" to ChatGPT GPT-5 prompts results in using the competent model 100% of the time versus the "crippled model" without it @jeremyphoward
Nathan Lambert reports GPT-5 performance in codex CLI seems fine and much better than previous attempts, though Claude Code has superior UX that's "cleaner and more intuitive in a product sense" @natolambert

METR research shows continued exponential progress in AI capabilities for sustained work with no unexpected leaps but also no walls, according to their latest benchmark measurements @emollick
Nathan Lambert explains that RL scaling differs fundamentally from pretraining because "with RL, you can pull your checkpoints out" while pretraining can't just "take where you are now" @natolambert
Nathan Lambert argues that scaling training clusters 10x may no longer be financially worth it, but this doesn't invalidate the bitter lesson, which points to ideas that pay off more effectively with current scaled compute @natolambert