AI Updates on 2025-07-10

xAI releases Grok 4 with state-of-the-art performance across multiple benchmarks, achieving #1 on Humanity's Last Exam (44.4%), GPQA (88.9%), AIME 2025 (100%), Harvard MIT Math (96.7%), USAMO25 (61.9%), ARC-AGI-2 (15.9%), and LiveCodeBench (79.4%) @deedydas
Grok 4 pricing announced at $3/M input tokens, $15/M output tokens with 256k context, and a multi-agent version Grok 4 Heavy at $300/month @AndrewCurran_
Google launches image-to-video generation capability in Veo 3 through Gemini App, allowing users to create 8-second video clips with sound from photos @sundarpichai
Mistral AI releases Devstral Small and Devstral Medium 2507 with improved performance and cost efficiency for coding agents and software engineering tasks @MistralAI
Microsoft Research introduces BioEmu 1.1, a generative deep learning method that emulates protein equilibrium ensembles, reducing GPU-years to GPU-hours for molecular dynamics simulations @MSFTResearch
Google releases MedGemma, a state-of-the-art open weights multimodal model for longitudinal EHR data and medical imaging across radiology, dermatology, pathology, and ophthalmology @JeffDean

Anthropic's revenue growth from $1B to $4B annualized in 2025 represents unprecedented growth in human history, while OpenAI reaches $10B @deedydas
AI is generating 35% of code for new Microsoft products and saved over half a billion dollars in call center costs while increasing customer satisfaction @AndrewCurran_
Microsoft announces mass layoffs despite all-time high valuation, revenue, and profits, highlighting the disconnect between financial performance and employment decisions @GergelyOrosz
Non-founder tech professionals now earn more than the best-paid athletes, indicating peak AI market conditions @GergelyOrosz
ByteDance is projected to match Meta's revenue scale by end of 2025, with both companies expected to reach $185-190B, though US regulatory risk remains a concern for TikTok @deedydas

xAI faces criticism for lack of transparency regarding Grok 4 launch, with no model card, red teaming documentation, or explanation of previous day's incident that required pulling Grok 3 @emollick
MIT Technology Review reports on a tool that strips away anti-AI protections from digital art, raising concerns about artist rights and intellectual property protection @techreview
Research suggests AI coding assistants may primarily make developers feel more productive rather than delivering actual productivity gains, similar to how Duolingo gamifies learning without effective teaching @fchollet
Study finds developers using AI tools show no significant speedup in task completion, with some evidence of slower performance on familiar tasks @emollick

Perplexity launches Comet, an AI-powered browser that can authenticate into user accounts and perform actions like unsubscribing from newsletters, rescheduling meetings, and managing emails @omooretweets
Andrew Ng introduces Agentic Document Extraction with field extraction capabilities, allowing users to extract specific fields from invoices, medical forms, and structured documents using natural language prompts @AndrewYNg
Perplexity partners with Coinbase to integrate real-time crypto data into Perplexity Finance, enabling AI-powered market analysis and trading insights @AravSrinivas
Hugging Face releases ScreenEnv, a fully sandboxed desktop environment for deploying AI agents that can see, click, type, browse, and manage applications with MCP support @amir_mahla
Odyssey demonstrates AI-generated 3D game engines that create interactive virtual worlds where each frame is AI-generated in real-time @emollick

Jeff Clune introduces Foundation Model Self-Play (FMSP), combining foundation model intelligence with self-play curriculum to explore diverse strategies in multi-agent games, successfully red-teaming GPT-4o-mini and breaking 6/7 defensive strategies @jeffclune
Stanford researchers present CellFlux, an image generative model that simulates cellular morphological changes from microscopy images, achieving 35% higher image fidelity and 12% greater biological accuracy for drug discovery applications @Zhang_Yu_hui
Google DeepMind publishes research on evaluating AI models' stealth and situational awareness capabilities to assess deceptive alignment risks, suggesting chain-of-thought monitoring as a defense mechanism @rohinmshah
Research on conformal prediction for long-tailed classification addresses the challenge of creating prediction sets that work well for both common and rare classes in machine learning applications @tifding