AI Updates on 2025-06-10

OpenAI announces o3-pro model with significant improvements over o3, featuring better performance in science, education, programming, data analysis, and writing @OpenAI
OpenAI reduces o3 pricing by 80%, making it more accessible as a daily driver model @sama
Mistral AI releases Magistral, their first reasoning model available in two variants: 24B parameter open-source Magistral Small and enterprise Magistral Medium @MistralAI
Apple introduces Foundation Models framework for accessing their local LLMs and new on-device AI models, though performance benchmarks show they lag behind open models like Gemma 3-4B and Qwen 3-4B @emollick

Meta reportedly investing $14 billion in Scale AI with a 49% stake, potentially bringing key talent as part of the deal @AndrewCurran_
Meta offering $2M+ annual compensation packages for AI talent but still losing candidates to OpenAI and Anthropic, with Anthropic maintaining 80% retention rate as the top destination for AI researchers @deedydas
Cursor AI crosses $500M ARR milestone, demonstrating the massive success of AI coding tools in the developer market @GergelyOrosz
Linear raises $82M Series C at $1.25B valuation, positioning itself as the purpose-built tool where teams, AI, and agents build software together @karrisaarinen
Enterprise AI startup Glean achieves $7.2B valuation, highlighting continued investor appetite for AI enterprise solutions @TechCrunch
Google raising Google Workspace pricing citing AI value additions, despite users finding limited utility in features like Gemini integration @GergelyOrosz

AI Now Institute emphasizes that resisting Big Tech AI's current path is essential to any emancipatory project grounded in justice and democratic self-determination @AINowInstitute
Ethan Mollick warns that people are looking for reasons to dismiss AI capabilities, citing the pattern of "AI must fail" papers getting disproportionate attention while "AI does this well" research is ignored @emollick
Concerns raised about xAI's Grok serving as an arbiter of truth on social media platforms, with calls for transparency about accuracy rates and effectiveness @emollick
Pentagon reportedly gutting the team responsible for testing AI and weapons systems, raising concerns about AI safety oversight in military applications @techreview

1X AI unveils Redwood, a 160M parameter Vision-Language-Action model capable of end-to-end mobile manipulation tasks including object retrieval, door opening, and home navigation @ericjang11
Perplexity introduces Memory feature and updates iOS voice assistant, with o3 model support now available for Pro users @AravSrinivas
Claude Code launches with deeper VS Code and JetBrains IDE integration, allowing Claude to see open files, LSP diagnostics, and highlighted text @_catwu
Windsurf introduces Planning mode for AI coding, using larger reasoning models to iterate on long-term plans while selected models take short-term actions @windsurf_ai
Yutori launches Scouts, AI agents that continuously monitor the web for specific information and provide automated alerts, functioning as an advanced version of Google Alerts @abhshkdz
xAI partners with Polymarket to blend market predictions with X data and Grok's analysis for enhanced prediction capabilities @xai
Google AI develops flood forecasting system using AI to understand rainfall-streamflow relationships, enabling global flood predictions for building resilient communities @GoogleAI

o3-pro achieves 59% performance on ARC-AGI-1 benchmark at high reasoning effort, setting new frontier pricing at $4.16 per task, while struggling with ARC-AGI-2 at less than 5% success rate @arcprize
Research on RLHF reveals potential issues with preference optimization, suggesting it may optimize for a "mythical user" that represents no one in reality @berkeley_ai
Stanford researchers develop approach for long-context LLMs using "self-study" to compress KV-cache memory, achieving 39x less memory usage and 26x higher peak throughput while matching in-context learning quality @stanfordnlp
Berkeley AI Research introduces SPlus optimizer that matches Adam performance within 44% of training steps across various objectives @berkeley_ai
Stanford HAI researchers use AI to analyze brain scans of students solving math problems, providing first insights into the neuroscience of math disabilities @StanfordHAI
Research demonstrates that reasoning models consistently appear "more safe" or "more cautious" with the same training intent, potentially due to inference-time scaled reward modeling @natolambert