AI Updates on 2025-12-11
AI Model Announcements
- OpenAI releases GPT-5.2, described as the smartest generally-available model in the world, particularly strong at real-world knowledge work tasks including spreadsheets, presentations, and coding. The model comes in three variants: GPT-5.2 Instant for everyday work, GPT-5.2 Thinking for complex reasoning and long-context tasks, and GPT-5.2 Pro for difficult questions and scientific work @OpenAI
- GPT-5.2 achieves 55.6% on SWE-Bench Pro, 52.9% on ARC-AGI-2, and 40.3% on Frontier Math, with a 70.9% win/tie rate against industry experts on GDPval benchmark measuring knowledge work tasks across 44 occupations @sama
- GPT-5.2 Pro achieves state-of-the-art 90.5% score on ARC-AGI-1 at $11.64 per task, representing a 390x efficiency improvement over last year's o3 preview which scored 88% at $4,500 per task @arcprize
- Alibaba announces Qwen Learn Mode powered by Qwen3-Max, featuring Socratic-style dialogue and adaptive learning paths grounded in cognitive psychology @Alibaba_Qwen
- Cohere launches Rerank 4 with two versions (Fast and Pro), featuring the largest context window in their Rerank series, self-learning capabilities without annotated data, and support for over 100 languages with state-of-the-art retrieval in 10 major business languages @cohere
- Google introduces Gemini Deep Research agent for developers, built on Gemini 3 Pro and trained using multi-step reinforcement learning to autonomously navigate the web and produce detailed reports with citations. Achieves state-of-the-art performance on DeepSearchQA benchmark and highest score yet on BrowseComp @GoogleDeepMind
- Google updates Gemini TTS models with richer tone versatility, stricter adherence to style prompts, smarter context-aware speed adjustments, and consistent character voices in multi-speaker scenarios @OfficialLoganK
- Mistral AI announces Devstral 2 is #1 trending on OpenRouter and teases another model drop coming in a few days @MistralAI
- Google announces Gemini integration with Google Maps, serving up local results in a rich visual format with photos, ratings, and real-world information @GeminiApp
AI Industry Analysis
- VC fundraising has dropped 75% from 2022 peak to approximately $45B in Q3 2025, returning to levels from 8 years ago, while capital deployment remains high at ~$330B over the last 4 quarters. The growing gap between funds deployed and funds raised suggests it will become significantly harder for startups to find capital @deedydas
- Over one-third of startups in 2025 were started solo for the first time in history, with solo founders becoming increasingly common @julianweisser
- Perplexity announces adoption by law firm Gunderson Dettmer for legal services, highlighting lawyers' need for accurate AI that can pull references reliably @AravSrinivas
- Disney signs three-year licensing deal with OpenAI allowing Sora to generate AI videos featuring its 200 characters, with exclusivity for the first year. Disney will set guardrails for character usage and curate videos for Disney+ @TechCrunch
- Harness raises $240M at $5.5B valuation to automate AI's "after-code" gap in software delivery @TechCrunch
- Runware raises $50M Series A to help make image and video generation easier for developers @TechCrunch
- Port raises $100M at $800M valuation to compete with Spotify's Backstage for developer portals @TechCrunch
- Opera launches Neon, an AI-powered browser priced at $20 per month @TechCrunch
- Worktrace raises $9M seed round led by 8VC to help businesses uncover automation opportunities, founded by former OpenAI product manager Angela Jiang and UIUC CS professor Deepak Vasisht @worktrace_ai
- Vybe raises $10M seed round led by First Round to enable vibe-coding for internal business applications with production data integration @qhoang09
- Oboe raises $16M Series A led by a16z for personalized learning platform @NirZicherman
- Unconventional AI raises $475M seed round co-led by a16z to develop highly efficient AI-first chips using analog computing approaches inspired by biological brains @a16z
- Hugging Face announces text-generation-inference is now in maintenance mode, recommending users migrate to vLLM, SGLang, llama.cpp or MLX for optimized inference @LysandreJik
- Cursor introduces visual design editing directly in codebase, allowing users to select elements, modify them visually, and have Cursor write the code, aiming to bridge design and engineering workflows @cursor_ai
- Runway releases its first world model and adds native audio to latest video model @TechCrunch
- Rivian announces major autonomy push with custom silicon, lidar, and hints at robotaxis, with AI assistant coming to EVs in early 2026 @TechCrunch
AI Ethics & Society
- Ethan Mollick demonstrates GPT-5.2 Pro creating visually complex shader code in a single shot, highlighting the difficulty of distinguishing AI-generated content from human-created work @emollick
- OpenAI announces investment in cybersecurity preparedness as models grow more capable, working with global experts to strengthen safeguards and give defenders an advantage @OpenAI
- Disney issues cease-and-desist to Google claiming massive copyright infringement @TechCrunch
- TIME names "Architects of AI" as 2025 Person of the Year, including Fei-Fei Li, recognizing AI's transformational impact on humanity @drfeifei
- xAI partners with El Salvador to bring personalized Grok tutoring to over 1 million public school students, creating the world's first nationwide AI tutor program @xai
- Anthropic announces Model Context Protocol (MCP) is now part of the Agentic AI Foundation under the Linux Foundation, with OpenAI, Anthropic, and Block as co-founders @AnthropicAI
- ICML 2026 announces new policy allowing reviewers and authors to choose between conservative or permissive LLM use, with matching based on preferences @icmlconf
- Ethan Mollick notes that open weights AI models lack the same economics as open source software, with no clear path to capture value despite increasing model costs, raising questions about sustainability @emollick
- Stanford researchers find that 1 in 20 AI benchmarks have serious flaws, meaning the industry has been promoting underperforming models and penalizing better ones due to broken evaluation methods @StanfordHAI
AI Applications
- Linear introduces AI agent integration with Intercom, Zendesk, Gong, and Slack Workflows, enabling automatic issue creation from customer calls and tickets with a single click @karrisaarinen
- Google debuts Disco, a Gemini-powered tool for making web apps from browser tabs @TechCrunch
- Google launches AI try-on feature for clothes that works with just a selfie @TechCrunch
- Andrew Ng shares recipe for building highly autonomous agents using open source aisuite package, allowing frontier LLMs to use tools like disk access and web search for complex tasks, though noting most practical agents need more scaffolding @AndrewYNg
- Simon Willison publishes comprehensive guide on patterns for vibe-coding single-file HTML tools, covering CORS-enabled APIs, localStorage, URL state management, and rich copy-paste functionality after creating 150 different tools @simonw
- Microsoft Research introduces Agent Lightning, which decouples how agents work from how they're trained by turning each agent step into reinforcement learning data, enabling developers to improve agent performance with minimal code changes @MSFTResearch
- Satya Nadella demonstrates chain of debate app for deep research using multiple models and decision frameworks, announcing integration into Copilot @satyanadella
- Swiggy uses Microsoft Fabric to process billions of data points in near real-time for delivery innovations @satyanadella
AI Research
- On GDPval benchmark measuring well-specified knowledge work tasks across 44 occupations, GPT-5.2 Thinking is the first model to perform at human expert level, with GPT-5.2 Pro winning 71% of head-to-head comparisons against human experts on tasks requiring 4-8 hours as judged by other humans @emollick
- Francois Chollet announces ARC 3 benchmark releasing in Q1 2026 to target exploration, goal-setting, and interactive planning as new bottlenecks beyond fluid intelligence. Notes that while ARC 1 is saturating, state-of-the-art models are not yet human-level on an efficiency basis, and ARC 2 remains largely unsaturated @fchollet
- Mike Knoop estimates human efficiency for solving simple ARC v1 tasks is 10,000x higher than GPT-5.2 Pro on an energy basis, down from 1,000,000x compared to last year's o3 preview @mikeknoop
- Google Deep