AI Updates on 2025-12-20
AI Model Announcements
- Alibaba releases Qwen-Image-Layered, an open-source model for native image decomposition with Photoshop-grade layering, physically isolated RGBA layers, and prompt-controlled structure supporting 3-10 layers with infinite decomposition depth @Alibaba_Qwen
- Google releases Gemini 3 Flash, bringing frontier-level performance at 3x faster speed than 2.5 Pro and a fraction of the cost, now available in Gemini App, AI Mode in Google Search, Google AI Studio, and Vertex AI @GoogleAI
- Anthropic releases Bloom, an open-source tool for generating behavioral misalignment evaluations for frontier AI models, allowing researchers to specify behaviors and quantify their frequency and severity across automatically generated scenarios @AnthropicAI
- Google releases multiple Gemma family updates including FunctionGemma (specialized version of Gemma 3 270M model), T5Gemma 2 (next evolution of encoder-decoder models), and Gemma Scope 2 (open suite of tools for language model interpretability) @GoogleAI
- Google's SynthID watermark can now verify AI-generated videos in addition to images, with verification available directly in the Gemini app @GoogleAI
- OpenAI introduces personalization settings in ChatGPT allowing users to adjust specific characteristics like warmth, enthusiasm, and emoji use, with tone modifications not impacting output accuracy @OpenAI
- OpenAI releases writing blocks feature in ChatGPT for easier email composition, allowing users to update and format text in chat, highlight for changes, accept or reject suggestions, and open directly in email clients @jamesfzhang
- Codex now officially supports skills per the agentskills.io standard, enabling reusable bundles of instructions, scripts, and resources that can be called directly or chosen automatically based on prompts @OpenAIDevs
- NotebookLM is now built on Gemini 3, bringing significant improvements to reasoning and multimodal understanding @NotebookLM
- Google Labs releases CC, an experimental AI productivity agent in Gmail for personalized daily briefings and custom email assistance @GoogleAI
- NotebookLM adds Data Tables as a new studio output for easy organization and synthesis of data across sources @GoogleAI
- Google's Playables Builder launches as a prototype web app on YouTube built with Gemini 3 Pro, enabling game development from short text, video, or image prompts that are playable on YouTube @GoogleAI
AI Industry Analysis
- Gerge Orosz observes that despite LLMs writing code 100x faster and in 100x greater volume than human developers, creating quality software remains difficult, highlighting that the hard part of software development was never just writing code but managing complexity, testing, and maintaining quality @GergelyOrosz
- Cursor acquires Graphite in continuation of its acquisition spree, signaling consolidation in the AI-powered development tools market @TechCrunch
- Investors are placing their bets on AI for next year, with AI dominating investment focus according to industry analysis @TechCrunch
- Ex-Splunk executives' startup Resolve AI hits $1 billion valuation with Series A funding, demonstrating continued strong investor appetite for AI infrastructure companies @TechCrunch
- Gerge Orosz identifies writing unit and integration tests as an excellent use case for AI in coding, noting that AI handles the tedious setup work while developers can focus on reviewing edge cases and ensuring test quality @GergelyOrosz
- Salesforce executives report that large language models cannot be trusted for full automation, leading them to develop a hybrid system with if-then deterministic features, representing a return to expert systems approaches from the 1980s @amir
- Gerge Orosz suggests git may face competition as the dominant version control system for the future, noting that git doesn't support agent trajectories and may not be efficient for massive repositories that AI agents generate @GergelyOrosz
- Amazon reportedly plans to invest up to $10 billion in OpenAI, with concerns raised about circular revenue as OpenAI would use that money to buy Amazon's products @TechCrunch
AI Ethics & Society
- New York Governor Kathy Hochul signs RAISE Act to regulate AI safety, marking significant state-level AI regulation @TechCrunch
- Research paper reveals that 25 different AI models asked to write a metaphor about time nearly all produced "time is a river" or "time is a weaver," likely due to overlapping training, alignment processes, and synthetic data contamination, raising concerns about lack of idea diversity @MParakhin
- Santa Fe Institute publishes first mathematically precise framework for what it would mean for one universe to simulate another, showing that several longstanding claims about simulations break down under rigorous definition and suggesting the possibility that a universe capable of simulating another could be perfectly reproduced inside that simulation @sfiscience
AI Applications
- NVIDIA releases NitroGen, an open-source foundation model trained to play 1000+ games across RPG, platformer, battle royale, racing, 2D, and 3D genres, adapting the GR00T N1.5 robotics architecture for gaming with 40K+ hours of gameplay data to develop embodied reasoning, perception, and motor coordination @DrJimFan
- Antigravity's computer use capabilities massively upgraded with Gemini 3 Flash, becoming both faster and better at performing long agentic tasks using the browser, including deep research and code visualization @_mohansolo
- Google's Nano Banana Pro unexpectedly demonstrates strong performance in creating PowerPoint presentations, representing an example of AI's jagged abilities leading to breakthroughs in unexpected areas @emollick
- Claude Code demonstrates capabilities beyond software development, proving effective for any task that can be accomplished by executing commands on a computer, suggesting a shift from application-specific tools to mode-based AI operation @simonw
- ChatGPT Pro users can now give friends 3 months of access to ChatGPT Plus, with share links available via email or notification for users who were Pro members as of December 1 @nickaturley
- SmolVLM from Hugging Face demonstrates real-time webcam capabilities running fully local on MacBook M3 using llama.cpp @DataChaz
- Sierra announces new capabilities focused on customer relationships rather than individual conversations, emphasizing the atomic unit of customer experience as a relationship @btaylor
AI Research
- METR evaluation shows Opus 4.5 achieving 4 hours 49 minutes at 50% success threshold for autonomous task duration, far above trend, though its 80% time horizon of 27 minutes remains similar to past models and below GPT-5.1-Codex-Max's 32 minutes, with the gap reflecting a flatter logistic success curve as Opus differentially succeeds on longer tasks @METR_Evals
- Analysis shows AI agent capabilities for coding tasks compared to human professionals are doubling approximately every 4 months, with Opus 4.5 putting progress roughly back on track for this exponential trend @aidigest_
- Researcher davidad predicts that by December 2026 the recursive self-improvement loop on algorithms will likely be closed, resulting in another inflection point to an even faster pace with perhaps around 70-80 day doubling time @davidad
- Stephen McAleer shifts research focus to automated alignment research, emphasizing the importance that alignment can keep up during the intelligence explosion as automated AI research arrives soon @McaleerStephen
- Users report GPT-5.2 in Codex represents a dramatic step-change, feeling more significant than the transition from 3.5 to 4, with strong performance on large, real-world codebases and methodical approach to tasks @Javi
- Research introduces MMGR (Multi-Modal Generative Reasoning) benchmark evaluating video models (Veo-3, Sora-2, Wan-2.2) and image models (Nano-banana/Pro, GPT-4o-image, Qwen-image) on physical, logical, 3D/2D spatial, and temporal reasoning, finding that while models excel at visual physics, they fail catastrophically at abstract logic (under 10% on ARC-AGI for most video models) and long-horizon planning @HaoyiQiu
- Berkeley AI introduces RETAIN, a new method for VLA (Vision-Language-Action) finetuning based on model merging that allows finetuning on narrow task data while maintaining broad generalization by directly merging base and finetuned policy in weight space @zhiyuan_zhou_
- Jeff Dean and Sanjay Ghemawat publish Performance Hints document externally, collecting performance optimization techniques ranging from high-level algorithmic improvements to low-level optimizations gathered from decades of changelists @JeffDean