AI Updates on 2025-12-20

AI Model Announcements

Alibaba releases Qwen-Image-Layered, an open-source model for native image decomposition with Photoshop-grade layering, physically isolated RGBA layers, and prompt-controlled structure supporting 3-10 layers with infinite decomposition depth @Alibaba_Qwen
Google releases Gemini 3 Flash, bringing frontier-level performance at 3x faster speed than 2.5 Pro and a fraction of the cost, now available in Gemini App, AI Mode in Google Search, Google AI Studio, and Vertex AI @GoogleAI
Anthropic releases Bloom, an open-source tool for generating behavioral misalignment evaluations for frontier AI models, allowing researchers to specify behaviors and quantify their frequency and severity across automatically generated scenarios @AnthropicAI
Google releases multiple Gemma family updates including FunctionGemma (specialized version of Gemma 3 270M model), T5Gemma 2 (next evolution of encoder-decoder models), and Gemma Scope 2 (open suite of tools for language model interpretability) @GoogleAI
Google's SynthID watermark can now verify AI-generated videos in addition to images, with verification available directly in the Gemini app @GoogleAI
OpenAI introduces personalization settings in ChatGPT allowing users to adjust specific characteristics like warmth, enthusiasm, and emoji use, with tone modifications not impacting output accuracy @OpenAI
OpenAI releases writing blocks feature in ChatGPT for easier email composition, allowing users to update and format text in chat, highlight for changes, accept or reject suggestions, and open directly in email clients @jamesfzhang
Codex now officially supports skills per the agentskills.io standard, enabling reusable bundles of instructions, scripts, and resources that can be called directly or chosen automatically based on prompts @OpenAIDevs
NotebookLM is now built on Gemini 3, bringing significant improvements to reasoning and multimodal understanding @NotebookLM
Google Labs releases CC, an experimental AI productivity agent in Gmail for personalized daily briefings and custom email assistance @GoogleAI
NotebookLM adds Data Tables as a new studio output for easy organization and synthesis of data across sources @GoogleAI
Google's Playables Builder launches as a prototype web app on YouTube built with Gemini 3 Pro, enabling game development from short text, video, or image prompts that are playable on YouTube @GoogleAI

AI Industry Analysis

Gerge Orosz observes that despite LLMs writing code 100x faster and in 100x greater volume than human developers, creating quality software remains difficult, highlighting that the hard part of software development was never just writing code but managing complexity, testing, and maintaining quality @GergelyOrosz
Cursor acquires Graphite in continuation of its acquisition spree, signaling consolidation in the AI-powered development tools market @TechCrunch
Investors are placing their bets on AI for next year, with AI dominating investment focus according to industry analysis @TechCrunch
Ex-Splunk executives' startup Resolve AI hits $1 billion valuation with Series A funding, demonstrating continued strong investor appetite for AI infrastructure companies @TechCrunch
Gerge Orosz identifies writing unit and integration tests as an excellent use case for AI in coding, noting that AI handles the tedious setup work while developers can focus on reviewing edge cases and ensuring test quality @GergelyOrosz
Salesforce executives report that large language models cannot be trusted for full automation, leading them to develop a hybrid system with if-then deterministic features, representing a return to expert systems approaches from the 1980s @amir
Gerge Orosz suggests git may face competition as the dominant version control system for the future, noting that git doesn't support agent trajectories and may not be efficient for massive repositories that AI agents generate @GergelyOrosz
Amazon reportedly plans to invest up to $10 billion in OpenAI, with concerns raised about circular revenue as OpenAI would use that money to buy Amazon's products @TechCrunch

AI Ethics & Society

New York Governor Kathy Hochul signs RAISE Act to regulate AI safety, marking significant state-level AI regulation @TechCrunch
Research paper reveals that 25 different AI models asked to write a metaphor about time nearly all produced "time is a river" or "time is a weaver," likely due to overlapping training, alignment processes, and synthetic data contamination, raising concerns about lack of idea diversity @MParakhin
Santa Fe Institute publishes first mathematically precise framework for what it would mean for one universe to simulate another, showing that several longstanding claims about simulations break down under rigorous definition and suggesting the possibility that a universe capable of simulating another could be perfectly reproduced inside that simulation @sfiscience

AI Applications

NVIDIA releases NitroGen, an open-source foundation model trained to play 1000+ games across RPG, platformer, battle royale, racing, 2D, and 3D genres, adapting the GR00T N1.5 robotics architecture for gaming with 40K+ hours of gameplay data to develop embodied reasoning, perception, and motor coordination @DrJimFan
Antigravity's computer use capabilities massively upgraded with Gemini 3 Flash, becoming both faster and better at performing long agentic tasks using the browser, including deep research and code visualization @_mohansolo
Google's Nano Banana Pro unexpectedly demonstrates strong performance in creating PowerPoint presentations, representing an example of AI's jagged abilities leading to breakthroughs in unexpected areas @emollick
Claude Code demonstrates capabilities beyond software development, proving effective for any task that can be accomplished by executing commands on a computer, suggesting a shift from application-specific tools to mode-based AI operation @simonw
ChatGPT Pro users can now give friends 3 months of access to ChatGPT Plus, with share links available via email or notification for users who were Pro members as of December 1 @nickaturley
SmolVLM from Hugging Face demonstrates real-time webcam capabilities running fully local on MacBook M3 using llama.cpp @DataChaz
Sierra announces new capabilities focused on customer relationships rather than individual conversations, emphasizing the atomic unit of customer experience as a relationship @btaylor

AI Research

METR evaluation shows Opus 4.5 achieving 4 hours 49 minutes at 50% success threshold for autonomous task duration, far above trend, though its 80% time horizon of 27 minutes remains similar to past models and below GPT-5.1-Codex-Max's 32 minutes, with the gap reflecting a flatter logistic success curve as Opus differentially succeeds on longer tasks @METR_Evals
Analysis shows AI agent capabilities for coding tasks compared to human professionals are doubling approximately every 4 months, with Opus 4.5 putting progress roughly back on track for this exponential trend @aidigest_
Researcher davidad predicts that by December 2026 the recursive self-improvement loop on algorithms will likely be closed, resulting in another inflection point to an even faster pace with perhaps around 70-80 day doubling time @davidad
Stephen McAleer shifts research focus to automated alignment research, emphasizing the importance that alignment can keep up during the intelligence explosion as automated AI research arrives soon @McaleerStephen
Users report GPT-5.2 in Codex represents a dramatic step-change, feeling more significant than the transition from 3.5 to 4, with strong performance on large, real-world codebases and methodical approach to tasks @Javi
Research introduces MMGR (Multi-Modal Generative Reasoning) benchmark evaluating video models (Veo-3, Sora-2, Wan-2.2) and image models (Nano-banana/Pro, GPT-4o-image, Qwen-image) on physical, logical, 3D/2D spatial, and temporal reasoning, finding that while models excel at visual physics, they fail catastrophically at abstract logic (under 10% on ARC-AGI for most video models) and long-horizon planning @HaoyiQiu
Berkeley AI introduces RETAIN, a new method for VLA (Vision-Language-Action) finetuning based on model merging that allows finetuning on narrow task data while maintaining broad generalization by directly merging base and finetuned policy in weight space @zhiyuan_zhou_
Jeff Dean and Sanjay Ghemawat publish Performance Hints document externally, collecting performance optimization techniques ranging from high-level algorithmic improvements to low-level optimizations gathered from decades of changelists @JeffDean