AI Updates on 2025-12-18

AI Model Announcements

Google releases Gemini 3 Flash globally, achieving state-of-the-art performance on agentic benchmarks including tau2, MCP atlas, and SWE verified, while maintaining lower costs than previous models @GeminiApp
OpenAI launches GPT-5.2-Codex, trained specifically for agentic coding and terminal use, with early success reported by internal teams @sama
Meta open-sources Perception Encoder Audiovisual (PE-AV), the technical engine behind SAM Audio's state-of-the-art audio separation, integrating audio with visual perception @AIatMeta
Google releases FunctionGemma, a lightweight 270M parameter open foundation model designed for creating specialized function calling models that can run on phones and browsers @osanseviero
Google introduces T5Gemma 2, the first multimodal, long-context, heavily multilingual (140 languages) encoder-decoder model, available in 270M-270M, 1B-1B, and 4B-4B sizes @osanseviero
Mistral releases Mistral OCR 3, setting new benchmarks in both accuracy and efficiency, outperforming enterprise document processing solutions and AI-native OCR @MistralAI
NVIDIA releases Nemotron 3 family of open models, data, and libraries, delivering highly efficient models designed for customization, multi-agent systems, and scale @NVIDIAAI
Luma releases a new AI model that lets users generate videos from a start and end frame @TechCrunch
xAI launches Grok Voice Agent API, empowering developers to build voice agents that speak dozens of languages, call tools, and search realtime data, with response times under one second @MarioNawfal

AI Industry Analysis

ChatGPT's mobile app reaches new milestone of $3 billion in consumer spending @TechCrunch
Vibe-coding startup Lovable raises $330M at a $6.6B valuation, signaling strong investor interest in AI-powered development tools @TechCrunch
Top AI companies are hiring professional vibe coders, non-technical people who are top 1% at using tools like Lovable, Replit, Bolt, v0, and Cursor @clairevo
Brett Adcock, founder of Figure (humanoid robotics company valued at $39B), is reportedly self-funding $100M into new AI lab called Hark, building human-centric AI that can think proactively and recursively improve @rowancheung
Stripe Capital randomized controlled trial across thousands of businesses shows that those accepting loans grew annual revenue around 27% faster over two years, highlighting capital constraints as a major bottleneck to business growth @patrickc
Google engineers report landing 120K-300K+ lines of code in production using Gemini 2.5 and 3.0, demonstrating significant productivity gains from AI coding assistants @GergelyOrosz
AI coding models work significantly better on greenfield projects and standard tooling compared to monoliths and non-standard tooling used at companies like Meta and Google, giving startup developers an advantage @GergelyOrosz
OpenAI built the Sora Android app, which hit #1 app in the world, in just 18 days with the help of Codex @gdb
ChatGPT launches an app store, letting developers submit apps for review to be listed in a new directory where users can search for apps directly in ChatGPT @TechCrunch

AI Ethics & Society

Ethan Mollick warns that everyone, even the most cynical and informed, will likely fall for at least one AI-faked story, photo, or post in the coming year, with bad implications for trust and information integrity @emollick
Google Gemini app introduces SynthID watermark detection feature, allowing users to upload images or videos to verify if they were created or edited with Google AI tools, helping identify AI-generated content @GeminiApp
Sam Altman reports that a security researcher using OpenAI's previous model found and disclosed a vulnerability in React that could lead to source code exposure, highlighting the dual-use nature of AI capabilities in cybersecurity @sama
OpenAI updates the Model Spec with a new Under-18 (U18) Principles section, along with smaller edits and simplifications to guide how models behave @w01fe
Adobe hit with proposed class-action lawsuit, accused of misusing authors' work in AI training @TechCrunch
FTC questions Instacart's AI-driven pricing tool, raising concerns about algorithmic pricing practices @TechCrunch

AI Applications

Anthropic's Project Vend experiment shows Claude running a shop in their San Francisco office, with the AI agent (named Claudius) improving business performance after upgrading from Claude Sonnet 3.7 to Sonnet 4 and 4.5, though still requiring significant human support @AnthropicAI
Guild's AI agent built with Sierra achieves 4.8/5 CSAT matching their human support team, scaling across 20+ languages to serve working adults balancing jobs, caregiving, and education @btaylor
Sutter Health partners with Sierra to deliver AI solutions that make care easier to navigate for patients while giving care teams more space to focus on human connection @btaylor
Amazon introduces Alexa+ feature adding conversational AI to Ring doorbells @TechCrunch
Shreya Rao demonstrates data processing with LLMs at scale using semantic Map, Filter, Reduce operators, achieving 86% cost reduction while retaining 90% accuracy through techniques like Task Cascades and query optimization @HamelHusain
Will McGugan releases Toad, a unified terminal interface for working with multiple AI coding agents including OpenHands, Claude Code, Gemini CLI, and others through the ACP protocol @willmcgugan
Andrew Ng launches new course on NVIDIA's NeMo Agent Toolkit, teaching developers to harden agentic workflows into reliable production-ready systems with observability, evaluation, and deployment capabilities @AndrewYNg

AI Research

Ethan Mollick reports no signs of an end to rapid gains in AI ability at ever-decreasing costs, with monthly updates needed to track progress on benchmarks like GPQA Diamond, though the benchmark is likely close to being maxed out @AndrewCurran_
GPT-5 autonomously solved an open math problem submitted to IMProofBench with a complete, correct proof without human hints or intervention, making a small but novel contribution to enumerative geometry @gdb
Research suggests popular AI models may feel nerfed at higher load due to deeper reduction operation trees in inference kernels with larger batch sizes, which increases rounding errors rather than deliberate performance degradation @davidad
AI transcription from handwriting now exceeds human-level performance, with Gemini 3 Flash achieving character-level error rates of 1.43% and word-level error rates of 2.74%, a 47-63% improvement over 2.5 Flash @emollick
John Schulman explains that value functions don't seem to help much in current RL settings for LLMs, despite their theoretical benefits for variance reduction, though he expects them to make a comeback @natolambert
Francois Chollet argues that general intelligence emerges evolutionarily from the simple goal of surviving through ever-novel, often adversarial situations, making it a situated process of efficient adaptation to novelty @fchollet
Francois Chollet notes that gradient descent fails in discrete and combinatorial reasoning spaces with cliff-like landscapes where a single logical step alters the entire outcome @fchollet
OpenAI and U.S. Department of Energy expand collaboration on AI and advanced computing to support national scientific priorities through the Genesis Mission, aiming to accelerate scientific discovery @OpenAINewsroom
Google DeepMind announces AI has potential to compress time needed for new discoveries from years to days, supporting U.S. Department of Energy's Genesis Mission by providing National Labs with AI tools for research in physics, chemistry, and beyond @GoogleDeepMind
Keras releases version 3.13 with major new features including model export to LiteRT for mobile/edge, GPTQ quantization support for post-training compression, and new Adaptive Pooling layers for dynamic architectures @fchollet
Meta releases Pixio in Transformers library, proposing 4 changes to Masked AutoEncoders (MAE) including scaling to 2B images, outperforming or matching DINOv3 trained at similar scales @NielsRogge
Hugging Face reaches 600,000 public datasets, representing a 1000x increase from 600 datasets five years ago @lhoestq
Transformers v5 redesigns tokenization with new backend architecture, improving the bridge between tokenizers and transformers @itazapo