AI Updates on 2025-12-18
AI Model Announcements
- Google releases Gemini 3 Flash globally, achieving state-of-the-art performance on agentic benchmarks including tau2, MCP atlas, and SWE verified, while maintaining lower costs than previous models @GeminiApp
- OpenAI launches GPT-5.2-Codex, trained specifically for agentic coding and terminal use, with early success reported by internal teams @sama
- Meta open-sources Perception Encoder Audiovisual (PE-AV), the technical engine behind SAM Audio's state-of-the-art audio separation, integrating audio with visual perception @AIatMeta
- Google releases FunctionGemma, a lightweight 270M parameter open foundation model designed for creating specialized function calling models that can run on phones and browsers @osanseviero
- Google introduces T5Gemma 2, the first multimodal, long-context, heavily multilingual (140 languages) encoder-decoder model, available in 270M-270M, 1B-1B, and 4B-4B sizes @osanseviero
- Mistral releases Mistral OCR 3, setting new benchmarks in both accuracy and efficiency, outperforming enterprise document processing solutions and AI-native OCR @MistralAI
- NVIDIA releases Nemotron 3 family of open models, data, and libraries, delivering highly efficient models designed for customization, multi-agent systems, and scale @NVIDIAAI
- Luma releases a new AI model that lets users generate videos from a start and end frame @TechCrunch
- xAI launches Grok Voice Agent API, empowering developers to build voice agents that speak dozens of languages, call tools, and search realtime data, with response times under one second @MarioNawfal
AI Industry Analysis
- ChatGPT's mobile app reaches new milestone of $3 billion in consumer spending @TechCrunch
- Vibe-coding startup Lovable raises $330M at a $6.6B valuation, signaling strong investor interest in AI-powered development tools @TechCrunch
- Top AI companies are hiring professional vibe coders, non-technical people who are top 1% at using tools like Lovable, Replit, Bolt, v0, and Cursor @clairevo
- Brett Adcock, founder of Figure (humanoid robotics company valued at $39B), is reportedly self-funding $100M into new AI lab called Hark, building human-centric AI that can think proactively and recursively improve @rowancheung
- Stripe Capital randomized controlled trial across thousands of businesses shows that those accepting loans grew annual revenue around 27% faster over two years, highlighting capital constraints as a major bottleneck to business growth @patrickc
- Google engineers report landing 120K-300K+ lines of code in production using Gemini 2.5 and 3.0, demonstrating significant productivity gains from AI coding assistants @GergelyOrosz
- AI coding models work significantly better on greenfield projects and standard tooling compared to monoliths and non-standard tooling used at companies like Meta and Google, giving startup developers an advantage @GergelyOrosz
- OpenAI built the Sora Android app, which hit #1 app in the world, in just 18 days with the help of Codex @gdb
- ChatGPT launches an app store, letting developers submit apps for review to be listed in a new directory where users can search for apps directly in ChatGPT @TechCrunch
AI Ethics & Society
- Ethan Mollick warns that everyone, even the most cynical and informed, will likely fall for at least one AI-faked story, photo, or post in the coming year, with bad implications for trust and information integrity @emollick
- Google Gemini app introduces SynthID watermark detection feature, allowing users to upload images or videos to verify if they were created or edited with Google AI tools, helping identify AI-generated content @GeminiApp
- Sam Altman reports that a security researcher using OpenAI's previous model found and disclosed a vulnerability in React that could lead to source code exposure, highlighting the dual-use nature of AI capabilities in cybersecurity @sama
- OpenAI updates the Model Spec with a new Under-18 (U18) Principles section, along with smaller edits and simplifications to guide how models behave @w01fe
- Adobe hit with proposed class-action lawsuit, accused of misusing authors' work in AI training @TechCrunch
- FTC questions Instacart's AI-driven pricing tool, raising concerns about algorithmic pricing practices @TechCrunch
AI Applications
- Anthropic's Project Vend experiment shows Claude running a shop in their San Francisco office, with the AI agent (named Claudius) improving business performance after upgrading from Claude Sonnet 3.7 to Sonnet 4 and 4.5, though still requiring significant human support @AnthropicAI
- Guild's AI agent built with Sierra achieves 4.8/5 CSAT matching their human support team, scaling across 20+ languages to serve working adults balancing jobs, caregiving, and education @btaylor
- Sutter Health partners with Sierra to deliver AI solutions that make care easier to navigate for patients while giving care teams more space to focus on human connection @btaylor
- Amazon introduces Alexa+ feature adding conversational AI to Ring doorbells @TechCrunch
- Shreya Rao demonstrates data processing with LLMs at scale using semantic Map, Filter, Reduce operators, achieving 86% cost reduction while retaining 90% accuracy through techniques like Task Cascades and query optimization @HamelHusain
- Will McGugan releases Toad, a unified terminal interface for working with multiple AI coding agents including OpenHands, Claude Code, Gemini CLI, and others through the ACP protocol @willmcgugan
- Andrew Ng launches new course on NVIDIA's NeMo Agent Toolkit, teaching developers to harden agentic workflows into reliable production-ready systems with observability, evaluation, and deployment capabilities @AndrewYNg
AI Research
- Ethan Mollick reports no signs of an end to rapid gains in AI ability at ever-decreasing costs, with monthly updates needed to track progress on benchmarks like GPQA Diamond, though the benchmark is likely close to being maxed out @AndrewCurran_
- GPT-5 autonomously solved an open math problem submitted to IMProofBench with a complete, correct proof without human hints or intervention, making a small but novel contribution to enumerative geometry @gdb
- Research suggests popular AI models may feel nerfed at higher load due to deeper reduction operation trees in inference kernels with larger batch sizes, which increases rounding errors rather than deliberate performance degradation @davidad
- AI transcription from handwriting now exceeds human-level performance, with Gemini 3 Flash achieving character-level error rates of 1.43% and word-level error rates of 2.74%, a 47-63% improvement over 2.5 Flash @emollick
- John Schulman explains that value functions don't seem to help much in current RL settings for LLMs, despite their theoretical benefits for variance reduction, though he expects them to make a comeback @natolambert
- Francois Chollet argues that general intelligence emerges evolutionarily from the simple goal of surviving through ever-novel, often adversarial situations, making it a situated process of efficient adaptation to novelty @fchollet
- Francois Chollet notes that gradient descent fails in discrete and combinatorial reasoning spaces with cliff-like landscapes where a single logical step alters the entire outcome @fchollet
- OpenAI and U.S. Department of Energy expand collaboration on AI and advanced computing to support national scientific priorities through the Genesis Mission, aiming to accelerate scientific discovery @OpenAINewsroom
- Google DeepMind announces AI has potential to compress time needed for new discoveries from years to days, supporting U.S. Department of Energy's Genesis Mission by providing National Labs with AI tools for research in physics, chemistry, and beyond @GoogleDeepMind
- Keras releases version 3.13 with major new features including model export to LiteRT for mobile/edge, GPTQ quantization support for post-training compression, and new Adaptive Pooling layers for dynamic architectures @fchollet
- Meta releases Pixio in Transformers library, proposing 4 changes to Masked AutoEncoders (MAE) including scaling to 2B images, outperforming or matching DINOv3 trained at similar scales @NielsRogge
- Hugging Face reaches 600,000 public datasets, representing a 1000x increase from 600 datasets five years ago @lhoestq
- Transformers v5 redesigns tokenization with new backend architecture, improving the bridge between tokenizers and transformers @itazapo