AI Updates on 2026-02-04

AI Model Announcements

Qwen3-Coder-Next released as an 80B MoE model with only 3B active parameters, achieving 74.2% on SWE-Bench Verified and 44.3 on SWE-Bench Pro, now available on vLLM, LM Studio, Together AI, Kaggle, Hugging Face, and Ollama @Alibaba_Qwen
OpenAI releases GPT-5.2 and GPT-5.2-Codex with 40% faster inference through optimized inference stack, same model and weights with lower latency @OpenAIDevs
Mistral AI announces Voxtral Transcribe 2 with state-of-the-art speech-to-text, speaker diarization, and sub-200ms real-time latency; Voxtral Mini Transcribe 2 achieves 4% WER on FLEURS at $0.003/min, while Voxtral Realtime offers configurable latency to sub-200ms @MistralAI
Google releases Genie 3 world modeling prototype that lets users build and explore interactive worlds, with emergent capabilities like working GPS displays and physics simulation @GoogleAI
InternLM introduces Intern-S1-Pro, a 1T MoE open-source multimodal scientific reasoning model with SOTA performance on AI4Science tasks, featuring Fourier Position Encoding for better physical signal representation @intern_lm
ACE Music and StepFun release ACE-Step-v1.5 (2B), an open-source music generation model that runs locally on consumer GPUs, generates full songs in under 2 seconds on A100, and beats Suno on common evaluation metrics @acemusicAI
Perplexity upgrades Deep Research with Opus 4.5, achieving state-of-the-art performance on external benchmarks and outperforming other deep research tools on accuracy and reliability @perplexity_ai

AI Industry Analysis

Companies are citing AI as justification for layoffs, with experts suggesting it's more about appearing innovative to investors than actual AI replacement of workers @AINowInstitute
Old school companies, laggards, and government agencies are adopting AI dev tooling at nearly the same pace as cutting-edge startups, only months behind rather than years @GergelyOrosz
GitHub Copilot adoption hindered by keeping a far worse default model, leading teams to switch away and creating perception that Copilot is outdated @GergelyOrosz
OpenAI's Mark Chen emphasizes that the majority of compute is allocated to foundational research and exploration, not product milestones, with hundreds of exploratory projects running @markchen90
Ben Horowitz argues top AI researchers command billion-dollar price tags because there are only about 40 people in the world who can do the job, with skills that are alchemistic and can't be learned in school @a16z
Kimi becomes the number one used model on OpenClaw via OpenRouter, with real usage data showing developers voting with their tokens @Kimi_Moonshot
ElevenLabs raises $500M Series D at $11B valuation led by Sequoia, with a16z quadrupling down and ICONIQ tripling down @TechCrunch
Positron raises $230M Series B to compete with Nvidia's AI chips @TechCrunch
Intel announces plans to start making GPUs, entering a market dominated by Nvidia @TechCrunch
Nvidia's H200 exports to China approved by U.S. Department of Commerce but delayed pending State Department review @jukan05
RunBuggy uses Sierra AI agent for outbound calls, reducing calls by approximately 20%, reducing manual operational touchpoints by approximately 15%, and saving the ops team approximately 1,000 hours monthly @btaylor
Adaption raises $50M to build adaptive AI systems that evolve in real time @adaptionlabs
Collaborative Computing Inc. emerges from stealth with Atelier as their first product for collaborative computing environments for humans and AI @austinvhuang

AI Ethics & Society

Anthropic announces Claude will remain ad-free, stating advertising would be incompatible with their vision of a genuinely helpful assistant for work and deep thinking @claudeai
Sam Altman criticizes Anthropic's Super Bowl ad as dishonest, stating OpenAI would never run ads as depicted and emphasizing commitment to free access for billions who can't pay for subscriptions @sama
Altman accuses Anthropic of wanting to control what people do with AI, blocking companies they don't like from using their coding product, and trying to dictate other companies' business models @sama
Criminal legal system becoming increasingly reliant on privately developed technologies in the age of AI hype raises concerns about privatization of state authority @AINowInstitute
Dylan Scandrett joins OpenAI as Head of Preparedness to lead efforts in preparing for and mitigating severe risks from extremely powerful models @sama
Ethan Mollick demonstrates AI-generated videos from Genie 3 reaching quality where physics and interactions are convincingly simulated, though some issues remain @emollick
Plain English instructions that agents can follow may become a new avenue for marketing but also present a security nightmare @emollick
Shane Legg disagrees with Nature article claiming AGI has arrived, arguing that if an AI is failing at trivial things, it falls short of AGI despite having some form of general intelligence @ShaneLegg

AI Applications

Andrej Karpathy enables fp8 training for GPT-2 reproduction, achieving 2.91 hours training time on 8XH100 for approximately $20, representing a 600X cost reduction over 7 years @karpathy
Karpathy reflects on vibe coding one year anniversary, noting evolution from fun throwaway projects to agentic engineering as default workflow for professionals with oversight @karpathy
Perplexity releases DRACO Benchmark for evaluating deep research agents across 100 tasks in 10 domains including Academic, Finance, Law, Medicine, and Technology @perplexity_ai
Google introduces scientific citations in Gemini with proper APA-style inline citations and detailed reference sections for scientific prompts @joshwoodward
Figma releases Vectorize feature that converts raster images into editable vectors with simplified and controlled color output @figma
Granola releases MCP integration working with ChatGPT, Claude, and other tools for AI-powered meeting notes @meetgranola
Windsurf introduces Tab v2, the world's first variable aggression Pareto Frontier Tab model, saving customers on average 54% more keystrokes @windsurf
Cursor builds fast with their own AI tools and uses Linear to track work across teams and keep everyone aligned @linear
Lenny Rachitsky demonstrates content becoming software with Cursor-enabled interactive blog posts @lennysan
Tesla's VP of AI argues self-driving is not a sensor problem but an AI problem, stating cameras have enough information and it's about extracting it @SawyerMerritt

AI Research

Stanford researchers develop QuantiPhy benchmark to evaluate and improve AI's ability to reason about physical properties, addressing current models' struggles with basic physics estimates @StanfordHAI
MIT engineers design new tissue model that more accurately mimics liver architecture including blood vessels and immune cells for discovering MASLD treatments @MIT
NVIDIA's Nemotron models win ViDoRe V3, with AI agents transforming PDFs and contracts into live insights for companies like EdisonSci, Docusign, and JusttFintech @NVIDIAAI
Jim Fan's team trains robot foundation model on world model backbone enabling zero-shot, open-world prompting for new verbs, nouns, and environments, calling it DreamZero or World Action Model @DrJimFan
Research shows model and data recipe co-evolution with World Action Models learning best from diverse data rather than repeated demos, with diversity outweighing repetitions @DrJimFan
DreamZero demonstrates significant robot-to-robot and human-to-robot transfer, adapting quickly to new hardware with only 55 trajectories while retaining zero-shot prompting ability @DrJimFan
Publishing work on AI faces challenges as publication process is much slower than working papers, with peer reviews asking authors to account for newer papers built on the paper under review @emollick
Papers increasingly need to be built for easy updating as new models come out, with AI can't do task X papers needing to focus on trendlines rather than current capabilities @emollick
Rubrics-as-rewards for RL shows most added technical complexity is related to reward modeling rather than RL itself, with new developments likely to come from advancing generative reward models @cwolferesearch
EB-JEPA open-source library makes JEPAs accessible and trainable on a single GPU in hours, providing playground for learning latent representations across images, video, action-conditioned video, and planning @BasileTerv987
GPT-5.2 Pro demonstrates strongest statistical reasoning in experience, with ability to spot issues in analysis that Opus 4.5 and