AI Updates on 2026-02-04
AI Model Announcements
- Qwen3-Coder-Next released as an 80B MoE model with only 3B active parameters, achieving 74.2% on SWE-Bench Verified and 44.3 on SWE-Bench Pro, now available on vLLM, LM Studio, Together AI, Kaggle, Hugging Face, and Ollama @Alibaba_Qwen
- OpenAI releases GPT-5.2 and GPT-5.2-Codex with 40% faster inference through optimized inference stack, same model and weights with lower latency @OpenAIDevs
- Mistral AI announces Voxtral Transcribe 2 with state-of-the-art speech-to-text, speaker diarization, and sub-200ms real-time latency; Voxtral Mini Transcribe 2 achieves 4% WER on FLEURS at $0.003/min, while Voxtral Realtime offers configurable latency to sub-200ms @MistralAI
- Google releases Genie 3 world modeling prototype that lets users build and explore interactive worlds, with emergent capabilities like working GPS displays and physics simulation @GoogleAI
- InternLM introduces Intern-S1-Pro, a 1T MoE open-source multimodal scientific reasoning model with SOTA performance on AI4Science tasks, featuring Fourier Position Encoding for better physical signal representation @intern_lm
- ACE Music and StepFun release ACE-Step-v1.5 (2B), an open-source music generation model that runs locally on consumer GPUs, generates full songs in under 2 seconds on A100, and beats Suno on common evaluation metrics @acemusicAI
- Perplexity upgrades Deep Research with Opus 4.5, achieving state-of-the-art performance on external benchmarks and outperforming other deep research tools on accuracy and reliability @perplexity_ai
AI Industry Analysis
- Companies are citing AI as justification for layoffs, with experts suggesting it's more about appearing innovative to investors than actual AI replacement of workers @AINowInstitute
- Old school companies, laggards, and government agencies are adopting AI dev tooling at nearly the same pace as cutting-edge startups, only months behind rather than years @GergelyOrosz
- GitHub Copilot adoption hindered by keeping a far worse default model, leading teams to switch away and creating perception that Copilot is outdated @GergelyOrosz
- OpenAI's Mark Chen emphasizes that the majority of compute is allocated to foundational research and exploration, not product milestones, with hundreds of exploratory projects running @markchen90
- Ben Horowitz argues top AI researchers command billion-dollar price tags because there are only about 40 people in the world who can do the job, with skills that are alchemistic and can't be learned in school @a16z
- Kimi becomes the number one used model on OpenClaw via OpenRouter, with real usage data showing developers voting with their tokens @Kimi_Moonshot
- ElevenLabs raises $500M Series D at $11B valuation led by Sequoia, with a16z quadrupling down and ICONIQ tripling down @TechCrunch
- Positron raises $230M Series B to compete with Nvidia's AI chips @TechCrunch
- Intel announces plans to start making GPUs, entering a market dominated by Nvidia @TechCrunch
- Nvidia's H200 exports to China approved by U.S. Department of Commerce but delayed pending State Department review @jukan05
- RunBuggy uses Sierra AI agent for outbound calls, reducing calls by approximately 20%, reducing manual operational touchpoints by approximately 15%, and saving the ops team approximately 1,000 hours monthly @btaylor
- Adaption raises $50M to build adaptive AI systems that evolve in real time @adaptionlabs
- Collaborative Computing Inc. emerges from stealth with Atelier as their first product for collaborative computing environments for humans and AI @austinvhuang
AI Ethics & Society
- Anthropic announces Claude will remain ad-free, stating advertising would be incompatible with their vision of a genuinely helpful assistant for work and deep thinking @claudeai
- Sam Altman criticizes Anthropic's Super Bowl ad as dishonest, stating OpenAI would never run ads as depicted and emphasizing commitment to free access for billions who can't pay for subscriptions @sama
- Altman accuses Anthropic of wanting to control what people do with AI, blocking companies they don't like from using their coding product, and trying to dictate other companies' business models @sama
- Criminal legal system becoming increasingly reliant on privately developed technologies in the age of AI hype raises concerns about privatization of state authority @AINowInstitute
- Dylan Scandrett joins OpenAI as Head of Preparedness to lead efforts in preparing for and mitigating severe risks from extremely powerful models @sama
- Ethan Mollick demonstrates AI-generated videos from Genie 3 reaching quality where physics and interactions are convincingly simulated, though some issues remain @emollick
- Plain English instructions that agents can follow may become a new avenue for marketing but also present a security nightmare @emollick
- Shane Legg disagrees with Nature article claiming AGI has arrived, arguing that if an AI is failing at trivial things, it falls short of AGI despite having some form of general intelligence @ShaneLegg
AI Applications
- Andrej Karpathy enables fp8 training for GPT-2 reproduction, achieving 2.91 hours training time on 8XH100 for approximately $20, representing a 600X cost reduction over 7 years @karpathy
- Karpathy reflects on vibe coding one year anniversary, noting evolution from fun throwaway projects to agentic engineering as default workflow for professionals with oversight @karpathy
- Perplexity releases DRACO Benchmark for evaluating deep research agents across 100 tasks in 10 domains including Academic, Finance, Law, Medicine, and Technology @perplexity_ai
- Google introduces scientific citations in Gemini with proper APA-style inline citations and detailed reference sections for scientific prompts @joshwoodward
- Figma releases Vectorize feature that converts raster images into editable vectors with simplified and controlled color output @figma
- Granola releases MCP integration working with ChatGPT, Claude, and other tools for AI-powered meeting notes @meetgranola
- Windsurf introduces Tab v2, the world's first variable aggression Pareto Frontier Tab model, saving customers on average 54% more keystrokes @windsurf
- Cursor builds fast with their own AI tools and uses Linear to track work across teams and keep everyone aligned @linear
- Lenny Rachitsky demonstrates content becoming software with Cursor-enabled interactive blog posts @lennysan
- Tesla's VP of AI argues self-driving is not a sensor problem but an AI problem, stating cameras have enough information and it's about extracting it @SawyerMerritt
AI Research
- Stanford researchers develop QuantiPhy benchmark to evaluate and improve AI's ability to reason about physical properties, addressing current models' struggles with basic physics estimates @StanfordHAI
- MIT engineers design new tissue model that more accurately mimics liver architecture including blood vessels and immune cells for discovering MASLD treatments @MIT
- NVIDIA's Nemotron models win ViDoRe V3, with AI agents transforming PDFs and contracts into live insights for companies like EdisonSci, Docusign, and JusttFintech @NVIDIAAI
- Jim Fan's team trains robot foundation model on world model backbone enabling zero-shot, open-world prompting for new verbs, nouns, and environments, calling it DreamZero or World Action Model @DrJimFan
- Research shows model and data recipe co-evolution with World Action Models learning best from diverse data rather than repeated demos, with diversity outweighing repetitions @DrJimFan
- DreamZero demonstrates significant robot-to-robot and human-to-robot transfer, adapting quickly to new hardware with only 55 trajectories while retaining zero-shot prompting ability @DrJimFan
- Publishing work on AI faces challenges as publication process is much slower than working papers, with peer reviews asking authors to account for newer papers built on the paper under review @emollick
- Papers increasingly need to be built for easy updating as new models come out, with AI can't do task X papers needing to focus on trendlines rather than current capabilities @emollick
- Rubrics-as-rewards for RL shows most added technical complexity is related to reward modeling rather than RL itself, with new developments likely to come from advancing generative reward models @cwolferesearch
- EB-JEPA open-source library makes JEPAs accessible and trainable on a single GPU in hours, providing playground for learning latent representations across images, video, action-conditioned video, and planning @BasileTerv987
- GPT-5.2 Pro demonstrates strongest statistical reasoning in experience, with ability to spot issues in analysis that Opus 4.5 and