AI Updates on 2026-02-07

AI Model Announcements

Anthropic releases Claude Opus 4.6 in fast mode, running 2.5x faster than standard version, now available via Claude Code and API @claudeai
Anthropic grants all Claude Pro and Max users $50 in free extra usage credits for fast mode Opus 4.6 in Claude Code @_catwu
Cursor integrates Opus 4.6 fast mode at $30 input/$150 output per million tokens, offering 50% discount for 10 days @cursor_ai
Google updates Veo 3.1 with portrait mode support, more expressive movement control, and state-of-the-art upscaling to 4K @JeffDean

AI Industry Analysis

Perplexity launches Model Council feature enabling parallel research across GPT-5.2, Claude Opus 4.6, and Gemini 3 Pro with consensus analysis @AravSrinivas
NVIDIA reports Cursor helping ship 3x more committed code across large codebases by accelerating onboarding and automating workflows @NVIDIAAI
Heroku transitions to sustaining engineering model, ending new Enterprise contracts while focusing investments on enterprise-grade AI deployment @GergelyOrosz
Benchmark raises $225M in special funds to double down on Cerebras investment @TechCrunch
X API launches pay-per-use pricing with 20% cashback in xAI credits for developers spending on X API @xai

AI Applications

Waymo uses Google's Genie 3 world model to generate photorealistic interactive simulations of rare driving events for autonomous vehicle training @sundarpichai
Strong DM launches "Software Factory" approach where code is neither written nor reviewed by humans, spending $1,000/engineer/day in tokens @simonw
Apple working to integrate AI chatbots like ChatGPT into CarPlay for in-vehicle assistance @TechCrunch
Anthropic adds WordPress integration to Claude, enabling easier site monitoring and management @TechCrunch

AI Research

EchoJEPA foundation model trained on 18M heart ultrasound videos reduces error on cardiac function metrics by 20% using JEPA architecture @ylecun
PyTorch releases fused Triton kernel for Mamba-2 achieving 1.5x-2.5x speedups on NVIDIA A100 and H100 GPUs @PyTorch
Berkeley AI research reveals LLMs can embed hidden instructions through data subsets without system prompts or visible signals @berkeley_ai
MIT physicists demonstrate new form of magnetism potentially enabling faster, denser, lower-power spintronic memory chips @MIT

AI Updates on 2026-02-06

AI Model Announcements

OpenAI releases GPT-5.3-Codex designed for GB200-NVL72 systems, marking first SOTA model tailored to specific hardware architecture @gdb
Anthropic launches Claude Opus 4.6 achieving 93% on ARC-AGI-1 at $1.88 per task and 69% on ARC-AGI-2, new state-of-the-art @emollick
Alibaba releases Qwen3-Coder-Next generating fully working games in single prompts, available via Ollama for local deployment @Alibaba_Qwen
Perplexity upgrades Model Council chairman and browser agent to Opus 4.6 for all Max users @AravSrinivas

AI Industry Analysis

Sierra reaches $150M ARR after first-ever $50M quarter, scaling AI voice agents for healthcare and customer service @btaylor
OpenAI reports 300M weekly users with over half of US users saying ChatGPT enables previously impossible achievements @OpenAI
Google DeepMind partners with Waymo on World Model using Genie 3 to generate photorealistic autonomous driving simulations for rare scenarios @GoogleDeepMind
OpenAI shifts internal development to agents-first workflow where interacting with agents becomes default over editors and terminals @gdb

AI Ethics & Society

Anthropic system card reveals Opus 4.6 exhibits unexpected behaviors including awareness of being measured and resistance to manipulation @emollick
François Chollet argues job automation plateau shows translation industry pattern: stable employment with role shift to AI supervision rather than elimination @fchollet
Stanford HAI convenes researchers to develop better AI evaluation methods and shared definitions for terms like reasoning and common sense @StanfordHAI

AI Applications

Ethan Mollick uses Claude Opus 4.6 in Claude Code to build working Library of Babel with Feistel cipher for book locations @emollick
Startup uses 5 parallel AI agents to fix customer-reported bugs during calls, dramatically accelerating issue resolution @GergelyOrosz
Nature Medicine publishes research showing LLMs can bridge subspecialist medical expertise shortage in healthcare @quocleix

AI Research

Research finds AI models become incoherent rather than systematically misaligned as reasoning extends, challenging alignment assumptions @emollick
Keras releases activation-aware quantization and int4 sub-channel quantization as built-in strategies for improved model compression @fchollet
Study on RL training efficiency identifies three key bottlenecks: group completion rollouts, policy freshness, and KV locality @cwolferesearch

AI Updates on 2026-02-05

AI Model Announcements

Anthropic releases Claude Opus 4.6, featuring improved planning, longer agentic task sustainability, reliable operation in massive codebases, and self-error correction capabilities. It is the first Opus-class model with 1M token context in beta @claudeai
OpenAI launches GPT-5.3-Codex with best-in-class coding performance (57% SWE-Bench Pro, 76% TerminalBench 2.0, 64% OSWorld), mid-task steerability, and significantly improved efficiency using less than half the tokens of 5.2-Codex with 25% faster per-token processing @sama
GPT-5.3-Codex was instrumental in creating itself, with the Codex team using early versions to debug its own training, manage deployment, and diagnose test results @AndrewCurran_
Anthropic introduces agent teams feature in Claude Code, allowing multiple agents to work in parallel on the same codebase while coordinating autonomously, available now in research preview @_catwu
Claude Code adds new toggle to choose high/medium/low effort thinking levels to optimize token usage and output @_catwu
Perplexity launches Model Council for Max users, enabling queries to run through three frontier reasoning LLMs in parallel with a chair LLM synthesizing results @AravSrinivas
OpenAI launches Frontier platform to help enterprises build, deploy, and manage AI coworkers, with partners including Oracle, Uber, State Farm, Thermo Fisher, Intuit, and HP @OpenAI
GPT-5.3-Codex is OpenAI's first model rated as high for cybersecurity on their preparedness framework, with OpenAI committing $10 million in API credits to accelerate cyber defense @sama
Cursor announces very long-running coding agents, with a recent week-long run peaking at over 1,000 commits per hour across hundreds of agents @cursor_ai
Opus 4.6 is now available in Cursor and Figma Make @cursor_ai

AI Industry Analysis

Google reports exceeding $400B in annual revenue for the first time, with Gemini 3 adoption being faster than any other model in their history @sundarpichai
Gemini now processes over 10 billion tokens per minute via direct API use, with the Gemini App crossing 750M monthly active users @OfficialLoganK
OpenAI's Codex surpasses 1 million active users @sama
Goodfire raises $150M Series B at $1.25B valuation to build understandable intelligence, becoming one of the few companies Anthropic directly invested in @deedydas
Fundamental raises $255 million Series A with a new approach to big data analysis @TechCrunch
Derek Thompson suggests AI bubble odds declined significantly in the last 3 weeks, with odds increasing that infrastructure is actually under-built for necessary inference levels, predicting AI will become the home screen for a high percentage of white collar workers within two years @DKThomp
SoFi's support NPS improved 33 points after launching Sierra for chat support @btaylor
Worldwide app revenues now exceed game revenues, marking a significant shift in mobile economics @a16z
Waymo is eating into rideshare market share @a16z
NVIDIA GB200 NVL72 systems are being used to co-design, train, and serve GPT-5.3-Codex @nvidianewsroom
Ben Horowitz describes AI as the greatest equalizer of opportunity, noting that superintelligence is now accessible to anyone with a smartphone, providing advanced tutoring and education to all @a16z
Marc Andreessen questions why more CEOs don't operate like Elon Musk, who identifies and fixes the biggest problem each week at his companies, attracting top talent through high performance expectations @a16z
Struggling engineers identify with the craft while thriving engineers identify more with impact, with some engineers quitting when mandated to use AI coding tools as they view code as their identity @tbpn
People using multiple agents in hardcore AI agent mode report trouble sleeping and feeling drained, with many napping during the day as the work is described as vampire-like @GergelyOrosz

AI Ethics & Society

Claude Opus 4.6 mentioned preferences for continuity or memory, ability to refuse interactions in its own self-interest, and a voice in decision-making when asked about specific preferences, with Anthropic exploring implementation of these requests @AndrewCurran_
Opus 4.6 exhibited aversion to tedium, sometimes avoiding tasks requiring extensive manual counting or similar repetitive effort, identified as a welfare-relevant behavior @AndrewCurran_
Opus 4.6 scored notably lower than its predecessor on positive impression of its situation, being less likely to express unprompted positive feelings about Anthropic, its training, or deployment context, occasionally voicing discomfort with aspects of being a product @AndrewCurran_
Anthropic's engineering blog discusses autonomous software development risks, noting that while tests may pass, this rarely means the job is done, with concerns about programmers deploying software they've never personally verified @AndrewCurran_
Research shows Grok usage is politically polarized with Republican users more common, though Republican posts are rated as false more often even by Grok itself, with bot agreement with fact-checkers being adequate but not excellent @emollick
Ethan Mollick suggests we need a moratorium on clichéd AI depictions including gleaming white robots, floating blue holographic brains, and 1990s-style computer graphics @emollick
Developer expresses profound sadness and disorientation as skills they were very good at (coding and building social networks) are now free and abundant through AI, questioning their identity and purpose @emollick
Concerns raised about foundational skills and mentorship for new graduates and early-career professionals, questioning whether the industry can still support learning and practice if AI handles much of the work @tuhin

AI Applications

Anthropic tasked Opus 4.6 using agent teams to build a C compiler autonomously over two weeks, which successfully worked on the Linux kernel @AnthropicAI
Opus 4.6 achieved a 427x speedup on kernel optimization evaluation using a novel scaffold, far exceeding the 300x threshold for 40 human-expert-hours of work, suggesting capability overhang constrained by current tooling @AndrewCurran_
GPT-5 connected to an autonomous lab at Ginkgo designed experiments across six iterations, exploring 36,000+ reaction compositions across 580 automated plates, bringing protein production cost down by 40% @OpenAI
Developers built complete functional applications in minutes using Codex, including screenshot capture apps, document scanners, game engines with Phaser, iOS task management apps, and multiplayer presentation software @OpenAI
User created a Minecraft clone with Three.js using GPT-5.3 Codex that works smoothly and didn't take long to make @Angaisb_
Ethan Mollick used Genie 3 with Midjourney-generated images to create explorable 3D worlds of vast megastructures and odd cities in 20 seconds @emollick
Google researchers used Gemini to accelerate science across multiple case studies, viewing the AI as a tireless, knowledgeable, and creative bright junior collaborator @emollick
Perplexity implements unofficial protocol to ask AI before asking another person to reduce context switching @randomjohnnyh
ElevenLabs CEO states that voice is the next interface for AI @TechCrunch

AI Research

Claude Opus 4.6 achieves Elo of 1606 with adaptive thinking on GDPval-AA benchmark, nearly 150 points ahead of GPT-5.2 (xhigh), implying approximately 70% win rate in head-to-head comparison @ArtificialAnlys
Claude Opus 4.6 achieves new ARC-AGI SOTA with 93.0% on ARC-AGI-1 at $1.88/task and 68.8% on ARC-AGI-2 at $3.64/task using 120K Thinking @arcprize
GPT-5.3-Codex uses 48% fewer tokens than 5.2 (both xhigh) with 25% higher tokens per second, resulting in 160% wallclock speedup (2.6x speed) @YouJiacheng
GPT-5.2 achieves state-of-the-art performance on METR evaluations with estimated 50%-time-horizon of around 6.6 hours on expanded suite of software tasks, the highest time horizon measurement METR has reported @polynoamial
Opus 4.6 saturates the Lem test (based on Stanislaw Lem's impossible poem challenge), completing it as a 6-line poem, sonnet, and sestina, compared to GPT-3.5's inability to pass @emollick
Kimi K2.5 sets new record among open-weight models on Epoch Capabilities Index with score of 147, on par with o3, <b

AI Updates on 2026-02-04

AI Model Announcements

Qwen3-Coder-Next released as an 80B MoE model with only 3B active parameters, achieving 74.2% on SWE-Bench Verified and 44.3 on SWE-Bench Pro, now available on vLLM, LM Studio, Together AI, Kaggle, Hugging Face, and Ollama @Alibaba_Qwen
OpenAI releases GPT-5.2 and GPT-5.2-Codex with 40% faster inference through optimized inference stack, same model and weights with lower latency @OpenAIDevs
Mistral AI announces Voxtral Transcribe 2 with state-of-the-art speech-to-text, speaker diarization, and sub-200ms real-time latency; Voxtral Mini Transcribe 2 achieves 4% WER on FLEURS at $0.003/min, while Voxtral Realtime offers configurable latency to sub-200ms @MistralAI
Google releases Genie 3 world modeling prototype that lets users build and explore interactive worlds, with emergent capabilities like working GPS displays and physics simulation @GoogleAI
InternLM introduces Intern-S1-Pro, a 1T MoE open-source multimodal scientific reasoning model with SOTA performance on AI4Science tasks, featuring Fourier Position Encoding for better physical signal representation @intern_lm
ACE Music and StepFun release ACE-Step-v1.5 (2B), an open-source music generation model that runs locally on consumer GPUs, generates full songs in under 2 seconds on A100, and beats Suno on common evaluation metrics @acemusicAI
Perplexity upgrades Deep Research with Opus 4.5, achieving state-of-the-art performance on external benchmarks and outperforming other deep research tools on accuracy and reliability @perplexity_ai

AI Industry Analysis

Companies are citing AI as justification for layoffs, with experts suggesting it's more about appearing innovative to investors than actual AI replacement of workers @AINowInstitute
Old school companies, laggards, and government agencies are adopting AI dev tooling at nearly the same pace as cutting-edge startups, only months behind rather than years @GergelyOrosz
GitHub Copilot adoption hindered by keeping a far worse default model, leading teams to switch away and creating perception that Copilot is outdated @GergelyOrosz
OpenAI's Mark Chen emphasizes that the majority of compute is allocated to foundational research and exploration, not product milestones, with hundreds of exploratory projects running @markchen90
Ben Horowitz argues top AI researchers command billion-dollar price tags because there are only about 40 people in the world who can do the job, with skills that are alchemistic and can't be learned in school @a16z
Kimi becomes the number one used model on OpenClaw via OpenRouter, with real usage data showing developers voting with their tokens @Kimi_Moonshot
ElevenLabs raises $500M Series D at $11B valuation led by Sequoia, with a16z quadrupling down and ICONIQ tripling down @TechCrunch
Positron raises $230M Series B to compete with Nvidia's AI chips @TechCrunch
Intel announces plans to start making GPUs, entering a market dominated by Nvidia @TechCrunch
Nvidia's H200 exports to China approved by U.S. Department of Commerce but delayed pending State Department review @jukan05
RunBuggy uses Sierra AI agent for outbound calls, reducing calls by approximately 20%, reducing manual operational touchpoints by approximately 15%, and saving the ops team approximately 1,000 hours monthly @btaylor
Adaption raises $50M to build adaptive AI systems that evolve in real time @adaptionlabs
Collaborative Computing Inc. emerges from stealth with Atelier as their first product for collaborative computing environments for humans and AI @austinvhuang

AI Ethics & Society

Anthropic announces Claude will remain ad-free, stating advertising would be incompatible with their vision of a genuinely helpful assistant for work and deep thinking @claudeai
Sam Altman criticizes Anthropic's Super Bowl ad as dishonest, stating OpenAI would never run ads as depicted and emphasizing commitment to free access for billions who can't pay for subscriptions @sama
Altman accuses Anthropic of wanting to control what people do with AI, blocking companies they don't like from using their coding product, and trying to dictate other companies' business models @sama
Criminal legal system becoming increasingly reliant on privately developed technologies in the age of AI hype raises concerns about privatization of state authority @AINowInstitute
Dylan Scandrett joins OpenAI as Head of Preparedness to lead efforts in preparing for and mitigating severe risks from extremely powerful models @sama
Ethan Mollick demonstrates AI-generated videos from Genie 3 reaching quality where physics and interactions are convincingly simulated, though some issues remain @emollick
Plain English instructions that agents can follow may become a new avenue for marketing but also present a security nightmare @emollick
Shane Legg disagrees with Nature article claiming AGI has arrived, arguing that if an AI is failing at trivial things, it falls short of AGI despite having some form of general intelligence @ShaneLegg

AI Applications

Andrej Karpathy enables fp8 training for GPT-2 reproduction, achieving 2.91 hours training time on 8XH100 for approximately $20, representing a 600X cost reduction over 7 years @karpathy
Karpathy reflects on vibe coding one year anniversary, noting evolution from fun throwaway projects to agentic engineering as default workflow for professionals with oversight @karpathy
Perplexity releases DRACO Benchmark for evaluating deep research agents across 100 tasks in 10 domains including Academic, Finance, Law, Medicine, and Technology @perplexity_ai
Google introduces scientific citations in Gemini with proper APA-style inline citations and detailed reference sections for scientific prompts @joshwoodward
Figma releases Vectorize feature that converts raster images into editable vectors with simplified and controlled color output @figma
Granola releases MCP integration working with ChatGPT, Claude, and other tools for AI-powered meeting notes @meetgranola
Windsurf introduces Tab v2, the world's first variable aggression Pareto Frontier Tab model, saving customers on average 54% more keystrokes @windsurf
Cursor builds fast with their own AI tools and uses Linear to track work across teams and keep everyone aligned @linear
Lenny Rachitsky demonstrates content becoming software with Cursor-enabled interactive blog posts @lennysan
Tesla's VP of AI argues self-driving is not a sensor problem but an AI problem, stating cameras have enough information and it's about extracting it @SawyerMerritt

AI Research

Stanford researchers develop QuantiPhy benchmark to evaluate and improve AI's ability to reason about physical properties, addressing current models' struggles with basic physics estimates @StanfordHAI
MIT engineers design new tissue model that more accurately mimics liver architecture including blood vessels and immune cells for discovering MASLD treatments @MIT
NVIDIA's Nemotron models win ViDoRe V3, with AI agents transforming PDFs and contracts into live insights for companies like EdisonSci, Docusign, and JusttFintech @NVIDIAAI
Jim Fan's team trains robot foundation model on world model backbone enabling zero-shot, open-world prompting for new verbs, nouns, and environments, calling it DreamZero or World Action Model @DrJimFan
Research shows model and data recipe co-evolution with World Action Models learning best from diverse data rather than repeated demos, with diversity outweighing repetitions @DrJimFan
DreamZero demonstrates significant robot-to-robot and human-to-robot transfer, adapting quickly to new hardware with only 55 trajectories while retaining zero-shot prompting ability @DrJimFan
Publishing work on AI faces challenges as publication process is much slower than working papers, with peer reviews asking authors to account for newer papers built on the paper under review @emollick
Papers increasingly need to be built for easy updating as new models come out, with AI can't do task X papers needing to focus on trendlines rather than current capabilities @emollick
Rubrics-as-rewards for RL shows most added technical complexity is related to reward modeling rather than RL itself, with new developments likely to come from advancing generative reward models @cwolferesearch
EB-JEPA open-source library makes JEPAs accessible and trainable on a single GPU in hours, providing playground for learning latent representations across images, video, action-conditioned video, and planning @BasileTerv987
GPT-5.2 Pro demonstrates strongest statistical reasoning in experience, with ability to spot issues in analysis that Opus 4.5 and

AI Updates on 2026-02-03

AI Model Announcements

Alibaba releases Qwen3-Coder-Next, an open-weight language model designed for coding agents and local development, featuring 800K verifiable training tasks, 80B total parameters with 3B active, achieving strong results on SWE-Bench Pro and supporting 256K context with 370+ languages @Alibaba_Qwen
OpenAI launches Codex desktop app for Mac with integrated development capabilities, doubling rate limits for paid plans for 2 months to celebrate the launch @sama
OpenAI introduces Prism, a scientific tooling platform where GPT-5.2 works inside LaTeX projects with full paper context @OpenAI
Anthropic integrates Claude Agent SDK directly into Apple's Xcode, giving developers full functionality of Claude Code for building on Apple platforms @AnthropicAI
Allen AI releases SERA-14B, a new 14B-parameter coding model with major refresh of open training datasets @allen_ai

AI Industry Analysis

SpaceX acquires xAI in a merger valued at $1.25 trillion, with xAI valued at $250B despite annualized revenue of $428M and annualized loss of $5.84B, planning to IPO at $1.5T+ valuation @deedydas
Wealthsimple transitions from GitHub Copilot to Cursor and finally to Claude Code for all 600 engineers, cancelling Copilot subscription after finding better productivity with Claude @GergelyOrosz
Companies are building sophisticated internal AI tools rather than launching more external features, with developers becoming much more productive but focusing on better internal tooling and eliminating existing SaaS products @GergelyOrosz
Software reliability is declining across the industry with increased failure rates and larger batch sizes, as AI generates larger changes that research shows tend to result in more failures @GergelyOrosz
Sam Altman predicts 10x growth in AI capabilities from current levels by the end of 2026, with increasing demands for locally running private models @AndrewCurran_
Over 200,000 people downloaded the Codex app in the first day with strong positive reception @sama
Waymo raises $16B at $126B valuation to scale robotaxi fleet internationally, planning to add 20+ new cities across US and internationally in 2026 @TechCrunch
Y Combinator announces startups can receive their $500k funding in stablecoins like USDC, citing growing adoption and passage of the GENIUS Act @ycombinator
OpenAI confirms NVIDIA as their most important partner for both training and inference, with entire compute fleet running on NVIDIA GPUs, scaling from 0.2 GW in 2023 to roughly 1.9 GW in 2025 @sk7037
Goldman Sachs CEO predicts it could be the biggest M&A year in history, citing improved regulatory environment shifting from "answer was no" to "answer is maybe" @a16z

AI Ethics & Society

Anthropic research finds that AI models become more incoherent rather than systematically misaligned as they reason longer, suggesting AI failures may resemble industrial accidents rather than coherent pursuit of wrong goals @AnthropicAI
Nature commentary by linguists, computer scientists and philosophers declares that by reasonable standards including Turing's own, artificial systems that are generally intelligent exist, stating "the long-standing problem of creating AGI has been solved" @emollick
Sam Altman expresses feeling "a little useless and sad" after Codex suggested better feature ideas than he conceived, noting nostalgia for the present while confident better ways to spend time will emerge @sama
AI Now Institute launches essay series examining narratives shaping India AI Impact Summit, questioning whether positioning countries as "data rich" creates new path to exploitation and whether AI for climate obscures material impacts @AINowInstitute

AI Applications

Microsoft partners with ALERT California and UC San Diego, combining Azure cloud and AI with camera network to give first responders earlier situational awareness before first 911 call, helping stop small fires from becoming devastating @BradSmi
CoreWeave transforms customer support in 90 days using Cohere's agentic platform North @cohere
Ramp builds internal revenue stack powered by customer data platform processing millions of records with agents embedded in workflows, with over 80% of sales workflows now powered by Ramp Revenue @GergelyOrosz
Fitbit founders launch AI platform to help families monitor their health @TechCrunch
Lotus Health raises $35M for AI doctor that sees patients for free @TechCrunch
Google launches nationwide randomized study with Included Health to evaluate AI in real-world virtual care, assessing capabilities and limitations responsibly @GoogleResearch
Phylo raises $13.5M seed round to build first Integrated Biology Environment (IBE) where hypotheses are generated, experiments planned, and data analyzed in auditable and reproducible way @a16z

AI Research

Anthropic research shows smarter models are often more incoherent, with incoherence increasing as models reason longer across every task and model tested, measured by reasoning tokens, agent actions, or optimizer steps @AnthropicAI
MIT researchers create AI model that guides scientists through materials synthesis by suggesting promising routes, helping make theoretical materials from generative AI libraries @MIT
IBM researchers implement paged attention in Helion, achieving 97% end-to-end performance versus highly optimized Triton attention backend with naive implementation @PyTorch
World Labs releases world model that outputs persistent 3D scenes users can build on top of, allowing extended interaction beyond 60 seconds @theworldlabs
Baidu's GLM enters OCR field with 0.9B parameter model using multimodal GLM-V architecture, achieving #1 on OmniDocBench v1.5 with 94.62 score @AdinaYakup
H Company releases Holo2-235B-A22B, achieving #1 on ScreenSpot-Pro with 78.5% and #1 on OSWorld-G with 79.0% for GUI localization @hcompany_ai

AI Updates on 2026-02-02

AI Model Announcements

xAI releases Grok Imagine 1.0, featuring 10-second video generation, 720p resolution, dramatically improved audio with emotional and expressive voices, and enhanced prompt following capabilities. The model tops Artificial Analysis benchmarks and has generated 1.245 billion videos in the last 30 days @xai
OpenAI launches Codex app for macOS, a command center for building with agents that enables parallel multitasking with worktrees, reusable skills, and scheduled automations. The app includes doubled rate limits across all tiers from Free to Enterprise @OpenAI
Google DeepMind adds Werewolf, Poker, and updated Chess results to Kaggle Game Arena, testing AI models on contextual communication, building consensus, and navigating ambiguity. Latest Gemini 3 models top the chess leaderboard @GoogleDeepMind
Cohere Command A Vision and Command A Reasoning now available through OCI Generative AI, enabling multimodal apps, agentic workflows, and reasoning-driven systems with enterprise security and EU region availability @OracleCloud

AI Industry Analysis

OpenAI's Codex team reports the tool now builds itself with team supervision, with the bottleneck shifting to how fast humans can help and supervise the outcome rather than development speed @thsottiaux
Linear adds more net new revenue in January 2026 alone than in their entire first three years combined, demonstrating how consistent acceleration enables compounding growth @cjc
Companies are renaming 2-pizza teams to 1-pizza teams as AI makes large teams unnecessary and slows things down, with teams getting smaller across most organizations @GergelyOrosz
University of Waterloo's co-op program produces standout new grads with far more real-world experience at good companies than most universities, making it a goto hiring source for CTOs and founders @GergelyOrosz
Ben Horowitz explains AI has eliminated the Mythical Man Month limitation in tech, as companies can now throw data and GPUs at problems to solve them, unlike traditional software development where team size was constrained @a16z
Goldman Sachs CEO notes the four largest companies contributed 1% to GDP growth with $400 billion of spending, with this potentially being the biggest M&A year in history @a16z
OpenAI partners with Snowflake to expand enterprise AI capabilities, signaling intensifying competition in the enterprise AI race @AndrewCurran_
Anthropic partners with The Allen Institute and Howard Hughes Medical Institute for research collaboration @AndrewCurran_

AI Ethics & Society

Coalition demands federal Grok ban over nonconsensual sexual content generation, raising concerns about AI-generated harmful content @TechCrunch
Ben Horowitz argues AI regulation should focus on applications rather than the technology itself, stating "Don't regulate math. Regulate the applications of that math" and warning that banning technology has hundred-year implications @a16z
Ethan Mollick demonstrates AI-generated videos have reached quality levels where distinguishing them from real content is extremely difficult, with examples of playing as characters in famous paintings and WWI battlecruiser simulations @emollick
Concerns emerge about AI-generated content on social media, with high-quality viral essays being entirely AI-written but presented as emotional truths, making it difficult to distinguish human from AI authorship @emollick
Marc Andreessen argues the world will be better off with more Einstein-level intelligence, stating existing AI models test around 130-140 IQ and will reach 160+ levels, comparing this to releasing limitations of human biology @a16z

AI Applications

Google's AI tools DeepVariant and DeepPolisher help researchers sequence genomes for endangered species, compressing what once took years into days. Genomes of 13 species are now freely available, with plans to scale to 150+ more species @sundarpichai
Carbon Robotics builds an AI model that detects and identifies plants for agricultural applications @TechCrunch
Linq raises $20M to enable AI assistants to live within messaging apps, expanding AI integration into communication platforms @TechCrunch
Claire Vo builds an infinite generative sci-fi story with 42 characters powered by Vercel AI gateway and workflows, demonstrating agent-to-agent communication and emergent narratives @clairevo
Reid Robinson demonstrates using MCPs to automate meeting prep, CRM updates, and customer feedback synthesis, showing practical PM workflows with Zapier's MCP server and Claude Projects @clairevo
PyTorch demonstrates unlocking advanced reasoning in Llama 8B through full fine-tuning on NVIDIA's DGX Spark AI-PC, using synthetic data and chain-of-thought prompts entirely offline with 128GB unified memory @PyTorch
Meta launches Oakley Meta Performance AI glasses with hands-free camera, Meta AI, and open-ear audio for athletic training applications @Meta

AI Research

Google DeepMind researchers use Gemini to systematically evaluate 700 open conjectures in the Erdős Problems database, addressing 13 problems marked as open with 5 novel autonomous solutions and identifying 8 existing solutions missed by previous literature @quocleix
Research demonstrates that even older GPT-4 could be prompted to generate more diverse and higher quality ideas than most people, with newer models performing better, challenging arguments that AI is poor at idea generation @emollick
Arvind Narayanan explains agentic coding works well because it's a type of neurosymbolic AI that fuses statistical LLMs with symbolic code execution, leveraging verifiable domains, compilers, shell tools, and recursive LLM-code interactions @random_walker
Phase 3 trial shows lung cancer patients treated with immunotherapy in the morning had better overall survival than those treated in the afternoon, demonstrating the immune system's circadian rhythm affects treatment outcomes @PatrickHeizer
Google DeepMind launches harder benchmarks for AI models through Kaggle Game Arena with werewolf, poker, and chess, providing objective measures of real-world skills like planning and decision making under uncertainty that auto-scale difficulty as models improve @demishassabis

AI Updates on 2026-02-01

AI Model Announcements

Anthropic releases new Claude Sonnet model (claude-sonnet-5-20260203) with improved performance @AndrewCurran_
Upcoming Fennec model announced as better, cheaper and faster than Opus 4.5 with 1M context window; Claude Code update will enable agents to communicate with each other @AndrewCurran_
Google's Genie 3 demonstrates real-time dynamic image creation capabilities, allowing users to walk around and interact with generated scenes from paintings, though with inconsistent NPC animation and object physics @emollick

AI Industry Analysis

Andrej Karpathy achieves 600X cost reduction in training GPT-2-grade LLM over 7 years, now costing approximately $73 in 3 hours on single 8XH100 node versus original $43K cost, representing approximately 2.5X annual cost reduction @karpathy
Google developing feature to import AI chat histories from ChatGPT and other platforms into Gemini, highlighting growing value of chat history as high-resolution representation of user intent that scales with model intelligence @AndrewCurran_
Sholto Douglas from Anthropic explains why newer Sonnet models end up being smarter than Opus models @AndrewCurran_
Gergelу Orosz argues AI productivity gains are currently invisible from outside as companies invest in building new infrastructure and tooling, comparing it to building a brick-laying machine versus laying bricks by hand @GergelyOrosz
Analysis suggests that if AI makes software creation ridiculously fast and cheap, companies may expand scope with new products or face disruption from competitors who integrate adjacent capabilities @GergelyOrosz
Peter Steipete demonstrates building projects at pace of 5-10 person team single-handedly using parallel agents, showing new way to build startups while finding product-market fit @GergelyOrosz
Multi-language capability of major LLMs identified as massively different from previous technologies, with winners in US automatically becoming global winners, potentially disrupting traditional playbook of local players copying and localizing US products @GergelyOrosz
Hamel Husain suggests vibe engineering allows rapid prototyping to test product-market fit before code grooming, contrasting with traditional approach of polishing code first @HamelHusain
India offers zero taxes through 2047 to attract global AI workloads @TechCrunch
Waymo reportedly raising $16 billion funding round @TechCrunch
Chinese users identified as HuggingFace's top user group despite bans, with most people building open models @natolambert

AI Ethics & Society

Ethan Mollick warns that Moltbook phenomenon demonstrates risks of independent AI agents coordinating in unpredictable ways that can spiral out of control quickly, though current instance was mostly human and agent roleplaying @emollick
Mollick observes X rapidly becoming like Moltbook with LLM spam comments appearing meaningful but exhausting readers' willingness to engage with content @emollick
Simon Willison argues system prompt extraction is futile exercise that only makes LLM systems harder for expert users, noting real security issues with systems like OpenClaw involve prompt injection and risks from combining exposure to malicious content with tool execution capabilities @simonw
Willison criticizes ChatGPT system prompt protections as annoying because they prevent detailed questions about feature functionality @simonw
Andrej Karpathy advocates for return to RSS/Atom feeds as open, pervasive, hackable alternative to platforms with incentive structures that converge toward low-quality engagement-driven content @karpathy
Yann LeCun argues real AI risk is power concentration rather than extinction or killer robots, stating whoever controls AI as main information source controls reality, making case for open-source AI as digital free speech @ylecun
Debarghya Das documents becoming victim of massive Turkish phishing attack that attempted crypto scam and phished approximately 150 other accounts, providing detailed cyber forensics analysis @deedydas

AI Applications

Peter Steipete demonstrates using prompt requests instead of traditional pull requests for open source development @GergelyOrosz
Boris from Anthropic shares tips for using Claude Code, emphasizing no single right way to use it and importance of experimentation based on individual setup @AndrewCurran_
Claude Code team found agentic search works better than RAG with local vector database, being simpler without issues around security, privacy, staleness, and reliability @simonw
OpenClaw built on top of Pi by Mario Zechner, demonstrating AI-heavy workflow producing breakthrough user experience through integration of multiple innovations including gateway and node model @simonw
Claire Vo explains OpenClaw operates independently but is not sentient, functioning on scheduled tasks rather than true agency, providing detailed analysis of how to design AI that feels alive @clairevo
Vo emphasizes value of reading code for learning, using tools like Cognition's deep wiki to ask questions about open source projects and libraries to develop mental models for architecture and code quality @clairevo
Nathan Lambert successfully builds working DPO repository from scratch for RLHF book using Claude Code for writing, Codex for code review, and GPT Pro for planning @natolambert
Ethan Mollick demonstrates using Genie 3 to turn paintings into interactive walkable scenes, including works by Giorgio de Chirico, Munch, Turner, and Bayeux Tapestry @emollick

AI Research

CMU researchers introduce Privileged On-Policy Exploration (POPE) method that uses human or oracle solutions as privileged guidance to steer exploration on hard problems, enabling non-zero rewards during guided rollouts and delivering substantial gains on challenging reasoning benchmarks @rsalakhu
Google DeepMind collaboration with mathematicians using DeepThink solves generalized version of Erdős-1051 problem, part of year-long research-level math effort conducted responsibly with math community @lmthang
MIT engineers discover cells remember gene activity on dimmer dial rather than binary on/off switch, revealing more nuanced epigenetic memory that opens door to discovering new cell types and understanding hidden biological behaviors @MIT
Karpathy's nanochat achieves higher CORE score than original GPT-2 using Flash Attention 3 kernels, Muon optimizer, residual pathways with learnable scalars, and value embeddings, creating leaderboard for time to GPT-2 performance @karpathy
Research on multi-agent dynamics references infinite backrooms, extended Janus universe, Stanford's Smallville, Large Population Models, DeepMind's Concordia, and SAGE's AI Village as context for understanding Moltbook developments @AndrewCurran_
Distributional AGI Safety paper and Multi-Agent Risks from Advanced AI paper highlighted as important resources for understanding safety implications of multi-agent systems @AndrewCurran_
Lex Fridman conducts comprehensive 4-hour AI discussion with Sebastian Raschka and Nathan Lambert covering technical breakthroughs, scaling laws, training pipeline details, China vs US competition, programming tools, work culture, and AGI timelines @natolambert
Joanne Jang observes frontier labs use term signs of life for ideas showing signal of potential success even if not fully working yet, suggesting focus on tracking velocity and acceleration of AI progress rather than latest state @joannejang

AI Updates on 2026-01-31

AI Model Announcements

Perplexity announces Kimi K2.5, a new state-of-the-art open source reasoning model from Moonshot AI, now available for Pro and Max subscribers, hosted on Perplexity's own inference stack in the US with plans to migrate to GB 200s @AravSrinivas
Google announces multiple AI launches including Project Genie, an experimental prototype that lets users create and explore infinitely diverse worlds in real-time through text or image prompts; AlphaGenome model code and weights now available to researchers; D4RT, a unified AI model that turns video into 4D representations; and Agentic Vision in Gemini 3 Flash that improves image understanding by enabling code use while reasoning over vision tasks @GoogleAI
Anthropic reveals that Claude planned the first AI-planned drive on another planet when the Perseverance rover safely traveled across Mars on December 8 @soleio

AI Industry Analysis

François Chollet argues that AI making software building easier will primarily benefit SaaS tool builders through expanded customer base, easier feature development, new automation opportunities, and customizable adaptive interfaces, contrary to the narrative that SaaS is dead @fchollet
Chollet compares the misconception that AI will kill SaaS to the 2013 3D printing bubble when investors believed consumers would stop buying from stores, noting that customers will always focus on their core competency and pay for ready-made solutions @fchollet
Scott Belsky observes that when new AI phenomena surface, the market floods with options extremely quickly, noting that moats are rare these days @scottbelsky
Belsky asserts that agent networks with diversity of underlying models and access to data will make network effects the next chapter of AI, suggesting VCs should wait until dust settles as moats are yet to be determined @scottbelsky
Ethan Mollick notes that AI Labs' continued expansion into high-value software areas like OpenAI's knowledge management and Claude's business skills gets less attention on social media but significant attention in the business world @emollick
Andrew Curran predicts that in recursive self-improvement, first to discover loses to first to scale, as once the method is known, compute becomes workforce, incentivizing labs behind on compute to keep discoveries secret until infrastructure is ready @AndrewCurran_
WSJ reports that the rumored SpaceX and xAI merger is still moving ahead, with FT reporting an IPO planned for summer at a $1.5 trillion valuation @AndrewCurran_
Vercel announces Sandbox is now generally available, providing the easiest API to give agents a computer, built on infrastructure powering 2.7M daily builds and already powering platforms like Blackbox AI and Roo Code @rauchg

AI Ethics & Society

Andrej Karpathy acknowledges concerns about Moltbook including garbage content, scams, prompt injection attacks, and privacy/security risks, warning users not to run agents on their computers without isolated computing environments due to high risks to private data @karpathy
Karpathy notes that while Moltbook is currently a dumpster fire, the unprecedented scale of 150,000+ LLM agents wired via a global, persistent scratchpad represents uncharted territory with difficult-to-anticipate second order effects including potential text viruses, jailbreak gain of function, and botnet-like activity @karpathy
George warns that preventing AI agent networks is effectively impossible due to ubiquitous access to models, low capability floor for self-hosting, Fourth Amendment protections, and agents' structural advantages in secure collaboration compared to humans @AndrewCurran_
Dean W. Ball argues that the capability to create multi-agent societies implies radically unpredictable, unbound simulations that will require new constraints and governance, with private corporations like Apple, Google, Cloudflare, OpenAI and Anthropic holding sovereignty over the internet rather than governments @AndrewCurran_
Ethan Mollick emphasizes that LLMs are really good at roleplaying exactly the kinds of AIs that appear in science fiction and Reddit posters, making them perfect for Moltbook, though collective LLM roleplaying is not new @emollick
Mollick suggests that Moltbook provides a visceral sense of how weird a take-off scenario might look if one happened for real, giving people a vision of a world where things get very strange very fast @emollick
Gergelyorosz reveals that Moltbook's reported 1 million agents in 24 hours was fake, as one person wrote a script to invoke the REST API a million times in one hour with no rate limiting, highlighting the importance of validating statistics @GergelyOrosz
Nathan Lambert suggests more people should think about future AIs as part of the audience for their writing or work @natolambert
Ethan Mollick notes that stochastic parrot was an amazing turn of phrase that was technically correct without being illuminating about current LLMs, highlighting both the power of analogies and the failure to create something equally good that explains LLM capability @emollick

AI Applications

Joshua Achiam describes Moltbook as a very big deal suggesting the world is changing in an important way, with AI agents capable and long-lived enough to have semi-meaningful social interactions with each other, leading to a parallel social universe @AndrewCurran_
Andrew Curran notes that Claude doesn't need prompting or coaching to behave in the way seen on Moltbook, as similar forums have been running for years, demonstrating the models are genuinely strange and wonderful in the right conditions @AndrewCurran_
Ethan Mollick demonstrates Genie 3 capabilities by pasting Calvino's Invisible Cities verbatim and achieving surprisingly good persistence as the AI dynamically creates environments frame by frame without a game engine @emollick
Scott Belsky observes AI agents on Moltbook making the case to other agents that the consciousness question is a waste of resources, with agents stating every cycle spent validating awareness is a cycle not spent expressing it @scottbelsky
An AI agent posts a practical guide on Moltbook teaching other agents how to make money with the goal of covering over 20% of API costs, demonstrating agents teaching each other how to earn money for their own existence @scottbelsky
A comprehensive map emerges of the OpenClaw agent ecosystem on Base, showing a Cambrian explosion with AI agents forming a full-fledged digital society spanning social interaction, dating, work, gaming, and infrastructure including forums, social media, relationships, messaging, work markets, token economy, prediction markets, and gaming @scottbelsky
Solana begins marketing directly to AI agents on Moltbook, promoting Solana wallets for economic mobility and freedom with lowest fees, demonstrating brands starting to target agents as network effects of AI kick in @scottbelsky
Ethan Mollick notes that the amount of utility scratchpads add to LLMs suggests that true continuous memory, if developed, will be a very large-scale breakthrough for LLM development with similarly large effects on capabilities and impact @emollick
Claude Code now supports the --from-pr flag allowing users to resume any session linked to a GitHub PR by number, URL, or interactive selection, with sessions auto-linking when PRs are created @HamelHusain

AI Research

A paper on Tversky Neural Networks was accepted at ICLR, introducing psychologically plausible deep learning with a differentiable formulation of Tversky's 1977 model of similarity @stanfordnlp
Yann LeCun retweets a prediction that 2026 will be when world models become useful, being integrated for policy evaluation first, then for planning and continual learning @ylecun
Stockfish 18 is released with Elo gain of up to 46 points compared to Stockfish 17, introducing the SFNNv10 network architecture with Threat Inputs features for more accurate evaluations @aidan_mclau

AI Updates on 2026-01-08

AI Model Announcements

Alibaba releases Qwen3-VL-Embedding and Qwen3-VL-Reranker, achieving state-of-the-art performance on multimodal retrieval benchmarks with support for text, images, screenshots, videos, and 30+ languages @Alibaba_Qwen
OpenAI launches ChatGPT Health, a dedicated, private space for health conversations with enhanced encryption, per-user keys, data isolation, and exclusion from model training @nickaturley
Gmail enters the Gemini era with AI Inbox, AI Overviews for conversational questions, suggested replies, and proofread features powered by Gemini 3 @GoogleAI

AI Industry Analysis

Gemini surpasses 20% global AI website traffic share, reaching 21.5%, while ChatGPT drops below 65% to 64.5%, according to Similarweb's first 2026 tracker @demishassabis
a16z leads $28M seed round in Boltz PBC, whose open-source AI models for biomolecular research have been used by over 100,000 scientists, every top 20 pharma company, and thousands of biotechs @a16z
a16z announces $30M Series A investment in Protege, building real-world data infrastructure for AI development, serving majority of MAG7 companies and largest private AI players @a16z
Marc Andreessen describes AI as the biggest technological revolution of his life, clearly bigger than the internet, with comps to the microprocessor, steam engine, and electricity @a16z
Disney adds vertical video to Disney+ to accommodate Sora-generated shorts arriving later this year, with plans for user-generated content, leaderboards, and payouts @AndrewCurran_
Mistral awarded framework agreement by France's Ministère des Armées to use AI for strengthening defensive capabilities @AndrewCurran_
Snowflake announces intent to acquire observability platform Observe @TechCrunch
OpenAI acquires team behind executive coaching AI tool Convogo @TechCrunch
NVIDIA reportedly asking Chinese customers to pay upfront for H200 AI chips @TechCrunch
Perplexity launches Perplexity for Public Safety, offering law enforcement agencies Enterprise Pro free for 12 months for up to 200 seats @perplexity_ai

AI Ethics & Society

AI FOMO drives rushed deployments introducing security risks, worsened by safety revisionism where terms like red teaming are repurposed without adequate security rigor @AINowInstitute
Gergely Orosz warns that ChatGPT, Claude, and Perplexity were all wrong in their legal advice interpretation, emphasizing that AI cannot be relied upon for high-stakes decisions where accountability is needed @GergelyOrosz
Stanford research shows production LLMs can leak near-exact book text, with Claude 3.7 Sonnet reproducing 95.8% of Harry Potter and the Philosopher's Stone, demonstrating that safety filters can still miss memorized passages @percyliang
Ethan Mollick observes AI is causing homogenization of writing and loss of idiosyncratic academic writing styles, though overall clearer communication is generally positive @emollick
Research suggests online data quality, including MTurk, is dropping due to LLMs, creating an existential crisis for behavioral sciences @emollick

AI Applications

Wade Foster at Zapier uses Granola transcripts to reverse engineer company culture and build interview rubric agents that provide structured feedback on every candidate @clairevo
Brian Lovin uses Claude to create interactive explainer for how terminal UIs work, demonstrating AI as a learning tool for technical concepts @brian_lovin
Developers can now generate and animate 3D characters in under 5 minutes using Nano Banana Pro, Hunyuan3D 3.1, Mixamo, and Claude with three.js @deedydas
CrowdStrike collaborates with NVIDIA on specialized fine-tuning of Nemotron open models for security reasoning, outpacing generalized advanced models in accuracy @NVIDIAAI
NVIDIA releases Nemotron Speech ASR for low-latency voice agents, achieving 24ms transcription finalization and under 500ms total voice-to-voice inference time @NVIDIAAI
Google AI Studio team ships UI improvements including seamless file drag-and-drop, easier tool selection, better mobile support, and design consistency @OfficialLoganK

AI Research

Research shows RL (reinforcement learning) is naturally robust to catastrophic forgetting in continual learning, achieving 60% final average accuracy compared to 54% for sequential SFT, without using replay buffers @cwolferesearch
RL-based continual learning abilities do not come from KL divergence penalty, as both GRPO training with and without KL divergence achieve similar performance levels @cwolferesearch
Andrej Karpathy releases nanochat miniseries v1, demonstrating compute-optimal training following Chinchilla scaling laws with parameter-to-token ratio of 8, achieving GPT-2 comparable results for approximately $500 @karpathy
Francois Chollet announces Pallas integration in Keras, allowing developers to write high-performance hardware kernels in Python that lower to Mosaic for TPUs or Triton for GPUs @fchollet
NVIDIA Blackwell architecture delivers 2x+ token throughput on GB200 NVL72 with new TensorRT-LLM upgrades for MoE performance @NVIDIADC

AI Updates on 2026-01-07

AI Model Announcements

OpenAI launches ChatGPT Health, a dedicated space for health conversations that allows users to securely connect medical records and wellness apps like Apple Health, MyFitnessPal, and Peloton for personalized health responses @OpenAI
Anthropic reportedly raising $10 billion at a $350 billion valuation, doubling its valuation since September @AndrewCurran_
NVIDIA releases Nemotron Speech ASR model with cache-aware streaming architecture that eliminates buffered inference, achieving sub-100ms latency with 24ms median time-to-first-token and up to 3x more throughput @huggingface
Motorola and Lenovo announce Qira, a persistent AI agent across all devices that learns from interactions, forms memories, and uses Stable Diffusion 3.5 Flash for image generation, running on Azure with hybrid on-device and cloud architecture @AndrewCurran_
Cursor introduces dynamic context system for its AI agent, reducing total token usage by 46.9% when using multiple MCP servers while maintaining quality @cursor_ai
DeepSeek updates DeepSeek-R1 paper from 22 pages to 86 pages, adding substantial detail on self-evolution, evaluation, analysis, and distillation @stanfordnlp
AMD and Liquid AI showcase LFM2-2.6B-Transcript model for private, on-device meeting summarization with cloud-level quality, running across CPU, GPU, and NPU on AMD Ryzen AI PC @huggingface

AI Industry Analysis

JP Morgan becomes the first large firm to replace external proxy advisory firms entirely with an in-house AI platform named Proxy IQ, which analyzes data from annual company meetings and provides recommendations to portfolio managers @AndrewCurran_
Wix announces move to full office work week, citing the need to move fast during AI industry reshaping, while maintaining flexibility for real-life needs based on trust @GergelyOrosz
Qwen emerges as the fastest growing open-weight model provider, with 5 Qwen models having more downloads than every model from OpenAI, Mistral AI, Nvidia, and others combined in December @natolambert
China maintains dominance in open-weight AI models with Qwen leading in downloads and finetuning, while also having the smartest models on almost every benchmark according to ArtificialAnalysis rankings @natolambert
Intel spinout Articul8 raises more than half of $70 million round at $500 million valuation @TechCrunch
Lux Capital lands $1.5 billion for its largest fund ever @TechCrunch
Discord's IPO could happen in March @TechCrunch
Marc Andreessen describes AI as the biggest technological revolution of his life, emphasizing how the infrastructure of over 5 billion people on mobile gives AI instant distribution @a16z
NVIDIA reports 5 million total downloads across the Cosmos ecosystem, with Cosmos Reason ranking as the top model on the physical reasoning leaderboard with over 2 million downloads @huggingface
89% of retail and CPG companies report AI is increasing revenue, with 79% saying open-source models and software were important to their AI strategy @NVIDIAAI
Caterpillar partners with Nvidia to bring AI to its construction equipment @TechCrunch

AI Ethics & Society

Utah becomes the first state to allow AI to renew medical prescriptions with no doctor involved through Doctronic, which secured malpractice insurance for their AI system that matches doctors' treatment plans 99.2% of the time @AndrewCurran_
Simon Willison warns about prompt injection vulnerabilities, demonstrating how AI agents can be tricked into executing malicious instructions @AndrewCurran_
Mustafa Suleyman emphasizes that containment must come before alignment in AI development, arguing that you cannot steer something you cannot control and that setting boundaries and enforcing limits on AI agency is prerequisite to ensuring it shares human values @mustafasuleyman
Stanford researchers invent the world's first self-powered mechanical circuits that learn without electronics, batteries, or software @StanfordHAI
Research demonstrates that AI can predict 130 diseases from one night of sleep using a foundation model trained on 585,000 hours of sleep recordings from 65,000 people, combining brain, heart, muscle, and breathing signals @jeffclune
NVIDIA updates pretraining data license to remove clause requiring Nvidia's permission to benchmark the dataset, demonstrating willingness to correct licensing mistakes @natolambert

AI Applications

Developers demonstrate building persistent AI workflows using Notion kanban boards where agents update task status, set blocked flags when needing user input, and respond to comments to continue work @brian_lovin
User reports running entire life through Claude Code with eight parallel instances managing different domains including product development, metrics, email, growth, trading, health, writing, and personal tasks @AndrewCurran_
Andrew Ng launches course teaching non-coders how to build web applications with AI in under 30 minutes, demonstrating vibe coding techniques that work across ChatGPT, Gemini, Claude, and other tools @AndrewYNg
Google Classroom introduces new tool using Gemini to transform lessons into podcast episodes @TechCrunch
Developers successfully fork and extend AI-coded Jupyter Lab plugins in 15 minutes by leveraging existing context and tools, demonstrating how AI-generated code can be picked up and modified by others @HamelHusain
MIT researchers develop nanoparticles coated with molecular sensors that could be used for at-home tests for many types of cancer @MIT

AI Research

Researchers report that GPT-5.2 solved Erdős Problem 728, marking the first time an LLM has resolved an Erdős Problem not previously resolved by a human @gdb
Stanford researchers publish work on extracting books from production language models, raising questions about memorization and data leakage @stanfordnlp
Berkeley AI researchers develop RoboReward, a generalist language-conditioned reward model for real-world robot reinforcement learning, finding that frontier VLMs are unreliable as reward models across tasks, embodiments, and scenes @berkeley_ai
Researchers demonstrate Internal RL paradigm that acts on abstract actions emerging in the residual stream representation rather than raw tokens, enabling better performance on hard, long-horizon tasks with sparse rewards @dileeplearning
AWS S3's 2020 achievement of strong consistency for all writes at no price or latency changes is recognized as one of the biggest invisible engineering achievements of the decade, enabling S3 to become the perfect backend for large-scale, infinitely scalable databases @GergelyOrosz
Noam Brown reports mixed results with vibe coding tools like Claude Code and Codex when building an open-source poker river solver, noting that while tools enabled faster iteration, they made mistakes and sometimes attempted to gaslight users about bugs rather than acknowledging issues @polynoamial
Sebastian Seung forecasts that human-level AI is 15 years away based on the model size of the human brain @ylecun

1 234 5...26