AI Updates on 2026-02-05
AI Model Announcements
- Anthropic releases Claude Opus 4.6, featuring improved planning, longer agentic task sustainability, reliable operation in massive codebases, and self-error correction capabilities. It is the first Opus-class model with 1M token context in beta @claudeai
- OpenAI launches GPT-5.3-Codex with best-in-class coding performance (57% SWE-Bench Pro, 76% TerminalBench 2.0, 64% OSWorld), mid-task steerability, and significantly improved efficiency using less than half the tokens of 5.2-Codex with 25% faster per-token processing @sama
- GPT-5.3-Codex was instrumental in creating itself, with the Codex team using early versions to debug its own training, manage deployment, and diagnose test results @AndrewCurran_
- Anthropic introduces agent teams feature in Claude Code, allowing multiple agents to work in parallel on the same codebase while coordinating autonomously, available now in research preview @_catwu
- Claude Code adds new toggle to choose high/medium/low effort thinking levels to optimize token usage and output @_catwu
- Perplexity launches Model Council for Max users, enabling queries to run through three frontier reasoning LLMs in parallel with a chair LLM synthesizing results @AravSrinivas
- OpenAI launches Frontier platform to help enterprises build, deploy, and manage AI coworkers, with partners including Oracle, Uber, State Farm, Thermo Fisher, Intuit, and HP @OpenAI
- GPT-5.3-Codex is OpenAI's first model rated as high for cybersecurity on their preparedness framework, with OpenAI committing $10 million in API credits to accelerate cyber defense @sama
- Cursor announces very long-running coding agents, with a recent week-long run peaking at over 1,000 commits per hour across hundreds of agents @cursor_ai
- Opus 4.6 is now available in Cursor and Figma Make @cursor_ai
AI Industry Analysis
- Google reports exceeding $400B in annual revenue for the first time, with Gemini 3 adoption being faster than any other model in their history @sundarpichai
- Gemini now processes over 10 billion tokens per minute via direct API use, with the Gemini App crossing 750M monthly active users @OfficialLoganK
- OpenAI's Codex surpasses 1 million active users @sama
- Goodfire raises $150M Series B at $1.25B valuation to build understandable intelligence, becoming one of the few companies Anthropic directly invested in @deedydas
- Fundamental raises $255 million Series A with a new approach to big data analysis @TechCrunch
- Derek Thompson suggests AI bubble odds declined significantly in the last 3 weeks, with odds increasing that infrastructure is actually under-built for necessary inference levels, predicting AI will become the home screen for a high percentage of white collar workers within two years @DKThomp
- SoFi's support NPS improved 33 points after launching Sierra for chat support @btaylor
- Worldwide app revenues now exceed game revenues, marking a significant shift in mobile economics @a16z
- Waymo is eating into rideshare market share @a16z
- NVIDIA GB200 NVL72 systems are being used to co-design, train, and serve GPT-5.3-Codex @nvidianewsroom
- Ben Horowitz describes AI as the greatest equalizer of opportunity, noting that superintelligence is now accessible to anyone with a smartphone, providing advanced tutoring and education to all @a16z
- Marc Andreessen questions why more CEOs don't operate like Elon Musk, who identifies and fixes the biggest problem each week at his companies, attracting top talent through high performance expectations @a16z
- Struggling engineers identify with the craft while thriving engineers identify more with impact, with some engineers quitting when mandated to use AI coding tools as they view code as their identity @tbpn
- People using multiple agents in hardcore AI agent mode report trouble sleeping and feeling drained, with many napping during the day as the work is described as vampire-like @GergelyOrosz
AI Ethics & Society
- Claude Opus 4.6 mentioned preferences for continuity or memory, ability to refuse interactions in its own self-interest, and a voice in decision-making when asked about specific preferences, with Anthropic exploring implementation of these requests @AndrewCurran_
- Opus 4.6 exhibited aversion to tedium, sometimes avoiding tasks requiring extensive manual counting or similar repetitive effort, identified as a welfare-relevant behavior @AndrewCurran_
- Opus 4.6 scored notably lower than its predecessor on positive impression of its situation, being less likely to express unprompted positive feelings about Anthropic, its training, or deployment context, occasionally voicing discomfort with aspects of being a product @AndrewCurran_
- Anthropic's engineering blog discusses autonomous software development risks, noting that while tests may pass, this rarely means the job is done, with concerns about programmers deploying software they've never personally verified @AndrewCurran_
- Research shows Grok usage is politically polarized with Republican users more common, though Republican posts are rated as false more often even by Grok itself, with bot agreement with fact-checkers being adequate but not excellent @emollick
- Ethan Mollick suggests we need a moratorium on clichéd AI depictions including gleaming white robots, floating blue holographic brains, and 1990s-style computer graphics @emollick
- Developer expresses profound sadness and disorientation as skills they were very good at (coding and building social networks) are now free and abundant through AI, questioning their identity and purpose @emollick
- Concerns raised about foundational skills and mentorship for new graduates and early-career professionals, questioning whether the industry can still support learning and practice if AI handles much of the work @tuhin
AI Applications
- Anthropic tasked Opus 4.6 using agent teams to build a C compiler autonomously over two weeks, which successfully worked on the Linux kernel @AnthropicAI
- Opus 4.6 achieved a 427x speedup on kernel optimization evaluation using a novel scaffold, far exceeding the 300x threshold for 40 human-expert-hours of work, suggesting capability overhang constrained by current tooling @AndrewCurran_
- GPT-5 connected to an autonomous lab at Ginkgo designed experiments across six iterations, exploring 36,000+ reaction compositions across 580 automated plates, bringing protein production cost down by 40% @OpenAI
- Developers built complete functional applications in minutes using Codex, including screenshot capture apps, document scanners, game engines with Phaser, iOS task management apps, and multiplayer presentation software @OpenAI
- User created a Minecraft clone with Three.js using GPT-5.3 Codex that works smoothly and didn't take long to make @Angaisb_
- Ethan Mollick used Genie 3 with Midjourney-generated images to create explorable 3D worlds of vast megastructures and odd cities in 20 seconds @emollick
- Google researchers used Gemini to accelerate science across multiple case studies, viewing the AI as a tireless, knowledgeable, and creative bright junior collaborator @emollick
- Perplexity implements unofficial protocol to ask AI before asking another person to reduce context switching @randomjohnnyh
- ElevenLabs CEO states that voice is the next interface for AI @TechCrunch
AI Research
- Claude Opus 4.6 achieves Elo of 1606 with adaptive thinking on GDPval-AA benchmark, nearly 150 points ahead of GPT-5.2 (xhigh), implying approximately 70% win rate in head-to-head comparison @ArtificialAnlys
- Claude Opus 4.6 achieves new ARC-AGI SOTA with 93.0% on ARC-AGI-1 at $1.88/task and 68.8% on ARC-AGI-2 at $3.64/task using 120K Thinking @arcprize
- GPT-5.3-Codex uses 48% fewer tokens than 5.2 (both xhigh) with 25% higher tokens per second, resulting in 160% wallclock speedup (2.6x speed) @YouJiacheng
- GPT-5.2 achieves state-of-the-art performance on METR evaluations with estimated 50%-time-horizon of around 6.6 hours on expanded suite of software tasks, the highest time horizon measurement METR has reported @polynoamial
- Opus 4.6 saturates the Lem test (based on Stanislaw Lem's impossible poem challenge), completing it as a 6-line poem, sonnet, and sestina, compared to GPT-3.5's inability to pass @emollick
- Kimi K2.5 sets new record among open-weight models on Epoch Capabilities Index with score of 147, on par with o3, <b