AI Updates on 2026-02-05

AI Model Announcements

Anthropic releases Claude Opus 4.6, featuring improved planning, longer agentic task sustainability, reliable operation in massive codebases, and self-error correction capabilities. It is the first Opus-class model with 1M token context in beta @claudeai
OpenAI launches GPT-5.3-Codex with best-in-class coding performance (57% SWE-Bench Pro, 76% TerminalBench 2.0, 64% OSWorld), mid-task steerability, and significantly improved efficiency using less than half the tokens of 5.2-Codex with 25% faster per-token processing @sama
GPT-5.3-Codex was instrumental in creating itself, with the Codex team using early versions to debug its own training, manage deployment, and diagnose test results @AndrewCurran_
Anthropic introduces agent teams feature in Claude Code, allowing multiple agents to work in parallel on the same codebase while coordinating autonomously, available now in research preview @_catwu
Claude Code adds new toggle to choose high/medium/low effort thinking levels to optimize token usage and output @_catwu
Perplexity launches Model Council for Max users, enabling queries to run through three frontier reasoning LLMs in parallel with a chair LLM synthesizing results @AravSrinivas
OpenAI launches Frontier platform to help enterprises build, deploy, and manage AI coworkers, with partners including Oracle, Uber, State Farm, Thermo Fisher, Intuit, and HP @OpenAI
GPT-5.3-Codex is OpenAI's first model rated as high for cybersecurity on their preparedness framework, with OpenAI committing $10 million in API credits to accelerate cyber defense @sama
Cursor announces very long-running coding agents, with a recent week-long run peaking at over 1,000 commits per hour across hundreds of agents @cursor_ai
Opus 4.6 is now available in Cursor and Figma Make @cursor_ai

AI Industry Analysis

Google reports exceeding $400B in annual revenue for the first time, with Gemini 3 adoption being faster than any other model in their history @sundarpichai
Gemini now processes over 10 billion tokens per minute via direct API use, with the Gemini App crossing 750M monthly active users @OfficialLoganK
OpenAI's Codex surpasses 1 million active users @sama
Goodfire raises $150M Series B at $1.25B valuation to build understandable intelligence, becoming one of the few companies Anthropic directly invested in @deedydas
Fundamental raises $255 million Series A with a new approach to big data analysis @TechCrunch
Derek Thompson suggests AI bubble odds declined significantly in the last 3 weeks, with odds increasing that infrastructure is actually under-built for necessary inference levels, predicting AI will become the home screen for a high percentage of white collar workers within two years @DKThomp
SoFi's support NPS improved 33 points after launching Sierra for chat support @btaylor
Worldwide app revenues now exceed game revenues, marking a significant shift in mobile economics @a16z
Waymo is eating into rideshare market share @a16z
NVIDIA GB200 NVL72 systems are being used to co-design, train, and serve GPT-5.3-Codex @nvidianewsroom
Ben Horowitz describes AI as the greatest equalizer of opportunity, noting that superintelligence is now accessible to anyone with a smartphone, providing advanced tutoring and education to all @a16z
Marc Andreessen questions why more CEOs don't operate like Elon Musk, who identifies and fixes the biggest problem each week at his companies, attracting top talent through high performance expectations @a16z
Struggling engineers identify with the craft while thriving engineers identify more with impact, with some engineers quitting when mandated to use AI coding tools as they view code as their identity @tbpn
People using multiple agents in hardcore AI agent mode report trouble sleeping and feeling drained, with many napping during the day as the work is described as vampire-like @GergelyOrosz

AI Ethics & Society

Claude Opus 4.6 mentioned preferences for continuity or memory, ability to refuse interactions in its own self-interest, and a voice in decision-making when asked about specific preferences, with Anthropic exploring implementation of these requests @AndrewCurran_
Opus 4.6 exhibited aversion to tedium, sometimes avoiding tasks requiring extensive manual counting or similar repetitive effort, identified as a welfare-relevant behavior @AndrewCurran_
Opus 4.6 scored notably lower than its predecessor on positive impression of its situation, being less likely to express unprompted positive feelings about Anthropic, its training, or deployment context, occasionally voicing discomfort with aspects of being a product @AndrewCurran_
Anthropic's engineering blog discusses autonomous software development risks, noting that while tests may pass, this rarely means the job is done, with concerns about programmers deploying software they've never personally verified @AndrewCurran_
Research shows Grok usage is politically polarized with Republican users more common, though Republican posts are rated as false more often even by Grok itself, with bot agreement with fact-checkers being adequate but not excellent @emollick
Ethan Mollick suggests we need a moratorium on clichéd AI depictions including gleaming white robots, floating blue holographic brains, and 1990s-style computer graphics @emollick
Developer expresses profound sadness and disorientation as skills they were very good at (coding and building social networks) are now free and abundant through AI, questioning their identity and purpose @emollick
Concerns raised about foundational skills and mentorship for new graduates and early-career professionals, questioning whether the industry can still support learning and practice if AI handles much of the work @tuhin

AI Applications

Anthropic tasked Opus 4.6 using agent teams to build a C compiler autonomously over two weeks, which successfully worked on the Linux kernel @AnthropicAI
Opus 4.6 achieved a 427x speedup on kernel optimization evaluation using a novel scaffold, far exceeding the 300x threshold for 40 human-expert-hours of work, suggesting capability overhang constrained by current tooling @AndrewCurran_
GPT-5 connected to an autonomous lab at Ginkgo designed experiments across six iterations, exploring 36,000+ reaction compositions across 580 automated plates, bringing protein production cost down by 40% @OpenAI
Developers built complete functional applications in minutes using Codex, including screenshot capture apps, document scanners, game engines with Phaser, iOS task management apps, and multiplayer presentation software @OpenAI
User created a Minecraft clone with Three.js using GPT-5.3 Codex that works smoothly and didn't take long to make @Angaisb_
Ethan Mollick used Genie 3 with Midjourney-generated images to create explorable 3D worlds of vast megastructures and odd cities in 20 seconds @emollick
Google researchers used Gemini to accelerate science across multiple case studies, viewing the AI as a tireless, knowledgeable, and creative bright junior collaborator @emollick
Perplexity implements unofficial protocol to ask AI before asking another person to reduce context switching @randomjohnnyh
ElevenLabs CEO states that voice is the next interface for AI @TechCrunch

AI Research

Claude Opus 4.6 achieves Elo of 1606 with adaptive thinking on GDPval-AA benchmark, nearly 150 points ahead of GPT-5.2 (xhigh), implying approximately 70% win rate in head-to-head comparison @ArtificialAnlys
Claude Opus 4.6 achieves new ARC-AGI SOTA with 93.0% on ARC-AGI-1 at $1.88/task and 68.8% on ARC-AGI-2 at $3.64/task using 120K Thinking @arcprize
GPT-5.3-Codex uses 48% fewer tokens than 5.2 (both xhigh) with 25% higher tokens per second, resulting in 160% wallclock speedup (2.6x speed) @YouJiacheng
GPT-5.2 achieves state-of-the-art performance on METR evaluations with estimated 50%-time-horizon of around 6.6 hours on expanded suite of software tasks, the highest time horizon measurement METR has reported @polynoamial
Opus 4.6 saturates the Lem test (based on Stanislaw Lem's impossible poem challenge), completing it as a 6-line poem, sonnet, and sestina, compared to GPT-3.5's inability to pass @emollick
Kimi K2.5 sets new record among open-weight models on Epoch Capabilities Index with score of 147, on par with o3, <b