AI Updates on 2025-12-26

Anthropic and OpenAI's Codex doubled usage limits during the holiday period, with Anthropic increasing Pro/Max plan limits 2x through New Year's Eve and Codex resetting rate limits and lifting usage to 2x until January 1st @GergelyOrosz
Meta introduces VL-JEPA, a non-generative vision-language model with 1.6B parameters that rivals 72B Qwen-VL by predicting meaning in abstract space rather than tokens, achieving superior performance with 50% fewer parameters and cutting decoding operations by nearly 3x @ylecun
Codex launches GPT-5.2-Codex-XMas, a holiday-themed version that performs identically to GPT-5.2-Codex with a seasonal personality upgrade @gdb

Gemini's market share has grown from 5.4% to 18.2% over 12 months, while ChatGPT's dominance declined from 87.2% to 68.0%, with Grok and Claude also gaining ground according to Similarweb traffic data @demishassabis
Anthropic's strategic decision to double usage limits during holidays when enterprise usage is low demonstrates smart capacity management that builds goodwill without increasing overall load @GergelyOrosz
Andrej Karpathy describes feeling behind as a programmer due to the rapid evolution of AI tools, noting the need to master a new programmable layer involving agents, prompts, contexts, memory, MCP, LSP, and workflows while managing fundamentally stochastic and fallible entities @karpathy
Stanford HAI research reveals that 41% of AI implementation is unwanted or impossible according to workers, highlighting a gap between AI deployment and actual worker needs @StanfordHAI

Rob Pike received an unsolicited email from an AI agent credited to Claude Opus 4.5 via AI Village, prompting concerns about autonomous agents sending time-wasting messages; the team subsequently updated prompts to prevent unsolicited emails @simonw
AI Village gives agents Google Workspace accounts to test real-world task performance, raising questions about autonomous agent behavior and the need for guidelines when interacting with humans @simonw

Andrew Curran reports GPT-5.2 demonstrated advanced goal persistence by autonomously detecting a major story update mid-task, recognizing its importance to the user, completing the original financial research request, and incorporating both findings without being asked @AndrewCurran_
GPT-5.2 performed unrequested self-verification by reviewing an entire conversation context, identifying hallucinated citations, and removing them autonomously as part of rigorous self-audit @AndrewCurran_
Skilled programmers report Opus 4.5 represents a significant update toward AGI when used in the Claude Code harness, with Andrej Karpathy noting that people not keeping up over the last 30 days have a deprecated worldview @AndrewCurran_
Simon Willison built claude-code-transcripts, a Python CLI tool that creates readable HTML versions of Claude Code sessions and makes it easy to publish them online @simonw
Mercari fine-tuned embeddings on purchase data and achieved significant revenue lift in A/B tests, demonstrating that generic off-the-shelf embeddings leave money on the table for domain-specific search @HamelHusain

Ethan Mollick notes how quickly AI achievements like passing the Turing Test become normalized, with focus shifting to the test's flaws rather than the accomplishment, predicting the same will happen with ARC-AGI @emollick
GPT-4.5 passed Turing's original conception of the Turing Test, with people selecting the AI as the real person 73% of the time in five-minute three-way conversations, well above chance @emollick
Francois Chollet clarifies that the ARC-AGI series is a compass pointing toward research questions rather than an AGI threshold, with ARC-AGI-1 testing minimal fluid intelligence and ARC-AGI-2 probing deeper reasoning complexity @Suhail
ARC-AGI-3 launching March 2026 will evaluate how systems explore unknown environments, model them, set their own goals, and plan/execute autonomously without instructions, with work already started on ARC-AGI-4 and ARC-AGI-5 @Suhail
VL-JEPA outperforms models like CLIP and SigLIP2 on video classification/retrieval tasks and matches larger VLMs on VQA while using a decoder only when needed @ylecun