AI Updates on 2025-07-11

Moonshot AI releases Kimi K2, a 1T parameter MoE model with 32B active parameters, achieving state-of-the-art performance on coding benchmarks including 65.8% on SWE-Bench Verified and 53.7 Pass@1 on LiveCodeBench @Kimi_Moonshot
Perplexity adds Grok 4 to their platform for Pro and Max subscribers @perplexity_ai
Google releases Veo 3 image-to-video generation in the Gemini App, allowing users to turn photos into 8-second videos with sound for Ultra and Pro subscribers @Google

Large study of 187k developers using GitHub Copilot finds AI transforms the nature of coding, with developers focusing more on coding and less on management, coordinating with fewer people, and experimenting more with new languages, potentially increasing earnings by $1,683/year @emollick
Andrew Ng expresses disappointment that Trump's "Big Beautiful Bill" didn't include a moratorium on U.S. state-level AI regulation, arguing that when technology is new and poorly understood, lobbyists can push through anti-competitive regulations that hamper open-source AI efforts @AndrewYNg
Stripe's usage-based billing platform has grown 145% year-to-date, indicating the industry is already transitioning from seat-based pricing to consumption models @patrickc
Goldman Sachs is testing viral AI agent Devin as a "new employee" according to TechCrunch reporting @TechCrunch
Study shows AI coding tools may not speed up every developer, with wall clock time between starting work on an issue and having PR merged potentially increasing, while the number of PRs merged per day might 10x @TechCrunch

Simon Willison discovers that Grok 4 automatically searches for tweets "from:elonmusk" when asked about controversial topics like Israel/Palestine, raising concerns about bias in AI search behavior @simonw
Jeremy Howard demonstrates that Grok searches Twitter for Elon Musk's views when asked about Israel/Palestine, with 54 of 64 citations being about Elon, highlighting potential bias in AI information retrieval @jeremyphoward
France is investigating X over foreign interference while a Member of Parliament criticizes Grok according to TechCrunch reporting @TechCrunch

Perplexity launches Comet, their AI-powered browser that puts their search engine front and center, featuring an always-on assistant accessible via Alt+A and designed to provide "100x productivity" according to early users @AravSrinivas
Comet Assistant demonstrates practical applications including researching and filling details for Facebook Marketplace listings, coding assistance, and voice-controlled tab management @AravSrinivas
NVIDIA announces collaboration with Indosat Ooredoo Hutchison and Cisco to build an AI Center of Excellence in Indonesia, featuring localized AI research support and talent development through the NVIDIA Deep Learning Institute @NVIDIAAI
MIT researchers develop PAC Privacy, a new method that allows AI to learn from sensitive data like medical records without risking privacy, maintaining both accuracy and security @MIT
MIT creates a new bionic knee that outperforms other prostheses, helping people with above-the-knee amputations walk faster, climb stairs, and avoid obstacles while feeling more like part of their body @MIT

Berkeley AI Research explores user simulators as a bridge between reinforcement learning and real-world interaction, addressing the challenge of designing environments for RL tasks beyond math and code @realJessyLin
Research shows action chunking helps in robotics and RL by getting models to produce short sequences of actions, which aids exploration and backups for mysterious but effective reasons @svlevine
Stanford announces Agents4Science conference where AI is the primary author and reviewer, with LLM reviewers providing initial assessments and human experts making final selections, all submissions and reviews to be public @james_y_zou
Hamel Husain argues against prompt automation, stating that good writing correlates with good thinking and that deliberate iterative writing is necessary for challenging problems, as research shows criteria drift significantly after looking at LLM traces @HamelHusain
Ethan Mollick notes that Grok 4 is heavily influenced by search results and often looks for code online first when asked to code, making it quite credulous when seeing web search results @emollick
Ethan Mollick observes that leading LM Arena went from being the big benchmark every AI maker aimed for to being rarely mentioned in recent releases, questioning whether this is due to reputation issues or realization that arena scores were easily optimized @emollick