Code Review in the Age of Agents

In World War I, new technology like the machine gun, barbed wire, and artillery suddenly shifted the balance from offense to defense. Cavalry charges were out — trenches and machine guns were in. The same epochal shift is happening right now in software engineering, but on the side of offense. AI coding agents (Claude Code, Codex, Cursor) have made writing code dramatically faster. But reviewing code – defense – is still happening at human speed. The bottleneck has flipped. ...

March 4, 2026 · Jim Robinson-Bohnslav

Vibe checking GPT-OSS with vLLM, Modal, and Textual

OpenAI’s just open-sourced their first model since GPT2. I wanted to try out a few prompts, check the vibes, and see OpenAI’s raw reasoning traces for the first time. Alas, gpt-oss.com was down for launch day and most inference providers weren’t up yet. My old code didn’t use the Responses API, so I couldn’t see the reasoning trace or change the reasoning effort. So, I basically mashed up five blog posts into a Modal vLLM server and a Textual Python client where we can chat with GPT-OSS-120b! The tokens per second are excellent on a single H100. Shout out to OpenAI and vLLM for great day-one performance. ...

August 6, 2025 · Jim Robinson-Bohnslav

In Defense of Muon: A Deep Dive into Moonshot's K2 Optimizer (A Translated Analysis)

About the translation This is a translation of the original blog post by toothacher17. The original post is in Chinese and can be found here. The author’s tweet about it is here. I translated it using Google Translate, Deepseek-R1, Gemini 2.5 Pro, and O3. This translation was edited by Kimi K2-Instruct at kimi.com. Original Post Author: toothacher17 Original Link: https://www.zhihu.com/question/1927140506573435010/answer/1927378524513219780 Source: Zhihu Copyright belongs to the author. For commercial reprint, please contact the author for authorization. For non-commercial reprint, please indicate the source. ...

July 12, 2025 · toothacher17

MLLMs, VLMs, LVLMs, LMMs...

There exists a class of models whose inputs are text prompts + images or video. Their outputs are text. Example: “Explain the joke in this tweet. Be concise.” Answer, courtesy of GPT4o: The joke humorously compares “the talk” about sensitive topics with explaining to kids why there’s a server at home. The mock children’s book title exaggerates the idea, poking fun at tech enthusiasts whose home servers are significant enough to require a formal explanation to their kids. ...

December 11, 2024 · Jim Robinson-Bohnslav