Jim Robinson-Bohnslav

Vibe checking GPT-OSS with vLLM, Modal, and Textual

OpenAI’s just open-sourced their first model since GPT2. I wanted to try out a few prompts, check the vibes, and see OpenAI’s raw reasoning traces for the first time. Alas, gpt-oss.com was down for launch day and most inference providers weren’t up yet. My old code didn’t use the Responses API, so I couldn’t see the reasoning trace or change the reasoning effort. So, I basically mashed up five blog posts into a Modal vLLM server and a Textual Python client where we can chat with GPT-OSS-120b! The tokens per second are excellent on a single H100. Shout out to OpenAI and vLLM for great day-one performance. ...

In Defense of Muon: A Deep Dive into Moonshot's K2 Optimizer (A Translated Analysis)

About the translation This is a translation of the original blog post by toothacher17. The original post is in Chinese and can be found here. The author’s tweet about it is here. I translated it using Google Translate, Deepseek-R1, Gemini 2.5 Pro, and O3. This translation was edited by Kimi K2-Instruct at kimi.com. Original Post Author: toothacher17 Original Link: https://www.zhihu.com/question/1927140506573435010/answer/1927378524513219780 Source: Zhihu Copyright belongs to the author. For commercial reprint, please contact the author for authorization. For non-commercial reprint, please indicate the source. ...

MLLMs, VLMs, LVLMs, LMMs...

There exists a class of models whose inputs are text prompts + images or video. Their outputs are text. Example: “Explain the joke in this tweet. Be concise.” Answer, courtesy of GPT4o: The joke humorously compares “the talk” about sensitive topics with explaining to kids why there’s a server at home. The mock children’s book title exaggerates the idea, poking fun at tech enthusiasts whose home servers are significant enough to require a formal explanation to their kids. ...