home
Mohamed Arbi Nsibi

CUDA Agent paper notes

March 8, 2026
· 0 views
Notes on CUDA Agent and how it uses an agentic RL workflow with tools for CUDA kernel optimization.

the core of this paper is about moving from “guessing code” to a structured agentic RL system that actually uses developer tools to self-correct

RFT (Rejection Fine-Tuning) and the warm-up

the authors found that starting RL from scratch is unstable because the model does not initially know how to use the agentic tools like the profiler or shell

the warm-up:

RFT:

performance and metrics

it tests against KernelBench which splits tasks into three levels of difficulty. They do not only measure whether the code runs. They measure whether it is faster than the industry standard

img.png

agentic workflow

instead of one-shot generation the model follows a structured workflow it is trained to:

why it matters

most coding LLMs are stochastic parrots for syntax, this new approach is closer to a reasoning engine for hardware, it doesnt just know CUDA syntax it learns how to navigate GPU architecture by observing the results of its own experiments in a sandbox

resources

Share this article: LinkedIn

(END)

Join the discussion