KernelGeneration is a repository created to evaluate GPU kernel metrics, including:
- correctness
- performance
- portability
-
PyTorch nightly (CUDA or ROCm). Install the nightly that matches your system before running the installer.
- CUDA example:
pip install --pre torch torchvision --index-url https://download.pytorch.org/whl/nightly/cu124
- ROCm example:
pip install --pre torch torchvision --index-url https://download.pytorch.org/whl/nightly/rocm6.4
Use the official PyTorch install selector to pick the correct nightly wheel for your OS/driver stack.
- CUDA example:
-
(Optional) KernelLLM prompts
If you want to try KernelLLM-style prompting, see the templates in./prompt/.
Note: the installer does not clone KernelLLM or installtransformers/accelerate.
After setup, export:
export TRITONBENCH_RUN_CONFIG="$(pwd)/benchmark_helion_runner.yaml"
export TRITONBENCH_HELION_PATH="$(pwd)/helion"kernelGen/
├── README.md
├── install.py
├── benchmark_helion_runner.yaml
├── prompt/
│ └── ... # optional prompt templates for KernelLLM-style tests
└── operators/
├── bf16_layernorm/
└── bf16_matmul/
install.py— installs TritonBench, clones Helion (kernel-gen-rh branch) and installs it in editable mode, and copies local operators into TritonBench.
It does not install PyTorch or KernelLLM.operators/— GPU kernel operators (copied into TritonBench during install).
MatMulLayerNorm— WIP (optional KernelLLM prompts available in./prompt/)- WIP:
GELU
- TorchInductor
- Triton
- Helion
- KernelLLM (optional, prompts only via
./prompt/) - WIP: Mako (based on KernelLLM)
- Install PyTorch nightly (see Requirements above).
- Run the installer (clones TritonBench, clones & installs Helion, and copies operators):
python install.py
- Export environment variables (adjust paths as needed):
export TRITONBENCH_RUN_CONFIG="$(pwd)/benchmark_helion_runner.yaml" export TRITONBENCH_HELION_PATH="$(pwd)/helion"
- Run the benchmark:
python tritonbench/run.py