Skip to content

Experimental evaluation of GPU kernels (MatMul, LayerNorm, GELU) across multiple backends. Operators are adapted from tutorials and examples and modified for this study.

Notifications You must be signed in to change notification settings

LironKesem/KernelGeneration

Repository files navigation

KernelGeneration

KernelGeneration is a repository created to evaluate GPU kernel metrics, including:

  • correctness
  • performance
  • portability

Requirements

  • PyTorch nightly (CUDA or ROCm). Install the nightly that matches your system before running the installer.

    • CUDA example:
      pip install --pre torch torchvision --index-url https://download.pytorch.org/whl/nightly/cu124
    • ROCm example:
      pip install --pre torch torchvision --index-url https://download.pytorch.org/whl/nightly/rocm6.4

    Use the official PyTorch install selector to pick the correct nightly wheel for your OS/driver stack.

  • (Optional) KernelLLM prompts
    If you want to try KernelLLM-style prompting, see the templates in ./prompt/.
    Note: the installer does not clone KernelLLM or install transformers/accelerate.

Environment

After setup, export:

export TRITONBENCH_RUN_CONFIG="$(pwd)/benchmark_helion_runner.yaml"
export TRITONBENCH_HELION_PATH="$(pwd)/helion"

Project Structure (before running install.py)

kernelGen/
├── README.md
├── install.py
├── benchmark_helion_runner.yaml
├── prompt/
│   └── ...                # optional prompt templates for KernelLLM-style tests
└── operators/
    ├── bf16_layernorm/
    └── bf16_matmul/
  • install.py — installs TritonBench, clones Helion (kernel-gen-rh branch) and installs it in editable mode, and copies local operators into TritonBench.
    It does not install PyTorch or KernelLLM.
  • operators/ — GPU kernel operators (copied into TritonBench during install).

Evaluated Kernels

  • MatMul
  • LayerNormWIP (optional KernelLLM prompts available in ./prompt/)
  • WIP: GELU

Backends Tested

  • TorchInductor
  • Triton
  • Helion
  • KernelLLM (optional, prompts only via ./prompt/)
  • WIP: Mako (based on KernelLLM)

Tested on

Usage

  1. Install PyTorch nightly (see Requirements above).
  2. Run the installer (clones TritonBench, clones & installs Helion, and copies operators):
    python install.py
  3. Export environment variables (adjust paths as needed):
    export TRITONBENCH_RUN_CONFIG="$(pwd)/benchmark_helion_runner.yaml"
    export TRITONBENCH_HELION_PATH="$(pwd)/helion"
  4. Run the benchmark:
    python tritonbench/run.py 

About

Experimental evaluation of GPU kernels (MatMul, LayerNorm, GELU) across multiple backends. Operators are adapted from tutorials and examples and modified for this study.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •