cuda : add FILL op support #17851

JayZenith · 2025-12-08T00:21:16Z

Add CUDA backend support for the GGML_OP_FILL, which was previously missing (CPU and Vulkan had it). This operation is used by the Qwen3-Next model (discussion in #16623).

Added fill.cu and fill.cuh with a simple CUDA kernel.
Added dispatch case in ggml_cuda_compute_forward()
Declared support in ggml_backend_cuda_device_supports_op()

Tested with test-backend-ops -o FILL on Tesla T4:
FILL(type=f32,ne=[10,10,4,3],c=0.000000): OK
FILL(type=f32,ne=[303,207,11,3],c=2.000000): OK
FILL(type=f32,ne=[800,600,4,4],c=-152.000000): OK
FILL(type=f32,ne=[2048,512,2,2],c=3.500000): OK
4/4 tests passed

am17an

I'm not sure why this should just not be a cudaMemsetAsync

JayZenith · 2025-12-08T03:05:39Z

@am17an cudaMemsetAsync writes only one byte value repeatedly and dosent interpret floats/doubles. It works for 0.0f but fails for numbers like 1.0f (0x3F800000) as it would write 0x3F to every byte. This kernel writes the full float/double per element, so works for any number. Essentially, byte-wise vs element-wise writing.

am17an · 2025-12-08T03:39:01Z

You need to also enable this kernel via ggml_backend_cuda_device_supports_op, right now I'm not sure how it's passing test-backend-ops for you

ggml/src/ggml-cuda/fill.cu

cuda : add FILL op support

d91f4f9

github-actions bot added Nvidia GPU Issues specific to Nvidia GPUs ggml changes relating to the ggml tensor library for machine learning labels Dec 8, 2025

loci-dev mentioned this pull request Dec 8, 2025

UPSTREAM PR #17851: cuda : add FILL op support auroralabs-loci/llama.cpp#481

Open

am17an reviewed Dec 8, 2025

View reviewed changes

ggml/src/ggml-cuda/fill.cu Outdated Show resolved Hide resolved

am17an reviewed Dec 8, 2025

View reviewed changes

ggml/src/ggml-cuda/fill.cu Outdated Show resolved Hide resolved

JayZenith force-pushed the cuda-fill-op branch 3 times, most recently from d22704c to 43f3b5f Compare December 8, 2025 04:09

am17an approved these changes Dec 8, 2025

View reviewed changes

JayZenith force-pushed the cuda-fill-op branch from 43f3b5f to ae71397 Compare December 8, 2025 04:57

cuda : add missing FILL op files

179ddb5

JayZenith force-pushed the cuda-fill-op branch from ae71397 to 179ddb5 Compare December 8, 2025 08:38

am17an merged commit 51e0c2d into ggml-org:master Dec 8, 2025
78 checks passed

gabe-l-hart mentioned this pull request Dec 10, 2025

feat: llama.cpp bump (17f7f4) for SSM performance improvements ollama/ollama#13408

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

cuda : add FILL op support #17851

cuda : add FILL op support #17851

Uh oh!

JayZenith commented Dec 8, 2025 •

edited

Loading

Uh oh!

am17an left a comment

Uh oh!

JayZenith commented Dec 8, 2025

Uh oh!

am17an commented Dec 8, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

cuda : add FILL op support #17851

cuda : add FILL op support #17851

Uh oh!

Conversation

JayZenith commented Dec 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

am17an left a comment

Choose a reason for hiding this comment

Uh oh!

JayZenith commented Dec 8, 2025

Uh oh!

am17an commented Dec 8, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

JayZenith commented Dec 8, 2025 •

edited

Loading