Skip to content

Conversation

@JayZenith
Copy link
Contributor

@JayZenith JayZenith commented Dec 8, 2025

Add CUDA backend support for the GGML_OP_FILL, which was previously missing (CPU and Vulkan had it). This operation is used by the Qwen3-Next model (discussion in #16623).

  • Added fill.cu and fill.cuh with a simple CUDA kernel.
  • Added dispatch case in ggml_cuda_compute_forward()
  • Declared support in ggml_backend_cuda_device_supports_op()

Tested with test-backend-ops -o FILL on Tesla T4:
FILL(type=f32,ne=[10,10,4,3],c=0.000000): OK
FILL(type=f32,ne=[303,207,11,3],c=2.000000): OK
FILL(type=f32,ne=[800,600,4,4],c=-152.000000): OK
FILL(type=f32,ne=[2048,512,2,2],c=3.500000): OK
4/4 tests passed

@github-actions github-actions bot added Nvidia GPU Issues specific to Nvidia GPUs ggml changes relating to the ggml tensor library for machine learning labels Dec 8, 2025
Copy link
Collaborator

@am17an am17an left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure why this should just not be a cudaMemsetAsync

@JayZenith
Copy link
Contributor Author

@am17an cudaMemsetAsync writes only one byte value repeatedly and dosent interpret floats/doubles. It works for 0.0f but fails for numbers like 1.0f (0x3F800000) as it would write 0x3F to every byte. This kernel writes the full float/double per element, so works for any number. Essentially, byte-wise vs element-wise writing.

@am17an
Copy link
Collaborator

am17an commented Dec 8, 2025

You need to also enable this kernel via ggml_backend_cuda_device_supports_op, right now I'm not sure how it's passing test-backend-ops for you

@JayZenith JayZenith force-pushed the cuda-fill-op branch 3 times, most recently from d22704c to 43f3b5f Compare December 8, 2025 04:09
@am17an am17an merged commit 51e0c2d into ggml-org:master Dec 8, 2025
78 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ggml changes relating to the ggml tensor library for machine learning Nvidia GPU Issues specific to Nvidia GPUs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants