mtmd: add Eagle2-VL vision and projector support #17360

YaelGitAccount · 2025-11-18T12:16:04Z

This PR adds initial support for the Eagle2-VL multimodal models (1B / 2B) in the MTMD pipeline.

The update introduces a dedicated converter path and runtime builder for the Eagle2-VL vision tower and its 2-layer projector.
All changes are fully self-contained and do not affect any existing model architectures.

Converter (convert_hf_to_gguf.py)

Registers a new model handler Eagle2VLVisionModel.
Writes VisionProjectorType=EAGLE2VL into GGUF metadata.
Extracts Eagle2-VL vision metadata (image/patch size, mean/std, block count, RMSNorm eps).
Supports metadata-driven spatial merge (spatial_merge_size, default: 2×2).
Canonicalizes projector weights (mm.0, mm.2) to [n_in, n_out]; supports optional biases.
Converts Conv3D patch-embed kernels into two Conv2D kernels when present.
Normalizes HF checkpoint prefixes to align with MTMD conventions.

GGUF (gguf-py/gguf/constants.py)

Adds the new projector type EAGLE2VL.

Runtime (tools/mtmd/clip.cpp)

Adds a dedicated build_eagle2vl() vision path:
- ViT with learned absolute position embeddings (including dynamic-resize support).
- Metadata-driven spatial merge prior to the projector.
- 2-layer MLP projector (mm.0 → GELU → mm.2) using canonical [n_in, n_out] weights.
Updates dispatcher to route PROJECTOR_TYPE_EAGLE2VL to the new builder.
Final embedding dimension derived from mm_2_w->ne[1].

Integration & Compatibility

Loader extended to read Eagle2-VL projector tensors.
No CLI changes.
No impact on other projector types or existing model architectures.

Validation

Tested locally on Eagle2-VL 1B and 2B checkpoints:

GGUF conversion produces expected metadata.
Vision tower + spatial merge + projector run end-to-end.
All matmuls operate on canonical weights (no runtime transposes).
Inference completes successfully.

Scope

This PR focuses on Eagle2-VL (1B / 2B).
Support for additional Eagle2 variants (e.g., 9B) will be handled in a follow-up.

Closes #16704

ngxson

Can you explicitly confirm if part of the PR is generated by AI? I feel very suspicious about some redundant code

While you said in the PR description that you tested it, you haven't even mentioned the link to the model, as well as how you tested it.

ngxson · 2025-11-18T20:25:58Z

tools/mtmd/clip.cpp

+                                learned_pos_embd,
+                                nullptr);
+
+    // keep runtime quiet in normal runs; shapes are correct by construction


some indentations seem off here

ngxson · 2025-11-18T20:26:37Z

tools/mtmd/clip.cpp

+        if (model.mm_0_b) {
+            embeddings = ggml_add(ctx0, embeddings, model.mm_0_b);
+        }
+
+        embeddings = ggml_gelu(ctx0, embeddings);
+
+        GGML_ASSERT(model.mm_2_w != nullptr);
+        // keep [n_in, n_tokens] layout for the second matmul as well
+        embeddings = ggml_reshape_2d(ctx0, embeddings, embeddings->ne[0], embeddings->ne[1]);
+        embeddings = ggml_cont_2d(ctx0, embeddings, embeddings->ne[0], embeddings->ne[1]);
+        // Weights are canonicalized at conversion time to [n_in, n_out]; multiply directly.
+        embeddings = ggml_mul_mat(ctx0, model.mm_2_w, embeddings);
+        if (model.mm_2_b) {
+            embeddings = ggml_add(ctx0, embeddings, model.mm_2_b);
+        }


better replacing this whole block with build_ffn

ngxson · 2025-11-18T20:32:26Z

convert_hf_to_gguf.py

+        mlp_pos = name.find("mlp1.")
+        if mlp_pos != -1:
+            mlp_suffix = name[mlp_pos + len("mlp1."):]
+            # Skip LayerNorm (mlp1.0.*)
+            if mlp_suffix.startswith("0."):
+                return []
+            # Map first Linear (mlp1.1.*) -> mm.0.*
+            if mlp_suffix.startswith("1."):
+                new_name = "mm.0." + mlp_suffix[2:]
+                if new_name.endswith(".weight"):


I think all of these code are redundant. This model: https://huggingface.co/nvidia/Eagle2-1B has simple .mlp.fc1 and .mlp.fc2 MLP, there is no nesting mlp1.1.* as you described

ngxson · 2025-11-18T20:33:12Z

convert_hf_to_gguf.py

+            ]
+
+        # 5) Conv3D patch embed -> two Conv2D kernels
+        if name.endswith("patch_embed.proj.weight") and data_torch.ndim == 5:


are you sure about this? seems like bad copy-paste code from QwenVL

mtmd : add Eagle2-VL vision support

c5ac776

YaelGitAccount requested review from CISC and ngxson as code owners November 18, 2025 12:16

DajanaV mentioned this pull request Nov 18, 2025

UPSTREAM PR #17360: mtmd: add Eagle2-VL vision and projector support auroralabs-loci/llama.cpp#259

Open

github-actions bot added examples python python script changes labels Nov 18, 2025

Update convert_hf_to_gguf.py

e35a94b

ngxson requested changes Nov 18, 2025

View reviewed changes

YaelGitAccount and others added 5 commits December 7, 2025 11:34

style: fix indentation in build_eagle2vl

6247fd2

Refactor MLP projector to use build_ffn helper

0dda80f

add projector LayerNorm weight mapping (converter side)

eb83923

Update convert_hf_to_gguf.py

f4af853

apply projector LayerNorm at runtime

b4f660f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

mtmd: add Eagle2-VL vision and projector support #17360

mtmd: add Eagle2-VL vision and projector support #17360

YaelGitAccount commented Nov 18, 2025

Uh oh!

ngxson left a comment

Uh oh!

ngxson Nov 18, 2025

Uh oh!

ngxson Nov 18, 2025

Uh oh!

ngxson Nov 18, 2025

Uh oh!

ngxson Nov 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mtmd: add Eagle2-VL vision and projector support #17360

Are you sure you want to change the base?

mtmd: add Eagle2-VL vision and projector support #17360

Conversation

YaelGitAccount commented Nov 18, 2025

Converter (convert_hf_to_gguf.py)

GGUF (gguf-py/gguf/constants.py)

Runtime (tools/mtmd/clip.cpp)

Integration & Compatibility

Validation

Scope

Uh oh!

ngxson left a comment

Choose a reason for hiding this comment

Uh oh!

ngxson Nov 18, 2025

Choose a reason for hiding this comment

Uh oh!

ngxson Nov 18, 2025

Choose a reason for hiding this comment

Uh oh!

ngxson Nov 18, 2025

Choose a reason for hiding this comment

Uh oh!

ngxson Nov 18, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants