Skip to content

Commit a6ef245

Browse files
authored
chore(model gallery): add qwen3-vl-30b-a3b-instruct (#6960)
Signed-off-by: Ettore Di Giacinto <[email protected]>
1 parent 88cb379 commit a6ef245

File tree

2 files changed

+71
-3
lines changed

2 files changed

+71
-3
lines changed

gallery/index.yaml

Lines changed: 63 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,67 @@
11
---
2+
- &qwen3vl
3+
url: "github:mudler/LocalAI/gallery/qwen3.yaml@master"
4+
icon: https://cdn-avatars.huggingface.co/v1/production/uploads/620760a26e3b7210c2ff1943/-s1gyJfvbE1RgO5iBeNOi.png
5+
license: apache-2.0
6+
tags:
7+
- llm
8+
- gguf
9+
- gpu
10+
- image-to-text
11+
- multimodal
12+
- cpu
13+
- qwen
14+
- qwen3
15+
- thinking
16+
- reasoning
17+
name: "qwen3-vl-30b-a3b-instruct"
18+
urls:
19+
- https://huggingface.co/unsloth/Qwen3-VL-30B-A3B-Instruct-GGUF
20+
description: |
21+
Meet Qwen3-VL — the most powerful vision-language model in the Qwen series to date.
22+
23+
This generation delivers comprehensive upgrades across the board: superior text understanding & generation, deeper visual perception & reasoning, extended context length, enhanced spatial and video dynamics comprehension, and stronger agent interaction capabilities.
24+
25+
Available in Dense and MoE architectures that scale from edge to cloud, with Instruct and reasoning‑enhanced Thinking editions for flexible, on-demand deployment.
26+
27+
#### Key Enhancements:
28+
29+
* **Visual Agent**: Operates PC/mobile GUIs—recognizes elements, understands functions, invokes tools, completes tasks.
30+
31+
* **Visual Coding Boost**: Generates Draw.io/HTML/CSS/JS from images/videos.
32+
33+
* **Advanced Spatial Perception**: Judges object positions, viewpoints, and occlusions; provides stronger 2D grounding and enables 3D grounding for spatial reasoning and embodied AI.
34+
35+
* **Long Context & Video Understanding**: Native 256K context, expandable to 1M; handles books and hours-long video with full recall and second-level indexing.
36+
37+
* **Enhanced Multimodal Reasoning**: Excels in STEM/Math—causal analysis and logical, evidence-based answers.
38+
39+
* **Upgraded Visual Recognition**: Broader, higher-quality pretraining is able to “recognize everything”—celebrities, anime, products, landmarks, flora/fauna, etc.
40+
41+
* **Expanded OCR**: Supports 32 languages (up from 19); robust in low light, blur, and tilt; better with rare/ancient characters and jargon; improved long-document structure parsing.
42+
43+
* **Text Understanding on par with pure LLMs**: Seamless text–vision fusion for lossless, unified comprehension.
44+
45+
#### Model Architecture Updates:
46+
47+
1. **Interleaved-MRoPE**: Full‑frequency allocation over time, width, and height via robust positional embeddings, enhancing long‑horizon video reasoning.
48+
49+
2. **DeepStack**: Fuses multi‑level ViT features to capture fine-grained details and sharpen image–text alignment.
50+
51+
3. **Text–Timestamp Alignment:** Moves beyond T‑RoPE to precise, timestamp‑grounded event localization for stronger video temporal modeling.
52+
53+
This is the weight repository for Qwen3-VL-30B-A3B-Instruct.
54+
overrides:
55+
mmproj: mmproj/mmproj-F16.gguf
56+
parameters:
57+
model: Qwen3-VL-30B-A3B-Instruct-Q4_K_M.gguf
58+
files:
59+
- filename: Qwen3-VL-30B-A3B-Instruct-Q4_K_M.gguf
60+
sha256: 75d8f4904016d90b71509c8576ebd047a0606cc5aa788eada29d4bedf9b761a6
61+
uri: huggingface://unsloth/Qwen3-VL-30B-A3B-Instruct-GGUF/Qwen3-VL-30B-A3B-Instruct-Q4_K_M.gguf
62+
- filename: mmproj/mmproj-F16.gguf
63+
sha256: 7e7cec67a3a887bddbf38099738d08570e85f08dd126578fa00a7acf4dacef01
64+
uri: huggingface://unsloth/Qwen3-VL-30B-A3B-Instruct-GGUF/mmproj-F16.gguf
265
- &jamba
366
icon: https://cdn-avatars.huggingface.co/v1/production/uploads/65e60c0ed5313c06372446ff/QwehUHgP2HtVAMW5MzJ2j.png
467
name: "ai21labs_ai21-jamba-reasoning-3b"

gallery/qwen3.yaml

Lines changed: 8 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -6,15 +6,20 @@ config_file: |
66
backend: "llama-cpp"
77
template:
88
chat_message: |
9-
<|im_start|>{{ .RoleName }}
10-
{{ if .FunctionCall -}}
11-
{{ else if eq .RoleName "tool" -}}
9+
<|im_start|>{{if eq .RoleName "tool" }}user{{else}}{{ .RoleName }}{{end}}
10+
{{ if eq .RoleName "tool" -}}
11+
<tool_response>
1212
{{ end -}}
1313
{{ if .Content -}}
1414
{{.Content }}
1515
{{ end -}}
16+
{{ if eq .RoleName "tool" -}}
17+
</tool_response>
18+
{{ end -}}
1619
{{ if .FunctionCall -}}
20+
<tool_call>
1721
{{toJson .FunctionCall}}
22+
</tool_call>
1823
{{ end -}}<|im_end|>
1924
function: |
2025
<|im_start|>system

0 commit comments

Comments
 (0)